Much research has taken place to analyse and compare Multi Microphone Arrays for surround sound music recording (Heck & Rieseberg, 2001; Preston, 2003; Schellstede & Faller, 2007; Williams & Le Du, 1999; Williams & Le Du, 2000). However, a relatively small number of research studies exist which directly compare the Soundfield system to Multi Microphone Arrays in the context of classical music recording which are discussed in the following sections.
Kassier et al. (2005) tested a selection of Multi Microphone Arrays for surround sound recording in a variety of musical styles performed by soloists or accompaniments. The aim of the paper was to fill a gap left by previous microphone comparisons which did not simultaneously record their musical sources meaning real-time switching between arrays was impossible. Simultaneous recording of a piece of music by two or more recording arrays means that differences in the performance from take to take will not be a factor in test participant decisions when presented with comparisons. The paper set out to build an understanding of the practicalities of the surround sound recording of classical music for future research by providing the multichannel recordings. Recording arrays were split into front and rear segments.
The front segment recording arrays used were:
- Variant of the Fukada Tree,
- Variant of the OCT technique,
- A near coincident technique designed by an author of the paper.
The rear segment arrays used were:
- IRT Cross variant,
- Hamasaki Square,
- Cortex MK2,
- Spaced Cardioid.
In addition to the main aims of the project, an informal comparison was made using a small sample of expert listeners. The results of this comparison show that the Fukada Tree was the most preferred front segment array with the Hamasaki Square being the most preferred rear segment array. The authors warned that their comparison was informal and also that the use of different makes and models of microphone throughout each array may have had an influence on the results. As expert listeners were used in the comparison it can be argued that this is not a significant issue as the results are supported by other studies using similar arrays in other more formal comparisons. Results cannot be said to be representative of all people as naïve listeners were not used in the testing.
Kornacki et al. (2001) made recordings of a classical quartet with a selection of surround sound Multi Microphone Arrays. The aim of the paper was to establish a recommended recording array for engineers to use in situations, rather than use the inconsistent set of recommendations which the authors felt preceded it. The results show that the most preferred technique is the Double ORTF followed closely by the Decca Tree. The results of this paper backup the results of Kassier et al. (2005) where the most preferred array included the Fukada Tree, which is a variant on the Decca Tree (Fukada, Tsujimoto, & Akita, 1997). It was not stated whether recordings were made simultaneously.
Sungyoung et al. (2006) used a selection of Multi Microphone Arrays and the Soundfield system to record solo piano recitals. The aim of the research was to investigate the importance of musical genre on listeners’ preference. The results show that the Soundfield was not highly regarded in testing and that the Fukada Tree was consistently one of the most preferred if not the most preferred across the musical selections. These results are important as the musical selections covered different styles of playing. This gives further credence to the Decca Tree and Fukada Tree arrays as being the most preferred representatives of the Multi Microphone Array microphone technique.
Paquier & Koehl (2011) used expert and naïve listeners to compare two Multi Microphone Arrays and two ambisonic surround recording techniques, the latter including the Soundfield system, when recording a big band. The aim of the paper was to compare these arrays with respect to listener preference and to four attributes associated with the assessment of surround sound. Results showed that expert listeners preferred the Multi Microphone Array technique, which was an OCT Surround array. Naïve listeners preferred the higher order ambisonics recording. The equipment used to create the higher order ambisonic recording was not listed. The authors state that the results of the attribute section of the test were not enough to fully explain the preference results; however, they speculate that the quality of the microphone capsules across each array may have been the cause of this. It should also be noted that some participants were disturbed by the level of direct sound in the rear of the ambisonic recordings.
This section discusses the methodologies of the research presented in Sections 2.1.1 and 2.1.2 to analyse their positive and negative aspects. Once established, these aspects are then considered in the development of the testing methodology used in this project which is outlined in Section 4.
Two conclusions can be made at this point. The first is that the Fukada Tree or Decca Tree based Multi Microphone Arrays are among the most popular as they are frequently represented and favoured in past studies. Secondly, the Soundfield microphone is not considered to be able to perform well for surround sound production of classical music.
However, in most of the studies presented, either very small or very large musical ensembles were used. Sungyoung et al. (2006) and Kassier et al. (2005) used solo and/or accompaniments for their recordings. Soloists can provide a single point of reference for image focus and clarity; however, they may not do enough to challenge the array in the area of localisation or image stability. Additionally, a solo instrument may not be enough to excite the room in as an acoustically significant way as a larger ensemble with multiple instruments featuring an overall wider audio spectrum and dynamic range could.
Paquier and Koehl (2011) used a twenty piece big band for their recordings. A common technique when recording large ensembles is to employ accent microphones. The job of these microphones, which are placed close to certain sections of a musical ensemble, is to add clarity and definition to that section (Moylan, 2007, pp. 294 – 295). For example, the more delicate sections of a choir or brass band which require their sounds to be clearly defined in the mix would invariably require accent microphones to be added to the stereo or front image of the mix and manipulated in level as required. These microphones can ultimately be seen as a support for the main array which is otherwise being overloaded by the size of the ensemble. The addition of accent signals would result in the front image being improved but manipulated in such a way that the fundamental array performance cannot be accurately judged.
Kornaki et al. (2001) used a string quartet for their comparison of Multi Microphone Arrays. The use of a string quartet can be seen as a reasonable middle ground between soloists who may not challenge a recording array and larger ensembles which would exceed the capabilities of an array. Quartets are wide enough that they can challenge an array in term of image characteristics while the result of combining a number of different sound sources and playback spectra around the performance space creates a complex reverberation pattern which will be picked up by the rear segments of the test arrays. However, the study which included the Decca Tree did not use the Soundfield system.
The results of the research outlined indicate that the Soundfield microphone is not the best choice for a recording engineer when recording a classical ensemble in stereo or surround sound. However, given that the research of Paquier and Koehl (2011) used a very large ensemble, no array could reasonably be expected to perform at its best in that situation. Sungyoung et al. (2006) used very small classical music sources which may not translate to the application of recording a larger ensemble. Therefore, further research which makes use of other appropriately sized musical sources with the Soundfield would be beneficial and contribute positively to the current body of knowledge.
It would be a misuse of resources and time to repeat past research by using a number of Multi Microphone Arrays. Instead, by using the results of the comparisons outlined thus far, a Multi Microphone Array based on the Decca and Fukada trees would mean that one of the most preferred Multi Microphone Arrays is being tested against the Soundfield which results in a tighter focus to the research.
The studies of Kassier et al. (2005) and Kornaki et al (2001) highlight issues which include different models and quality of microphones used for the Multi Microphone Arrays. To eliminate these issues, professional quality microphones of the same type should be used in all Multi Microphone Arrays used in this project.
Listening tests are used to compare, rate or assess the quality of various test stimuli. A listening test comprises of a test participant sitting in an acoustically suitable environment where they interact with a test interface which plays back audio stimuli and allows them to answer questions based on what they heard.
The research of Kassier et al. (2005), Sungyoung et al. (2006) and Paquier & Koehl (2011) all make use of listening tests to make the comparisons of the recording arrays which are used to meet their respective research aims. This section will discuss the components required to create a robust listening test methodology.
There are two groups of test participant available for this type of project which are expert and naïve. For a listener to be considered an expert, they would have to have been experienced in some area of audio engineering with skills in critical listening. This experience would result in the ability to objectively judge aspects of the sound source (Moylan, p. 89). Naïve listeners will not have these critical listening abilities and therefore would require training in the related areas (Bech & Zacharov, 2006, pp. 310 – 315).
The research of Kassier et al. (2005) and Sungyoung et al. (2006) used expert listeners as part of their tests. The term expert is used to describe listeners who either work in audio or have a musical background. Paquier & Koehl (2011) used a nearly equal number of expert and naive listeners. The opinion of these two listener types can greatly aid the research as similarities or differences between the groups results can highlight significant issues to consider. However, in the context of this project, obtaining enough of each type of listener may prove to be difficult but should be sought where possible.
Objective audio testing is where an audio signal is analysed by using hardware or software analysers which measure objective metrics, such as the readings outlined in Section 3.5.3(Cox, Objective Metrics, 2013). Subjective testing is where a sample of participants is asked for their opinion in aspects of an audio stimulus (Cox, Subjective Methods, 2013). In the context of this project, subjective testing is the most suitable method of arriving at a conclusion which meets the project aims as it requires asking a sample of listeners their opinions on test stimuli.
Kornaki et al. (2001) used parametric and non-parametric methods of pairwise comparison testing where answers between two test stimuli are given on scales or as categories respectively (Sprent, 1989, pp. 1 – 2). In a non-parametric pairwise comparison of recording extracts, participants are asked questions where answers are given in binary form and categorical. The comparison allows for two or more options to be rated against each other (Sprent, 1989, pp. 20 – 47). Answers can be one or the other, like or dislike, narrow or wide etc. A pairwise comparison collects answers which are categorical and eliminate many of the possible causes of scale based bias outlined in Section 2.2.3; however, this is at the expense of detailed answers. The forms of bias should still be appreciated and considered in the design of the listening tests.
A paper by Rumsey, ZielinskiI, & Bech (2008) outlines possible forms of bias which can be found when conducting listening tests. Although the paper was written around scale based testing types such as MUSHRA (ITU, BS.1534-1, 2003), the following sources of bias should still be considered (Rumsey, ZielinskiI, & Bech, 2008).
The recency effect is a situation where participants preference is bias towards the most recently heard sound source. In a test environment where a set of two sources are played, the latter may tend to be the most preferred. The use of stimuli with short duration is recommended as well as randomising the playback of test stimuli.
The paper describes bias due to participant’s interactions with the test environment, emotions or preferences. A method of bias reduction listed is to use a large sample base with multiple backgrounds. The practicalities of the latter in a University project environment with limited time frames may be restrictive; however, a large sample size should be sought.
The paper goes into detail about ensuring that participants are not presented with too many stimuli and questions. In a test where scales are used, results may be contaminated by participants either using extremes of the scales or following a pattern of using middling scale values. The issue is less important for the collection of categorical data such as like or dislike.
The research of Paquier & Koehl (2011) asked test participants to answer questions based on certain sonic attributes. By gathering data about testing material in this way, identifications as to why a particular array is preferred over the other can be investigated.
Choisel & Wickelmaier (2007) found that consistent judgments of naïve listeners could be obtained on their preference and assessment of attributes of a sound source when provided with the attribute definitions. By using a similar technique, the reasons behind the preference may be indicated for further investigative work. The attributes chosen for this purpose are spaciousness, envelopment, clarity and naturalness. The definitions used by Choisel & Wickelmaier (2007) as supplied to test participants are given in Table 2.1.
|Spaciousness||A sound is said to be spacious when you have a good impression of the space which it is played. Try to imagine this space; it can be a small room for example, or a large hall. Select the sound which the impression of the space is greater.|
|Envelopment||A sound is enveloping when it wraps around you. A very enveloping sound will give you the impression of being immersed in it, while a non-enveloping one will give you the impression of being outside of it.|
|Clarity||The clearer the sound, the more details you can perceive in it, choose the sound that appears clearer to you.|
|Naturalness||A sound is natural if it gives you a realistic impression, as opposed to sounding artificial.|
Table 2.1 – Attribute Definitions
The ability of an array to convey spaciousness and envelopment is essential. Without a sense of space, the listener cannot be enveloped in it. The use of the surround signals is then required to enable the envelopment of listeners when playing back appropriate material (ITU, BS.775-3, 2012, p. 7).
The Multi Microphone Array is a spaced technique while the Soundfield is a coincident technique. Each operational method has inherent differences with respect to their recorded images as highlighted in Section 1.1.2. The definition of instrumentation and room ambience could be positively or negatively affected by each array type.
Given the significant physical and operational differences between recording arrays outlined and with respect to the sense of space and envelopment discussed previously, it may be possible that listeners feel one array sounds more natural than the other.
The suitability of the definitions chosen and the ability to reliably test these definitions was assessed with an online experiment. Binaural reproduction allows the three dimensional sound space to be reproduced over headphones (Eargle, 2005, pp. 187 – 191). By using the Binaural function of the Harpex-B plugin in some cases, a selection of mixes were exported in the following ways:
- To test the spaciousness of sounds, a stereo mix was sourced from the front left and right Multi Microphone Array microphones. A second mix was sourced in the same way with the addition of the ambient signals from the rear Multi Microphone Array microphones. These mixdowns were presented to participants who were asked which mix sounded more spacious.
- To test the envelopment of sounds, a binaural mixdown from Harpex-B was used in conjunction with a stereo mix sourced from the left and right Multi Microphone Array microphones.
- To test the clarity between audio extracts, stereo mixdowns of the Multi Microphone Array and Soundfield arrays were presented to participants who were asked which mix sounded clearer.
- To test whether pieces sound natural, a mono mix was sourced from the centre microphone of the Multi Microphone Array and compared with a stereo mix sourced from the left and right Multi Microphone Array microphones.
These extracts were then hosted on Soundcloud and included on dedicated survey page of a website (Kelly, 2014). The web test presented listeners with four pairwise comparison tests, one for each definition. Participants were asked to select which recording extract displayed more of each attribute.
The link to the test was distributed through Twitter, Facebook and internet forums of musical and audio engineering topic areas. By using the binaural mixdown process, surround sound signals could be played back over headphones. Participants were asked to supply information about the headphones being used for post screening purposes.
In total, thirty participants took part. The answers of ten participants were discounted due to a lack of information supplied on headphones used. Out of the remaining twenty results, all definitions were supported with results meeting a minimum 95% confidence level. Results of lower quality headphones, such as personal media player ear buds, matched those of higher quality equipment such as the Beyerdynamic DT150. Results from this experiment serve to confirm conclusions of Choisel & Wickelmaier (2007). Results are listed in Appendix B.5.
With respect to the current body of research, there is scope for a focused study into the comparison of a Multi Microphone Array and the Soundfield microphone in the surround sound recording of classical music.
The ensemble size in previous research can be seen as influencing the results of the respective studies. A string quartet provides a balance between the ensembles size while not requiring reinforcement with accent microphones.
Past research has compared techniques for a variety of reasons which used the assessment of listener preference as a basis for their results. From these studies, the Fukada and Decca Tree techniques are frequently the most favoured arrays. By using a derivative of these arrays, one of the most preferred arrays can be compared against the Soundfield system which ensures there is no overlap between past research and that resources are used most efficiently.
To meet aims, different forms of testing methodology were used in past research. A testing methodology using non-parametric pairwise comparisons has been selected as it provides clear preference results from test participants. In addition, the use of sound attributes has been selected to allow for the possible reasons behind preference to be established.