3 – Audio Production

3.1           Recording

In association with the Music Department and The Sonic Arts Research Centre of the Queen’s University Belfast, the Ulster String Quartet was recorded in the Harty Room of the Music Department on October the 10th, 2013 (The Ulster String Quartet, 2014; Queen’s University Belfast, 2014; The Sonic Arts Research Centre, 2014).

During the three hour recording session, a number of classical pieces were recorded using the Fukada Tree Multi Microphone Array which was adapted for the recording session and the Soundfield array microphone. With the microphone arrays setup in the Harty Room, array signals were routed to the Pro Tools HD equipped Harrison Studio. An image of the studio is shown in Figure 3.1. Monitoring was provided by five Genelec model 1032a monitors (Genelec, 2014). Recordings were made at a sample rate of 48kHz at a bit depth of 24. These settings were retained throughout the testing process.


Figure 3.1 – The Queen’s University Harrison Studio

3.2           Harty Room Specification

The Harty Room is part of the School at Music of Queen’s University Belfast. It is primarily used for classes, concerts and recording sessions from students of the Sonic Arts Research Centre. The room has an estimated volume of 1150m3 and reverberation time of 1.4 seconds (Kuster, 2008, p. 983). As time available in the Harty Room was limited, the critical distance of the room could not be measured and was instead approximated as being 1.64 metres which was calculated using the formula detailed in Equation 3.1 (Sengspiel, 2014).


where is the room volume in m3

and is the reverberation time of the room in seconds

Equation 3.1 – Sabine Critical Distance Approximation

The room is in the shape of a cross, which can be seen in Figure 3.2. With the stage at one end of the room (image left), the rear microphones of a surround array have been known to pick up early reflection information from the alcoves, at the centre of the cross. A more sonically pleasing reverberation sound can be captured by pushing the rear section of a surround array slightly further back. This issue was observed during the quartet setup and the required adjustments were implemented for this project.


Figure 3.2 – Aerial shot of the Harty Room (Google, 2014)

3.3           Choice of Recording Arrays

Based on its performance in past research and considering the performance environment, a variant on the Fukada Tree was selected for the Multi Microphone Array recording. The Fukada Tree replaces the Neumann M50 microphones outlined in the original Decca Tree design with cardioid microphones. The M50, shown in Figure 3.3, is an omni-directional microphone; however, it shows directional characteristics higher up its frequency response (Eargle, 2005, p. 183). The use of a cardioid based array allows more directionality and control of ambience during the production.


Figure 3.3 – Neumann M50 microphone internals (Recording Hacks, 2014)

With reference to Figure 3.4, the left and right microphones of the Fukada Tree are angled slightly away from stage to reach a recording angle of ~108°. Fukada also states additional microphones to the left and right of the array, which are separate to the L, C, R microphones, should be used to improve low frequency response to better capture a large ensemble’s width (Fukada A. , 2001).


Figure 3.4 – Basic Fukada Tree Layout

Where the array could have been set up exactly to the guidelines, each array should be adjusted to suit the recording environment and situation to ensure the recorded sound is at its best (Moylan, 2007, p. 294). A consideration of working in the Harty Room is that the building’s central heating system causes issues with the very low frequency elements of recordings. In addition to this, a string quartet would not require the reinforcement that they provide therefore these microphones would be omitted from the recording array.

The recording angle would also be adjusted to suit the width of the stage and musicians better. The exact angle was determined after the quartet had set up on stage and is detailed in Section 3.4.

3.4           Microphone Setup Details


Figure 3.5 – Microphone Setup

With reference to Figure 3.5, L, C, R, LS and RS are the left, centre, right, left surround and right surround Multi Microphone Array microphones respectively. S is the Soundfield microphone. The front to rear distance of 220cm is not to scale in this figure.

Microphone C of the Multi Microphone Array was placed 175cm from the front of the performers. Microphone S was placed 225cm from the front of the performers which is 50cm behind Microphone C. These placement decisions were made after listening to the quartet warm up where a balanced sound was observed with respect to direct sound and ambience. The placement is 11cm beyond the critical distance of the space which was calculated in Section 3.2.

Microphones L, C and R were 190 cm high, taking into account the stage height. Microphone S was 210cm high to allow it to clear the microphone C. Microphones LS and RS were both 200 cm high.

The ensemble was 213cm wide and 152cm deep. The centre point of the ensemble’s width was placed on the centre line of the Harty room. This line then denoted the position of the centre and Soundfield microphones. The Soundfield microphone was placed between the Left and Right Multi Microphone Array microphones and directly behind the centre microphone.

3.5           Editing and Processing

The editing and processing of the recordings was completed using Steinberg’s Cubase 6.5 in the University of Salford Newton building studios A and D. Cubase was installed on a Dell E6530 and interfaced into the studios monitoring systems via a Focusrite Saffire Pro 24 audio interface. Genelec and Blue Sky surround sound monitoring systems are featured in studios A and D respectively.

3.5.1       B-Format Processing

The Soundfield B-format decoders were not available for this project. Instead, the Harpex-B VST plugin was used for the decoding of B-format to a 5.1 surround sound signal (Harpex, 2014). ­­This plugin accepts the W, X, Y and Z signals of the B-format signal and then processes them to provide outputs such as binaural, stereo, surround and ambisonic. Listening tests show that it the use of the Harpex method provides high quality surround sound decoding of B-format signals (Berge & Barret, 2010b).

After the recording took place and by using the options provided in the Harpex-B plugin a surround sound signal conforming to the ITU-R BS.775-3 speaker angles of a 5.1 loudspeaker setup was created (ITU, BS.775-3, 2012, p. 4). This was used as it is the default output configuration supplied to an engineer to use during a production.

The selling point of the Soundfield array is the ability for post-recording manipulation the recorded signals so a second derivation was created. By adjusting the rear angles of the Harpex-B plugin to 150°, a more pleasing reverberation signal was obtained.

For use in this project, the 5.1 surround function was used. Harpex-B uses an omni-directional signal as the default source for the LFE (Harpex, 2011, p. 9). For the reasons that low frequency reinforcement of the Fukada Tree was omitted from the Multi Microphone Array, the LFE signal was not used.

The plugin automatically applies phase shifts to simulate a microphone spacing of 17cm, similar to the approximate distance between the human ears. This default setting was retained for use in the tests (Harpex, 2011, p. 9).      Basis of Operation

Harpex-B uses a technique called parametric decoding to process the Soundfield B-format signals. This technique splits the incoming signal into filter banks. Analysis of the contents of each filter bin can be used to determine directional information. Harpex-B also uses signal phase relationships for this purpose. Without this, sounds which occupy the same set of frequencies of a given filter bin would be determined as coming from the same direction as the louder signal (Harpex, 2014; Berge & Barret, 2010a; Berge & Barett, 2010b).

3.5.2       Recording Extract Processing

Pro Tools HD was used for the recording as it was the digital audio workstation installed in the Harrison Studio. These recordings where then transferred to Cubase 6.5. The processing included:

  • Examining recorded material,
  • Selection and editing of suitable extracts,
  • Setting markers for exporting of the extracts,
  • Exporting of each extract with respect to desired output format,
  • Importing of the sections into a finalisation project,
  • Fades applied to beginning and end of extracts,
  • Amplitude examination and levelling,
  • Final export.

3.5.3       Amplitude Levelling

All recording extracts were individually run through the Steinberg SLM 128 plugin (Tischmeyer, 2012) which provided metering that met the EBU R128 loudness recommendations (European Broadcasting Union, 2011). The key figure being assessed is the integrated LUFS value which measures loudness from the beginning to the end of the programme material. By playing each individual recording extract in its entirety, the overall loudness can be objectively measured. By resetting the meter, the process can be repeated on each other extract and collectively matched.

Adjustments were then made to pre-export fader values with reference to the SLM 128 to ensure the integrated LUFS reading for each set of recordings were the same within a tolerance of ± 1LUFS. This means that amplitude related preference on the part of the test participants would not be a factor as playback material would be the same loudness for each set (Katz, 2007, p. 168). Table 3.1 shows the LUFS readings for arrays in each set of recordings.

  Set 1 Set 2 Set 3 Set 4
Multi Microphone Array -27.8 -18.2 -18.1 -18.9
Soundfield ITU -27.7 -18.0 -18.1 -18.8
Soundfield   Adjusted -27.7 -18.3 -18.1 -18.8

Table 3.1 – LUFS figures for arrays in each musical piece or set.

3.5.4       Extract Length

The ITU-R BS.1116-1 standard states that test material should be between ten and twenty-five seconds in length (ITU, BS.1116-1, 1997, p. 7). Assisted by the use of a professional ensemble, the examination of the recorded material confirmed that a composite take was not required and issues of noticeable edit points during testing could be eliminated. All test recording extracts were twenty seconds in length.

3.6          Summary

A classical string quartet was recorded simultaneously with a Fukada Tree derivative and the Soundfield MKV system. Recordings were edited to produce the listening test stimuli which were taken from two musical pieces of differing style and performance characteristics. In addition to the Fukada Tree test material, two derivations of the Soundfield system were produced. The first uses the default settings derived from the ITU-R BS.775 surround sound standard while the second was adjusted to produce the most sonically pleasing rear image.

Next post

4       Listening Tests

2 – Literature Review


2.1           Comparison of Recording Arrays

Much research has taken place to analyse and compare Multi Microphone Arrays for surround sound music recording (Heck & Rieseberg, 2001; Preston, 2003; Schellstede & Faller, 2007; Williams & Le Du, 1999; Williams & Le Du, 2000). However, a relatively small number of research studies exist which directly compare the Soundfield system to Multi Microphone Arrays in the context of classical music recording which are discussed in the following sections.

2.1.1       Surround Multi Microphone Array Comparison

Kassier et al. (2005) tested a selection of Multi Microphone Arrays for surround sound recording in a variety of musical styles performed by soloists or accompaniments. The aim of the paper was to fill a gap left by previous microphone comparisons which did not simultaneously record their musical sources meaning real-time switching between arrays was impossible. Simultaneous recording of a piece of music by two or more recording arrays means that differences in the performance from take to take will not be a factor in test participant decisions when presented with comparisons. The paper set out to build an understanding of the practicalities of the surround sound recording of classical music for future research by providing the multichannel recordings. Recording arrays were split into front and rear segments.

The front segment recording arrays used were:

  • Variant of the Fukada Tree,
  • Variant of the OCT technique,
  • INA3,
  • A near coincident technique designed by an author of the paper.

The rear segment arrays used were:

  • IRT Cross variant,
  • Hamasaki Square,
  • Cortex MK2,
  • Spaced Cardioid.

In addition to the main aims of the project, an informal comparison was made using a small sample of expert listeners. The results of this comparison show that the Fukada Tree was the most preferred front segment array with the Hamasaki Square being the most preferred rear segment array. The authors warned that their comparison was informal and also that the use of different makes and models of microphone throughout each array may have had an influence on the results. As expert listeners were used in the comparison it can be argued that this is not a significant issue as the results are supported by other studies using similar arrays in other more formal comparisons. Results cannot be said to be representative of all people as naïve listeners were not used in the testing.

Kornacki et al. (2001) made recordings of a classical quartet with a selection of surround sound Multi Microphone Arrays. The aim of the paper was to establish a recommended recording array for engineers to use in situations, rather than use the inconsistent set of recommendations which the authors felt preceded it. The results show that the most preferred technique is the Double ORTF followed closely by the Decca Tree. The results of this paper backup the results of Kassier et al. (2005) where the most preferred array included the Fukada Tree, which is a variant on the Decca Tree (Fukada, Tsujimoto, & Akita, 1997). It was not stated whether recordings were made simultaneously.

2.1.2       Multi Microphone Array and Soundfield Comparison

Sungyoung et al. (2006) used a selection of Multi Microphone Arrays and the Soundfield system to record solo piano recitals. The aim of the research was to investigate the importance of musical genre on listeners’ preference. The results show that the Soundfield was not highly regarded in testing and that the Fukada Tree was consistently one of the most preferred if not the most preferred across the musical selections. These results are important as the musical selections covered different styles of playing. This gives further credence to the Decca Tree and Fukada Tree arrays as being the most preferred representatives of the Multi Microphone Array microphone technique.

Paquier & Koehl (2011) used expert and naïve listeners to compare two Multi Microphone Arrays and two ambisonic surround recording techniques, the latter including the Soundfield system, when recording a big band. The aim of the paper was to compare these arrays with respect to listener preference and to four attributes associated with the assessment of surround sound. Results showed that expert listeners preferred the Multi Microphone Array technique, which was an OCT Surround array. Naïve listeners preferred the higher order ambisonics recording. The equipment used to create the higher order ambisonic recording was not listed. The authors state that the results of the attribute section of the test were not enough to fully explain the preference results; however, they speculate that the quality of the microphone capsules across each array may have been the cause of this. It should also be noted that some participants were disturbed by the level of direct sound in the rear of the ambisonic recordings.

2.1.3       Analysis

This section discusses the methodologies of the research presented in Sections 2.1.1 and 2.1.2 to analyse their positive and negative aspects. Once established, these aspects are then considered in the development of the testing methodology used in this project which is outlined in Section 4.      Musical Selection

Two conclusions can be made at this point. The first is that the Fukada Tree or Decca Tree based Multi Microphone Arrays are among the most popular as they are frequently represented and favoured in past studies. Secondly, the Soundfield microphone is not considered to be able to perform well for surround sound production of classical music.

However, in most of the studies presented, either very small or very large musical ensembles were used. Sungyoung et al. (2006) and Kassier et al. (2005) used solo and/or accompaniments for their recordings. Soloists can provide a single point of reference for image focus and clarity; however, they may not do enough to challenge the array in the area of localisation or image stability. Additionally, a solo instrument may not be enough to excite the room in as an acoustically significant way as a larger ensemble with multiple instruments featuring an overall wider audio spectrum and dynamic range could.

Paquier and Koehl (2011) used a twenty piece big band for their recordings. A common technique when recording large ensembles is to employ accent microphones. The job of these microphones, which are placed close to certain sections of a musical ensemble, is to add clarity and definition to that section (Moylan, 2007, pp. 294 – 295). For example, the more delicate sections of a choir or brass band which require their sounds to be clearly defined in the mix would invariably require accent microphones to be added to the stereo or front image of the mix and manipulated in level as required. These microphones can ultimately be seen as a support for the main array which is otherwise being overloaded by the size of the ensemble. The addition of accent signals would result in the front image being improved but manipulated in such a way that the fundamental array performance cannot be accurately judged.

Kornaki et al. (2001) used a string quartet for their comparison of Multi Microphone Arrays. The use of a string quartet can be seen as a reasonable middle ground between soloists who may not challenge a recording array and larger ensembles which would exceed the capabilities of an array. Quartets are wide enough that they can challenge an array in term of image characteristics while the result of combining a number of different sound sources and playback spectra around the performance space creates a complex reverberation pattern which will be picked up by the rear segments of the test arrays. However, the study which included the Decca Tree did not use the Soundfield system.      Soundfield Microphone Performance

The results of the research outlined indicate that the Soundfield microphone is not the best choice for a recording engineer when recording a classical ensemble in stereo or surround sound. However, given that the research of Paquier and Koehl (2011) used a very large ensemble, no array could reasonably be expected to perform at its best in that situation. Sungyoung et al. (2006) used very small classical music sources which may not translate to the application of recording a larger ensemble. Therefore, further research which makes use of other appropriately sized musical sources with the Soundfield would be beneficial and contribute positively to the current body of knowledge.      Choice of Multi Microphone Array

It would be a misuse of resources and time to repeat past research by using a number of Multi Microphone Arrays. Instead, by using the results of the comparisons outlined thus far, a Multi Microphone Array based on the Decca and Fukada trees would mean that one of the most preferred Multi Microphone Arrays is being tested against the Soundfield which results in a tighter focus to the research.

The studies of Kassier et al. (2005) and Kornaki et al (2001) highlight issues which include different models and quality of microphones used for the Multi Microphone Arrays. To eliminate these issues, professional quality microphones of the same type should be used in all Multi Microphone Arrays used in this project.

2.2           Testing Methodology

Listening tests are used to compare, rate or assess the quality of various test stimuli. A listening test comprises of a test participant sitting in an acoustically suitable environment where they interact with a test interface which plays back audio stimuli and allows them to answer questions based on what they heard.

The research of Kassier et al. (2005), Sungyoung et al. (2006) and Paquier & Koehl (2011) all make use of listening tests to make the comparisons of the recording arrays which are used to meet their respective research aims. This section will discuss the components required to create a robust listening test methodology.

2.2.1       Test Participants

There are two groups of test participant available for this type of project which are expert and naïve. For a listener to be considered an expert, they would have to have been experienced in some area of audio engineering with skills in critical listening. This experience would result in the ability to objectively judge aspects of the sound source (Moylan, p. 89). Naïve listeners will not have these critical listening abilities and therefore would require training in the related areas (Bech & Zacharov, 2006, pp. 310 – 315).

The research of Kassier et al. (2005) and Sungyoung et al. (2006) used expert listeners as part of their tests. The term expert is used to describe listeners who either work in audio or have a musical background. Paquier & Koehl (2011) used a nearly equal number of expert and naive listeners. The opinion of these two listener types can greatly aid the research as similarities or differences between the groups results can highlight significant issues to consider. However, in the context of this project, obtaining enough of each type of listener may prove to be difficult but should be sought where possible.

2.2.2       Testing Type

Objective audio testing is where an audio signal is analysed by using hardware or software analysers which measure objective metrics, such as the readings outlined in Section 3.5.3(Cox, Objective Metrics, 2013). Subjective testing is where a sample of participants is asked for their opinion in aspects of an audio stimulus (Cox, Subjective Methods, 2013). In the context of this project, subjective testing is the most suitable method of arriving at a conclusion which meets the project aims as it requires asking a sample of listeners their opinions on test stimuli.

Kornaki et al. (2001) used parametric and non-parametric methods of pairwise comparison testing where answers between two test stimuli are given on scales or as categories respectively (Sprent, 1989, pp. 1 – 2). In a non-parametric pairwise comparison of recording extracts, participants are asked questions where answers are given in binary form and categorical. The comparison allows for two or more options to be rated against each other (Sprent, 1989, pp. 20 – 47). Answers can be one or the other, like or dislike, narrow or wide etc. A pairwise comparison collects answers which are categorical and eliminate many of the possible causes of scale based bias outlined in Section 2.2.3; however, this is at the expense of detailed answers. The forms of bias should still be appreciated and considered in the design of the listening tests.

2.2.3       Bias

A paper by Rumsey, ZielinskiI, & Bech (2008) outlines possible forms of bias which can be found when conducting listening tests. Although the paper was written around scale based testing types such as MUSHRA (ITU, BS.1534-1, 2003), the following sources of bias should still be considered (Rumsey, ZielinskiI, & Bech, 2008).      Recency Effect

The recency effect is a situation where participants preference is bias towards the most recently heard sound source. In a test environment where a set of two sources are played, the latter may tend to be the most preferred. The use of stimuli with short duration is recommended as well as randomising the playback of test stimuli.      Environmental and Personal

The paper describes bias due to participant’s interactions with the test environment, emotions or preferences. A method of bias reduction listed is to use a large sample base with multiple backgrounds. The practicalities of the latter in a University project environment with limited time frames may be restrictive; however, a large sample size should be sought.      Frequency

The paper goes into detail about ensuring that participants are not presented with too many stimuli and questions. In a test where scales are used, results may be contaminated by participants either using extremes of the scales or following a pattern of using middling scale values. The issue is less important for the collection of categorical data such as like or dislike.

2.3           Comparison Using Attributes

The research of Paquier & Koehl (2011) asked test participants to answer questions based on certain sonic attributes. By gathering data about testing material in this way, identifications as to why a particular array is preferred over the other can be investigated.

Choisel & Wickelmaier (2007) found that consistent judgments of naïve listeners could be obtained on their preference and assessment of attributes of a sound source when provided with the attribute definitions. By using a similar technique, the reasons behind the preference may be indicated for further investigative work. The attributes chosen for this purpose are spaciousness, envelopment, clarity and naturalness. The definitions used by Choisel & Wickelmaier (2007) as supplied to test participants are given in Table 2.1.

Attribute Definition
Spaciousness A sound is said to be spacious when you have a good impression of the space which it is played. Try to imagine this space; it can be a small room for example, or a large hall. Select the sound which the impression of the space is greater.
Envelopment A sound is enveloping when it wraps around you. A very enveloping sound will give you the impression of being immersed in it, while a non-enveloping one will give you the impression of being outside of it.
Clarity The clearer the sound, the more details you can perceive in it, choose the sound that appears clearer to you.
Naturalness A sound is natural if it gives you a realistic impression, as opposed to sounding artificial.

Table 2.1 – Attribute Definitions

The ability of an array to convey spaciousness and envelopment is essential. Without a sense of space, the listener cannot be enveloped in it. The use of the surround signals is then required to enable the envelopment of listeners when playing back appropriate material (ITU, BS.775-3, 2012, p. 7).

The Multi Microphone Array is a spaced technique while the Soundfield is a coincident technique. Each operational method has inherent differences with respect to their recorded images as highlighted in Section 1.1.2. The definition of instrumentation and room ambience could be positively or negatively affected by each array type.

Given the significant physical and operational differences between recording arrays outlined and with respect to the sense of space and envelopment discussed previously, it may be possible that listeners feel one array sounds more natural than the other.

2.3.1       Web Test

The suitability of the definitions chosen and the ability to reliably test these definitions was assessed with an online experiment. Binaural reproduction allows the three dimensional sound space to be reproduced over headphones (Eargle, 2005, pp. 187 – 191). By using the Binaural function of the Harpex-B plugin in some cases, a selection of mixes were exported in the following ways:

  • To test the spaciousness of sounds, a stereo mix was sourced from the front left and right Multi Microphone Array microphones. A second mix was sourced in the same way with the addition of the ambient signals from the rear Multi Microphone Array microphones. These mixdowns were presented to participants who were asked which mix sounded more spacious.
  • To test the envelopment of sounds, a binaural mixdown from Harpex-B was used in conjunction with a stereo mix sourced from the left and right Multi Microphone Array microphones.
  • To test the clarity between audio extracts, stereo mixdowns of the Multi Microphone Array and Soundfield arrays were presented to participants who were asked which mix sounded clearer.
  • To test whether pieces sound natural, a mono mix was sourced from the centre microphone of the Multi Microphone Array and compared with a stereo mix sourced from the left and right Multi Microphone Array microphones.

These extracts were then hosted on Soundcloud and included on dedicated survey page of a website (Kelly, 2014). The web test presented listeners with four pairwise comparison tests, one for each definition. Participants were asked to select which recording extract displayed more of each attribute.

The link to the test was distributed through Twitter, Facebook and internet forums of musical and audio engineering topic areas. By using the binaural mixdown process, surround sound signals could be played back over headphones. Participants were asked to supply information about the headphones being used for post screening purposes.

In total, thirty participants took part. The answers of ten participants were discounted due to a lack of information supplied on headphones used. Out of the remaining twenty results, all definitions were supported with results meeting a minimum 95% confidence level. Results of lower quality headphones, such as personal media player ear buds, matched those of higher quality equipment such as the Beyerdynamic DT150. Results from this experiment serve to confirm conclusions of Choisel & Wickelmaier (2007). Results are listed in Appendix B.5.

2.4           Summary

With respect to the current body of research, there is scope for a focused study into the comparison of a Multi Microphone Array and the Soundfield microphone in the surround sound recording of classical music.

The ensemble size in previous research can be seen as influencing the results of the respective studies. A string quartet provides a balance between the ensembles size while not requiring reinforcement with accent microphones.

Past research has compared techniques for a variety of reasons which used the assessment of listener preference as a basis for their results. From these studies, the Fukada and Decca Tree techniques are frequently the most favoured arrays. By using a derivative of these arrays, one of the most preferred arrays can be compared against the Soundfield system which ensures there is no overlap between past research and that resources are used most efficiently.

To meet aims, different forms of testing methodology were used in past research. A testing methodology using non-parametric pairwise comparisons has been selected as it provides clear preference results from test participants. In addition, the use of sound attributes has been selected to allow for the possible reasons behind preference to be established.

Next post

3       Audio Production


1 – Introduction

Multi-microphone recording arrays use two or more microphones for the recording of acoustic musical sources for stereo and surround sound productions. Many configurations are available to the recording engineer. The basic layout of a Multi Microphone Array is to use one microphone per speaker in the desired playback system.

A two microphone stereo array would be implemented in a similar way to the orange and green segments of Figure 1.1 where the left and right microphone signals are routed directly to the left and right playback channels respectively.

Some stereo arrays make use of a third microphone placed in the centre to improve the phantom centre imaging and stereo to mono compatibility (Eargle, 2005, p. 173). This implementation is denoted in Figure 1.1 by the dotted blue line where the centre microphone signal is routed in equal amounts to the left and right playback channels. In 5.1 surround sound production, the centre microphone would instead feed the dedicated centre loudspeaker which is denoted by the solid blue line. This centre signal is also used to contribute to centre image stability (ITU, BS.775-3, 2012, p. 7).


Figure 1.1 – Example of two and three microphone arrays and speaker routing.

By using established microphone technology, Multi Microphone Arrays have been perfected for use in a variety of situations with recording engineers often having a preferred array for a given situation based on their sonic aims for the recording project (Moylan, 2007, pp. 261 – 274).

The Soundfield microphone allows users to record the 360° sound scene which surrounds it using a single contained unit (Craven & Gerzon, 1977). The microphone produces four signals and these can be processed in a certain way to create the surround sound image. The processing required to create the component signals of a stereo or surround sound production can take place within the engineer’s digital audio workstation at any point after the recording takes place (Harpex, 2014; Soundfield Ltd., 2014).

This allows the recording engineer to revisit the recorded material and process it a different way to achieve the best sonic results for a production in return for minimal setup effort. This is in contrast to the engineer in a Multi Microphone Array context having to ensure that their array has been setup perfectly with respect to the angles and spacing between the individual microphones as errors cannot be fixed in the mix.

The Soundfield MKV system (Soundfield Ltd., 2013) costs in the region of £5750 incl. VAT at the time of writing (HHB Communications Ltd, 2014) while a set of professional quality microphones such as the AKG C414 XLII can cost around £4795 incl. VAT for a set of five required for surround sound recording (DV247, 2014). Although many may find the price difference to be significant, recording engineers may also experience practical and value for money differences between the techniques which could impact on their ultimate choice when purchasing a system.

The set of the five AKG C414 microphones, or similar, can be redeployed across multiple instruments in a variety of studio and live recording scenarios such as close microphone techniques being used on individual instruments. The Soundfield system would not be a capable of this as it a single contained unit. The versatility of the multi microphone approach would greatly benefit recording engineers in terms of available recording options and value for money unless there were clear sonic advantages of using the Soundfield system with respect to audio production quality.

This project sets out to compare the two recording options in order to establish whether the Soundfield system is capable of producing the clear sonic advantages required for it to be recommended to engineers. The remainder of this section will describe the background of microphone operation in order to make an initial comparison between the recording methods. A literature review in Section 2 will establish the requirement for research into this area and highlight the elements required to construct a robust production and testing methodology to investigate these initial comparisons. Section 0 will describe the production of the test material and Section 4 will describe the testing phase of the project. The methods of statistical analysis will be discussed in Section 5 with the results, discussion, conclusions and avenues for further work discussed in Sections 6, 7, 8 and 9 respectively.

1.1           Background

1.1.1       Basic Operation

Polar patterns are the basis of microphone sound pickup. Some of these are detailed in Figure 1.2. Each pattern has its own pick up characteristics when used in music recording. An omni directional microphone will pick up sound equally in all directions meaning the direct sound from a sound source will be picked up as well as the reflections it creates in a space. Cardioid microphones will pick up the direct sound from the source while rejecting a proportion of the reflected sound which results in better intelligibility of the direct sound. If the cardioid microphone is not picking up enough ambient sound for the production, a figure-of-8 pattern can provide a better ratio of direct to ambient sound (Eargle, 2005, pp. 7 – 21).


Figure 1.2 – Example of an omnidirectional, cardioid and figure-of-8 pickup patterns (Houghton & Sound on Sound, 2011)

Advances in technology have allowed for improvements of microphone characteristics such as signal to noise ratio and improved frequency response (Eargle, 2005, pp. 1 – 6). Despite the technological advances, the basic concept of operation has remained the same meaning that great detail is required to ensure the recording array has been setup perfectly with respect to the geometry and interplay between microphones as there will not be any avenue for fixing mistakes in these areas during editing or mixing.

1.1.2       Human Hearing

Recording arrays for classical music are designed to make use of the human hearing system’s abilities to locate a sound source’s direction; in other words, how the brain determines sound source’s directional cues. Figure 1.3 shows the paths of direct sound between a sound source and the ears of a listener. By processing the difference in arrival time of the sound source between each ear, known as the interaural time differences, the brain can determine the source’s directional cues (Howard & Angus, 2009, pp. 107 – 111).


Figure 1.3 – Interaural Time and Intensity Differences (Howard & Angus, 2009, p. 107)

Additionally, the signal coming into the right ear is being attenuated by the physical presence of the listener’s head. This creates a sound intensity difference between what the left and right ears receive which can be processed by the brain to determine the sound source’s directional cues. This is called the interaural intensity difference (Howard & Angus, 2009, pp. 107 – 113).

Each interaural difference operates at a different area of the human hearing spectrum. Interaural time differences operate at frequencies below 700Hz and interaural intensity differences operate at frequencies above 2.8kHz while both are at work at the cross over region from 700Hz to 2.8kHz (Howard & Angus, 2009, pp. 112 – 113).

All recording arrays have been designed to use one or both of these interaural differences as a basis of how they capture the sound image as they have an impact on the characteristics of the image when played back on a standard stereo or surround sound system (Eargle, 2005, pp. 168, 174 – 175). These characteristics are considered by recording engineers in their array choice to meet the aims of a given production.

1.2           Multi Microphone Techniques

Multi microphone techniques employ a number of discrete microphones to capture stereo and surround sound images. Sections 1.2.1 and 1.2.2 outline examples of a stereo and surround sound Multi Microphone Array respectively for initial comparison with the Soundfield microphone which is detailed in Section 1.3.

1.2.1       Example of a Multi Microphone Array Stereo Technique

The ORTF stereo technique, detailed in Figure 1.4, was developed by the Office de Radio Television Diffusion Française and uses two cardioid microphones spaced a specified distance and angle from each other (Eargle, 2005, pp. 179 – 181). The microphone spacing used in this array has similarities with the distance between human ears which in general is a distance of around 18 centimetres (Howard & Angus, 2009, p. 107).


Figure 1.4 – ORTF Stereo Array

1.2.2       Example of a Multi Microphone Array Surround Technique

The Sound Performance Lab array uses five microphones to capture the surround field. The left, centre, right, left surround and right surround signals are routed to the left, centre, right, left surround and right surround speakers of a 5.1 loudspeaker setup respectively (ITU, BS.775-3, 2012, p. 2). The array, shown in Figure 1.5, can be seen as having two distinct sections with the front three microphones being a significant distance ahead of the rear surround microphones. In terms of recording, this is perhaps the single biggest difference which can be observed between Multi Microphone Arrays and the Soundfield microphone, although front to rear distances can vary depending on the Multi Microphone Array chosen.


Figure 1.5 – SPL Array

1.2.3       The Mid/Side Technique

Alan Blumlein developed the concept of creating what is now known as the Mid/Side recording technique whereby a stereo image can be recorded by using a figure-of-8 microphone in combination with a cardioid microphone (Eargle, 2005, pp. 173 – 174). This technique splits the stereo image into three pieces which make up the middle and side components of the image.

The figure-of-8 microphone is placed in parallel to the width axis of the performers. This means that the front and rear polar pattern lobes of the microphone can feed the left and right signals in the mix when routed through a matrix. The cardioid microphone is placed on axis to the centre point of the ensemble. These signals can then be manipulated with respect to level at the mixing stage to influence the characteristics of the stereo image with the aim of improving it and/or solving issues in the recording which could not have been otherwise addressed (Blumlein, 1931, p. 91).

The setup and routing of the Mid/Side technique is shown in Figure 1.6. This technique can be implemented using existing technology with no need for specialist equipment and has become a popular method of stereo recording.


Figure 1.6 – Mid/Side Patterns and Routing (Eargle, 2005, p. 174)

1.3           Soundfield Concept

The Soundfield concept advanced on the mid/side technique. The Soundfield system uses four unidirectional microphone capsules arranged in parallel onto each side of a tetrahedral shape, which are labelled 12A, 12B, 12C and 12D in Figure 1.7. The four signals from the capsules are collectively known as the A-format.


Figure 1.7 – Soundfield tetrahedral capsule layout (Craven & Gerzon, 1977, p. 1)

The A-format is then processed to produce the B-format signals using a Soundfield decoder (Soundfield Ltd., 2013). This process uses signals from each capsule and passes them through mathematical and frequency equalisation processes. The output of these processes is called the B-format. Equation 1.1 details the A-format to B-format conversion (Gerzon, 1975).


Where A, B, C and D are the tetrahedral signals of the A-format.

Equation 1.1 – B-Format derivation from A-format (Craven & Gerzon, 1977, p. 6)

After the mathematical operation outlined in Equation 1.1 is complete, the signals are sent through an equalisation process (Craven & Gerzon, 1977, pp. 4 – 5).

The B-format is made up of an omnidirectional component called the W signal and three figure-of-8 components known as the X, Y and Z signals which correspond to front to rear, left to right and height axes respectively, as seen in Figure 1.8. W, X, Y and Z are the post-equalisation signals of E, F, G and H respectively (Gerzon, 1975, pp. 4, 5). With the B-format signals recorded and by using a suitable decoder, a set of polar patterns can then be created to conform to the desired output standard.


Figure 1.8 – Soundfield Polar Patterns (Robjohns & Sound on Sound, 2005)

1.4           Initial Comparison

This section will highlight the main differences which can be observed between Multi Microphone Arrays and Soundfield recording techniques in preparation for the Literature Review in Section 2 which will outline a method of investigating the significance of these differences.

1.4.1       Directional Cues

Figure 1.9 shows a set of Soundfield capsules. There is a very small physical distance between them. When a recording array uses small distances between the microphone capsules, the array is referred to as being coincident. As the arrival time of a sound would be close to identical in terms of directional pickup, a coincident array will utilise the interaural intensity differences of a sound source (Eargle, 2005, p. 168).


Figure 1.9 – The Soundfield capsules (Soundfield Ltd., 2014)

A spaced array, such as the ORTF array outlined in Section 1.2.1, uses both the interaural time differences and interaural intensity differences for the determination of directional cues (Eargle, 2005, pp. 174 – 175).

Both coincident and spaced stereo techniques have inherent stereo image characteristics. For example, coincident techniques generally exhibit strong localisation and image sharpness where spaced techniques give a softer and less defined image (Eargle, 2005, p. 175).

1.4.2       Critical Distance

The most distinct difference between the Soundfield and multi microphone techniques is the spacing between the front and rear sound pickup. The significance of this can be understood by looking at the concept of the critical distance, shown in Figure 1.10. This distance is defined by the point at which the intensity level of a direct source is equal to that of the reverberant field it creates (Pohlmann & Everest, 2001, p. 37).f1.10

Figure 1.10 – Critical Distance (Eargle, 2005, p. 15)

Placement of the front microphones too far away from the musical sources can allow too much ambient sound into the recording and produce a narrow front image width, as seen in Figure 1.11. If the placement is too close, similar to the blue array in Figure 1.11, an unnaturally wide stereo image with too much direct sound can result (Eargle, 2005, p. 245). If the placement is too far away, similar to the red array, the image can have a squashed or narrow characteristic while allowing large proportion of ambient sound into the image.

The ideal placement will place musical sources across the image in a balanced and pleasing way (Moylan, 2007, p. 294). For example, the orange microphone placement in Figure 1.11 may provide the most pleasing image with a trio of musicians however if are more than three or four sound sources, the image may become too full resulting in poor clarity and localisation which would require a closer placement and the possible use of ambient accent microphones. These are important considerations when placing a stereo or front section of a surround sound array, of any type.


Figure 1.11- Stereo Image Characteristics

The purpose of the rear speakers of a surround sound system is to convey a sense of ambience rather than the reproduction of musical instruments which the front does (Howard & Angus, 2009, pp. 367 – 368). With the front array placement chosen as described earlier in this section, the rear microphones in a multi microphone context will invariably be placed a distance away from the front microphones and further from the sound sources to allow for the pickup of sufficient reverberation from the space for use in the mix (Eargle, 2005, p. 245; Wuttke, 2005, p. 6). This means that a consequence of using the Soundfield system may be problematic quality of ambient sound for the rear sections of a production.

1.4.3       Front to Rear Correlation

By spacing the left and right microphones of a spaced stereo recording array too far apart, the correlation between the signals they record will drop. In other words, the proportion of direct sound which they both pick up will drop to such a point that a distinct gap between sound sources can be perceived between the left and right loudspeakers (Eargle, 2005, p. 176).

In the front image, the drop in correlation has a similar result on the image width as placing the microphones too close to the sound sources and is not desirable as there would be no cohesive or balanced front image. If the rear image is highly correlated with the front image in classical music production, the listener may perceive musical sources which were in front of the recording array in the rear of the playback image which is also not desirable (Howard & Angus, 2009, p. 368) (Eargle, 2005, p. 245). The fully coincident nature of the Soundfield system may result in problems with front rear correlation upon playback.

1.4.4       Convenience

A relative advantage of the Soundfield system over a Multi Microphone Array technique is that a single stand and set of cabling are required for a full surround recording. This assists in setup time as there is no need to carefully measure the distance and angle relationship which would otherwise be required for a Multi Microphone Array technique. This can help towards making the recording array less obtrusive to audience members if recording is taking place live while also easier to fix if the array is disturbed.

1.4.5       Hyper Realism

Audio production for television, cinema and music rely on the principle of hyper real sound (Holman, 2010, p. xviii; Fazenda, 2012). This concept is where sounds are created or manipulated with respect to listener enjoyment rather than realism (Moylan, 2007, p. 263). Examples of this concept can be seen in music production where the recording of instruments in an acoustically dry room is supplemented with artificial reverb during the mixing stage.

Similarly, the recording of instruments in a reverberant space can be supplemented by using dedicated room reverberation microphone signals in the mix which are adjusted in level until suitable (Eargle, 2005, pp. 194 – 195). Neither of these production methods stay true to the perception of a listener if he or she were sat in the performance space; however, these methods are aimed to improve listener enjoyment.

1.5           Summary

The operational principles of Multi Microphone Arrays and the Soundfield system have been outlined. The differences between these methods have been highlighted with respect to their influence on recorded material. With these aspects considered, it is the aim of this study to compare a Multi Microphone Array with the Soundfield recording system in the recording of classical music in an effort to highlight if and how the differences translate into the preference of listeners. The results of this can then be used by engineers when considering their options of recording arrays for similar situations.

Next post

2       Literature Review


Surround Sound Recording Technique Research

Early in 2014, I completed an original research project for my Master of Science degree at the University of Salford. Below, you can find the abstract of the dissertation followed by links to each of the chapters. I hope you find it interesting and helpful. If you have any questions, please get in touch by email by clicking here. This project could not have taken place without the assistance of the people listed in the Acknowledgements section further down this page, so thanks again to them!

Continue reading “Surround Sound Recording Technique Research”