Spatial Audio for VR & Ambisonics

One of my big interests is spatial audio, in particular Ambisonics and VR sound. What I find so exciting about it is that it's under-explored and has the potential to evolve into an entirely new artistic medium.

I devoted my undergraduate engineering thesis to the study of Ambisonics under Frank Boland and spent the summer of 2016 doing a research internship in VR sound, collaborating with members of Trinity College's Sigmedia and Thrive audio research groups.

During my research internship, I worked with Dublin composer and academic Enda Bates on developing my engineering thesis project (which I'll go into in more depth later) into a javascript plugin for Reaper. I also assisted Enda with other projects, such as designing MUSHRA tests for VR headphones and a Higher-Order Ambisonic (HOA) recording of a live performance for cello, guitar, flute and sax.

H2n First-Order Ambisonic Encoder Plugin

In 2016, I coded a plugin that converts 4-channel Zoom H2n recordings to horizontal-only First-Order Ambisonics (FOA). Why is this useful? The Zoom H2n is a popular, affordable and amazing microphone favoured by spatial audio and Ambisonics practitioners worldwide which comes bundled with Google's professional Jump VR camera rig. While Zoom eventually included onboard FOA encoding for the H2n, it didn't afford the flexibility of keeping the 4-channel recordings as a pair of stereo recordings (front and back). You'd have to use an Ambisonic encoder to get those back from the FOA files.

The plugin was released to the online spatial audio community through various forums on social media. It was well received and it seems that a good few people are actually using it!

You can find the code for the plugin on my GitHub, or go to Enda's blog and download the plugin with a manual and sample Reaper project.

The code works by matrixing the front left (Lf), front right (Rf), back left (Lb) and back right (Rb) channels of the H2n recordings (AKA the A-format signals) into the 3 FOA components (AKA the B-format signals): W (omnidirectional component), X (front-back component) and Y (left-right component). I derived the equations for this in my engineering thesis.

Ambisonic Recording of a Performance

For a department study comparing various Ambisonic microphones, we recorded a performance of one of Enda's spatial pieces using the mics in question: the Zoom H2n, the MH Acoustics EigenMike, the Core Sound TetraMic and the Soundfield MKV system. The EigenMike is capable of recording HOA, the rest of the mics can record FOA. These recordings were then used in MUSHRA tests to assess the subjective spatial quality and overall sound quality of the different recording methods.

You can watch a 360 video of the performance above. You can check out a 360 video of another one of Enda's spatial pieces here.

Virtual Headphone Tests

Another project I worked on was a comparative study of two virtual methods of testing headphones. We designed MUSHRA tests to gauge the spatial quality and overall sound quality of headphones and hence their suitability for VR use.

Getting unbiased results from comparative tests of headphones can be difficult, as impressions of sound quality are skewed by the brand, the price, the look and the feel of the headphones. A way to get around this is using a virtual method, where one pair of headphones is used to test many.

In one virtual method, the filtered method, the transfer functions of each headphone under test are measured. The test stimulus (music, speech etc) is then filtered according to this, effectively altering the sound of the test audio in the same way the physical headphone would. Subjects listen to these recordings on a single pair of flat-response monitor headphones, hopefully eliminating any influence the physical attributes of the headphone have on the perceived sound quality.

To get the transfer functions, we played a sine sweep through each pair of headphones and recorded this with a Neumann KU-100 dummy head mic. Voxengo's Deconvolver was used to extract the frequency responses from these recordings.

We worked on another method of virtual testing, the recorded method, which involves recording the test audio played through the physical headphones onto the dummy head. As with the other method, these recordings are listened to on a single pair of monitor headphones during comparative tests. Effectively when you listen to one of these recordings you're hearing the test audio as it would sound on a particular headphone.

The project was completed after I finished the internship, but basically the filtered method gave more consistent results than the recorded, and is generally a better method of virtual headphone testing.

My Thesis - Localising Sounds with Ambisonics

My undergraduate engineering thesis involved assessing how well sounds can be located on the horizontal plane using the Zoom H2n. Why? As mentioned, the H2n is widely used in VR sound. Objectively assessing how accurately the H2n preserves spatial information in a captured sound field gives an indication of how suitable it is for recording immersive VR sound.

The sounds used were pink noise and speech, played at increasingly anticlockwise positions on an 8-speaker circular array with a H2n at the centre (see diagram). The H2n was assessed by comparing the calculated angle of incidence of the sounds vs. their actual angle of incidence. The sound direction was calculated from the 4-channel recordings by converting them to FOA and using intensity vectors to estimate the source angle.

The conversion to FOA is a little involved and uses polar pattern equations, but in essence its as follows:

W = A(Lf + Rf) + B(Lb + Rb)

X = C(Lf + Rf) - D(Lb + Rb)

Y = E(Lf - Rf) + D(Lb - Rb)

Where the letters A-F are constants. The X and Y components can be thought of as vectors, with lots of instantaneous sound intensity values for the two horizontal directions. Getting the tan of Y/X then gives an estimate for the source angle at that instant. That's pretty much the gist of it, if you're interested in the gory details feel free to contact me via the form on the "about/say hi" section.

The results were analysed by calculating the bias for the recordings from each source angle. The H2n performed pretty well for the price, with an average bias of 4.5 degrees per direction. That basically means how "off" sources would sound from their true direction on H2n recordings, on average.

As another application for the localisation method, it was used to assess how well a Yamaha Soundbar fares in emulating real 5.1 surround sound using "virtual speakers" (see diagram). This was done by playing pink noise through each speaker on both systems, recording it using a Zoom H2n, and comparing the estimated angle for the virtual speakers to that of the physical speakers.

The graph marked "Soundbar vs. Physical 5.1 Comparison" shows the results of this, the blue line being the estimated source angle for the real 5.1 system (physical speakers) vs. the estimated source angles for the Soundbar (virtual speakers). Clearly the Soundbar works to some extent as the red and blue graphs follow a similar trajectory, but the definition of the virtual speakers is far blurrier than the physical ones.