Mastering Surround Sound

April 27, 2017

By Aimee Baldridge

Sennheiser’s AMBEO VR mic is one of a new breed of audio recording devices designed to capture sound from all directions to better mix with VR video.

Good sound is a critical element of any wedding film, and virtual reality footage is no exception. But if your viewers will be able to see everything in the scene, what should they be able to hear? The simplest answer: everything. In a truly immersive VR video experience, viewers wearing headphones can experience sound from all directions, as if they’re sitting in the room. And now the tools required to create that kind of experience through spatial audio are becoming widely available.

The recently launched Vuze VR camera from Humaneyes incorporates four microphones so that it captures sound that can be spatialized with the 4K 360-degree imagery it records with its eight lenses. Both Core Sound and Sennheiser have also introduced professional tetrahedral microphones that capture four channels of sound to create spatial audio. Each tetrahedral mic head has four capsules pointed in different directions. “Just as with the 360-degree camera, the microphone allows you to capture audio from all of the directions surrounding you,” explains Brian Glasscock, a user experience researcher on the Sennheiser team that developed the company’s AMBEO VR mic. “So in playback as the viewer turns their head left, right, up and down, the appropriate audio perspective can be played back.”

Where most cameras have one or two mics, the Vuze VR camera has four. The net result is a much richer audio recording.


To create a cohesive audio experience of the space for VR video viewers, a tetrahedral microphone needs to be placed close to the camera. “Just as with a 360-degree camera, the perspective of the microphone defines the perspective of the viewer,” Glasscock says. “With the 360-degree camera, you want to place it where you want the viewer to look. With the VR mic, you want to place it where you want the viewer to hear.” Directly above or below the camera is usually the best option, both because the audio perspective will match the video’s and because the microphone can be placed in the camera’s blind spot. Of course, VR camera operators have to be especially careful not to make noises that will be picked up during recording, whether using a separate tetrahedral mic or a built-in audio capture system like the one in the Vuze.

Shooters who are willing to do a little audio mixing can also supplement their spatial audio capture with spot mics such as a lavalier on the officiant at a ceremony or a speaker at a reception. “What that allows you to do is capture the overall environmental sound but be able to emphasize certain sounds,” Glasscock explains. “The [VR] microphone can be thought of as a kind of ambience or an environmental capture microphone. It gives you an ambisonic bed, or a 3D bed, on which to make your final mix.” Ambisonics is a system for capturing and reproducing spherical surround sound, instead of simply the right and left sound placement of a stereo system or the horizontally placed additional channels in a 2D surround sound system. The most popular current VR video platforms, including YouTube and Facebook, support ambisonic formats for sound.


Whether you opt to incorporate audio from spot mics or not, producing spatial audio requires some post-production work that mono and stereo sound mixing does not. Tetrahedral mics output sound in the ambisonic A-format, while VR distribution platforms like YouTube, Facebook and various VR headsets support spatial audio in the ambisonic B-format. Tetrahedral microphone makers provide plug-ins to convert from ambisonic A to B, and third-party software is available to convert audio from the Vuze camera. However, there are a few additional steps between converting your sound to the right flavor of ambisonics and uploading it to your favorite platform. In addition to mixing in any sound captured by spot mics, you have to combine the finished ambisonic B audio mix with your VR video file, a process called “muxing.”

While audio mixing can become a complicated process for someone working with multiple microphone recordings in a professional sound-editing program, it can also be done simply and fairly quickly, especially if you’re working with only the output from a tetrahedral mic or a camera like the Vuze. Mixing a 3D audio “bed” with the sound from a lavalier mic that was placed on a wedding ceremony officiant can be done with relatively simple tools. “For someone who’s just getting into it and wants to spatialize or mix in a couple mono sources,” Glasscock says, “there are a few free plug-in suites available for standard digital audio workstations and even for editing in a video editor like Premiere.” Matthias Kronlachner’s ambiX plug-in suite and Trond Lossius’ Ambisonic Toolkit are among the most popular tools for working with spatial audio.

Once your sound has been mixed and muxed with your video in a video editor, the output has to be prepared for the specific platform on which you want to distribute it. “Each platform has a different set of metadata requirements and file format requirements,” Glasscock explains. “They differ between YouTube and Facebook.” Fortunately, both of those sites provide detailed tutorials to exporting your VR video for display on their platforms. VR headsets from Google and Oculus, which is owned by Facebook, also use the YouTube and Facebook formats, respectively. Because each platform that supports audio in the ambisonic B-format renders the audio a little differently, there may be slight differences in the way sound comes out on different platforms. However, the spatialized effect will be intact on all of them, and you can use your spatial audio plug-in to preview playback for specific platforms and optimize sound for a particular one.


As new tools for capturing immersive audio come to market, new questions about how—and whether—to use them will arise as well. For filmmakers capturing VR video at events like weddings, one of the main questions will be how much of everything everyone really wants to hear. As Humaneyes sales director Jeff Miller points out about the Vuze camera, “if you set it down at the family table, you’ll be able to hear everything.” When placing a VR camera with spatial audio capture, wedding shooters will need to consider not only how guests nearby might feel about having their every sniffle and snipe recorded, but also how the captured sound will contribute to the final video. Hearing everyone in the scene creates a realistic ambiance, but it can also clutter the footage with random chatter. “Who would want that?” asks Miller. “It’s just not interesting.”

But others see an opportunity for event filmmakers to craft a rich experience for their clients that was previously unavailable. “If you place the viewer in a wedding scene and you don’t hear Grandma or the aunt behind you coughing or laughing or crying, I think that removes you from the experience,” Glasscock notes. “I think audio is extremely important to have spatialized. It’s about 50 percent or more of the experience in 360-degree video. It gives the viewer a truly immersive experience from the audio perspective that they would have if they were actually there.”

Aimee Baldridge is a New York-based writer who covers the art, technology and business of photography and filmmaking.  

To read this article in the digital edition, click here.