Who will define the immersive video experience file format? MPEG, Apple, Adobe or Facebook?

Tuesday, 22 August 2017

We have file formats and codecs to store 2D video as seen from a single point. Soon we will need ways recording light information in a 3D space, so immersed viewers will be able to move around inside and choose what to look at, and where to look from.

In 1994 Apple tried to kick off VR on the Mac using an extension to their QuickTime video framework: QuickTimeVR. As with the Newton personal digital assistant, it was the right idea, wrong time.

Today different are companies are hoping to earn money from creating VR and AR experience standards, markets and distribution systems. The Motion Pictures Experts Group think it is time to encourage the development of a standard - so as to prevent multiple VR and AR ‘walled gardens’ (where individual companies hope to capture users in limited ecosystems).

This summer Apple announced that their 4K+ codec of choice is HEVC. That can encode video at very high resolutions. Apple also plan to incorporate depth information capture, encoding, editing and playback into iOS and macOS. 

Structured light encoding

Knowing the depth of the environment corresponding to every pixel in a flat 2D video frame is very useful. With VR video, that flat 2D video can represent all the pixels from the point of view of a single point. Soon we will want more. Structured light recording is more advanced. It captures the light in a given 3D volume. Currently light field sensors do this by capturing the light information arriving at multiple points on a 2D plane (instead of the single point we use today in camera lenses). The larger the 2D plane, the larger the distance viewers will be able to move their heads when immersed in the experience to see from different points of view. 

However the light information is captured, we will need file formats and codecs to encode, store and decode structured light information.

Streaming Media has written about MPEG-I, a standard that is being developed:

The proposed ISO/ IEC 23090 (or MPEG-I) standard targets future immersive applications. It's a five-stage plan which includes an application format for omnidirectional media (OMAF) "to address the urgent need of the industry for a standard is this area"; and a common media application format (CMAF), the goal of which is to define a single format for the transport and storage of segmented media including audio/video formats, subtitles, and encryption. This is derived from the ISO Base Media File Format (ISOBMFF).

While a draft OMAF is expected by end of 2017 and will build on HEVC and DASH, the aim by 2022 is to build a successor codec to HEVC, one capable of lossy compression of volumetric data.

"Light Field scene representation is the ultimate target," according to Gilles Teniou, Senior Standardisation Manager - Content & TV services at mobile operator Orange. "If data from a Light Field is known, then views from all possible positions can be reconstructed, even with the same depth of focus by combining individual light rays. Multiview, freeview point, 360° are subsampled versions of the Light Field representation. Due to the amount of data, a technological breakthrough – a new codec - is expected."

This breakthrough assumes that capture devices will have advanced by 2022 – the date by which MPEG aims to enable lateral and frontal translations with its new codec. MPEG has called for video test material, including plenoptic cameras and camera arrays, in order to build a database for the work.

Already too late?

I wonder if taking until 2022 for MPEG to finish work on MPEG I 1 will be too late. In 2016 there was debate about the best way of encoding ambisonic audio for VR video. The debate wasn't settled by MPEG or SMPTE. Google’s YouTube and Facebook agreed on the format they would support. That became the de facto standard.

Apple have advertised a job vacancy for a CoreMedia VR File Format Engineer with ‘Direct experience with implementing and/or designing media file formats.’

Facebook have already talked about 6 degrees of freedom video at their 2017 developer conference. They showed alpha versions of VR video plugins from Mettle running in Premiere Pro CC for 6DoF experiences. Adobe have since acquired Mettle.

Facebook won’t want to wait until 2022 to have serve immersive experiences where users will be able to move left, right, up, down, back and forth while video plays back.

Now the race is on to define the first immersive video file format.