What is Field-of-View Adaptive Streaming (FOVAS) and how does it optimize VR video for the highest possible quality at the lowest bandwidth requirements, for any given VR headset?VR videos are big. REALLY big.
Pixvana is introducing unique field-of-view transforms to tackle the challenge of streaming them efficiently over networks. VR video begins as simultaneous streams from multiple cameras, sometimes as many as 24. These streams are then stitched together to create a spherical view that surrounds the viewer. This giant sphere can range from HD resolution to more than 10,000 pixels wide! The finished VR video is then typically squeezed down into 4K resolution (3840 x 1920) so that it can be transmitted over a standard network. But 4K isn’t nearly enough resolution to make a truly lifelike experience with today’s head-mounted displays. Often this results in soft and muddy video that breaks the feeling of immersion.
Why 4K VR Video Streaming Doesn’t Work
Streaming a VR video in 4K leads to poor quality and wastes bandwidth. Today’s VR headsets display a field-of-view (FOV) of ~90 degrees, which is only about a sixth of the entire video sphere. Looking through this viewport at 4K VR video, the viewer sees fewer pixels than the headset display is capable of rendering, which makes the video look soft and blocky. At any moment, more than 80% of the video is out of view and unseen by the viewer—a huge waste of data bandwidth. Here's a video that elaborates on VR adaptive streaming Viewports:
Viewports = FOV optimized streams for VR Video
The solution is to deconstruct the VR video into multiple streams, each of which acts as a “viewport” into a region of the video sphere. Each of these viewports can now contain enough pixels to perfectly fill a VR headset’s maximum resolution. A viewport will contain both the current FOV as well as a lower-resolution version of the out-of-view areas to maintain peripheral vision for the viewer.
Viewboxes = FOV optimized encoding for VR Video
To accomplish this mix of high and low resolution image areas within a single video stream, we need to use an image projection technique that is non-proportional. We call these projections a Viewbox.
A VR “Viewbox” is the projection model used to represent a video stream as if it where a spherical image surrounding the viewer
To understand the magic of Viewboxes and viewports, it helps to have a good understanding of the inherent challenge of representing a sphere as a rectangle.
The problem with Greenland
Much as old-time navigators needed a way to lay a map out on a table, today's VR producers need a way to review spherical images on their flat computer monitors.
Mapping a spherical image into a flat one has vexed cartographers for centuries. For example, Google Maps represents the spherical earth using the Mercator projection that stretches objects near the poles. Greenland, it turns out, is not the size of Africa. It only looks that way because it is positioned on the map near the pole. Move the country to the equator, and we see that it is closer in size to Mexico.
A similar projection called an equirectangular projection is commonly used to represent spherical VR video within traditional flat video files. (An equirectangular projection is also called “lat-long,” since the latitude and longitude lines are proportional.) It is a convenient view of the whole sphere, but ultimately it adds extra pixels to fully cover the top and bottom of the image, greatly misrepresenting the size of everything at the top and bottom of the frame. Adding these extra pixels adds redundant information, wasting bandwidth and making Greenland huge again.
There are alternative projections that can more accurately represent elements on the sphere, and others that more accurately show the relative size of objects across the sphere. A sinusoidal projection accurately reflects the actual resolution at the top and bottom of the sphere. Buckminster Fuller proposed the dymaxion map that unfolds the world into a 20-sided object. If you’ve ever played Dungeons and Dragons, this is the same shape as a d20 die. Less distortion means better image quality. Unfortunately these are inefficient techniques for video encoding. Video encoders can’t handle the irregular shapes, and the result is ugly motion artifacts where the edges meet.
Here is a video with some side-by-side examples of Viewbox projections applied to the same video, each of which optimizes the video stream to show more resolution in the direct line of site of the viewer, facing the actresses.
Adaptive Streaming VR Video with Viewboxes & Viewports
In Pixvana’s system, we combine a viewbox image projection with multiple viewports to form a adaptive streaming solution for high quality VR video.
- We switch the video stream to the optimal viewport based on where the viewer’s head is turned at any moment in time
- Our smart player takes into account the head position, network conditions, codec and projections, so a viewer can get the best quality, no matter where they look
- Content creators get the best of both worlds: better resolution where the viewer is looking and lower streaming cost by conserving the data that needs to be streamed.
Pixvana has explored a range of projections that provide better resolution in a viewport. We call these projections Viewboxes. One example of a viewbox is the commonly-used projection of a sphere into a cube. A standard cube map lowers distortion because the sides are flat, but because all six sides have the same amount of data, this wastes bandwidth by showing high-resolution video behind the viewer’s head. Lowering the resolution of some of the sides would result in noticeable degradation.
Think of a viewbox as a lampshade that you pull over your head when you are in VR. If the front of the viewbox is larger than the rear, you have a shape called a frustum. When the viewer watches VR video in a frustum-shaped viewbox, they perceive that they are in a sphere, but the video is sharpest when they look straight ahead at the front face, and becomes softer as they turn around.
Combine viewport tracking and multiple viewboxes based on different head positions, and you have Pixvana’s optimized system for adaptive streaming of VR video. Our system adapts to resolution changes to both source cameras and headsets. This golden ratio delivers the perfect pixels from the original to match the bandwidth of the network and the resolution of the headset.
Field-of-View Adaptive Streaming (FOVAS) optimizes VR video for the highest possible quality at the lowest bandwidth requirements, for any given VR headset.
The Pixvana XR Cloud
We are building a platform for VR video creation and delivery that we call the "XR Cloud", which includes native support for Field-of-View Adaptive Streaming processing. Our goal is to put an amazing cloud-based engine in the hands of storytellers so they can create VR, AR, and MR (... X-R!) videos of the highest quality that can reach the widest possible audience of consumers, across the many head mounted displays that are coming to market in the years ahead. VR video created with our system will be viewable on multiple platforms, including the Samsung Gear VR, HTC Vive, Playstation VR, Google Cardboard and a common web player. We will enable Field-of-View Adaptive Streaming delivery through a SDK as well as in the form of a Unity plug-in that developers can add to their existing VR applications.
With Pixvana, higher quality VR video means that VR films will feel more immersive and lifelike, giving everyone a better experience.