Lifecast creates volumetric video software to record footage using 180-degree cameras and turn it into 3D scenes for use in virtual production and mixed-reality experiences. We spoke with Lifecast’s CEO, Forrest Briggs, Ph.D., about their new Volumetric Video Editor tool and what roles machine learning and AI will play in the future of content creation.
Can you give me the basic rundown of how the Volumetric Video Editor works and some use cases?
Volumetric video editor (VVE) is a professional tool for Mac and Windows that makes it possible to render, edit, preview, and encode volumetric video (or photos).
Typically volumetric video is captured in a light stage with many cameras pointing inward at a person, but not capturing the background environment- or using depth sensors which have limited resolution and field of view.
Instead, we are using cameras with dual 180 degree lenses, combined with machine learning, to construct a 3D model of each frame of video. This approach enables capturing a complete, immersive volumetric scene with both foreground and background, in any environment.
The use cases include virtual production, virtual or mixed reality, games, and web development.
Lifecast’s tools work with any VR180 camera as input, including the Canon EOS R5 or R5C, CALF, FXG Duo, ZCam K1-Pro, Lenovo Mirage, and others.
VVE also includes some basic tools for editing the depth maps and alpha channels in the volumetric video, and a live 3D preview of the results.
Once processed, the volumetric video can be compressed with standard mp4, jpg, or png files, then played back in our open-source volumetric video player projects for Unreal Engine, Unity, and javascript (which supports desktop, mobile, and VR viewing).
Is this a 2.5D parallax effect or are you able to move through a 3D space?
Yes, you are able to move a virtual camera in 3D space, with 6 degrees of freedom (6DOF), forward/backward, left/right, up/down, pitch, roll, and yaw.
There are some trade-offs compared with photogrammetry or NeRF, which requires tens or hundreds of images of a scene that is not moving.
In contrast, we are creating volumetric video with only two images from each moment in time.
With only two images, the result deteriorates the farther the virtual camera moves from its original location. This is still an improvement over current generation 3D VR video, which responds to a user rotating their head, but not moving side to side, which can cause motion sickness.
Our 3D reconstruction of each frame of video is sufficient to correctly render the scene in response to such motion, which makes the experience more comfortable and immersive.
For virtual production, we can render photorealistic shots where the virtual camera moves a small amount; for example, this would be useful for a dolly shot where we want to see some parallax in the background.
This article goes into more detail about our volumetric video format using layered depth images.Â
How does the depth estimation process work?
Lifecast has spent years developing a custom pipeline for 3D geometric computer vision. The depth estimation includes a transformer-based neural net and temporal stabilization and is robust to calibration error (which is common in VR180 cameras).
How does this integrate with Unreal and virtual production?
We provide a free, open source volumetric video player for Unreal Engine. Here are a couple of tutorials on how to set it up in Unreal:
- Tutorial: Virtual production with volumetric video and text-to-3D in Unreal Engine
- Tutorial: Volumetric video in Unreal Engine
In virtual production, one of the challenges is creating convincing 3D environments to be rendered on an LED wall in Unreal Engine or Unity.
Lifecast’s software can save time and money in this process, using photorealistic volumetric video captured with VR180 cameras in any environment.
Video is important to make the background feel convincing and alive.
Lifecast’s software has some limitations, which means it is not a replacement for hand-made 3D environments in every virtual production scenario, but it is good for cases where the virtual camera needs to move a little bit.
Imagine you are filming a dialog with highly-paid actors in a remote location. Later, for some reason you need to re-film some of the shots. It would be expensive to bring the actors back on location, and maybe impossible to recreate the same weather conditions.
Why not capture the environment with a VR180 camera, just in case this situation arises?
Then Lifecast’s software makes it possible to recreate those shots in a virtual production studio, at a significantly lower cost.
We are also building a library of royalty-free volumetric videos that are ready to use for virtual production.
What role do you see ML and AI playing in the future of content creation?
Immersive and/or volumetric media will gradually displace 2D photos and videos. Capturing volumetric video from the real world is a challenging problem in machine learning and computer vision. Machine learning will be a core component of all systems for capturing volumetric media.
AI-art generators such as Stable Diffusion will increasingly be important for content creation.
At Lifecast, we are exploring how to generate immersive volumetric media from text prompts with diffusion models. This is all done with the exact same LDI3 volumetric format that we use for photos and videos. Text-to-volumetric scene is available at holovolo.tv (by Lifecast).
One of the interesting things we learned while making immersive volumetric video is that when you give people the freedom to move the virtual camera, they will inevitably try to look somewhere the real cameras never captured, and they expect to see something plausible.
We are using AI to fill in these missing parts of volumetric scenes. There is a continuum of volumetric media, ranging from mostly real-world capture with missing parts filled in by AI, to whole scenes completely imagined by AI, and everything in between.