They each posted a video of the experience on their channel, but Cleo’s goes into more detail about the experience and showcases the three distinct tech setups they demoed.
Here’s the chapter breakdown from Cleo’s video if you want to jump to a specific section:
And here’s MKBHD’s video.
Both videos were pretty high-level primers aimed at an audience with very little knowledge of virtual production.
Fortunately, we were able to talk to ZeroSpace’s Head of Research Engineering, Evan Clark, to get a more specific breakdown of what specific gear they were using and what a more democratized future of virtual production for content creators could look like.
Can you give me a breakdown of the tech being used in each of the three setups they demoed?
So starting with the LED volume (a joint venture with 4Wall Entertainment), it’s an Unreal Engine 5 powered 65ft ICVFX curve built driven by Brompton LED Processing. The system can run up to 120 Hz currently with capacity to scale.
Brompton frame remapping is used to cycle through different images on the volume for multicam support.
We choose these cameras because of their full-frame sensors which really make frame multiplexing possible. Everything is synchronized via genlock on the system to any frame rate between 24 – 60 Hz.
In motion capture, we run a 12 camera Vicon Vero setup capable of up to 3 person tracking at 120 FPS, with the ability to track an additional number of props.
The system is all synchronized via ambient to our master clock which makes guaranteeing perfectly timed data frame to video comparisons possible.
More recently we’ve added the ability for Markerless tracking via Captury, we have the ability to flex up to 30 cameras to cover massive spaces and track several actors in the volume all without the suit; making us the most advanced motion capture stage on the east coast.
Composed of an array of Azure Kinects, it all feeds back to a main server that takes the data into depthkit and generates an RGB-packed data stream that can be loaded into anything that implements their SDK to produce real-time volumetric holograms.
Marques was pretty interested in the possibility of having a virtual studio to keep changing sets. He’s also probably one of the few YouTubers who has the resources to build his own volume (he has a $250K Motorized Precision Cinema Robot to record product shots).
Let’s say he went that route – what would be the gear setup needed and what kind of extra crew (or training) would he need to operate it?
It really depends on one thing, what type of shots do you actually want to get? We have so many people who come to us wanting to use this tech but really have a very basic understanding of what each piece is capable of or the time and experience needed to really create the proper virtual art to make it effective.
To give a precise answer to that is really challenging; what I will highlight is his interest in frame multiplexing.
In that case, it starts with
- Picking the right LED tile, how accurate is the color rendering, what density of pixels do you want, how close do you need to shoot to the volume
- Choosing if you want to do frame-multiplexing
- How many cameras/ what type of cameras you want to use. Then comes down to what platform works best, nDisplay, Disguise, StypeLand, etc. Each have their own pros and cons including but not limited to cost, complexity, hardware density etc.
In our space, we currently need roughly 3 – 4 people to run the volume, one managing screens, one managing Unreal, one managing color, and one managing camera in addition to your usual production roles. But each shoot is different.
One thing that people also need to take into account is the content; you need a team of trained real-time artists to be able to craft photo-real worlds that are optimized to run on a large canvas, but also are easy to work with from a cinematography standpoint, so adding 2 – 3 more people on top of the operation.
Can you break down how the volumetric video capture works and are there any potential new uses for it once the Apple Vision Pro comes out?
Volumetric video uses a number of different camera perspectives to solve into a point cloud.
In the case for scatter it uses depth sensors + RGB data, but it can also be done just using RGB data through tools like reality capture or mesh room (though these are more for stationary photogrammetry).
Really this tech does have some maturing to do before it is ready for something like Apple Vision Pro, but the uses are endless.
As Marques said in Cleo’s video one great example is being able to take a volumetric capture of a product and being able to move around it in post and redesign your camera shots ad-hoc.
We currently have a few things in the pipeline targeting the future of spatial computing like this, but I think we are very excited about the prospects of being able to create truly hybrid events via this and other mechanisms.
What would need to happen (new hardware, easier software, etc) for smaller productions/content creators to take advantage of VP?
I talk a lot about the democratization of VP, and to some extent it really already has been.
I started in my living room with a green screen, Unreal Engine, and some production knowledge and got away with creating the content that led me to where I am today.
I think one thing that is already happening is the increase in computer density, making you require fewer machines to do the mathematics to make any of this possible.
There still is however a lot of growth in knowledge needed; understanding the nuances of both how you make something look cinematic with these technologies and how and when to use them for the right shot, and how to set these systems up and get them to cooperate in a way that is easily manageable.
I think after that then really the main challenge left is your virtual art direction, how do you really build something you want to see come to life well.