How the MOCAP tools behind Avengers’ Thanos brought “The Quarry” to life

Ted Raimi in "The Quarry." (Images courtesy of 2K Games)
Ted Raimi in “The Quarry.” (Images courtesy of 2K Games)
Placeholder while article actions load

Performance-capture acting for video games is often a sterile experience. An actor straps on a helmet camera and performs their lines in a sound booth, then that data gets transferred to a game engine where it’s put on an in-game skeleton for artists to review and clean up. With “The Quarry,” the process felt more like filming a stage play, according to Aruna Inversin, creative director and visual effects supervisor for Digital Domain, the VFX studio that partnered with developer Supermassive Games on their spiritual successor to “Until Dawn.”

The Oscar-winning studio has produced visual effects for movies like “Titantic,” “The Curious Case of Benjamin Button” and several Marvel films. To create the photorealistic characters seen in “The Quarry,” it used the AI facial capture system Masquerade, which was developed to replicate Josh Brolin’s likeness for his character Thanos in “Avengers: Infinity War.” Masquerade was originally designed to do one thing: to take the performance from a head-mounted camera and translate it into a digital mesh that could then be rendered in a movie. For “The Quarry,” the VFX team needed something that could track the movement and facial expressions of actors and create digital characters that could be edited in real time. So they built Masquerade 2.0.

This new tech expedited production tenfold. Actors would perform their scenes, Digital Domain would upload all the performances onto a computer with their body performance, head performance and audio time coded and synced, then send that to Supermassive to review in the game engine. The developer would provide feedback, and by the next day “The Quarry” Director Will Byles, who also directed “Until Dawn,” could watch the footage to determine if any particular performance needed to be reshot. In total, Digital Domain said it rendered 250 million frames for “The Quarry” — far more than it would have for a typical film.

Like “Until Dawn,” “The Quarry” is an interactive spin on classic slasher and horror films. A group of camp counselors get stuck overnight after camp has ended and are hunted by werewolves; whether they survive the night is determined by the player, who makes decisions like whether they investigate a mysterious noise or take an ill-advised detour to go skinny dipping.

See also  What happens at SoftBank . . . ends up in DD

Transposing the actors’ performances onto their in-game character models involved a multistep process Inversin called the “DD pipeline.” First, the team conducted facial scans of each cast member to build a library of face shapes that Supermassive could reference to create their characters in the game. Then they filmed their performances at Digital Domain’s performance capture stage in Los Angeles. Each day, the actors would don full motion-capture suits and facial rigs to record their expressions. Their faces were covered in dots, which the cameras tracked along with the markers on their suits to triangulate and map their movements to a virtual skeleton using Masquerade 2.0.

Review: ‘The Quarry’ is a standout slasher that takes just a few wrong turns

To calibrate the equipment correctly, the actors performed a range of motion tests, thus ensuring the data was being tracked consistently across different shoots, said Paul Pianezza, a senior producer at Digital Domain. Maintaining that consistency for the actors faces was a bit more involved: The team constructed molds of each actor’s face, put the dots on the mold and drilled holes in it to create a physical template they then used to ensure the dots were placed in similar locations throughout shooting. If some of the markers were blocked or missing during a shot, Masquerade’s AI could automatically fill in the blanks.

There’s a human element to the DD pipeline too, Inversin said. Artists analyzed the footage and identified the actor’s expressions — a smile, a frown, a scream, etc. — to note how the dots moved in those instances, which Inversin compared to establishing key frames for each character.

The team used the data recorded throughout these shootings, facial scans and range-of-motion tests to build a library of each actors’ idiosyncrasies that could then be used to train their AI so the computer could read their movements and expressions correctly. As Inversin explained: “machines only are as good as what you can feed [them] in terms of information.”

The team also worked with Supermassive to line up objects on the set with their in-game locations using the stage’s gridded layout.

“So if someone is opening the door and peeking through, they’re moving that door and we’re capturing that data,” Pianezza said.

Filming began around the same time as the covid-19 pandemic, which placed limits on how many actors could be in the studio at one time and added another layer of choreography to the process. Most of the group scenes were filmed in segments with two or three actors over multiple shoots. The fire pit scene, for example, involved three groups of two counselors each shot at different times.

See also  Get Apple iPhone SE 2020, iPhone 12 mini or iPhone 13 mini with big discount; Check deals

The hidden world and overlooked problems of acting in video games

While filming, actors didn’t have to worry about containing their movements as they ran, jumped or got attacked by werewolves, Inversin said. All performances in “The Quarry” — save from some stunt sequences — were performed by the actors on the motion capture stage, exactly as they appear in the game.

“That’s the performance that Ted Raimi did and you see it in the game and that’s his lip quivering and that’s his look around, you know, that’s him,” Inversin said. “An animator didn’t go in and fix that. You know, that’s what he did onstage.”

Digital Domain had to tackle two common problems in motion capture to make this process possible: helmet stabilization and eye tracking. The team heavily modified an open-source eye-tracking software, Gaze ML, over the course of three years to improve the accuracy and appearance of the digital eyes. New machine learning algorithms added to Masquerade 2.0 allowed it to analyze the capture footage and compensate for any jostling as the actors moved around.

As in movies, Masquerade uses this capture footage to create photorealistic animations of an actors’ performance. Unlike in movies, though, video games need to render those animations in real time in response to the player’s actions — a monumental computing task since each actor’s face is composed of upward of a thousand unique blend shapes, or facial movements corresponding to different expressions.

“When you try to playback that performance, at runtime, there are so many different complex shapes, the real time system doesn’t know what to pull from in the appropriate time frame it needs,” Inversin said.

The solution: Chatterbox, a tool developed by the company’s Digital Human Group to streamline the process of analyzing and rendering live-action facial expressions. The library of facial expressions from each cast member and key frames identified by Digital Domain’s artists are fed into Chatterbox, which then uses machine learning algorithms to automatically track the dots on the actors’ face in each shot and calculate the best possible options to alter facial expressions without sacrificing quality.

See also  Google Lens is now MORE accessible to Chrome users! Know how to use Google Lens on Chrome

“So to kind of make it more optimized for a game engine, we take those thousand different facial shapes and decimate them,” Inversin said. “We reduce them to the idealized shapes based on the performance. And what that means is basically if a character is just talking, they don’t need a blend shape for, you know, yelling, right?”

Put another way, Chatterbox is doing with facial expressions what video games already do with other in-game assets — rendering them as needed based on what the player is doing in that moment.

Altogether, after 42 days of shooting on the motion capture stage, Digital Domain had 32 hours of captured footage to put into the game. Of the 4,500 shots deemed game-quality, only about 27 — less than 0.5 percent — had to be tweaked by animators in post production. Traditionally, animators would need to manually touch up expressions throughout the footage, but between Chatterbox and advancements with Masquerade 2.o, the team only needed “to fix a couple of things where the machine got angry,” Pianezza said. That was crucial for a project of this size.

“You can’t brute force anything at 30 hours,” Pianezza added. “The system has to work.”

The result was in-game performances that felt bespoke to each actor. Inversin said he thought that being onstage and being able to freely move around and express their emotions elevated the actors’ performances. And that has some promising implications for motion capture in the entertainment industry.

“To have the ability for actors and directors to capture stuff on the mocap stage and know that their performances are being translated effectively in a video game with the nuance of their face and their performance, I think it’s a big selling point for everyone that wants to see those experiences as well as direct and consume that media. Because it’s like yeah that’s Ted Raimi’s walk as he walks across the jail cell. That’s Lance Henriksen smacking his lips. That’s what he does. We didn’t add that in, that’s his natural performance.”

its-a-living