A New Trick Lets Artificial Intelligence See in 3D
The current wave of artificial intelligence can be traced back to 2012, and an academic contest that measured how well algorithms could recognize objects in photographs.
That year, researchers found that feeding thousands of images into an algorithm inspired loosely by the way neurons in a brain respond to input produced a huge leap in accuracy. The breakthrough sparked an explosion in academic research and commercial activity that is transforming some companies and industries.
Now a new trick, which involves training the same kind of AI algorithm to turn 2D images into a rich 3D view of a scene, is sparking excitement in the worlds of both computer graphics and AI. The technique has the potential to shake up video games, virtual reality, robotics, and autonomous driving. Some experts believe it might even help machines perceive and reason about the world in a more intelligent—or at least humanlike—way.
“It is ultra-hot, there is a huge buzz,” says Ken Goldberg, a roboticist at the University of California, Berkeley, who is using the technology to improve the ability of AI-enhanced robots to grasp unfamiliar shapes. Goldberg says the technology has “hundreds of applications,” in fields ranging from entertainment to architecture.
The new approach involves using a neural network to capture and generate 3D imagery from a few 2D snapshots, a technique dubbed “neural rendering.” It arose from the merging of ideas circulating in computer graphics and AI, but interest exploded in April 2020 when researchers at UC Berkeley and Google showed that a neural network could capture a scene photorealistically in 3D simply by viewing several 2D images of it.
That algorithm exploits the way light travels through the air and performs computations that calculate the density and color of points in 3D space. This makes it possible to convert 2D images into a photorealistic 3D representation that can be viewed from any possible point. Its core is the same sort of neural network as the 2012 image-recognition algorithm, which analyzes the pixels in a 2D image. The new algorithms convert 2D pixels into the 3D equivalent, known as voxels. Videos of the trick, which the researchers called Neural Radiance Fields, or NeRF, wowed the research community.
“I’ve been doing computer vision for 20 years, but when I saw this video, I was like ‘Wow, this is just incredible,’” says Frank Dellaert, a professor at Georgia Tech.
For anyone working on computer graphics, Dellaert explains, the approach is a breakthrough. Creating a detailed, realistic 3D scene normally requires hours of painstaking manual work. The new method makes it possible to generate these scenes from ordinary photographs in minutes. It also provides a new way to create and manipulate synthetic scenes. “It’s seminal and important, which is something crazy to say for work that’s only two years old,” he says.
Dellaert says the speed and variety of ideas that have emerged since then have been breathtaking. Others have used the idea to create moving selfies (or “nerfies”), which let you pan around a person’s head based on a few stills; to create 3D avatars from a single headshot; and to develop a way to automatically relight scenes differently.
The work has gained industry traction with surprising speed. Ben Mildenhall, one of the researchers behind NeRF who is now at Google, describes the flourishing of research and development as “a slow tidal wave.”
Source link