GANs Power a Dimensional Shift
Because real-world datasets that capture the same object from different angles are rare, most AI tools that convert images from 2D to 3D are trained using synthetic 3D datasets like ShapeNet. To obtain multi-view images from real-world data — like images of cars available publicly on the web — the NVIDIA researchers instead turned to a GAN model, manipulating its neural network layers to turn it into a data generator. The team found that opening the first four layers of the neural network and freezing the remaining 12 caused the GAN to render images of the same object from different viewpoints. Keeping the first four layers frozen and the other 12 layers variable caused the neural network to generate different images from the same viewpoint. By manually assigning standard viewpoints, with vehicles pictured at a specific elevation and camera distance, the researchers could rapidly generate a multi-view dataset from individual 2D images. The final model, trained on 55,000 car images generated by the GAN, outperformed an inverse graphics network trained on the popular Pascal3D dataset. Read the full ICLR paper, authored by Wenzheng Chen, fellow NVIDIA researchers Jun Gao and Huan Ling, Sanja Fidler, director of NVIDIA’s Toronto research lab, University of Waterloo student Yuxuan Zhang, Stanford student Yinan Zhang and MIT professor Antonio Torralba. Additional collaborators on the CVPR paper include Jean-Francois Lafleche, NVIDIA researcher Kangxue Yin and Adela Barriuso. The NVIDIA Research team consists of more than 200 scientists around the globe, focusing on areas such as AI, computer vision, self-driving cars, robotics and graphics. Learn more about the company’s latest research and industry breakthroughs in NVIDIA CEO Jensen Huang’s keynote address at this week’s GPU Technology Conference. GTC registration is free, and open through April 23. Attendees will have access to on-demand content through May 11.