Links table
One summary and introduction
2. Related works
3. Method and 3.1. Build
3.2. Loss and 3.3. Implementation details
4. Data processing
4.1. Training data set
4.2. Evaluation standard
5. Experiments and 5.1. Metrics
5.2. Baselines
5.3. Compared to SOTA methods
5.4. Qualitative results and 5.5. Ablation study
6. Limitations and discussion
7. Conclusion and references
A. Additional qualitative comparison
for. Inference on images generated by artificial intelligence
C. Data organization details
C. Data organization details
In this section, we describe our data generation procedures for training and for displaying object scans from OmniObject3D to create one of our benchmark test suites.
C.1. Create a synthetic training dataset
View the image. For a random 3D mesh asset, our Blender-based rendering pipeline first loads it into a scene and normalizes it to fit inside a unit cube. Our scene consists of a large flat-bottomed rectangular bowl, a common scene setup used by 3D artists for rendering to allow for realistic shading, 4-point light sources and a single area light source. We place cameras randomly around the object with a focal length ranging from 30mm to 70mm, equivalent to a 35mm sensor size. We randomly vary the distance, elevation (from 5 to 65 degrees) and LookAt point of the camera and generate images with a resolution of 600 x 600 (see Figure 11). This variation in object/camera geometry allows the diversity of projective geometry to be captured in real-world scenarios, coming from different capture devices and camera positions. This is in contrast to previous work that uses a fixed core, a fixed distance, and LookAt indicating the center of the object.
In addition to RGB images, we extract segmentation masks, depth maps, intrinsic and extrinsic elements, and object pose. We center objects, hide the background, resize images to 224 x 224, and process additional annotations to account for cropping, segmentation, and resizing.
C.2. Create an OmniObject3D test suite
The original videos released by the OmniObject3D dataset have noisy foreground masks and are mostly captured indoors on a tabletop. To improve illumination contrast and ensure accurate segmentation, we follow the rendering procedure described in the previous section to generate the test data. Different from our training set generation, we use HDRI environment maps to generate scene lighting, resulting in high lighting quality and diversity (see Figure 12).
[6] https://github.com/autonomousvision/claim_networks
[7] https://github.com/laughtervv/DISN
Authors:
(1) Zixuan Huang, University of Illinois at Urbana-Champaign, and both authors contributed equally to this work;
(2) Stefan Stoyanov, Georgia Institute of Technology and both authors contributed equally to this work;
(3) Anh Tay, Georgia Institute of Technology;
(4) Varun Gambani, AI Stabilization;
(5) James M. Rehg, University of Illinois at Urbana-Champaign.