One summary and introduction

2. Related works

3. Method and 3.1. Build

3.2. Loss and 3.3. Implementation details

4. Data processing

4.1. Training data set

4.2. Evaluation standard

5. Experiments and 5.1. Metrics

5.2. Baselines

5.3. Compared to SOTA methods

5.4. Qualitative results and 5.5. Ablation study

6. Limitations and discussion

7. Conclusion and references

A. Additional qualitative comparison

for. Inference on images generated by artificial intelligence

C. Data organization details

C. Data organization details

In this section, we describe our data generation procedures for training and for displaying object scans from OmniObject3D to create one of our benchmark test suites.

C.1. Create a synthetic training dataset

View the image. For a random 3D mesh asset, our Blender-based rendering pipeline first loads it into a scene and normalizes it to fit inside a unit cube. Our scene consists of a large flat-bottomed rectangular bowl, a common scene setup used by 3D artists for rendering to allow for realistic shading, 4-point light sources and a single area light source. We place cameras randomly around the object with a focal length ranging from 30mm to 70mm, equivalent to a 35mm sensor size. We randomly vary the distance, elevation (from 5 to 65 degrees) and LookAt point of the camera and generate images with a resolution of 600 x 600 (see Figure 11). This variation in object/camera geometry allows the diversity of projective geometry to be captured in real-world scenarios, coming from different capture devices and camera positions. This is in contrast to previous work that uses a fixed core, a fixed distance, and LookAt indicating the center of the object.

In addition to RGB images, we extract segmentation masks, depth maps, intrinsic and extrinsic elements, and object pose. We center objects, hide the background, resize images to 224 x 224, and process additional annotations to account for cropping, segmentation, and resizing.

C.2. Create an OmniObject3D test suite

The original videos released by the OmniObject3D dataset have noisy foreground masks and are mostly captured indoors on a tabletop. To improve illumination contrast and ensure accurate segmentation, we follow the rendering procedure described in the previous section to generate the test data. Different from our training set generation, we use HDRI environment maps to generate scene lighting, resulting in high lighting quality and diversity (see Figure 12).

Figure 7. Additional qualitative and comparative results on OmniObject3D.Figure 7. Additional qualitative and comparative results on OmniObject3D.

Figure 8. Additional qualitative and comparative results on Octoc3D.Figure 8. Additional qualitative and comparative results on Octoc3D.

Figure 9. Qualitative and comparative results on Pix3D.Figure 9. Qualitative and comparative results on Pix3D.

Figure 10. Qualitative results on images generated using DALL · E 3. These results demonstrate the zero generalization ability of ZeroShape to complex new images.Figure 10. Qualitative results on images generated using DALL · E 3. These results demonstrate the zero generalization ability of ZeroShape to complex new images.

Figure 11. Generating synthetic training data. We provide training images with different lighting, in-camera and external. Images are cropped in half, the foreground is split, and resized before being used as training input.Figure 11. Generating synthetic training data. We provide training images with different lighting, in-camera and external. Images are cropped in half, the foreground is split, and resized before being used as training input.

Figure 12. Generating OmniObject3D test data. For OmniObject3D, we create realistic test images with different lighting and indoor and outdoor camera properties. To increase the realism and diversity of the display, we use various HDRI environment maps to illuminate the scene.Figure 12. Generating OmniObject3D test data. For OmniObject3D, we create realistic test images with different lighting and indoor and outdoor camera properties. To increase the realism and diversity of the display, we use various HDRI environment maps to illuminate the scene.


[6] https://github.com/autonomousvision/claim_networks

[7] https://github.com/laughtervv/DISN

Authors:

(1) Zixuan Huang, University of Illinois at Urbana-Champaign, and both authors contributed equally to this work;

(2) Stefan Stoyanov, Georgia Institute of Technology and both authors contributed equally to this work;

(3) Anh Tay, Georgia Institute of Technology;

(4) Varun Gambani, AI Stabilization;

(5) James M. Rehg, University of Illinois at Urbana-Champaign.

By BBC

Leave a Reply

Your email address will not be published. Required fields are marked *