- Tram Ho
In a blog post ahead of the International Conference on Computer Vision (ICCV) in Seoul, South Korea, Facebook highlighted their latest advances in regards to computers’ ability to understand content.
Now their systems can be used to detect objects with complex background scenes, such as overlapping chair legs or furniture. They do it thanks to the most advanced machine learning algorithms available today, when they are able to extract two-dimensional objects from pictures and export them to 3D correctly.
It is also suitable for augmented reality applications and robots, as well as its ability to navigate in space.
Researchers at Facebook, including Georgia Gkioxari, Shubham Tulsiani and David Novotny, said: ” Our research builds on recent advances in using deep learning techniques to predict and localize targets. in an image, as well as new tools and architecture for understanding three-dimensional shapes, such as 3-dimensional voxels, point clouds, and meshes . ”
One of their works is Mesh R-CNN, a method that can predict three-dimensional shapes from images of cluttered and obscured objects.
Mesh R-CNN converts images from 2D into 3D.
Facebook researchers said they achieved this by enhancing the open-source platform Mask R-CNN (an advanced platform for object segmentation in images). The researchers enhanced the system with a grid prediction branch, which was reinforced with the Torch3D library, containing highly optimized 3D operators.
Mesh R-CNN can effectively use the R-CNN Mask to detect and classify different objects in an image, then it will infer three-dimensional images based on the set of guesswork. said above.
Facebook said that, based on the Pix3D public dataset, Mesh R-CNN successfully detected objects on every checklist and predicted a full 3-dimensional shape for every photo of furniture. Furniture. In another data set – ShapeNet – Mesh R-CNN has 7% higher detection rate than previous tools.
Images extracted by C3DPO.
Another system also developed by Facebook – Canonical 3D Pose Networks (referred to as C3DPO) – handles situations when there are no meshes and feedback images to train.
It performs a three-dimensional refactoring of key points, perfecting the results of these restructuring by monitoring the two-dimensional key points. (The keypoints in this case involve the tracked parts of the objects that can provide a set of clues around the shape and changes in its perspective.)
Facebook emphasized that this refactoring was previously possible in part because of memory constraints. Now, Facebook’s C3DPO architecture allows for three-dimensional reconstruction even when the hardware for such collection is not feasible, as with large objects.
Refer to VentureBeat
Source : Trí Thức Trẻ