More informations
Our Mission
With GRADE we provide a pipeline to easily simulate robots in photorealistic dynamic environments. This will simplify the robotics research by a great extent by allowing easy data generation and precise custom simulations. The project is based on Isaac Sim, Omniverse and the open source USD format.
On our first work, we put the focus into four main components: i) indoor environments, ii) dynamic assets, iii) robot/camera setup and control, iv) simulation control, data processing and extra tools. Using our framework we generate an indoor dynamic environment dataset, and showcase various scenarios such as heterogeneous multi-robot setup and outdoor video capturing. The data generated is solely based on publicly available datasets and tools such as: 3D-Front, Google Scanned Objects, ShapeNet, Cloth3D, AMASS, SMPL and BlenderProc. This has been used to extensive test various SLAM libraries and synthetic data usability in human detection/segmentation task with both Mask-RCNN and YOLOv5. You can already check the data used for training, and the sequences we used to test the SLAM frameworks here.
After that, we used the very same framework to generate a dataset of zebras captured outdoor from aerial views, and demonstrated that we can train a detector without using real world images achieving 94% mAP. We already released the images and most of the data for you to experiment with.
Thanks to the ROS support and the python interface, each one of our modules can be easily removed from the pipeline, or exchanged with your custom implementations. Our framework is easily expandable and customizable for any use case, and we will welcome any contribution that you may have.
Note that the use of ROS is neither necessary nor mandatory. You can use the system without any working knowledge of ROS or of its components.
Environments
While perception and processing capabilities of our robot increases, there's the need to increase also the quality of our simulated environments. This will enable research based on simulations that are more coherent with the real world that the robot will sense when deployed, closing the sim-to-real gap. Using Omniverse's connectors you can convert almost any environment to the USD format. This can then be imported directly in IsaacSim. Our conversion tool, based on BlenderProc, focuses on the 3D-Front dataset, the biggest available semantically-annotated indoor dataset with actual 3D meshes. However, as we show in our work we can easily import environments from other applications such as UnrealEngine, or download environments from popular marketplaces such as SketchFab. For example, we use the same system to convert BlenderProc environments (download them with this), and FBX files. Our script will automatically extract other useful information such as the STL of the environment, convert it to x3d (which can be then converted to octomap), and get an approximated enclosing polygon. Find more information on how you can easily convert an environment here.
Dynamic assets
Most of the robotics research is being carried out in static environments. This mainly because animated assets are difficult to simulate, place and manage. We focused mostly on animated clothed humans by converting and placing inside our environments Cloth3D and AMASS (CMU) animated assets. To do so, we implemented a simple to use tool that allows a streamlined conversion between SMPL mesh and clothes animation to the USD format. The tool is expandable to different body models (e.g. SMPL-X) or different data sources. The animations are then placed in the simulation with an easy to understand technique which can be easily exchanged with your own script. Our strategy use the STL trace of the animation and of the environment to check for collisions using a custom service based on MoveIt's FCL interface.
However, dynamic assets are not only humans but there can be also objects or animals. Luckly, you can easily simulate those too. For example, here we show how you can add flying properties to objects. The zebras we showcase in the video (get them here) were converted using blender to the USD format. The translation and rotation offset was added manually animated using this procedure and then manually placed and scaled in the savanna environment. The zebras used for the generation instead are randomly scaled and placed (statically) at simulation time. Check how here and here.
Robot/camera setup and control
Any robot can be imported in the simulation. You can control the robot in various ways with or without ROS and with or without physics. Non-physics enabled possibilities include control through teleporting and as a flying object. Physics-enabled ones include software in the loop, joint-based waypoints, pre-developed controllers (e.g. standard vehicles) and directly through joint commands. Find out more here.
Since Isaac Sim does not provide fluid-dynamic simulation and frictionless perpendicular movement, we developed a custom virtual 6DOF joint controller that works both with position and velocity setpoints. This allowed us to control both a drone and a 3-wheeled omnidirectional robot. The joint controller would work for any robot that you have, provided that you include in it your own joint definitions.
Robot's sensors can be created at run-time (main script, function), and dynamically changed, or pre-configured directly in the USD file.
Data publishing can be controlled manually, therefore each sensor can have a custom publishing and, if you want, failure rate (link).
Since the simulator is ROS-enabled any sensor can be published manually with added noise and any custom topic can be added or listened. However, the use of ROS is NOT mandatory NOR required.
Simulation control, data processing and extras
The simulation control include loading the environment, placing and control the robot, placing the animations, animate objects, integrate ROS, assets/materials/lights randomization, dynamically set simulation settings, and data saving. The pipeline is fully customizable, e.g. with new placement strategies, by using or not ROS, have software in the loop, add noise to data etc. Check out more information about this here and here. Based on the complexity of the environment and the on the number of lights performances can reach almost realtime processing (>15fps) with RTX rendering settings. Without using physics and ROS, it should be possible to reach realtime performance. Generated information includes RGB, depth, semantics, 2D-3D bounding-boxes, semantic instances, motion-vector (optical flow), and asset skeleton/vertices positions. We implemented a set of tools to post-process the data, extract it, add noise and fix some known issues of the simulation. Instructions to evaluate various SLAM framework with our or your own data can be found here. We also provide scripts and procedures to prepare the data and train both YOLO and Mask R-CNN (via `detectron2`). However, you can directly download the images, masks and boxes we used in the paper from our data repository. Data visualization, instance-to-semantic mapping tool, get SMPL and corrected 3D bounding boxes, and automatically convert and process the USD as a text file are already available. Finally, we developed a way to replay any experiment in such a way one can generate data with newly added/modified sensors and different environment settings, check the code here.