This repository is the implementation code of the paper "Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications" (arXiv, paper, video). Our work is accepted to CVPR 2023 Workshop on Vision Datasets Understanding.
In this work, we create a RGB-D dataset, called Digital-Twin Track-ing Dataset (DTTD), to enable further research of the problem to extend potential solutions to longer-range in a meter scale. We select Microsoft Azure Kinect as the state-of-the-art time-of-flight (ToF) camera. In total, 103 scenes of 10 common off-the-shelf objects with rich textures are recorded, with each frame annotated with a per-pixel semantic segmentation and ground-truth object poses provided by a commercial motion capturing system. We also provide source code in this repository as references to data generation and annotation pipeline in our paper.
- 10/10/2023: We introduce a continuous work of DTTD, "Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset" (arXiv, project page), which proposes a novel depth-robust pose estimator as well as an iPhone Dataset (DTTDv2).
- 06/28/2023: DTTDv1 (Azure Kinect) & DTTDv2 (iPhone) released at here.
DTTD_Dataset
├── train_data_list.txt
├── test_data_list.txt
├── classes.txt
├── cameras
│ ├── az_camera1
│ └── iphone12pro_camera1 (to be released...)
├── data
│ ├── az_new_night_1
│ │ └── data
│ │ │ ├── 00001_color.jpg
│ │ │ ├── 00001_depth.png
│ │ │ ├── 00001_label_debug.png
│ │ │ ├── 00001_label.png
│ │ │ ├── 00001_meta.json
│ │ │ └── ...
| | └── scene_meta.yaml
│ ├── az_new_night_2
│ │ └── data
| | └── scene_meta.yaml
| ...
|
└── objects
├── apple
│ ├── apple.mtl
│ ├── apple.obj
│ ├── front.xyz
│ ├── points.xyz
│ ├── textured_0_etZloZLC.jpg
│ ├── textured_0_norm_etZloZLC.jpg
│ ├── textured_0_occl_etZloZLC.jpg
│ ├── textured_0_roughness_etZloZLC.jpg
│ └── textured.obj.mtl
├── black_expo_marker
├── blue_expo_marker
├── cereal_box_modified
├── cheezit_box_modified
├── chicken_can_modified
├── clam_can_modified
├── hammer_modified
├── itoen_green_tea
├── mac_cheese_modified
├── mustard_modified
├── pear
├── pink_expo_marker
├── pocky_pink_modified
├── pocky_red_modified
├── pocky_white_modified
├── pop_tarts_modified
├── spam_modified
├── tomato_can_modified
└── tuna_can_modified
Before running our data generation and annotation pipeline, you can activate a conda environment where Python Version >= 3.7:
conda create --name [YOUR ENVIR NAME] python = [PYTHON VERSION]
conda activate [YOUR ENVIR NAME]
then install all necessary packages:
pip install -r requirements.txt
- calculate_extrinsic: extrinsic information
- cameras: camera information
- data_capturing: helper package for data capturing
- data_processing: helper package for data processing
- demos: demo videos
- doc: demo images
- extrinsics_scenes: folder to save all extrinsic scenes
- iphone_app: iPhone app development for capturing RGBD images for iPhone 12 Pro camera
- manual_pose_annotation: helper package for pose annotation
- models: baseline deep learning 6D pose estimation algorithms
- objects: object models that we use in DTTD (with corresponding scale and texture)
- pose_refinement: helper package for pose refinement
- quality_control: helper package for reviewing manual annotations
- scene_labeling_generation: helper package for generating labels
- scenes: folder to save all recorded RGBD data
- synthetic_data_generation: helper package for generating synthetic data
- testing: package to test aruco marker's appearance, extrinsic's validity, etc.
- toolbox: package to generate data for model training
- tools: commands for running the pipelines. Details in tools/README.md.
- utils: utils package
- OptiTrack Motion Capture system with Motive tracking software
- This doesn't have to be running on the same computer as the other sensors. We will export the tracked poses to a CSV file.
- Create a rigid body to track a camera's OptiTrack markers, give the rigid body the same name that is passed into
tools/capture_data.py
- Microsoft Azure Kinect
- We interface with the camera using Microsoft's K4A SDK: https://github.com/microsoft/Azure-Kinect-Sensor-SDK
- iPhone 12 Pro / iPhone 13 (to be released...)
- Please build the project in
iphone_app/
in XCode and install on the mobile device.
- Please build the project in
Link to tutorial video: https://youtu.be/ioKmeriW650.
- Place ARUCO marker somewhere visible
- Place markers on the corners of the aruco marker, we use this to compute the (aruco -> opti) transform
- Place marker positions into
calculate_extrinsic/aruco_corners.yaml
, labeled under keys:quad1
,quad2
,quad3
, andquad4
.
- Data collection
- If extrinsic scene, data collection phase should be spent observing ARUCO marker, run
tools/capture_data.py --extrinsic
- If extrinsic scene, data collection phase should be spent observing ARUCO marker, run
- Example data collection scene (not extrinsic):
python tools/capture_data.py --scene_name test az_camera1
- Start the OptiTrack recording
- Synchronization Phase
- Press
c
to begin recording data - Observe the ARUCO marker in the scene and move the camera in different trajectories to build synchronization data
- Press
p
when finished
- Press
- Data Capturing Phase
- Press
d
to begin recording data - If extrinsic scene, observe the ARUCO marker.
- If data collection scene, observe objects to track
- Press
q
when finished
- Press
- Stop OptiTrack recording
- Export OptiTrack recording to a CSV file with 60Hz report rate.
- Move tracking CSV file to
<scene name>/camera_poses/camera_pose.csv
- Clean raw opti poses (
tools/process_data.py --extrinsic
) - Sync opti poses with frames (
tools/process_data.py --extrinsic
) - Calculate camera extrinsic (
tools/calculate_camera_extrinsic.py
) - Output will be placed in
cameras/<camera name>/extrinsic.txt
- Clean raw opti poses (
tools/process_data.py
)
Example:python tools/process_data.py --scene_name [SCENE_NAME]
- Sync opti poses with frames (
tools/process_data.py
)
Example:python tools/process_data.py --scene_name [SCENE_NAME]
- Manually annotate first frame object poses (
tools/manual_annotate_poses.py
) 1. Modify ([SCENE_NAME]/scene_meta.yml
) by adding (objects
) field to the file according to objects and their corresponding ids.
Example:python tools/manual_annotate_poses.py test
- Recover all frame object poses and verify correctness (
tools/generate_scene_labeling.py
)
Example:python tools/generate_scene_labeling.py --fast [SCENE_NAME]
- Generate semantic labeling (
tools/generate_scene_labeling.py
)
Example:python /tools/generate_scene_labeling.py [SCENE_NAME]
- Generate per frame object poses (
tools/generate_scene_labeling.py
)
Example:python tools/generate_scene_labeling.py [SCENE_NAME]
- Generate semantic labeling (
If DTTD is useful or relevant to your research, please kindly recognize our contributions by citing our papers:
@InProceedings{DTTDv1,
author = {Feng, Weiyu and Zhao, Seth Z. and Pan, Chuanyu and Chang, Adam and Chen, Yichen and Wang, Zekun and Yang, Allen Y.},
title = {Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2023},
pages = {3288-3297}
}
@misc{DTTDv2,
title={Robust Digital-Twin Localization via An RGBD-based Transformer Network and A Comprehensive Evaluation on a Mobile Dataset},
author={Zixun Huang and Keling Yao and Seth Z. Zhao and Chuanyu Pan and Tianjian Xu and Weiyu Feng and Allen Y. Yang},
year={2023},
eprint={2309.13570},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
- Extrinsic scenes have their color images inside of
data
stored aspng
. This is to maximize performance. Data scenes have their color images inside ofdata
stored asjpg
. This is necessary so the dataset remains usable. - iPhone spits out
jpg
raw color images, while Azure Kinect skips outpng
raw color images.
- Good synchronization phase by observing ARUCO marker, for Azure Kinect keep in mind interference from OptiTrack system.
- Don't have objects that are in our datasets in the background. Make sure they are out of view!
- Minimize number of extraneous ARUCO markers/APRIL tags that appear in the scene.
- Stay in the yellow area for best OptiTrack tracking.
- Move other cameras out of area when collecting data to avoid OptiTrack confusion.
- Run
manual_annotate_poses.py
on all scenes after collection in order to archive extrinsic. - We want to keep the data anonymized. Avoid school logos and members of the lab appearing in frame.
- Perform 90-180 revolution around objects, one way. Try to minimize stand-still time.