We present an integrated, real-time approach for 2D hand pose detection from a monocular RGB image, with a common backbone shared between the bounding box detector and the keypoint detector subnets. This is in contrast to traditional methods which use two separate models for hand localization and keypoint detection with no sharing of features. We build on the popular RetinaNet architecture for object detection and introduce an integrated model which performs both hand localization and keypoint detection in real-time. We evaluate our approach on two different datasets and show evidence that our model obtains accurate results.
The files defining the new architecture can be found in the KP_RN_Configs folder.
Model weights for LSMV and NZSL datasets can be found here.
- Python 3.6+
- We use detectron2 v0.1.1 for all experiments.
- For usage on local machines , clone this repository and install the packages in requirements.txt
- For ease of usage on colab notebooks, add the KP_RN_Configs folder to your Google drive and follow the training notebooks for setup.
- To use your own dataset, convert keypoint and hand bbox annotations to COCO format.
- We use the LSMV dataset and the NZSL dataset.
For a detailed guide on training and evaluation please go through the colab notebooks added above.