Deep Neural Labeling: Hybrid Hand Pose Estimation Using Unlabeled Motion Capture Data With Color Gloves in Context of German Sign Language

Kristoffer Waldow, Arnulph Fuhrmann, Daniel Roth

in 6th IEEE International Conference on Artificial Intelligence & extended and Virtual Reality 2024 (AIxVR ’24)

Deep_neural_labeling__Hand_pose_estimation_using_using_unlabled_motion_capture_data_with_color_gloves _Teaser

Overview of our Deep Neural Labeling method with hand pose reconstruction based on German sign language motion capture performances and a color glove. Motion capture data is captured in addition to video information. The unlabeled point cloud of finger joint markers is assigned by an image reprojection and a subsequent neural network that has learned the point cloud shape for a regression classification task. Finally, the assigned points are used to animate the hand by solving an optimization problem to find the best joint angles by minimizing the positional error of the markers.


Hands are fundamental to conveying emotions and ideas, especially in sign language. In the context of virtual reality, motion capture is becoming essential for mapping real human movements to avatars in immersive environments. While current hand motion capture methods feature partly great usability, accuracy, and real-time performance, they have limitations. Industry-standard motion capture methods with sensor gloves lead to acceptable results, but still produce occasional errors due to proximity of the fingers and sensor drifts. This, in turn, requires time-consuming correction and manual labeling of optical markers during post-processing for offline use cases and prohibits the use in real-time scenarios as VR communication. To overcome these limitations, we introduce a novel hybrid hand pose estimation method that leverages both an optical motion capture system and a color-coded fabric glove. This approach merges the strengths of both techniques, enabling the automated labeling of 3D marker positions through a data-driven machine-learning approach. Using a spherical capture rig and a deep learning algorithm, we improve efficiency and accuracy. The labeled markers then drive a robust optimization procedure for solving hand posture, accounting for limitations in finger movements and validation checks. We evaluate our system in the context of German sign language where we achieve an accuracy of 97% correct marker assignments. Our approach aims to enhance the accuracy and immersion of sign language communication in VR, making it more inclusive for both deaf and hearing people.



Link to preprint version

Comments are closed.