Skip to content

🌌 OpenVTuber-虚拟アイドル共享计划 An application of real-time face and gaze analyzation via deep nerual networks.

License

Notifications You must be signed in to change notification settings

DeepVTuber/DeepVTB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenVTuber - 虚拟アイドル共享计划

Language grade: Python License ECCV Code Open-VTuber

みんな本当にありがとう! これからもたくさんちゅめてね!

OpenVtuber: An application of real-time face and gaze analyzation via deep nerual networks.

  • Lightweight network architecture for low computing capability devices.
  • 3D gaze estimation based on the whole face semantic informations.
  • The total framework is an upgradation of the [ECCV 2018] version.
  • Drive MMD models through a single RGB camera.

Setup

Requirements

  • Python 3.6+
  • pip3 install -r requirements.txt
  • node.js and npm or yarn
  • cd NodeServer && yarn # install node modules

Socket-IO Server

  • cd NodeServer
  • yarn start

Python Client

  • cd PythonClient
  • python3 vtuber_link_start.py <your-video-path>

Face Detection

RetinaFace: Single-stage Dense Face Localisation in the Wild of CVPR 2020, is a practical single-stage SOTA face detector. It is highly recommended to read the official repo RetinaFace (mxnet version).

However, since the detection target of the face capture system is in the middle-close range, there is no need for complex pyramid scaling. We designed and published Faster RetinaFace to trade off between speed and accuracy, which can reach 500~1000 fps on normal laptops.

Plan Inference Postprocess Throughput Capacity (FPS)
9750HQ+1660TI 0.9ms 1.5ms 500~1000
Jetson-Nano 4.6ms 11.4ms 80~200

Face Alignment

In this project, we apply the facial landmarks for calculating head pose and slicing the eye regions for gaze estimation. Moreover, the mouth and eys status can be inferenced via these key points.

Emotion

The 2D pre-trained 106 landmarks model is provided by insightface repository, based on the coordinate regression face alignment algorithm. We refine this model into TFLite version with lower weights (4.7 MB), which can be found at here. For checking the effectiveness of landmark detection, run the following command at PythonClient sub directory:

python3 TFLiteFaceAlignment.py <your-video-path>

Head Pose Estimation

The Perspective-n-Point (PnP) is the problem of determining the 3D position and orientation (pose) of a camera from observations of known point features. The PnP is typically formulated and solved linearly by employing lifting, or algebraically or directly.

Briefily, for head pose estimation, a set of pre-defined 3D facial landmarks and the corresponding 2D image projections need to be given. In this project, we employed the eyebrow, eye, nose, mouth and jaw landmarks in the AIFI Anthropometric Model as origin 3D feature points. The pre-defined vectors and mapping proctol can be found at here.

We adopt cv2.SolvePnP API for calculating the rotation vector and transform vector. Run the following command at PythonClient sub directory for real-time head pose estimation:

python3 SolvePnPHeadPoseEstimation.py <your-video-path>

Iris Localization

Estimating human gaze from a single RGB face image is a challenging task. Theoretically speaking, the gaze direction can be defined by pupil and eyeball center, however, the latter is unobservable in 2D images. Previous work of Swook, et al. presents a method to extract the semantic information of iris and eyeball into the intermediate representation, which so called gazemaps, and then decode the gazemaps into euler angle through regression network.

Inspired by this, we propose a 3D semantic information based gaze estimation method. Instead of employing gazemaps as the intermediate representation, we estimate the center of the eyeball directly from the average geometric information of human gaze.

rKWPK0.jpg

Our eye region landmark detection and iris localization models are more robust than the original implementation, which leads to the higher accuracy in more complex situations. The demo of iris localization can be run as follows:

python3 TFLiteIrisLocalization.py <your-video-path>

More details about 3D gaze estimation can be found at the Laser Eye repository.

Special Thanks

  • threejs.org: Applying Three.js WebGL Loader to render MMD models on web pages.
  • kizunaai.com: モデルは無料でご利用いただけます.

Become a VTuber

This document, OpenVTuberTools, aims at providing a wide range of toolkits that support you to become a VTuber at a very low cost.

One webcam and one decent PC, then you can become a VTuber!

Live2D-Widget

widget1 widget2

[Sample demo] You can also carry out secondary development within the allowed scope, here are some examples.

キズナアイ言語 ლ(´ڡ`ლ)

KizunaAI

インテリジェントなスーパーA.I.『キズナアイ言語』
gem『r-fxxk』を使用したBrainfuck系言語です。


使用ライブラリ & 本家


Go to -> KizunaAI-Lang

🍮 Community


--> 南嘉Nanga is from 《詩經》 / Book of Songs

南有嘉鱼,烝然罩罩。君子有酒,嘉宾式燕以乐。
南有嘉鱼,烝然汕汕。君子有酒,嘉宾式燕以衎。
南有樛木,甘瓠累之。君子有酒,嘉宾式燕绥之。
翩翩者鵻,烝然来思。君子有酒,嘉宾式燕又思。

————《诗经·小雅·南有嘉鱼》

‘南有嘉鱼,烝然汕汕。’ that means

'In the south is the barbel, And, in multitudes, they are taken with wicker nets. The host has spirits, On which his admirable guests feast with him, delighted.' The swarms of fish in the water either sway fast, or have a light and unrestrained posture, and each is extremely happy, giving people rich associations.

Acknowledgement

@kwea123 @1996scarlet @stevenjoezhang

Citation

@InProceedings{Park_2018_ECCV,
      author = {Park, Seonwook and Spurr, Adrian and Hilliges, Otmar},
      title = {Deep Pictorial Gaze Estimation},
      booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
      month = {September},
      year = {2018}
}

@inproceedings{Liu_2018_ECCV,
      author = {Liu, Songtao and Huang, Di and Wang, Yunhong},
      title = {Receptive Field Block Net for Accurate and Fast Object Detection},
      booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
      month = {September},
      year = {2018}
}

To-Do

  • Improve face tracking in complex surrounds
  • Face tracking for All multi-platform device with web-browse
  • Add more face expression
  • Add pose presets
  • Add many utility functions

--> Go to CHANGELOG

Contributing

  • Fork it!
  • Create your feature branch: git checkout -b my-new-feature
  • Commit your changes: git commit -am 'Add some feature'
  • Push to the branch: git push origin my-new-feature
  • Submit a pull request :D

Support this project

Donating to help me continue working on this project. Buy Me a Coffee

Buy Me A Coffee

Donate with Paypal Donate with WeChat