Anime AI Waifu is an AI powered voice assistant with VTuber's model, that combines the charm of anime characters with cutting-edge technologies. This project is meant to create an engaging experience where you can interact with desired character in real-time without powerful hardware.
-
🎤 Voice Interaction: Speak to your AI waifu and get instant (almost) responses.
- Whisper - openai's paid speech recognition.
- Google sr - free speech recognition alternative.
- Console - if you don't want use microphone just type prompts with your keyboard.
-
🤖 AI Chatbot Integration: Conversations are powered by an AI chatbot, ensuring engaging and dynamic interactions.
- Openai's 'gpt-3.5-turbo' or any other available model.
- File with personality and behaviour description.
- Remembers previous messages.
-
📢 Text-to-Speech: Hear your AI waifu's responses as she speaks back to you, creating an immersive experience.
- Google tts - free and simple solution.
- ElevenLabs - amazing results, tons of voices.
- Console - get text responses in your console (but VTube model will be just idle).
-
🌐 Integration with VTube Studio: Seamlessly connect your AI waifu to VTube Studio for an even more lifelike and visually engaging interaction.
- Lipsync while talking.
*Demonstration in real time without cutouts or speed up. This is real delay in answers.
To run this project, you need:
-
Install Python 3.10.5 if you don't already have it installed.
-
Clone the repository by running
git clone https://github.com/JarikDem-Bot/ai-waifu.git
-
Install the required Python packages by running
pip install -r requirements.txt
in the project directory. -
Create
.env
file inside the project directory and enter your API keys.env template
OPENAI_API_KEY='YOUR_OPEN_AI_KEY' ELEVENLABS_API_KEY='YOUR_ELEVENLABS_KEY'
-
Install VB-Cable
-
Install and set VTube Studio
-
Select your required settings in
main.py
inwaifu.initialize
Arguments:
-
user_input_service
(str) - the way to interact with Waifu"whisper"
- OpenAI's whisper speech to text service; paid, requires OpanAi API key."google"
- free google speech to text service."console"
- type your promt in console with text (absoulutely free).None
or unspecified - default value is"whisper"
.
-
stt_duration
(float) - the maximum number of seconds that it will dynamically adjust the threshold for before returning. This value should be at least 0.5 in order to get a representative sample of the ambient noise. Default value is0.5
. -
mic_index
(int) - index of the device to use for audio input. IfNone
or unspecified will use default microphone. -
chatbot_service
(str) - service that will generate responses"openai"
- OpenAI text generation servise; paid, requires OpanAi API key."test"
- returns prewritten message; used as dummy text for developement to reduce time and cost of testings.None
or unspecified - default value is"openai"
.
-
chatbot_model
(str) - model used for text generation. List of available models you can find here. Default value is"gpt-3.5-turbo"
. -
chatbot_temperature
(float) - determines creativity of the generated text. A higher value leads to more creative result. A lower value leads to less creative and more similar results. Default value is0.5
. -
personality_file
(str) - relative path to txt file with waifu's description. Default value is"personality.txt"
. -
tts_service
(str) - service that "reads" Waifu's responses"google"
- free Google's tts, voice feels very "robotic"."elevenlabs"
- ElevenLabs tts with good quality; paid, requires ElevenLabs API key."console"
- output will be printed in console (free).None
or unspecified - default value is"google"
.
-
output_device
- (int) output device ID or (str) output device name substring. If VB-Cable is used, you need to find device, that will start withCABLE Input (VB-Audio Virtual
usingsd.query_devices()
command. -
tts_voice
(str) - ElevenLabs voice name. Default value is"Elli"
. -
tts_model
(str) - ElevenLabs model. Recommended values are"eleven_monolingual_v1"
and"eleven_multilingual_v1"
. Default value is"eleven_monolingual_v1"
.
-
-
Run the project by executing
python main.py
in the project directory.
Depending on the selected input mode, program may send all recorded sounds or other data to the 3-rd parties such as: Google (stt, tts), OpenAI (stt, text generation), ElevenLabs (tts).