Introduction to this insanity
In series: Streamtech
Before I start: let’s backtrack a bit. 2023-2024 I came up with a wonderful idea for an indie game I wanted to make. There’s an old Amiga/DOS game called Sink or Swim that I really enjoyed as a kid. It was one of the first games I ever played. I wanted to make something similar but with modern technology and modern game design. Can’t be that hard these days, no?
I’m an old nerd who grew up with the internet. Over time I picked up the skills to make a game like this, I just hadn’t had the energy and willpower to do it. I tried, and it worked reasonably well, but a new demotivator came along: the slow iteration cycle of game development. I am a creative nerd and I want to enjoy my work as I work on it. Solo developing a game takes way too long with the little energy I have. All the doubts kept creeping up my spine. Not a good time.
So here’s another thing: I watch a lot of YouTube and Twitch. I never even considered doing any content of my own because I hate my voice, my face, and <pick your poison>, you know, the usual, but… I got a different idea…
Go to the Twitch stream directory some day. You see them all over. V-Tubers. Or VTubers. Or vTubers? I don’t know, anyway: streamers who present themselves through a virtual avatar. Also in this day of AI gimmickery, there’s another even newer breed: AI characters, such as “Neuro-sama”.
Neuro made me laugh because she’s so insane, and got repeatedly banned on Twitch for saying out loud something she perhaps shouldn’t have. VTubers, while often also quite insane, pique my interest not because I care about their streams but for the technical aspects of virtual characters and the low-hanging fruits. I look at these VTubers and all I see is discomfort: they move awkwardly, their mouth movements to speech are janky and delayed, they wobble around like giant balloons. I cannot stand watching them.
These little shortcomings gave me an idea, and I started googling and searching the depths of YouTube for very particular tech videos. In a couple of days I had found loose answers to most of my questions: the problems were solvable, and I could solve them. The reason VTubers are janky is the jank of software controlling them. You just hire somebody to make a 3D model for you and off you go. Streamers are powerless against the jank of the software. Of course, most are perfectly fine in their mediocrity, but I see both a design and a technical adventure!
What of AI, then? Things like Neuro are funny, but she in particular is funny because she’s supposed to be a little bit shit. She’s talking nonsense, she’s doing nonsensical things, but sometimes also creative things - to much admiration. She’s supposed to be laughed at. “What’s my favourite flavour of burp? I think my favourite is either aluminium or potassium,” she just said, as I’m typing this.
But what about serious AI? I’ve done AI-based tools at work and I don’t believe we’ve peaked yet. I wish to make a… Navi, from Zelda. Or Cortana, from Halo. Something like that. A helper for myself. Imagine an AI character on a stream that has a well-defined personality, knowledge base, skillset, and understands everything that’s happening on the stream. The complexity of an assistant like this has no upper bound, but does have a lower bound: what I didn’t want to do is make an virtual character/friend that was just a ChatGPT relay. It would have to offer something unique.
At this point I’ve been so thoroughly nerd-sniped that my qualifications as an “interesting streamer” mean very little. There are problems that need solving, and I want to make something awesome because I can. And with a much faster iteration cycle than developing a vertical slice of a game! Yeah, sounds good.
Let’s go
With all that out of the way: this website / blog exists to document my findings, dreams and solutions regarding this… grand stream project.
I will not be posting my code repositories as-is: I won’t have time to do clean-up, and things get very complex and won’t be very useful as they are, but I will be posting the important parts on this blog, topic by topic. I want to document fun stuff I discover after I have discovered (and possibly solved) them. If you would like me to make a dedicated tool of one of these tricks for you to help with your video productions, that can be arranged, for the low-low price of imaginary internet points. Probably.
The following list is the general outline of the project. Not in work-order, because I do whatever inspires me at any given moment. Subsequent posts here on this site will go into greater detail.
-
Initial set up for a stream assistant character: Viola the Arcane, or “Fio”
- Draw placeholder sprite animation to get started
- Finalize a design for the character
-
Set up a game engine project to serve as stream overlay and as Fio’s space
- Engine of choice: Godot
- R&D: Set it up as another input/output system
- Animate Fio’s placeholder sprite with some funky particle effects and maybe shaders to boot
- Fio needs the ability to move around the screen autonomously
- Fio needs the ability to respond to chat on-screen
- Classical Twitch stream features in-engine: follower alerts, show chat on screen, etc
- Create fun interactivity things to entertain friends in chat (some will require “cybernetics”, some just game engine gimmicks)
-
Design and create a “cybernetics” system to serve as Fio’s decision-making system and persistent memory
- R&D: Fio needs to be able to remember things
- R&D: Fio needs to be able to make decisions
- System to communicate with the game engine
- System to communicate with the Twitch chat
- R&D: Design and experiment with uses for large language models (forever ongoing)
- R&D: Design and experiment with, possibly train custom models for specific use cases (chat message inference/classification/etc)
- R&D: Design and implement an extendable plugin system: all inputs must be broadcast to features requiring those inputs
-
Design and create a Twitch bot to serve as an input/output system
- All chat messages must be sent to the cybernetics loop
- For use in the game engine, download and re-package all emotes into engine-compatible spritesheets
- Be able to receive messages from the cybernetics loop to speak in chat
-
Fio second stage: 3D model
- Abuse a friend or hire a professional to finalize Fio’s design
- R&D: Learn Blender. Nbd.
- Create a 3D model of Fio using the final design as reference
- R&D: Plonk the 3D model into Godot and figure out how to apply realtime shaders to only certain parts of the model
-
Personal VTuber project
- Create a placeholder 3d model for my own virtual avatar (upper torso, face)
- R&D: realtime phoneme detection from audio input
- R&D: animate the model’s mouth texture based on phoneme detection
- R&D: use the same audio input through a speech-to-text system to interact with Fio through voice alone
- R&D: decide on a 3D gesture capture system that’s compatible with the game engine and its 3D model operation
- R&D: set up a comfortable gesture capture camera system
- R&D: design an animation mechanism that doesn’t look wobbly and awkward
- When all this works, finalize the avatar’s design and create a proper 3D model
The final product should be one virtual character (me) and one AI character (Fio) whose interactions should produce something unique, and their appearance and behavior should be something that’s pleasant to watch. Everything will be made from scratch: no “VTuber Studios” or other skips. The journey is the fun part, after all. The technical challenges around the Personal VTuber project are what fascinate me, and I can’t wait to get to it. Stay tuned!
This should keep me busy for a year or two, after which… who knows. Mayhaps I’ll get back on that Sink or Swim spiritual sequel.