Twitch emotes and a game engine
In series: Streamtech
Have you ever had an inexplicable urge to invent a new custom binary file format? I know, me too! But, well, I work with stuff adjacent to the internet in some fashion. 99% of the time a product’s needs for data persistence are served by good old relational databases, which are some of the most battle-tested pieces of software known to man.
At the time of writing this, all data persistence cases in the stream project are served by local SQLite files. Twitch chat log? SQLite! Fio needs to make a decision? SQLite! A plugin needs its own pool of dynamic items? Plonk down a new SQLite file! They are easy to manage and I can commit them to source control to move my test data around easily. Good stuff. Hard to argue against them…
Moving to the next phase
I had spent too long just faffing about Fio’s internal organs – I was starting to lose track of what’s fun and what’s important. To move on, I wanted to create an entertaining feature that I could sort of leave on while I move on to the 3D modeling phase of the project. This “entertaining feature” would remain as a minor interaction gimmick while I work on other things.
For this feature, I decided to imitate a common thing seen on streams: inserting physics-enabled emotes into the overlay. Designwise, this kind of trick may be be introduced in many, many ways. One popular example is a whole idle game called Intro Fighters where the emotes do battle, and one common trick are emotes that are fired at a VTuber model to comedic effect, perhaps by chatters’ command, and so on and so forth.
From the user’s perspective, this is how I wanted my first emote-based amusement to work:
- There is a “chute” graphic at the top of the screen
- Chatter may say
!box EmoteNamein chat - That emote, regardless of emote provider, will fall down from the “chute” as a physics box, strikes the bottom of the screen, and bounces about with some randomness.
- The emote disappears after a set time.
I already had in place a Twitch bot to read chat messages, a plugin system to enable “capturing” chat messages with emotes in them, and a game engine overlay. The groundwork is there, now… in short, we need to download any emote and draw a new texture, on the fly. Let’s see what kind of obstacles we bump into!
Downloading the emotes
The Twitch emote ecosystem is a bit shattered. We have a total of four parties to download emotes from: BetterTTV, FrankerFaceZ, 7TV, and native Twitch emotes.
It turns out, getting the actual emotes is slightly janky. Since third-party emotes are simply words in chat messages, every single chat message must be filtered using a closed list of “available words” as emotes. Since all of my code is in JavaScript (TypeScript), a package called twitch-emoticons seemed alright. It’s a bit incomplete, but it does properly detect (parse) emotes in a message, and gets the URLs for the emotes I have, so I can download them. Cool!
Except… Twitch has a feature where, if you subscribe to a channel, you may use the channel’s emotes on other channels’ chats. Thus, every single Twitch-native emote on the platform is a possibility, so in the end you cannot prefetch the emotes as your Twitch bot starts, they must be handled on the fly, as they appear. The twitch-emoticons library does not help with this and a manual solution is required.
Luckily, for each message, Twitch’s chat events provide a convenient list of items of interest, called fragments, which include native emotes used in that message. Emote fragments have the following shape:
type TwitchEmoteFragment = {
type: "emote", // "emote" for emote fragments
text: string // the emote in question, such as "Kappa"
// for fragment type "emote", the following object is provided:
emote: {
id: string // long id, format like "emotesv2_1234abcdefgh",
emote_set_id: string // number as string "1234567890",
owner_id: string // number as string "5678901234",
format: string[] // list of details: [ "static", "animated" ],
}
}
Notice the id property on the emote. This is what we need to download a Twitch native emote. They are served by Twitch’s content delivery system like so:
https://static-cdn.jtvnw.net/emoticons/v2/EMOTE_ID_HERE/default/dark/3.0
In short, to download a Twitch native emote, we place the emote id into that URL template and that’s that. Third-party emotes are also fetched similarly by an id on an URL template, but the aforementioned twitch-emoticons library does the emote name - emote id conversion for us.
This means that whenever the Godot overlay requests the image data for an emote, it must be able to provide this Twitch-internal emote id as well. Maybe I’ll toss the twitch-emoticons library away later because this subscriber-emotes-special-case needs to be handled manually anyway, and a unified approach would be cleaner.
Anyway, now we have all emotes and their image data, moving on…
Emotes in Godot
A problem appears: game engines would really like you to import your assets while you’re working on the game, so that the game engine itself can properly manage resourcing and stuff. However, our possible textures are random memes somewhere on the internet. We can’t pre-import them.
Luckily, in Godot there appears to be a way to create a texture resource on the fly. The texture resource, in turn, is constructed from an Image object, which lets us directly attempt loading raw image data from a byte array, or a “buffer”.
Before we can investigate it further, there’s another problem: Godot’s texture system is fundamentally static. While common modern image formats like WebP are supported, their animation frames are not.
To create a 2D animation in Godot, it must be as an AnimatedSprite2D node, with an atlas texture representing the individual frames attached to it through a “SpriteFrames” resource. This makes sense from the perspective of a game engine: the animation frames must be controlled by the game, not by the image itself. As such, “animated gifs” etc. are fundamentally counter-productive to a game engine. There are tricks to embed a web browser’s renderer within the program to draw the animated images, but that’s performance-insanity, and we won’t be giving it a second thought.
It gets worse: Twitch native emotes are usually PNG, but GIF if they are animated. Third-party emotes may be WebP, animated WebP, AVIF(seriously?), among others. It’s all a big mess. Meanwhile in Godot, as mentioned earlier, for the AnimatedSprite2D we’ll be creating a texture resource from an Image object. Looking at the documentation for Image we can see the supported range:
load_from_file(path: String) static
load_bmp_from_buffer(buffer: PackedByteArray)
load_jpg_from_buffer(buffer: PackedByteArray)
load_ktx_from_buffer(buffer: PackedByteArray)
load_png_from_buffer(buffer: PackedByteArray)
load_svg_from_buffer(buffer: PackedByteArray, scale: float = 1.0)
load_svg_from_string(svg_str: String, scale: float = 1.0)
load_tga_from_buffer(buffer: PackedByteArray)
load_webp_from_buffer(buffer: PackedByteArray)
It’s… a wider support than I expected, but narrower than I hoped, if that makes sense. We need a solution: how are we displaying emote images of arbitrary image file formats within Godot, and at run time? How do we know its desired size and scaling? How do we know if it’s animated?
A thought strikes me. Now… what if… what if we were to minimize complexity on the game engine side. So that the emote data the overlay program receives is always in a predictable state and format? It has been prepared in advance to be so. We could check for the existence of the data by the emote’s name. The feature, and possible future features requiring loading images at runtime, could work without a hard dependency on a database server or a Twitch bot. This emote data could be created and stored for later.
The FIO1 Emote File Format
If the Twitch bot would simply create a physical file which represents a single emote and all data related to the emote’s behavior, without depending on yet another service to provide and store the data, suddenly loading it in the game engine would become a predictable operation. The file would be prepared in a manner that reading the image contained within into a Godot texture would be as straightforward as we can make it. Could this be… my first ever realistic use-case for a custom binary file format? A case I can actually argue for? Pog champ and omega lul, let’s try it!
The outlines shall be as follows:
- Image format is WebP for its wide range of features, and its robust support in Godot
- If the source image was not WebP, it is converted to WebP
- Always treated as animated. If source emote isn’t animated, it will be “animated” with a single frame
- All animation frames of the original emote are concatenated horizontally into a single image - an atlas texture
- Depending on the original animation format, detecting the framerate might or might not be a possibility
- (for example GIFs work on a delay-per-frame basis, which sucks for days)
- If framerate is not known, it’s assumed to be 15 frames per second
Under the hood I’ll be using the Sharp library to convert images to WebP. Sharp enables us to read images into a well-defined intermediary format, so we don’t have to worry about the source image format itself, we just tell Sharp what to do with the image, whatever it was originally.
Basics of binary files
A binary file is simply a sequence of bytes. When dealing with binary data, we always work with contracts, so to speak (a term I like to use). It’s just a mess of bytes, unless we somehow know in advance how to make sense of the chaos of bytes. Famously, the first 4 bytes of a file are usually called the “magic number” or the header, and are used to identify the file type. In our case, the magic number will be a sequence of four 8-bit numbers: 70 73 79 49. If we were to treat these numbers as ASCII text characters, they would spell out "FIO1".
Obviously, FIO named after the character I’m working on, and the 1 represents a version number. A running version number enables us to make changes or alternatives to the file format without breaking existing files.
Additionally, we will need to include metadata of variable length. This complicates matters a bit: we will have to know the exact length of the metadata block in bytes when reading the file. So let’s add a metadata length indicator right after the header, and before the metadata section itself. The length indicator shall be a 32-bit unsigned integer, telling us the length of the following metadata in bytes.
Why we’re so specific about the integer byte sizes is that the programming languages I’m using take a bunch of shortcuts. JavaScript’s very vague number type is not a general standard outside the JavaScript world. It’s not a fixed-length integer nor, to be exact, is it even an integer. To get predictable results when stepping outside our programming environments, we’ll treat data as explicit 8-bit or 32-bit integers, which is where JavaScript’s Uint8Array and Uint32Array classes come into play. More on those soon below.
The shape of FIO1
The full contract for writing and reading our FIO1 file goes like this:
[First 4 bytes]: FIO1
70 73 79 49 (constant)
[Next 4 bytes]: Number N
Length N of the following metadata block as a 32-bit unsigned integer
[Next N bytes]: Metadata
Metadata block as a JSON object of exactly N bytes in length
[Remaining data]: WebP
Starting from 4+4+N bytes to the end of the file: The atlas texture in WebP
At version 1 the metadata block has the following shape:
type EmoteBinMetadata = {
name: string // i.e. "Kappa"
uniqueId: string // even third-party emote vendors give an id
animated: boolean // for now ignored, always assumed true
type: string // for now will always be "webp"
frameRate: number
frameCount: number
frameWidth: number
frameHeight: number
}
So when we read a file which starts with the magic number FIO1, the above description is the set of assumptions we are taking when reading the file.
Writing a FIO1 file
Our endgame is creating a Blob which we can either return over the wire as a downloadable application/octet-stream item, or turning the blob into an ArrayBuffer and writing it to disk.
In JavaScript we can start with an instance of the native TextEncoder class:
const encoder = new TextEncoder();
const header = encoder.encode("FIO1");
The encode method takes a string and returns a Uint8Array. In this case, inspecting the header variable would reveal encode has transformed the input "FIO1" into its 8-bit numeric representation: [ 70, 73, 79, 49 ]
Similarly, assuming our metadata is a stringified JSON object (shape described above earlier), we can now create the chunk:
const metadataBytes = encoder.encode(metadataJsonString);
const metadataLength = new Uint32Array([metadataBytes.length]); // 4 bytes
We don’t know how big the stringified metadata is in advance, so once we have metadataBytes, we simply read the length of the byte array and trust it.
Assuming our image data is already read into a Buffer (using Sharp), we turn it into a Uint8Array as well:
const imageBytes = new Uint8Array(imageBuffer.buffer);
Now, we’re going to create the Blob from a full Uint8Array with everything in it. Uint8Arrays are not variable length: to create an Uint8Array, we need to know the size of the entire final byte array at initialization. We know the header is 4 bytes, the metadata size indicator is 4 bytes, we know how many bytes the metadata will take up, and we know how many bytes the image data will take up.
const emoteData = new Uint8Array(
4 + 4 + metadataBytes.length + imageBytes.length
);
Now it’s the right size, but empty. All that’s left is to fill in this emoteData array with the header, the metadata length indicator, the metadata, and the image data.
// write the magic header numbers, "FIO1", starting at byte 0
emoteData.set(header, 0);
// convert our 32-bit metadata length indicator
// into 4 8-bit integers and write it starting at byte 4
emoteData.set(new Uint8Array(metadataLength.buffer), 4);
// Starting at byte 8, write the metadata
emoteData.set(metadataBytes, 8);
// Starting at where metadata ends, write the image data
emoteData.set(imageBytes, 8 + metadataBytes.length);
// We have the file, ready to be written or sent
const binary = new Blob([emoteData], { type: "application/octet-stream" });
Reading it back is very similar, with the same contract:
- First 4 bytes spell out FIO1? Continue
- Reading 4 bytes starting at byte 4, we take the assumption that these bits form a 32-bit unsigned integer.
- This 32-bit integer tells how many of the following bytes to treat as being part of the metadata object.
- Then we take the data starting at 4 + 4 + metadata’s length to the end of file, and assume all that remaining stuff is a well-formed WebP image.
The plan for the feature
Keeping in mind my existing tools, this is how I thought the control flow would work:
-
When a chatter says
!box EmoteNamein chat, the Twitch bot, as normal, will send the chat message through the central logic program I’ve dubbed “Cybernetics.” -
A new plugin in the Cybernetics’ plugin system will catch that a message begins with the keyword
!box. -
This plugin will command the Godot overlay: spawn physics box, with graphic “EmoteName”
-
Godot overlay receives the command, and the following flow begins:
- Godot overlay checks its cache: do we already have data for “EmoteName”
- If yes, spawn physics box (done)
- If no, ask Twitch bot for it
- Twitch bot downloads the emote and creates the FIO1 file
- Twitch bot wires the FIO1 file to the overlay
- Overlay saves it to disk for future use, spawn physics box (done)
For a new, previously unseen emote, the sequence is like this:
On subsequent Kappas there’s a hit on Godot’s cache, and the whole noise simplifies into:
Lets try it out
Earlier I briefly mentioned that we’ll have to spread the animation frames horizontally into a single image, because it’s the easiest way to do animation frames at runtime.
Our new file format will be read by our Godot overlay program, but before I’d get my knickers in a twist with GDScript, I wrote a small program in JavaScript to dump the WebP atlas texture, to see if it turns out all right.
Here’s a well-known peepoWave:
At a glance it seemed alright. Most emotes on twitch are a little janky and hastily made, it’s not my place to fix them, right? Some frames seem vertically misaligned, but I can’t tell if its part of the emote’s jank or not. Anyway, all 7 beautiful animation frames are laid out horizontally as expected.
This emote available on 7TV, called BIGCAT, brings great joy to me and my friends. I decided to test it as well.
This is what came out:
We… uhh…? We have a problem. I must have missed something and I’m not sure what. We have some clues though:
- The distortion is limited to certain frames.
- The distortion intensity is specific to the frame where it occurs.
- The distortion has a pattern: it seems to violently skew the image.
My initial suspicion is that there is something wrong with the frame dimensions. Mayhaps the previously observed vertical misalignment is important after all. Color data for digital images is simply a linear sequence of colors. Image width is not part of the color data itself, it’s a separate property of the picture.
Let’s imagine we have an image of 10 pixels: two orange pixels, two green pixels, one purple, repeated twice. Visualized in-memory, it would look like this:
🟧🟧🟩🟩🟪🟧🟧🟩🟩🟪
Now if we define that this set of pixels are laid out as a 5x2 pixel image, a line break will occur every 5 pixels. The 6th pixel will be the first pixel of the next line.
🟧🟧🟩🟩🟪
🟧🟧🟩🟩🟪
It looks nice and orderly. An orange cube and a green cube, and a purple block. Now, what will happen if we say the image is suddenly 4 pixels wide?
🟧🟧🟩🟩
🟪🟧🟧🟩
We can sort of tell there used to be an orange cube, but… the layout of the color data is just broken. It wraps at the wrong place. New lines begin at the wrong index. The height of the image has not changed, so the overflow color data for pixels 9 and 10 are simply lost.
Something like this is obviously happening with BIGCAT’s atlas texture, but I need more data.
There’s a WebP utility library called node-webpmux that should help resolve this mystery.
import WebP from 'node-webpmux';
const img = new WebP.Image();
await img.load('emotes/BIGCAT.webp');
console.log(img.frames)
I noticed among the frame info dumps we had entries for coordinates and dimensions:
[...]
x: 0,
y: 0,
width: 128,
height: 128,
[...]
Apparently in WebP animations it’s perfectly legal to have the image dimensions change per frame. Why this would ever be a desirable thing is beyond me, but I suspect it’s our culprit now. Making our Godot overlay program be aware of variable frame sizes sounds like introducing way too much complexity to the file format. Let’s just enforce a fixed size for all frames.
Inspecting the data more closely, here are some properties of the first four frames:
{ x: 0, y: 0, width: 124, height: 128 }
{ x: 0, y: 2, width: 128, height: 124 }
{ x: 0, y: 4, width: 128, height: 116 }
{ x: 0, y: 6, width: 128, height: 108 }
This confirms my suspicions. The first frame’s width is 4 pixels too short. In the warped atlas texture, the first frame was broken, but the next few frames were fine. There is some variation in the height as well, which means it will not be centered properly. The x and y properties seem to be informing of the offset of the frame contents in pixels. So in the above example, if frame 2 is 4 pixels too short in height, the y property informs us to push it down by 2 pixels to center it.
We need to fix this when we’re writing the atlas texture. My solution goes like this:
- Scan through all frames and find the frame with the largest dimensions. Treat these dimensions as the “base” size.
- All frames are to be laid out on the x-axis in multiples of this base width.
- For now I’m ignoring the
xandyproperties: once we decide on a base frame size, we’ll automatically know how much individual frame contents need to be offset by, if need be.
Much better!
Finishing up
So all this to make a predictably-shaped texture atlas? Pretty much, yeah! This is how we’ll add the texture region in Godot’s GDScript after decoding the file.
#Rect2(
# x offset,
# y offset,
# frame width,
# frame height
#)
var region = Rect2(
frameWidth * index,
0,
frameWidth,
frameHeight
)
Let’s say our frame width is set at 128 pixels. That means all frame positions on the atlas are multiples of 128 on the x-axis. Y-position is always zero since we’re always framing horizontally only.
The final problem I had was with my FIO1 downloader in the Godot overlay program when requesting a FIO1 file from the Twitch bot. According to Godot’s documentation, file handles are automatically closed and flushed when resources are freed, but if I didn’t manually close with FileAccess.close(), the file written to disk would be about 2 kilobytes smaller than it should be, for some reason (???), and reading the WebP section of the emote file would fail and crash the overlay. Manually closing the file handle with close() fixed the issue. Mayhaps somebody wiser than me might enlighten me as to why.
var file = FileAccess.open("user://" + filename, FileAccess.WRITE)
file.store_buffer(body)
++ file.close()
emote_download_complete.emit(true)
As a final finishing “a bit of fun” touch, I added a second parameter to our !box command. If someone were to summon some emotes with, say, !box Kappa 50, the 50 means that it will spawn 50 copies of the emote, one every 100 milliseconds.
Q&A
Q: Since you’re already using a horizontally framed WebP, why didn’t you just open that in Godot? Why the new binary file buzz?
A: Our horizontally framed WebP does not contain information about its intended use. It being WebP does not in fact matter - it could be anything. The WebP aspect is not important. What is important is that we know what we’re using this data for. Perhaps WebP has a metadata section for folderol, but I didn’t look into it. We’d be stuck with the limitations of that image format if we were to rely on it.
Using our own file, we can trust the image has been prepared in a specific manner, so that we can run it into a texture and trust it works as we’d expect.
Done
Alright. Onwards to the next learning experience: a summer of 3D modeling and realtime 3D animation!