Animation System
Overview
The SyntH Animation System is powered by KaradaStateServer — the single source of truth for three independent streams: the active VRM model, the animation state, and face blend-shape values. Clients receive push-updates via WebSocket; only real state changes are broadcast.
For a complete list of HTTP and WebSocket endpoints exposed by the WebUI (including those used by the animation subsystem) see API Endpoints Reference.
Note
“Karada” (からだ) is the Japanese word for body. The KaradaStateServer is the centralized body state manager for VRM animations and facial values. In earlier versions this module was casually referred to as the “animation handler”; the new name emphasizes its broader role as a general state server and aligns with the project’s move away from the legacy AnimationHandler concept.
The system coordinates between backend logic and frontend rendering to create a coherent and responsive avatar experience across any number of simultaneously-connected WebUI windows.
Architecture
The animation system consists of three main components:
Backend: KaradaStateServer
Located in core/animation_handler.py, this component is the canonical VRM state service:
Maps logical animation states to FBX animation files (clients never choose a file directly; they request a state and Karada picks an appropriate animation based on the active skin and Rei fallback)
Tracks and broadcasts the current animation state to all connected clients
Manages VRM model state and pushes it on connect/change
Manages animation contexts and automatic fallback to Idle
Client‑side Animation Renderer
Located in res/synth_webui/js/vrm-viewer.mjs, this frontend component:
Receives animation commands from the backend
Loads and manages FBX animation files
Controls the THREE.js AnimationMixer
Handles smooth transitions between animations
(The previous “Frontend Animation Handler” terminology has been retired to avoid confusion with the server‑side KaradaStateServer.)
WebUI Integration
The SynthWebUIInterface (core/webui.py) coordinates between the backend and frontend:
Initializes
KaradaStateServeron startupTriggers animations at appropriate lifecycle points
Sends animation commands via WebSocket
Animation States
Clients and internal components never supply a file path – they request one of the
logical states below. Karada (the AnimationHandler) chooses an appropriate
FBX from the active skin (falling back to Rei) and sends that filename to the
front-end.
The system defines four logical animation states:
Idle
Trigger: No active animations or tasks
Files: Idle.fbx, Idle2.fbx, Happy Idle.fbx (random selection)
Loop: Yes
Description: Default state when the avatar is not actively processing or responding
Think
Trigger: When a message is received from a user
Files: Thinking.fbx
Loop: Yes
Description: Indicates the AI is processing the incoming message
Write
Trigger: When the LLM starts generating a response
Files: Texting While Standing.fbx, Texting.fbx (random selection)
Loop: Yes
Description: Indicates the AI is formulating and writing a response
Talk
Trigger: Can be triggered by components/plugins for speech output
Files: talking.fbx
Loop: Yes
Description: Indicates the avatar is speaking or vocalizing
Animation Flow
Standard Message Handling
User sends message → Backend triggers
THINKanimationLLM starts processing → Backend triggers
WRITEanimationResponse complete → Backend triggers
IDLEanimation (via context cleanup)
The animation flow is automatic and managed by the WebUI’s message handling logic.
Usage
Backend Usage
Components access the global KaradaStateServer instance:
from core.animation_handler import get_karada_state_server, AnimationState
# Get the server instance
handler = get_karada_state_server()
# Trigger an animation
await handler.transition_to(
AnimationState.THINK,
session_id="session_123",
context_id="my_context"
)
# Stop animation and return to idle
await handler.stop_animation("my_context", "session_123")
Context Management
The animation handler uses context IDs to track multiple concurrent animations. When all contexts are stopped, the handler automatically returns to Idle state.
Example:
# Start animation with context
await handler.play_animation(
AnimationState.WRITE,
session_id="session_123",
loop=True,
context_id="response_generation"
)
# Later, stop this context
await handler.stop_animation("response_generation", "session_123")
# If no other contexts are active, returns to Idle
Frontend WebSocket Protocol
The backend communicates animation state to connected WebUI clients through five distinct WebSocket message types. All messages are JSON objects.
vrm_animation — Play an animation
Emitted by KaradaStateServer._send_animation_command() whenever the active
animation changes. This is the primary playback command.
{
"type": "vrm_animation",
"file": "/skins/Rei/animations/think/Thinking.fbx",
"state": "think",
"loop": true,
"reset_eyes": true,
"descriptor": {
"intro": {"start_frame": 0, "end_frame": 15},
"loop": {"start_frame": 16, "end_frame": 60},
"outro": {"start_frame": 61, "end_frame": 90},
"fps": 30
},
Note
The descriptor object above comes directly from the companion
<animation>.fbx.json file located next to the FBX. that file is
the single source of truth for loop/intro/outro timings, fps, and
related metadata; duplicating the same values elsewhere is a bug.
When no descriptor file exists, the handler synthesises sensible
defaults (idle animations loop, other states play once, and the
implicit loop section spans frames 0–max).
- “animation_state”: {
“action”: “think”, “phase”: “loop”, “animation”: “/skins/Rei/animations/think/Thinking.fbx”, “descriptor”: { “…” : “…” }, “clip”: {“name”: “Thinking”, “duration”: 1.47, “fps”: 30.0}, “timing”: {“started_at”: “2026-03-04T12:00:00Z”, “time_in_clip”: 0.0, “current_frame”: 0}, “expressions”: [], “blink”: {“auto”: true}, “eye_movement”: {“auto”: true}, “emotions”: {“dominant”: “happy”, “values”: {“happy”: 7.5}}, “lipsync”: false, “priority”: 10, “source”: “core”
}
}
Note
The key is "file" (not "animation"). The legacy "type": "animation"
spelling is still accepted by chat-window.mjs for backwards compatibility,
but the backend always emits "vrm_animation".
reset_eyes is emitted only for targeted session plays (not broadcast), so
each client can perform a smooth eyes-reset when the animation changes. It is
not included in global broadcasts (session_id=None).
animation_state is populated only when a descriptor and/or emotions are
available. Clients should treat it as optional.
vrm_model — Set active VRM model
Emitted by KaradaStateServer.set_vrm_model() when the persona’s VRM changes.
{
"type": "vrm_model",
"name": "SyntH.vrm",
"url": "/avatars/SyntH.vrm",
"hash": "sha256:abc123"
}
The optional hash field allows clients to skip reloading an already-cached model.
vrm_face — Blend-shape / emotion values
Emitted to update the avatar’s facial expression sliders. Because the
client’s smoothing pipeline decays values rapidly, this message is generally
only used for slow-changing EmotionManager state. (Facial expression tags
use vrm_expression_set instead.)
{
"type": "vrm_face",
"values": {"happy": 0.8, "neutral": 0.1}
}
vrm_expression_set / vrm_expression_clear — instantaneous facial overrides
New in 2026 release. These packets allow the backend to inject a short-lived
facial expression with high priority, bypassing the normal emotion decay.
They are consumed by the WebUI’s expression pipeline which applies a smooth
lerp and automatically removes the source when a matching vrm_expression_clear
packet is received (typically after a cooldown). This is the mechanism used
by [em_*] tags inserted into LLM-generated text.
{"type": "vrm_expression_set", "name": "grin", "intensity": 0.9}
To clear the override and return to the emotional baseline:
{"type": "vrm_expression_clear"}
preload_animation — Preload an animation file
Asks the frontend to preload an FBX file in the background so it is ready when
vrm_animation requests it. Up to 3 IDLE variants are pre-warmed before any
non-IDLE animation plays (see ensure_idle_preloaded()).
{
"type": "preload_animation",
"animation": "/skins/Rei/animations/idle/Idle.fbx",
"descriptor": {
"loop": {"start_frame": 0, "end_frame": 120},
"fps": 30
}
}
Note
The message type is "preload_animation" and the URL key is "animation".
animation_state — Informational state summary
A lightweight broadcast that communicates what is playing without necessarily
triggering a re-play. Used by clients that arrived after the original
vrm_animation command was sent and need to know the current state.
{
"type": "animation_state",
"state": "think",
"animation_file": "Thinking.fbx"
}
New-client handshake (hello / has_assets)
When a new client establishes a WebSocket connection it may send a hello
message listing assets it already has cached:
{ "type": "hello", "has_assets": ["/avatars/SyntH.vrm"] }
The backend calls get_missing_assets(has_assets) and pushes only the missing
assets to the client, avoiding redundant transfers. Immediately after, the full
current state (VRM model + active animation + face values) is pushed via
get_full_state().
Note
All vrm_animation commands are broadcast to all connected sessions
(session_id=None), ensuring that every open WebUI window shows the
same animation simultaneously.
Centralized Animation State
The animation system maintains a centralized state on the backend that is synchronized across all connected clients. This ensures that when multiple users/devices view the same avatar simultaneously (through different WebUI windows), they all see the exact same animation.
How It Works
Single Source of Truth:
KaradaStateServermaintains the current animation state - Current state (IDLE, THINK, WRITE, TALK) - Current animation file being played - Animation descriptor (frame info for intro/loop/outro)State Change Notifications: When an animation changes: - Backend notifies all registered callbacks via
_notify_animation_state_changed()- WebUI broadcasts the new animation state to all connected WebSocket clients - Each client receives the identical animation commandNew Client Synchronization: When a client connects: - WebSocket endpoint retrieves current animation state via
get_current_animation_state()- Sends the current animation to the new client before any other messages - New client immediately displays the correct animation
Use Case Example
Timeline:
--------
User 1 (Telegram) → sends message
↓
Backend (KaradaStateServer)
↓ triggers THINK animation
↓ updates _current_animation_file, _current_animation_descriptor
↓ calls _notify_animation_state_changed()
↓ WebUI broadcasts to all clients
Client A (WebUI, Device 1) ← receives THINK animation
Client B (WebUI, Device 2) ← receives THINK animation (same video view!)
Client C (WebUI, Phone) ← receives THINK animation
All three devices see the same avatar doing the same THINKING motion simultaneously.
Configuration
No special configuration required. The synchronization is automatic:
Backend calls
register_animation_state_changed_callback()during initializationWebUI broadcasts to all connected clients when animation changes
New clients receive current state on connection
KaradaStateServer API
The following public methods are available for plugins and interfaces.
get_full_state() → dictReturns the complete VRM state in a single dict with four keys:
{ "vrm_model": {"name": "...", "url": "...", "hash": "..."}, "animation": {"file": "...", "url": "...", "state": "idle", "loop": True, "descriptor": {...}}, "face_values": {"happy": 0.0, ...}, "audio": {"url": "...", "audio_duration_s": 3.2, "offset_s": 0.7, "lipsync_data": null} }
The
audiokey isNonewhen no TTS audio is currently playing. Called on every new WebSocket connection to push the current state to the newly-connected client.async set_vrm_model(url, name, hash_=None) → NoneStores the active VRM model info internally and broadcasts a
vrm_modelmessage to all connected WebSocket clients. Should be called by the persona manager or WebUI when the active VRM file changes.get_missing_assets(has_assets: list[str]) → list[str]Given a list of asset URLs that the client already has cached, returns the subset of server-known assets (currently the active VRM) the client is missing. Used during the hello/has_assets handshake.
register_state_animations(state, animations: dict[str, list[str]], sequential=False) → NoneOverride plugin animations for a logical state.
animationsis a dict with optional keysloop,post,other, each mapping to a list of FBX file names.handler.register_state_animations( "think", {"loop": ["DeepThought.fbx"], "post": ["PostThink.fbx"]}, sequential=True, )
add_temporary_search_path(path: Path) → NonePrepend a high-priority search path (used by animation uploads). Temporary paths are tracked separately and can be removed via
remove_temporary_search_path().remove_temporary_search_path(path: Path) → NoneRemove a previously-added temporary search path.
get_animation_variants(state: str) → dictReturns discovered animation variants for a state, classified into three buckets:
{ "loop": ["Thinking.fbx"], # descriptor has loop section (or play_once=False) "post": ["ThinkPost.fbx"], # descriptor has play_once=True "other": ["Unclassified.fbx"], # no descriptor at all }
async ensure_idle_preloaded(session_id=None) → NonePre-warms up to 3 IDLE animation variants. Called automatically before any non-IDLE animation plays so that returning to IDLE is instant.
Adding New Animations
Backend
Place your FBX file(s) under the active skin’s
animations/<state>/folder, e.g.skins/Rei/animations/think/Thinking.fbx
- Optionally add a JSON descriptor alongside an FBX file named
<animation>.fbx.json to describe
intro,loopandoutroframe ranges or aplay_onceflag.
- Optionally add a JSON descriptor alongside an FBX file named
The backend will dynamically discover available animations. Plugins may also:
Register override lists via
register_state_animations(state, animations, sequential=False)Register aliases via
register_state_aliases({})Add search paths via
set_animation_search_paths([...])
Frontend
Ensure the animation files are accessible via the
/animations/endpointUpdate the
animationMappingsin the WebUI template if adding a new state:
const animationMappings = {
think: ['Thinking.fbx'],
write: ['Texting While Standing.fbx', 'Texting.fbx'],
talk: ['talking.fbx'],
idle: ['Idle.fbx', 'Idle2.fbx', 'Happy Idle.fbx'],
custom: ['CustomAnimation.fbx'] // New animation
};
Temporary Animation Uploads
The WebUI exposes endpoints to upload temporary animations that do not modify
the active persona skin until explicitly promoted. Uploaded files are stored under
skins/temp/<upload_id>/animations/<state>/ with a companion metadata file at
skins/temp/<upload_id>/meta.json.
Upload flow
Client uploads an FBX/VRMA to
POST /api/animations/upload.The server writes the file to
skins/temp/<upload_id>/animations/<state>/.The
KaradaStateServeradds the upload root as a temporary search path so the animation can be discovered without touching the active skin.
Promotion flow
When you are ready to make the animation permanent, call
POST /api/animations/promote to copy the upload into
skins/<persona>/animations/<state>/.
Endpoints
POST /api/animations/upload(multipart)GET /api/animations/uploadsDELETE /api/animations/uploads/{upload_id}POST /api/animations/promote
Notes
Temporary uploads are prioritized using search paths; they can be removed at any time.
Descriptors can be provided alongside the upload as JSON (<file>.fbx.json).
Cleanup runs automatically based on
SYNTH_MATEENGINE_UPLOAD_TTL_DAYS(default: 7 days).Promotion is guarded by
SYNTH_MATEENGINE_PROMOTE_ENABLED=1.
Integration with Interfaces
While the WebUI interface automatically manages animations for message handling, other interfaces (Telegram, Discord, Matrix) can integrate with the animation system.
Preferred integration pattern
Interfaces should generally not broadcast animation states directly on message receipt. The core message queue is the fallback owner of the lifecycle:
Message accepted/enqueued → THINK
Generation start → WRITE (or TALK)
Generation end → IDLE
If an interface needs a different mapping (e.g. TTS prefers TALK instead of WRITE), it should pass override hints through the message context (or implement the optional interface hooks used by the queue) rather than bypassing the core chain.
Direct control (advanced)
If an interface explicitly opts out of the core queue animation broadcast (and takes full responsibility for animation state), it may call the server directly.
Example for an interface that wants to show the avatar is “thinking”:
from core.animation_handler import get_karada_state_server, AnimationState
class MyInterface:
async def handle_message(self, message):
handler = get_karada_state_server()
# Get the session_id from WebUI if available
# Note: This only works if the user has a WebUI session
# For pure Telegram/Discord, animations are WebUI-only
webui_session = self.get_webui_session_for_user(message.from_user.id)
if webui_session:
await handler.transition_to(
AnimationState.THINK,
session_id=webui_session,
context_id=f"interface_{message.message_id}"
)
Flexible Animation Sections (Intro/Loop/Outro)
The animation system supports flexible combinations of intro, loop, and outro sections, allowing for more sophisticated animation sequences:
Full Animation Flow
An animation can define up to three sections:
Intro: Initial/setup frames (e.g., transition into thinking pose)
Loop: Repeating frames that play continuously (e.g., thinking motion)
Outro: Wind-down/transition frames (e.g., returning to rest pose)
{
"intro": {"start_frame": 0, "end_frame": 20},
"loop": {"start_frame": 21, "end_frame": 120},
"outro": {"start_frame": 121, "end_frame": 160}
}
Smart Playback
When play_animation() is called:
If
loopsection exists → always loop untilstop_animation()is calledIf only
intro→ play once and stop automaticallyWebUI uses the descriptor to determine which frames to play
Graceful Stopping
When stop_animation() is called:
If
outroexists → play outro sequence before returning to IdleIf no outro → immediately return to Idle
Duration calculated from frame count (approximately 30fps)
Supported Combinations
All combinations work correctly:
intro + loop + outro: Full animation flow
loop + outro: Repeating with graceful ending
intro + loop: Intro then repeating motion
loop only: Simple repeating animation
intro + outro: One-shot animation
Any solo section: Works as expected
See Animation Flow System - Flexible Intro/Loop/Outro for detailed documentation.
effective_loop Determination
play_animation() computes the effective loop behaviour from the descriptor and the
requested loop parameter according to the following hierarchy:
Condition |
|
|---|---|
State is IDLE |
|
Descriptor has |
|
Descriptor has |
|
Descriptor has |
|
Descriptor has |
|
Descriptor present, |
|
No descriptor |
Respects the |
When effective_loop is False and the state is not IDLE, a background task
(_non_loop_fallback) schedules a return to IDLE after the clip completes,
acting as a safety net in case the client does not send a completion event.
The duration passed to _non_loop_fallback is calculated as follows:
If a descriptor is available, all defined sections (
intro,loop,outro) are measured in frames, converted to seconds using the descriptor’sfpsvalue (default 30), and summed together. A safety buffer of 1.5 seconds is then added to accommodate network latency and the front end’s own “finished” event handling.If no descriptor or usable frame information exists, a conservative default of 3 seconds is used before adding the 1.5 second buffer.
This scheme prevents the backend fallback from firing partway through an
animation’s outro – a problem that used to manifest as the VRM dropping into
T‑pose mid‑transition when playing non‑looping clips such as write.
IDLE Rotation Loop
When IDLE has multiple FBX variants, the _rotation_loop() background task
switches to the next variant every 30–60 seconds (random interval). The default
mode is sequential (all IDLE animations are listed alphabetically in order);
other states use random selection by default.
# Register a state as sequential (cycles in order, no repeat)
handler.register_state_animations(
"idle",
{"loop": ["Idle.fbx", "Idle2.fbx", "Happy Idle.fbx"]},
sequential=True,
)
The rotation task is cancelled automatically when a higher-priority context starts and restarted when returning to IDLE.
Smart Eye Behaviour
The frontend automatically suspends blink and saccade loops when the avatar’s
eyes_closed blend-shape exceeds 0.5 (e.g. during a blinking animation).
They resume once the value drops below the threshold.
Additionally, the eyes_closed value is clamped to 0.85 to prevent
visual artefacts (eyelash/cheek clipping).
The backend emits "reset_eyes": true in every targeted (non-broadcast)
vrm_animation command so that the client can smoothly reset eye state when
a new animation starts.
Known Issues Fixed
Stale window.animationHandler (idle-only animation)
Symptom: Only the Idle animation played; Think/Write state changes were logged by
chat-window.mjs (vrm_animation received: think) but [KaradaStateServer] startAction
never appeared in the console.
Root cause: window.animationHandler was set at module-load time (when
vrm-viewer.mjs was parsed), at which point the closure variable animationHandler
was still null. The real AnimationHandler instance was created inside
loadDefaultAnimations() (called after a VRM file is loaded) and assigned only to
the module-scoped closure variable, never back to window.animationHandler.
Every subsequent call to VRMAnimations.play() hit the guard
if (!window.animationHandler) return and silently exited.
Fix (vrm-viewer.mjs, 2026-03-03):
loadDefaultAnimations()now updateswindow.animationHandlerimmediately after creating the real instance:animationHandler = new AnimationHandler(currentMixer, vrm); window.animationHandler = animationHandler; // ← added
VRMAnimations.play / preload / setFaceValuesuse the closure variable first, falling back to the global only as a safety net:const handler = animationHandler || window.animationHandler; if (!handler) return; handler.startAction(state, animation, playOnce, playSection, descriptor);
Debugging
Enable debug logging to see animation state changes:
export LOGGING_LEVEL=debug
Animation state server logs appear with the prefix [KaradaStateServer].
HTTP fallback
If a WebSocket client cannot receive the state via push, it can poll:
GET /api/animation_state
The frontend (vrm-viewer.mjs) uses this as a safety net on reconnect.
Limitations
Animations are only visible in WebUI and Karada API clients
Multiple concurrent animations on the same session may conflict (use context IDs properly)
Animation files must be Mixamo-compatible FBX format
File names in the mapping must match exactly (case-sensitive)
Transport Layer
KaradaStateServer is decoupled from any specific I/O mechanism through an abstract
transport layer (core/karada_transport.py). Each transport implements
KaradaTransport and is registered at runtime via add_transport().
Built-in transports:
WebSocketTransport (
core/karada_ws_transport.py) — wraps the WebUI’sconnectionsdict and delegates toWebSocket.send_json().KaradaApiTransport (
core/karada_api.py) — serves clients that connect through the public REST + WebSocket API (/api/karada/ws).
Custom transports (e.g. a native desktop client or XR headset) only need to
subclass KaradaTransport and call handler.add_transport(transport).
from core.karada_transport import KaradaTransport
class MyTransport(KaradaTransport):
async def broadcast_animation(self, payload: dict) -> None: ...
async def broadcast_audio(self, payload: dict) -> None: ...
async def broadcast_face(self, payload: dict) -> None: ...
async def broadcast_model(self, payload: dict) -> None: ...
async def broadcast_expression(self, payload: dict) -> None: ...
async def send_to_session(self, session_id: str, payload: dict) -> None: ...
async def preload_asset(self, session_id: str | None, payload: dict) -> None: ...
def get_connected_sessions(self) -> list[str]: ...
handler.add_transport(MyTransport())
Priority & Preemption
Every animation state has a numeric priority (higher = more important). When
play_animation() is called, the server compares the new request’s priority
against the currently active priority. Lower-priority requests are silently
rejected.
Default priorities (defined in ANIMATION_STATE_PRIORITIES):
State |
Priority |
|---|---|
IDLE |
0 |
WRITE |
3 |
TALK |
5 |
THINK |
10 |
Plugins can register custom priorities:
handler.register_state_priority("touch", 7)
Audio State Tracking
KaradaStateServer tracks the currently playing TTS audio so that late-joining clients can resume playback from the correct offset.
# Called by the WebUI when TTS audio starts
handler.set_current_audio("/static/audio/tts/reply_42.wav", duration_s=3.2)
# Late-joining client receives this when it connects
audio = handler.get_current_audio()
# → {"url": "...", "audio_duration_s": 3.2, "offset_s": 0.7, "lipsync_data": ...}
Audio state is automatically cleared when the estimated playback duration
(plus a small buffer) has elapsed. get_full_state() includes an "audio"
key so late-joiners get everything in one shot.
Watchdog
A background watchdog task (10 s interval) detects stuck animation states — for
example if THINK remains active after all _active_tasks have been cleared due
to a race condition or bug. When a stuck state is detected, the watchdog forces
a return to IDLE.
The watchdog starts automatically when the first transport is registered. No manual configuration is needed.
Karada REST & WebSocket API
A public API router is mounted at /api/karada/ and exposes the full body
state to external clients (native apps, XR headsets, monitoring dashboards).
State endpoints (GET):
/api/karada/state— full state (model + animation + face + audio)/api/karada/state/animation— current animation only/api/karada/state/model— current VRM model/api/karada/state/face— current face blend-shapes/api/karada/state/audio— current audio playback (if any)
Action endpoints (POST):
/api/karada/action— request a state change ({"state": "think"})
Discovery endpoints (GET):
/api/karada/animations/{state}— list available animations for a state/api/karada/animations/{state}/{file}/descriptor— get descriptor for a file/api/karada/skins— list available skins
Asset distribution (GET / POST):
/api/karada/assets/manifest— SHA-256 asset manifest for cache validation/api/karada/assets/missing— given a list of owned hashes, returns missing assets
WebSocket (WS):
/api/karada/ws— real-time push stream (same protocol as the WebUI WS)
Future Enhancements
Potential improvements to the animation system:
Dynamic animation blending based on response content
Configurable animation mappings via config system
Integration with TTS for lip-sync animations
Binary (non-URL) file transfer for embedded VRM assets
See Also
VRM Avatar Animations - VRM animation file documentation
Component Development Pattern - Two-Phase Initialization - Component development patterns
Interfaces - Interface development guide