Key architectural details
Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization.
119B total parameters, with 6B active parameters per token (8B including embedding and output layers).
256k context window, supporting long-form interactions and document analysis.
Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs.
Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.



At this point, these small models should add explicit minimum hardware requirements just so they can stand out. STM32 w xxGB of PSRAM. Android phone w this much RAM, how many TOPS, and minimum OS version. ESP32-S3 or S4? That sort of thing.
If you just say ‘small,’ you get lost in the noise.
tbh that’s the main thing I took away from this, since when did small equal 119b ?!
Does that mean they’ve got large models lined up approaching 1tb?
deleted by creator
deleted by creator
We are being gaslit. From the article:
No big. Your typical homelab setup. 🙄
Also: https://github.com/jahrulnr/esp32-picoTTS
deleted by creator