Run Gemma on Reachy Mini, an open source robot

Ian Ballantyne, Developer Relations Engineer at Google DeepMind, shows how Gemma runs on hardware like Raspberry Pi, Jetson, and Nano, letting a model see, hear, and act the way a robot would. The demo is Reachy Mini, the open source robot from Hugging Face and Pollen Robotics, a small robot that sees with cameras, reacts with emotion and head movement, and holds a conversation through its microphone and speaker.

What’s covered: A live conversation with Reachy Mini about why local on-device models matter for privacy and speed, the robot’s ability to move its head, show emotion, and take pictures to understand its surroundings, controlling smart devices and APIs like lights, thermostats, calendars, and live data, and an early look at the robot reasoning about a chessboard and explaining how a knight moves. Explore the Reachy Mini project from Hugging Face and Pollen Robotics, and try running Gemma on your own hardware.

What would you build with Gemma on a robot or IoT device? Drop it in the comments.

Resources: Reachy Mini → https://goo.gle/4xJNpVJ

Local Reachy Mini → https://goo.gle/4f1dOXC

Gemma Docs → https://goo.gle/4xMnVXS

Gemma Cookbook → https://goo.gle/4epYEuZ

Yes, Gemma can be used to manage smart home systems and devices.

In the video, it is explained that by running a local model like Gemma on hardware (such as a Reachy Mini robot or other IoT devices), the AI can interact with smart devices and APIs.

Potential applications include:

Home Automation: Controlling lights and thermostats (1:43 – 1:50).
Data Management: Fetching live data or managing calendars (2:01 – 2:06).
Privacy and Speed: Because the model runs locally on your device, it offers improved privacy and faster reaction times since data doesn’t need to travel over the internet (1:04 – 1:12).

Running Google’s Gemma locally on the Reachy Mini open-source desktop robot allows the machine to see, hear, speak, and act completely offline. Developed by Hugging Face and Pollen Robotics, this integration leverages a local speech-to-speech cascade pipeline to give the robot ultra-low latency real-time voice, vision reasoning, and expressive motor responses without relying on the cloud. [1, 2, 3, 4]

Choose Your Setup Architecture

You can deploy Gemma on Reachy Mini through three primary architectural approaches, depending on your hardware availability and use case: [5]

Fully On-Device (Single-Board Computer): Run highly quantized variants directly on edge boards like the NVIDIA Jetson Orin Nano (8GB) or a Raspberry Pi 5. [2, 6, 7]
Local Companion PC (Recommended): Run the heavy LLM backend on an external Mac or PC (e.g., M3 Pro MacBook or an AI workstation), pointing the robot’s UI dashboard to your local host. [1, 3]
In-Browser WebAI: Use a browser, a USB-C tether, and Transformers.js to execute Gemma 4 entirely offline via WebGPU and WebSerial. [8]

Step-by-Step Implementation Guide

To implement the fully offline, zero-internet speech-to-speech stack recommended by Hugging Face, follow these configuration steps: [1, 9] from Run Gemma

1. Deploy the Local Speech Backend [10]

Set up a high-speed inference server on your companion hardware or advanced edge board using the Hugging Face speech-to-speech library. This relies on a cascaded architecture: [1]

LLM Engine: Deploy llama.cpp hosting the latest instruction-tuned Gemma variants.
Voice Activity Detection (VAD): Use Silero VAD for background noise isolation and instant voice detection.
Speech-to-Text (STT): Use Parakeet-TDT 0.6B v3 to transcribe the robot’s microphoned audio streams.
Text-to-Speech (TTS): Integrate Qwen3-TTS or GPU-accelerated Kokoro TTS for low-latency verbal output. [1, 6, 11]

2. Connect Reachy Mini to the Pipeline

Once the backend environment is initialized, it exposes a Realtime API-compatible WebSocket (/v1/realtime). [1]

Open the Reachy Mini Desktop App or web interface.
Navigate to the connection configuration panel.
Map the robot’s media streams directly to your local WebSocket URL (e.g., ws://localhost:8000/v1/realtime). [1, 3, 6, 12]

3. Establish the Media and Control Stack

The system coordinates audio streams, computer vision frames, and motor movements simultaneously:

Audio & Vision Streams: Reachy Mini streams its built-in dual microphones and 160° wide-angle camera data locally. [4, 13]
Tool Calling: When Gemma decides to execute an action (e.g., taking a picture, tracking a hand, or moving a chess piece), it outputs structured commands. [2, 8, 14]
Actuation: These commands trigger Python SDK behaviors over WebSerial or Wi-Fi, converting the model’s logic into animated head turns, expressiveness, or physical actions. [4, 8, 15]

What You Can Do With Local Gemma

Offline Chess Play: Give the robot visual awareness of a physical chessboard; it can use its cameras to inspect piece positions and explain rules or moves. [2, 4, 16]
Low-Latency Conversions: Chat seamlessly without waiting for data to travel over the internet, giving the robot a responsive, fluid personality. [2, 17]
Total Privacy Isolation: Because no video or audio packets leave the local environment, the robot operates securely in private homes, classrooms, or confidential research spaces. [2] from Run Gemma