Run Gemma on the edge with the Coral Board

By | June 16, 2026
What’s covered: Live translation with speech in and translated speech out running entirely on the board, natural language controlling physical hardware, and vision and sound working together in a lightweight version of the pre-I/O show that generates music from an aquarium of jellyfish, all on a single board.
The Coral Board is available this summer. Every demo in the video is open source and on GitHub to get you started. Follow the links below to learn more.
What will you build on the edge with Gemma? Drop it in the comments.
Subscribe to Google for Developers → https://goo.gle/developers
Speaker: Ian Ballantyne Products Mentioned:
Google AI, Gemini

Running Gemma on-device with the Coral Board allows for high-performance, private, and offline AI inference. Everything happens directly on the board, eliminating the need for cloud connectivity for these tasks (0:140:17).

How Gemma runs on-device:

  • Hardware Acceleration: The Coral Board utilizes Google’s Coral NPU (Neural Processing Unit), an ultra-low-power, RISC-V based accelerator specifically designed to handle machine learning workloads efficiently at the edge (0:040:06).
  • Software Integration: Developers use the MediaPipe LLM Inference API, which provides specialized wrappers to manage on-device memory and handle the model’s operations directly on the hardware accelerator.
  • Quantization: Because edge devices have resource constraints, Gemma models are typically converted to 4-bit or 8-bit quantized formats. This significantly reduces the model’s memory footprint and improves inference speed without sacrificing performance, making it suitable for hardware like the Coral Board.
  • Workflow: The process generally involves taking Gemma model weights, quantizing them for compatibility with LiteRT (formerly TensorFlow Lite), and deploying them via the MediaPipe framework to execute on the NPU.

This setup enables developers to build sophisticated edge applications—such as live speech translation, natural language hardware control, and real-time creative audio-visual generation—all running locally (0:300:55).

Run Gemma on the edge with the Coral Board
Running Gemma on the edge is highly achievable, especially with Google’s dedicated hardware and software stacks designed for low-power, local AI inference. The Gemma model family includes small-sized versions tailored exactly for edge and ultra-mobile deployment. [1, 2]


Hardware Options


To run Gemma locally on specialized edge hardware, you have a few primary routes depending on the exact board version you are using:
  • The Coral Dev Board (Gemma-optimized): Google features a dedicated Coral development board specifically engineered to run lightweight Gemma models (such as Gemma3-270M chips). It acts as a full-stack, open hardware platform capable of on-device AI tasks like real-time voice translation and natural language hardware control.
  • Classic Coral Dev Board (Edge TPU): If you are using older Coral Dev Boards (which have ~1 to 4 TOPS ML accelerators), running a full large language model can be resource-heavy. However, these boards excel at highly optimized encoder-based, vision, and audio tasks at the edge. [4, 5]

Deployment Tools for Edge & Mobile


To execute Gemma or its optimized variants (including the Gemma 4 small sizes), Google provides specific cross-platform runtimes:
  • LiteRT (formerly TFLite) for LLMs: The fully open-source framework specifically built to run LLMs directly on user devices, offering fine-grained control and direct NPU/GPU acceleration.
  • MediaPipe: Provides the LLM Inference API, which is the easiest way to integrate Gemma into cross-platform edge applications seamlessly. [6]
For more context on how these tiny, low-power NPUs are bringing generative AI directly to edge devices and wearables, check out the Google Research innovations:

If you want, I can:

  • Detail the specific compiler steps required to convert Gemma models for Edge TPU execution
  • Provide links to the open-source GitHub repositories for Coral hardware demos
  • Explain how to deploy Gemma models via LiteRT [7]

Read more

. How google combatting AI scams with security, legislation and more

. Google Workspace Updates Weekly Recap – June 12, 2026

. Microsoft Who evaluates the evaluators? The data science behind agent evals

. How to watch the 2026 FIFA World Cup on YouTube

. Read Sundar Pichai’s 2026 Commencement Address at Stanford University

. The Pixel punches way above its weight in the smartphone space [Video]

. I tested AI glasses in Paris. Here’s what they got wrong

. Elon Musk and co may relish march of the robots but there must be AI boundaries in the workplace

. Microsoft CEO Satya Nadella warns AI could leave entire industries struggling if value stays with few companies

. India, France Aim To Expand AI, Data & Academic Partnerships By 2030

. AI buys robot and car, does exactly what experts warned.

. How Unilever is building an AI-first enterprise at scale

. Introducing the OpenAI Partner Network

. Helping students and parents prepare for the final exams period

. Read Sundar Pichai’s 2026 Commencement Address at Stanford University-2

for more refer Gemini website click here

for more refer Artificial Intelligence  website click here