Aether — Local Smart Speaker

activeRust100% · local inference

Open-source distributed smart speaker system in Rust — all AI processing runs on your own hardware. Multi-node mTLS gRPC, zero-config mDNS discovery, Whisper STT, Ollama LLM, Qdrant RAG. No cloud APIs, no telemetry.

GitHub ↗← Projects

STATUSactive

LANGUAGERust

METRIC100% · local inference

STACKRust · gRPC · mTLS · Whisper STT · Ollama · Qdrant · Docker · ARM SBC · mDNS

REPOmiki-przygoda/Aether ↗

COMMITS164

STARS0

REPOSITORY

miki-przygoda/Aether ↗

⊙ 164 commits‹› Code ↗

miki-przygodaMerge pull request #8 from miki-przygoda/app-improvements2d9dbe74 days ago

·	.dockerignore
·	.gitattributes
▾	.github
·	.gitignore
·	CLAUDE.md
·	Cargo.lock
·	Cargo.toml
·	Cross.toml
·	Dockerfile
·	LICENSE
·	README.md
·	compose.gpu.yml
·	compose.yml
▾	crates
▾	docs
▾	finetuning
▾	models
▾	proto
▾	scripts
▾	todos

📄

Aether

Local-first, privacy-centric smart assistant — built in Rust.

Aether is an open-source smart speaker system that keeps every piece of AI processing on your own hardware. No cloud APIs, no telemetry, no accounts, no data leaving your network. Wake word detection, speech-to-text, language model inference, and text-to-speech all run locally — on devices you own and control.

How It Works

Aether splits the workload between two roles:

Edge node — a low-power ARM board (e.g. Raspberry Pi) that sits in the room. It listens for the wake word locally and streams audio to the brain only after activation. Nothing leaves the device until you speak.
Brain node — a more powerful machine running Docker that handles the heavy lifting: Whisper for transcription, Ollama for the language model, and Kokoro for speech synthesis.

Edge nodes discover the brain automatically over your local network via mDNS. All traffic between them is encrypted with mutual TLS using a self-hosted certificate authority created during a one-time pairing ceremony.

Edge Node ────────────────────────────────────────────────────────────┐
  (ARM SBC)                                                             │
                                                                        │
    Microphone → wake word detection (on-device, always private)        │
                                                                        │
    Wake word detected → mTLS gRPC stream ───────────────────────────►  │
                                                                        │  Brain Node
                         WAV response ◄───────────────────────────────  │  (Docker Compose)
                                                                        │
    Speaker ← playback                                                  │   Whisper  ·  Ollama  ·  Kokoro
                                                                        │   Qdrant (RAG + history)
  Browser ──── http://<brain>:8080/ui/ ──────────────────────────────►  │   Web UI

The brain serves a self-hosted web UI at port 8080 — use it from any browser on your local network to pair nodes, train your wake word, manage documents, and configure every setting.

Getting Started

What You Need

A machine to run the brain (any x86-64 or ARM64 machine with Docker and at least 8 GB RAM)
A Raspberry Pi or similar ARM single-board computer for the edge node
A USB microphone and speaker for the edge node

1. Launch the Brain

Clone the repository and download the AI model weights, then start the brain with Docker Compose.

git clone https://github.com/miki-przygoda/Aether
cd Aether

# Download Whisper and Kokoro model weights (~2 GB)
./scripts/download-models.sh

# Start the full brain stack
docker compose up

If your machine has an NVIDIA GPU and the nvidia-container-toolkit installed, enable GPU acceleration:

docker compose -f compose.yml -f compose.gpu.yml up

Once running, open the web UI in a browser on the same network:

http://<brain-machine-ip>:8080/ui/

A setup wizard will guide you through the remaining steps.

2. Install the Edge Node on Your Pi

ARCHITECTURE

Edge nodes (ARM SBCs) run Porcupine wake-word detection entirely on-device — nothing streamed until wake word spoken.
mDNS zero-config discovery — edge nodes find the brain automatically on the local network; no IP addresses, no accounts.
mTLS gRPC — all inter-node traffic uses mutual TLS via a self-hosted certificate authority established in a one-time pairing ceremony.
Brain node (Dockerised) — Whisper STT + Ollama LLM (Llama 3.2 / Mistral Nemo) + Piper/Kokoro TTS + Qdrant vector DB for RAG memory.
Multi-node — any number of edge nodes connect independently; each runs the same pipeline against the shared brain.

STACK

Three-crate workspace: aether-core (shared types), brain-node (inference), edge-node (ARM SBC binary).
Cross-compilation for ARM SBCs via cross-rs.
GPIO hardware control via rppal — LED status indicators and physical panic button.
Docker Compose brain — CPU default, single flag for GPU acceleration.
All 4 development phases complete: audio pipe, neural engine, hardware feedback, RAG memory.