Projects / Aether — Local Smart Speaker

Aether — Local Smart Speaker

activeRust100% · local inference

Open-source distributed smart speaker system in Rust — all AI processing runs on your own hardware. Multi-node mTLS gRPC, zero-config mDNS discovery, Whisper STT, Ollama LLM, Qdrant RAG. No cloud APIs, no telemetry.

STATUSactive
LANGUAGERust
METRIC100% · local inference
STACKRust · gRPC · mTLS · Whisper STT · Ollama · Qdrant · Docker · ARM SBC · mDNS
COMMITS134
STARS0
134 commits‹› Code ↗
M
miki-przygodaMerge branch 'master' of https://github.com/miki-przygoda/Aether9ef0ea11 months ago
·.dockerignore
·.gitattributes
.github
·.gitignore
·CLAUDE.md
·Cargo.lock
·Cargo.toml
·Cross.toml
·Dockerfile
·LICENSE
·README.md
·compose.gpu.yml
·compose.yml
crates
docs
finetuning
models
proto
scripts
todos
📄
Aether

Local-First, Privacy-Centric Smart Assistant — built in Rust.

Aether is an open-source distributed smart speaker system that keeps all AI processing on your own hardware. No cloud APIs. No telemetry. No data leaving your network.

Why Aether?

Commercial smart speakers trade convenience for privacy. Aether takes a different approach: split the work between low-power edge nodes (always-on listening) and a Dockerised brain node (heavy AI inference), connected over an encrypted local network. The result is Alexa-like responsiveness with full data sovereignty — and no accounts, subscriptions, or external services required.

Architecture
Edge Node 1 ──┐
  (ARM SBC)     │                     ┌─────────────────────────────┐
                ├── mTLS gRPC ──────► │   Brain Node (Docker)       │
  Edge Node 2 ──┤   (local network)   │  ┌──────────┐ ┌─────────┐   │
  (ARM SBC)     │                     │  │brain-node│ │ ollama  │   │
                │   WAV stream ◄───── │  │  Rust    │ │  LLM    │   │
  Edge Node N ──┘                     │  └──────────┘ └─────────┘   │
                                      └─────────────────────────────┘

Edge nodes discover the brain automatically on the local network via mDNS — no accounts, no manual configuration. All traffic is encrypted with mutual TLS using a self-hosted certificate authority established during a one-time pairing ceremony.

Repository Layout
crates/
├── aether-core/   — shared types and traits (LlmResponse, NodeState, …)
├── brain-node/    — Docker-deployed inference server (STT · LLM · TTS)
└── edge-node/     — ARM SBC binary (wake word · audio capture · gRPC client)
Tech Stack
| Layer                 | Technology                               |
|:----------------------|:-----------------------------------------|
| **Language**          | Rust                                     |
| **Audio I/O**         | `cpal` (ALSA / PulseAudio)               |
| **Wake Word**         | Porcupine (local, on-device)             |
| **Discovery**         | `mdns-sd` (zero-config local network)    |
| **Networking**        | `tonic` (gRPC) over mTLS                 |
| **TLS**               | `rustls` + `rcgen` (self-hosted CA)      |
| **STT**               | Whisper.cpp via `whisper-rs`             |
| **LLM**               | Ollama (Llama 3.2 / Mistral Nemo)        |
| **TTS**               | Piper (fast) or Kokoro-82M (natural)     |
| **GPIO / Hardware**   | `rppal` (I2C, PWM, GPIO)                 |
| **Brain Deployment**  | Docker Compose (CPU default, GPU opt-in) |
| **Cross-compilation** | `cross-rs`                               |
How It Works

1. Idle — Edge node listens locally for the wake word using a small on-device model. No audio leaves the device.

2. Activation — Wake word detected; a mTLS gRPC stream opens to the brain node (discovered automatically via mDNS).

3. Transcription — Audio chunks are streamed to Whisper for speech-to-text.

4. Inference — The transcript is sent to Ollama. The LLM responds with structured JSON describing an action or reply.

5. Synthesis — Response text is converted to speech via TTS and streamed back.

6. Playback & Action — Edge node plays the audio and executes any GPIO actions (LEDs, buttons, etc.).

Getting Started
Brain Node (any machine with Docker)
ABOUT

Local-First Privacy-Centric Smart Speaker

ACTIVITY
Stars0
Forks0
Commits134
LicenseMIT
LANGUAGES
Rust82.3%
HTML10%
CSS2.5%
Shell1.9%
Python1.8%
Dockerfile1.2%
ARCHITECTURE
  • Edge nodes (ARM SBCs) run Porcupine wake-word detection entirely on-device — nothing streamed until wake word spoken.
  • mDNS zero-config discovery — edge nodes find the brain automatically on the local network; no IP addresses, no accounts.
  • mTLS gRPC — all inter-node traffic uses mutual TLS via a self-hosted certificate authority established in a one-time pairing ceremony.
  • Brain node (Dockerised) — Whisper STT + Ollama LLM (Llama 3.2 / Mistral Nemo) + Piper/Kokoro TTS + Qdrant vector DB for RAG memory.
  • Multi-node — any number of edge nodes connect independently; each runs the same pipeline against the shared brain.
STACK
  • Three-crate workspace: aether-core (shared types), brain-node (inference), edge-node (ARM SBC binary).
  • Cross-compilation for ARM SBCs via cross-rs.
  • GPIO hardware control via rppal — LED status indicators and physical panic button.
  • Docker Compose brain — CPU default, single flag for GPU acceleration.
  • All 4 development phases complete: audio pipe, neural engine, hardware feedback, RAG memory.