Open-source distributed smart speaker system in Rust — all AI processing runs on your own hardware. Multi-node mTLS gRPC, zero-config mDNS discovery, Whisper STT, Ollama LLM, Qdrant RAG. No cloud APIs, no telemetry.
| .dockerignore | |
| .gitattributes | |
| .github | |
| .gitignore | |
| CLAUDE.md | |
| Cargo.lock | |
| Cargo.toml | |
| Cross.toml | |
| Dockerfile | |
| LICENSE | |
| README.md | |
| compose.gpu.yml | |
| compose.yml | |
| crates | |
| docs | |
| finetuning | |
| models | |
| proto | |
| scripts | |
| todos |
Local-First, Privacy-Centric Smart Assistant — built in Rust.
Aether is an open-source distributed smart speaker system that keeps all AI processing on your own hardware. No cloud APIs. No telemetry. No data leaving your network.
Commercial smart speakers trade convenience for privacy. Aether takes a different approach: split the work between low-power edge nodes (always-on listening) and a Dockerised brain node (heavy AI inference), connected over an encrypted local network. The result is Alexa-like responsiveness with full data sovereignty — and no accounts, subscriptions, or external services required.
Edge Node 1 ──┐
(ARM SBC) │ ┌─────────────────────────────┐
├── mTLS gRPC ──────► │ Brain Node (Docker) │
Edge Node 2 ──┤ (local network) │ ┌──────────┐ ┌─────────┐ │
(ARM SBC) │ │ │brain-node│ │ ollama │ │
│ WAV stream ◄───── │ │ Rust │ │ LLM │ │
Edge Node N ──┘ │ └──────────┘ └─────────┘ │
└─────────────────────────────┘Edge nodes discover the brain automatically on the local network via mDNS — no accounts, no manual configuration. All traffic is encrypted with mutual TLS using a self-hosted certificate authority established during a one-time pairing ceremony.
crates/ ├── aether-core/ — shared types and traits (LlmResponse, NodeState, …) ├── brain-node/ — Docker-deployed inference server (STT · LLM · TTS) └── edge-node/ — ARM SBC binary (wake word · audio capture · gRPC client)
| Layer | Technology | |:----------------------|:-----------------------------------------| | **Language** | Rust | | **Audio I/O** | `cpal` (ALSA / PulseAudio) | | **Wake Word** | Porcupine (local, on-device) | | **Discovery** | `mdns-sd` (zero-config local network) | | **Networking** | `tonic` (gRPC) over mTLS | | **TLS** | `rustls` + `rcgen` (self-hosted CA) | | **STT** | Whisper.cpp via `whisper-rs` | | **LLM** | Ollama (Llama 3.2 / Mistral Nemo) | | **TTS** | Piper (fast) or Kokoro-82M (natural) | | **GPIO / Hardware** | `rppal` (I2C, PWM, GPIO) | | **Brain Deployment** | Docker Compose (CPU default, GPU opt-in) | | **Cross-compilation** | `cross-rs` |
1. Idle — Edge node listens locally for the wake word using a small on-device model. No audio leaves the device.
2. Activation — Wake word detected; a mTLS gRPC stream opens to the brain node (discovered automatically via mDNS).
3. Transcription — Audio chunks are streamed to Whisper for speech-to-text.
4. Inference — The transcript is sent to Ollama. The LLM responds with structured JSON describing an action or reply.
5. Synthesis — Response text is converted to speech via TTS and streamed back.
6. Playback & Action — Edge node plays the audio and executes any GPIO actions (LEDs, buttons, etc.).