Implementing Real-Time Analytics on Raspberry Pi Fleet with ClickHouse
Build a Pi fleet telemetry pipeline to ClickHouse for real-time dashboards — practical tutorial with code, schema, and scaling tips for 2026-edge analytics.
Ship a real-time IoT analytics demo: Raspberry Pi fleet -> ClickHouse dashboards
Hook: If you manage edge fleets or build IoT analytics demos, you know the pain: fragmented telemetry, slow ingestion, and dashboards that lag behind reality. This tutorial walks you through an end-to-end project that turns a fleet of Raspberry Pi devices into a reliable, low-latency telemetry pipeline backed by ClickHouse for real-time dashboards — a practical demo you can deploy and show in interviews, presentations, or POCs.
Why this matters in 2026
ClickHouse has continued to mature as a high-performance OLAP backend and, following major funding and product expansion in 2025, it's now a top choice for time-series and analytics workloads at scale. At the same time, Raspberry Pi 5 and the new AI HAT+ 2 (late 2025) make Pi devices capable edge nodes for telemetry plus lightweight inference. Combining both gives you a modern, extensible playground for IoT analytics, edge computing, and real-time metrics.
"ClickHouse raised a large round in late 2025, accelerating development for analytics use-cases in 2026." — Bloomberg/Tech press coverage
Project overview — what you'll build
End result: a working demo where multiple Raspberry Pi devices send telemetry (CPU, memory, sensor, and inference metrics) to a lightweight edge gateway or directly to ClickHouse for ingestion. ClickHouse stores telemetry in an efficient time-series schema, materialized views pre-aggregate for low-latency dashboards, and Grafana visualizes the real-time metrics.
Key features you'll implement:
- Pi telemetry agent (Python) that samples system metrics and custom sensors.
- Secure ingestion via HTTP API or edge gateway with TLS and API keys.
- ClickHouse schema optimized for time-series: MergeTree, partitioning, compression, TTL.
- Real-time materialized views to produce minute-level aggregates for dashboards.
- Grafana dashboard wired to ClickHouse for live panels.
Architecture choices and reasoning
There are three common architectures for this kind of project:
- Direct ingestion: Pis send telemetry to ClickHouse HTTP endpoint. Simple for demos, low ops, but fewer buffer guarantees.
- Edge aggregator: Pis send to an edge gateway (NATS / MQTT / Vector / Telegraf); gateway batches and writes to ClickHouse. Better buffering and transformation.
- Message queue + connector: Pis → Kafka/RabbitMQ → ClickHouse Kafka engine or connector. Production-grade for large fleets.
For this tutorial we'll show a hybrid: start with direct HTTP ingestion from Pi for simplicity, and include an optional section to replace direct writes with a lightweight aggregator using Vector or Kafka as you scale.
Prerequisites
- Raspberry Pi devices (Pi 4 or Pi 5 recommended). Pi 5 + AI HAT+ 2 is excellent for local inference telemetry.
- ClickHouse server (self-hosted, Docker, or ClickHouse Cloud). ClickHouse 23/24/25+ works — 2026 versions include more features optimized for cloud and streaming.
- Grafana (8+) with ClickHouse datasource plugin.
- Basic Linux, Python 3.9+, and networking skills.
Step 1 — ClickHouse schema and ingestion endpoints
Design principles:
- Time-ordered keys: ORDER BY (device_id, timestamp) for efficient range scans per device.
- Partitioning by month or day depending on retention.
- Compression tuning for telemetry types (Float32 for metrics, LowCardinality(String) for device tags).
- Materialized views to pre-aggregate for dashboard queries (per minute).
Example table DDL
CREATE TABLE telemetry_raw (
timestamp DateTime64(3),
device_id String,
metric String,
value Float32,
tags Map(String, String)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (device_id, timestamp)
TTL timestamp + toIntervalDay(30)
SETTINGS index_granularity = 8192;
Notes:
- TTL keeps 30 days by default — adjust for your demo.
- Use Decimal types for monetary metrics or high-accuracy values.
Materialized view for real-time aggregation
CREATE MATERIALIZED VIEW telemetry_minute
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (device_id, minute)
POPULATE AS
SELECT
device_id,
toStartOfMinute(timestamp) AS minute,
metric,
sum(value) AS sum_value,
count() AS samples
FROM telemetry_raw
GROUP BY device_id, minute, metric;
This view gives minute-level aggregates, dramatically reducing query time for Grafana panels.
Step 2 — Telemetry agent for Raspberry Pi
The agent runs on each Pi, samples metrics, and pushes them to ClickHouse using the HTTP Insert API (JSONEachRow). The code below is minimal, resilient, and uses asynchronous batching.
Python agent (async, batching)
#!/usr/bin/env python3
import asyncio
import aiohttp
import time
import psutil
import json
from datetime import datetime
CLICKHOUSE_URL = "https://clickhouse.example.com:8443/?query=INSERT%20INTO%20telemetry_raw%20FORMAT%20JSONEachRow"
API_KEY = "REPLACE_WITH_API_KEY"
async def gather_metrics():
cpu = psutil.cpu_percent(interval=None)
mem = psutil.virtual_memory().percent
temp = None
try:
import subprocess
temp = float(open('/sys/class/thermal/thermal_zone0/temp').read()) / 1000.0
except Exception:
pass
return [
{"timestamp": datetime.utcnow().isoformat(), "device_id": "pi-01", "metric": "cpu_percent", "value": cpu, "tags": {}},
{"timestamp": datetime.utcnow().isoformat(), "device_id": "pi-01", "metric": "mem_percent", "value": mem, "tags": {}},
]
async def send_batch(session, batch):
headers = {"X-Api-Key": API_KEY, "Content-Type": "application/json"}
body = '\n'.join(json.dumps(row) for row in batch)
async with session.post(CLICKHOUSE_URL, data=body.encode('utf-8'), headers=headers) as resp:
text = await resp.text()
if resp.status != 200:
print('Error:', resp.status, text)
async def run():
async with aiohttp.ClientSession() as session:
batch = []
while True:
metrics = await gather_metrics()
batch.extend(metrics)
if len(batch) >= 20:
await send_batch(session, batch)
batch = []
await asyncio.sleep(5)
if __name__ == '__main__':
asyncio.run(run())
Deployment tips:
- Wrap this script in a systemd service (or use balena / Fleet device manager) for auto-start and logging.
- Use device-specific names and include firmware & agent version in tags for observability.
- Batch to reduce requests; use exponential backoff on failures.
Step 3 — Secure, production-ready ingestion
Even for demos, security and reliability matter. Follow these practices:
- Use HTTPS and strong TLS configurations on ClickHouse or via a reverse proxy (NGINX, Caddy).
- API keys or mTLS to authenticate Pi agents. ClickHouse supports client certificates and HTTP headers.
- Edge buffering for intermittent networks. Vector or Telegraf on Pi can buffer to disk and retry.
- Rate limit and input validation at the gateway to avoid malformed inserts.
Step 4 — Scaling beyond the demo
When you move from 10 Pis to thousands, architecture shifts. Here are recommended patterns:
- Use Kafka as a durable decoupling layer. ClickHouse Kafka engine + materialized view can stream data into MergeTree tables.
- Autoscale ClickHouse shards or use ClickHouse Cloud for managed scaling.
- Partitioning & TTL strategies: daily partitions, shorter retention for raw data, longer for aggregates.
- Monitoring: Use system tables (system.metrics, system.parts) and Grafana alerts to detect back-pressure or compactions. Also have a plan for vendor SLAs and escalation.
Step 5 — Grafana dashboards and queries
Grafana connects to ClickHouse via the ClickHouse datasource plugin. Use the pre-aggregated materialized view for fast panels. Example query for average CPU per device in the last 15 minutes:
SELECT
minute,
avg(sum_value) AS avg_cpu
FROM telemetry_minute
WHERE metric = 'cpu_percent'
AND minute >= now() - INTERVAL 15 minute
GROUP BY minute
ORDER BY minute
Tips:
- Use template variables for device_id to quickly switch devices in Grafana.
- Leverage Grafana's streaming panels or 5s refresh for near-real-time visibility.
- For many devices, use downsampling (per-minute or per-5-min aggregates) to keep panels responsive.
Advanced strategies — low latency, high cardinality, and cost
ClickHouse is fast, but edge and IoT workloads have unique constraints:
Handling high-cardinality tags
Device IDs are high-cardinality by nature. Use LowCardinality(String) for tags when appropriate, and avoid storing hundreds of dynamic tag keys per row. For very dynamic labels, consider a separate tag table linked by device_id.
Reducing query latency
- Pre-aggregate in materialized views.
- Use SummingMergeTree or AggregatingMergeTree for frequent aggregated queries.
- Keep order keys aligned with common query patterns (device, time).
Cost optimization
Retention + compression = cost control. Use TTL to drop raw data and keep aggregates. ClickHouse has excellent columnar compression — choose appropriate codecs (ZSTD) and level tuning.
Observability & troubleshooting
Key checks when things go wrong:
- ClickHouse system tables:
system.mutations,system.parts,system.asynchronous_metrics. - Agent logs on Pi: ping times, failed HTTP statuses, batching backlog.
- Network: NAT, firewall rules, and TLS handshake issues (use openssl s_client to debug).
- Have incident playbooks and recovery runbooks informed by broader incident response best practices.
Case study: 100 Pi fleet demo (example timeline)
Here's a concise runbook for a demo you can build in a weekend and present by Monday:
- Day 1 morning: Stand up ClickHouse in Docker on a cloud VM or use ClickHouse Cloud trial; create tables and materialized views.
- Day 1 afternoon: Build the Python telemetry agent and test local inserts via curl.
- Day 1 evening: Deploy the agent to 10 Pis via Ansible or balena; verify ingestion and Grafana panels.
- Day 2 morning: Add additional metrics (AI inference latency, model confidence from AI HAT+ 2 if available) and update schema.
- Day 2 afternoon: Introduce an edge aggregator (Vector) to demonstrate buffering and resilience; show network failure and recovery scenario.
- Day 2 evening: Polish Grafana dashboard, add alerts, and prepare demo scripts (simulate load, show scaling).
Security and privacy considerations
For real deployments, remember:
- Encrypt data in transit (TLS). For Pi fleet, consider mutual TLS if you can manage cert provisioning.
- Limit sensitive telemetry; anonymize or hash identifiers if privacy concerns exist.
- Use least-privilege API keys; rotate keys periodically.
- Automate safe backups and versioning as part of your pipeline (see Automating Safe Backups patterns) to prevent data loss when agents misbehave.
Trends and future predictions (2026+)
What to expect and how to position your demos:
- ClickHouse will continue to add streaming-friendly connectors and cloud features (2025 funding accelerated product expansion); expect tighter integrations with Kafka, Snowflake-like features, and managed ClickHouse offerings in 2026.
- Raspberry Pi hardware (Pi 5 + AI HATs) is turning the Pi into a credible edge inference device — include inference metrics in your telemetry to showcase AI at the edge.
- Edge-first analytics pipelines will become more common: local aggregation, privacy-preserving sampling, and selective cloud sync. See work on edge AI emissions and pipeline strategy for guidance.
Common pitfalls and how to avoid them
- Pitfall: Sending one HTTP request per metric. Fix: Batch JSONEachRow inserts.
- Pitfall: Unbounded high-cardinality tags causing huge table sizes. Fix: Normalize tags or store them in a secondary table.
- Pitfall: Relying on raw data for dashboards. Fix: Pre-aggregate with materialized views.
Actionable checklist
- Provision ClickHouse (Docker / Cloud) and create telemetry_raw and telemetry_minute.
- Build and test the Python agent on one Pi; validate data in ClickHouse with SELECT queries.
- Connect Grafana and import a simple dashboard that queries telemetry_minute.
- Secure the ingestion endpoint (TLS + API key) and deploy agents to the fleet with systemd or balena.
- Introduce an edge aggregator when scaling beyond tens of devices for resilience.
Resources and further reading
- ClickHouse HTTP interface docs — for JSONEachRow insert syntax and tuning.
- Grafana ClickHouse plugin documentation — for datasource setup and best practices.
- Raspberry Pi OS and balena docs — for fleet deployment and OS imaging.
Final notes — why this project is a great demo
This tutorial compresses modern edge and analytics practices into a reproducible project. It addresses common pain points — fragmented telemetry, slow insights, and brittle demos — by demonstrating batching, schema design, and aggregation patterns that are production-relevant. In 2026, with ClickHouse's growth and Raspberry Pi's edge capabilities, this stack is both practical and impressive to stakeholders.
Takeaways:
- Use ClickHouse for fast, columnar IoT analytics and pre-aggregate with materialized views for real-time dashboards.
- Start simple (direct HTTP inserts), then add buffering and Kafka when scaling.
- Secure ingestion and plan retention; compress and TTL old raw data.
- Leverage Pi 5 + AI HAT+ 2 for additional inference telemetry to demonstrate edge AI metrics.
Call to action
Ready to build this demo? Start by spinning up a ClickHouse instance and deploying the Python agent to one Pi. If you want a jump-start, download the example repository (includes Docker, ClickHouse DDL, and the agent) on our GitHub and join the CodeWithMe community to share dashboards and get feedback. Ship a working real-time demo this week — and show how edge + analytics can transform telemetry into immediate insights.
Related Reading
- Deploying Generative AI on Raspberry Pi 5 with the AI HAT+ 2: A Practical Guide
- Storage Cost Optimization for Startups: Advanced Strategies (2026)
- How Edge AI Emissions Playbooks Inform Composer Decisions (2026)
- Embedding Observability into Serverless Clinical Analytics — Evolution and Advanced Strategies (2026)
- Integrating Micro-Apps with Smart Garage Systems: DIY Dashboards Without Coding
- Convenience Retailing for Jewelers: Lessons from Asda Express’s Expansion
- How to Create a Stylish, Compact Home Cocktail Station Using Shelving and Lighting
- How Minecraft Streamers Can Use Bluesky LIVE Badges to Grow Viewership
- How to Build an Affordable Travel Art Collection on Vacation
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Optimizing UX for Navigation: Lessons from the Google Maps vs Waze Debate
Moderator’s Toolkit: Managing Community-Contributed Micro Apps and Mods
Building Resilient On-Device Assistants: A Developer Guide After Siri’s Gemini Shift
Micro App Monetization for Developers: From Free Tools to Paid Add-Ons
Unpacking Apple’s Future: What 20+ New Products Mean for Developers
From Our Network
Trending stories across our publication group