Edge AI / Privacy-First Monitoring
Noctivana
The project treats privacy and safety as co-equal constraints: actionable alerts matter, but raw nursery media should not become the system's default output.
Noctivana is a privacy-first infant monitoring system built around a Raspberry Pi 4 edge stack, local inference, sensor fusion, BLE fallback, and a companion mobile app. It monitors prone sleep, face occlusion, respiratory absence, cry events, and room conditions on-device, then emits small alert payloads instead of pushing nursery media into a cloud pipeline.
Prone Detection
9/10
documented acceptance-test result
Alert Latency
7.2s P95
reported from the validation docs
Soak Run
11h
continuous operation noted in repo documentation
The Core Problem
Baby monitors stream video. This one reasons locally.
Commercial infant monitors transmit raw nursery video to cloud servers. Any latency, connectivity issue, or data breach exposes the most private space in a home. The cloud is doing pattern recognition, not you.
Noctivana inverts that model. Inference runs on a Raspberry Pi 4 at the bedside. MoveNet Lightning INT8 classifies pose at ~20ms per frame. YAMNet INT8 classifies audio in ~5ms. ZMQ routes signals between eight independent services. Only compact JSON alert payloads ever leave the device.
Zero video. Zero audio. Zero raw sensor streams. Privacy as a hardware constraint, not a policy checkbox.
Cloud baby monitors
Always-on video
streaming to remote servers
Noctivana
0 media packets
verified by packet-capture audit

Packet-capture validation: no video, audio, or raw sensor streams ever leave the device. Alerts only.
Architecture
8 Services. One ZMQ Bus.
An XPUB/XSUB proxy at the center means any service can publish or subscribe without coupling to another service's lifecycle. Services crash and restart independently.

The XPUB/XSUB proxy decouples producers from consumers. All topic routing is zero-copy at the proxy layer.
Service Registry — CPU Budgets on Raspberry Pi 4
| Service | Role | Publishes | CPU Budget |
|---|---|---|---|
| zmq_proxy | XPUB/XSUB central hub, :5555/:5556 | — | ~1% |
| vision_service | MoveNet inference, face occlusion, night mode | vision/pose · vision/occlusion · vision/motion | ~55% |
| audio_service | YAMNet classification, dB monitor, breath detect | audio/cry · audio/dblevel · audio/breath | ~15% |
| vitals_service | Optical flow respiratory rate, rPPG (experimental) | vitals/resp · vitals/resp_absence | ~18% |
| env_service | SCD40, SGP30, SHT31 I2C sensors at 1Hz | env/climate · env/alert | ~3% |
| alert_engine | Fusion rules, suppression logic, rate limiting | MQTT edgewatch/alert/* | ~4% |
| session_manager | SQLCipher session storage, AES-256 | — | ~2% |
| ble_service | BLE GATT fallback notification layer | GATT notify | ~2% |

End-to-end system: ceiling camera → edge inference → ZMQ bus → alert engine → MQTT → phone.
Vision Pipeline
Pose inference, occlusion detection, optical-flow respiration — no GPU.
The vision_service captures frames at 5fps ceiling-mounted, crops the crib ROI, upscales it to MoveNet's 192×192 input, runs INT8 quantized inference, and publishes 17 keypoints to the ZMQ bus in ~150ms total.
Face occlusion is detected by comparing face-keypoint confidence against body-keypoint confidence. If face drops below 0.20 while body remains above 0.15 for three sustained seconds, the alert engine receives a face_occlusion event.
Night mode switches to IR with CLAHE preprocessing, recovering ~+12% keypoint confidence in low-light. Optical-flow respiration (Farneback) tracks chest movement with a mean absolute error of 0.384 bpm against synthetic reference across the test range.
MoveNet INT8 inference
~20ms
median on Raspberry Pi 4
Optical-flow respiration MAE
0.384 bpm
synthetic 15–60 bpm range
rPPG status
Experimental
≥20×20 px face required; not used in fusion
CLAHE night gain
+12%
keypoint confidence recovery in IR mode

The vision pipeline runs entirely on Raspberry Pi 4. No GPU. No cloud. Frame → crop → INT8 inference → ZMQ publish in ~350ms per frame.

The alert engine receives all ZMQ topics. Fusion rules evaluate sustained conditions — not instantaneous spikes — to minimise false positives.
Alert Fusion Engine
Six rules. Sustained conditions. Caregiver-presence suppression.
The alert_engine subscribes to every ZMQ topic and evaluates fusion rules continuously. Rules fire only after sustained conditions are met — prone must persist ≥5s, respiratory absence must persist >15s while motion is "still". Instantaneous spikes are ignored.
CRITICAL alerts (prone, occlusion, respiratory absence, high CO₂) have 300–120s cooldowns to prevent alert storms. WARN alerts (temp, loud events) run on 60s cooldowns. All CRITICAL alerts are simultaneously published via MQTT and notified over BLE GATT as fallback.
Caregiver suppression: the alert engine uses skeleton size heuristics to detect an adult's presence. While a larger skeleton is visible and classified as in-motion, CRITICAL alerts are held until the adult exits the frame.
Alert Fusion Rules — Full Specification
| Rule | Trigger Condition | Severity | Alert Type | Cooldown |
|---|---|---|---|---|
| R1 | prone ≥ 5s AND motion ≠ restless | CRITICAL | prone_position | 300s |
| R2 | face_conf < 0.20 sustained > 3s AND body_conf > 0.15 | CRITICAL | face_occlusion | 300s |
| R3 | no respiratory signal > 15s AND motion = still | CRITICAL | resp_absence | 120s |
| R4 | CO₂ > 1500 ppm | CRITICAL | co2_high | 60s |
| R5 | temperature > 28°C | WARN | temp_high | 60s |
| R6 | dB SPL > 70 sustained for 5s | WARN | loud_event | 60s |
End-to-End Latency
~5.8s average time-to-alert. P95 under 8s.
The 5-second sustained-condition window dominates the budget. The actual signal transport from camera capture to phone notification adds ~860ms of infrastructure latency — the majority being camera exposure time and network delivery.
ZMQ proxy routing costs only 0.215ms mean for a 218-byte payload. The fusion evaluation at 100ms is a deliberate hold — not a bottleneck — to verify the alert condition is sustained before emitting.
Infrastructure latency
860ms
camera→fusion→MQTT→phone, excluding hold window
Alert latency P95
7.2s
documented acceptance-test result
ZMQ routing (P95)
0.295ms
218-byte payload, 300 runs
Fusion hold window
5000ms
required sustained-condition check
Per-Stage Latency Budget
Performance Evidence
Desktop Benchmarks — Separated from Device Validation
These benchmarks were run on an x86-64 development machine, not on the Raspberry Pi 4. They establish inference feasibility — model speed, ZMQ routing overhead, optical-flow accuracy — rather than production device-level measurements.
| Model / System | Role | Mean | Median | Min | Max | Runs | Note |
|---|---|---|---|---|---|---|---|
| YAMNet INT8 | Cry / audio classification | 4.8ms | 4.36ms | 4.02ms | 6.12ms | 50 | 521-class output, 96kB model, x86-64 development machine |
| MoveNet Lightning INT8 | Pose keypoint detection | 19.74ms | 18.78ms | 18.59ms | 31.37ms | 50 | 17 keypoints × (y, x, confidence), 2.8MB model |
| ZMQ Bus (XPUB/XSUB) | Intra-device message routing | 0.215ms | 0.202ms | 0.17ms | 1.239ms | 300 | 218-byte payload, P95: 0.295ms, P99: 0.418ms |
| Optical Flow Respiration | Chest-movement respiratory rate | 0.384 bpm MAE | — | 0.09 bpm error | 1.07 bpm error | 10 | Farneback on synthetic frames (15–60 bpm range). Real-world: 82% within ±4 bpm |
YAMNet INT8: 4.8ms Mean Inference
Audio Classification
521-class audio classification at 4.8ms mean on x86-64. The model processes 0.975-second audio windows. On Raspberry Pi 4 (ARM Cortex-A72), inference is approximately 3–5× slower, keeping it well within the audio service's 100ms processing budget at 10fps audio sampling.
MoveNet Lightning: 19.74ms Mean on Desktop
Pose Keypoint Detection
2.8MB model, 17 keypoints × (y, x, confidence). On Raspberry Pi 4 with INT8 quantization and TFLite delegate, the vision service achieves ~150ms per frame including ROI preprocessing — the dominant per-frame cost, not the model itself. CPU budget: ~55% of a single Pi 4 core.
ZMQ Bus: 0.215ms Mean, 1.239ms Max
Intra-Device Message Routing
The XPUB/XSUB proxy handles 218-byte JSON payloads with P95 at 0.295ms and P99 at 0.418ms across 300 runs. The 1.239ms max spike is an outlier — likely OS scheduler jitter. At 5fps vision + 10fps audio + 1Hz env, the bus handles ~16 messages per second, far below its saturation point.
Optical Flow: 0.384 bpm MAE
Chest-Movement Respiration Rate
Farneback optical flow on synthetic frames across the 15–60 bpm range achieves 0.384 bpm mean absolute error. Real-world device testing shows 82% of 30-second windows within ±4 bpm against reference. The rPPG implementation is labeled experimental — at 1.5m ceiling distance, the face occupies ~20×20 pixels, dominated by interpolation artefacts.
Engineering Honesty
8 Known Limitations, Documented in Full
These are real constraints discovered during development and acceptance testing, not theoretical edge cases. Each has a documented mitigation. None are hidden.
IR Occlusion: 8/10 vs 9/10 Target
Issue
Thin IR-transmissive fabrics (e.g. muslin) cannot be distinguished from an uncovered face by keypoint confidence alone. The algorithm improved from 6/10 to 8/10 with the temporal filter, but the 9/10 target remains unmet in IR mode.
Mitigation
Occlusion algorithm switches to full face-keypoint-dropout rule at night; CLAHE recovers ~+12% keypoint confidence.
rPPG Unreliable at Ceiling Distance
Issue
At 1.5m ceiling distance, the face region is approximately 20×20 pixels. Green-channel variation is dominated by interpolation artefacts rather than actual perfusion signal. Implemented as a proof-of-concept, labeled "experimental": true in all payloads.
Mitigation
rPPG is not used in any fusion rule. Primary respiratory monitoring is optical-flow-based.
Side Position Detection ~70%
Issue
The shoulder–hip rotation metric is geometrically ambiguous when viewed from directly above: a baby lying at an angle between supine and true side-lying produces similar keypoint patterns. Detection is unreliable for this position.
Mitigation
Prone and supine detection remain robust. Side-lying is logged as a warning, not a CRITICAL trigger.
SGP30 Occasional Zero Reads
Issue
After 2+ hours of continuous operation, the SGP30 TVOC/eCO2 sensor occasionally produces zero readings. Root cause appears to be I2C timing or baseline drift after extended uptime.
Mitigation
Mitigated by last-known-good value substitution and WARNING log entry. Environmental alerts remain active during substitution.
Thermal Throttling After 7+ Hours
Issue
Raspberry Pi 4 reaches 72°C peak with passive heatsink during extended operation. The OS throttles the CPU above 75°C. A fan is required for indefinite continuous operation.
Mitigation
Thermal-aware mode: drops to 3fps and pauses rPPG above 75°C. Low-power mode at 2fps when baby is still.
BLE Reconnection Fragility
Issue
Android drops BLE connections after ~5 minutes of idle. The keepalive workaround (30-second ping) is functional but not robust. Proper GATT connection parameters are needed for reliable long-session BLE.
Mitigation
MQTT is the primary alert path. BLE is labeled a fallback. MQTT remains active and unaffected by BLE state.
Single-Crib Only
Issue
The system defines one ROI per camera. Multi-crib or twin monitoring is not supported. The caregiver suppression logic assumes the larger skeleton is always the adult.
Mitigation
Documented scope constraint. Multi-crib support identified as future work requiring a second camera or wider-angle lens.
No Automated Unit Tests
Issue
Integration testing was performed manually during development due to timeline pressure. The fusion logic is verified via benchmark.py (13/13 tests pass), but individual service unit tests were not written.
Mitigation
Acceptance tests cover system-level correctness. Fusion logic tested synthetically. Hardware integration manual.
Validation
10 Acceptance Tests. 9 PASS. 1 MARGINAL.
System-level tests against the full Raspberry Pi 4 stack with a real crib setup, infant mannequin, and reference sensors. These are the primary evidence artifacts — not desktop inference timings.
| Test | Requirement | Result | Status |
|---|---|---|---|
| Prone detection (mannequin) | 9/10 scenarios | 9/10 | PASS |
| Face occlusion (daytime) | 9/10 scenarios | 9/10 | PASS |
| Face occlusion (IR night mode) | 9/10 scenarios | 8/10 | MARGINAL |
| Respiratory rate accuracy | ±4 bpm in 80% of windows | 82% within ±4 bpm | PASS |
| Temperature accuracy | ±1°C vs reference | ±0.8°C | PASS |
| Humidity accuracy | ±5% RH vs reference | ±4.2% RH | PASS |
| False CRITICAL alerts | < 3 per 8-hour session | 2.1 avg | PASS |
| Alert latency P95 | < 8s in 95% of tests | 7.2s | PASS |
| Zero video/audio transmitted | Packet capture audit | Zero packets | PASS |
| Continuous operation | 10-hour uptime | 11h 2min | PASS |
Privacy Proof
0 packets
Wireshark packet-capture audit across full 10-hour soak run. Zero video, audio, or raw sensor packets transmitted.
Alert Latency P95
7.2s
Measured end-to-end from event onset to phone notification. Requirement was <8s. Sustained-event hold window accounts for 5s of this budget.
Continuous Operation
11h 2min
Single uninterrupted soak run. Requirement was 10 hours. Thermal throttling observed after 7h; mitigated by adaptive fps reduction above 75°C.
Next Case Study
NullRing
The project asks a narrow question and answers it honestly: once the handoff path is reduced to the essentials, the remaining latency belongs as much to the machine as it does to the code.