Edge AI / Privacy-First Monitoring

Noctivana

The project treats privacy and safety as co-equal constraints: actionable alerts matter, but raw nursery media should not become the system's default output.

Noctivana is a privacy-first infant monitoring system built around a Raspberry Pi 4 edge stack, local inference, sensor fusion, BLE fallback, and a companion mobile app. It monitors prone sleep, face occlusion, respiratory absence, cry events, and room conditions on-device, then emits small alert payloads instead of pushing nursery media into a cloud pipeline.

PythonEdge AIRaspberry PiSensor FusionReact NativeMQTT

Open Repository Return to Index

Prone Detection

9/10

documented acceptance-test result

Alert Latency

7.2s P95

reported from the validation docs

Soak Run

11h

continuous operation noted in repo documentation

The Core Problem

Baby monitors stream video. This one reasons locally.

Commercial infant monitors transmit raw nursery video to cloud servers. Any latency, connectivity issue, or data breach exposes the most private space in a home. The cloud is doing pattern recognition, not you.

Noctivana inverts that model. Inference runs on a Raspberry Pi 4 at the bedside. MoveNet Lightning INT8 classifies pose at ~20ms per frame. YAMNet INT8 classifies audio in ~5ms. ZMQ routes signals between eight independent services. Only compact JSON alert payloads ever leave the device.

Zero video. Zero audio. Zero raw sensor streams. Privacy as a hardware constraint, not a policy checkbox.

Cloud baby monitors

Always-on video

streaming to remote servers

Noctivana

0 media packets

verified by packet-capture audit

Noctivana privacy proof — packet-capture showing zero media packets transmitted

Expand

Packet-capture validation: no video, audio, or raw sensor streams ever leave the device. Alerts only.

Architecture

8 Services. One ZMQ Bus.

An XPUB/XSUB proxy at the center means any service can publish or subscribe without coupling to another service's lifecycle. Services crash and restart independently.

Noctivana ZMQ bus architecture — XPUB/XSUB proxy connecting 8 services

Expand

The XPUB/XSUB proxy decouples producers from consumers. All topic routing is zero-copy at the proxy layer.

Service Registry — CPU Budgets on Raspberry Pi 4

Service	Role	Publishes	CPU Budget
zmq_proxy	XPUB/XSUB central hub, :5555/:5556	—	~1%
vision_service	MoveNet inference, face occlusion, night mode	vision/pose · vision/occlusion · vision/motion	~55%
audio_service	YAMNet classification, dB monitor, breath detect	audio/cry · audio/dblevel · audio/breath	~15%
vitals_service	Optical flow respiratory rate, rPPG (experimental)	vitals/resp · vitals/resp_absence	~18%
env_service	SCD40, SGP30, SHT31 I2C sensors at 1Hz	env/climate · env/alert	~3%
alert_engine	Fusion rules, suppression logic, rate limiting	MQTT edgewatch/alert/*	~4%
session_manager	SQLCipher session storage, AES-256	—	~2%
ble_service	BLE GATT fallback notification layer	GATT notify	~2%

Noctivana full system overview — edge stack, alert path, and mobile app

Expand

End-to-end system: ceiling camera → edge inference → ZMQ bus → alert engine → MQTT → phone.

Vision Pipeline

Pose inference, occlusion detection, optical-flow respiration — no GPU.

The vision_service captures frames at 5fps ceiling-mounted, crops the crib ROI, upscales it to MoveNet's 192×192 input, runs INT8 quantized inference, and publishes 17 keypoints to the ZMQ bus in ~150ms total.

Face occlusion is detected by comparing face-keypoint confidence against body-keypoint confidence. If face drops below 0.20 while body remains above 0.15 for three sustained seconds, the alert engine receives a face_occlusion event.

Night mode switches to IR with CLAHE preprocessing, recovering ~+12% keypoint confidence in low-light. Optical-flow respiration (Farneback) tracks chest movement with a mean absolute error of 0.384 bpm against synthetic reference across the test range.

MoveNet INT8 inference

~20ms

median on Raspberry Pi 4

Optical-flow respiration MAE

0.384 bpm

synthetic 15–60 bpm range

rPPG status

Experimental

≥20×20 px face required; not used in fusion

CLAHE night gain

+12%

keypoint confidence recovery in IR mode

Noctivana vision pipeline — ROI crop, MoveNet INT8, keypoint classification, night-mode path

Expand

The vision pipeline runs entirely on Raspberry Pi 4. No GPU. No cloud. Frame → crop → INT8 inference → ZMQ publish in ~350ms per frame.

Noctivana alert fusion logic — multi-signal evaluation and suppression rules

Expand

The alert engine receives all ZMQ topics. Fusion rules evaluate sustained conditions — not instantaneous spikes — to minimise false positives.

Alert Fusion Engine

Six rules. Sustained conditions. Caregiver-presence suppression.

The alert_engine subscribes to every ZMQ topic and evaluates fusion rules continuously. Rules fire only after sustained conditions are met — prone must persist ≥5s, respiratory absence must persist >15s while motion is "still". Instantaneous spikes are ignored.

CRITICAL alerts (prone, occlusion, respiratory absence, high CO₂) have 300–120s cooldowns to prevent alert storms. WARN alerts (temp, loud events) run on 60s cooldowns. All CRITICAL alerts are simultaneously published via MQTT and notified over BLE GATT as fallback.

Caregiver suppression: the alert engine uses skeleton size heuristics to detect an adult's presence. While a larger skeleton is visible and classified as in-motion, CRITICAL alerts are held until the adult exits the frame.

Alert Fusion Rules — Full Specification

Rule	Trigger Condition	Severity	Alert Type	Cooldown
R1	prone ≥ 5s AND motion ≠ restless	CRITICAL	prone_position	300s
R2	face_conf < 0.20 sustained > 3s AND body_conf > 0.15	CRITICAL	face_occlusion	300s
R3	no respiratory signal > 15s AND motion = still	CRITICAL	resp_absence	120s
R4	CO₂ > 1500 ppm	CRITICAL	co2_high	60s
R5	temperature > 28°C	WARN	temp_high	60s
R6	dB SPL > 70 sustained for 5s	WARN	loud_event	60s

End-to-End Latency

~5.8s average time-to-alert. P95 under 8s.

The 5-second sustained-condition window dominates the budget. The actual signal transport from camera capture to phone notification adds ~860ms of infrastructure latency — the majority being camera exposure time and network delivery.

ZMQ proxy routing costs only 0.215ms mean for a 218-byte payload. The fusion evaluation at 100ms is a deliberate hold — not a bottleneck — to verify the alert condition is sustained before emitting.

Infrastructure latency

860ms

camera→fusion→MQTT→phone, excluding hold window

Alert latency P95

7.2s

documented acceptance-test result

ZMQ routing (P95)

0.295ms

218-byte payload, 300 runs

Fusion hold window

5000ms

required sustained-condition check

Per-Stage Latency Budget

Camera capture + frame ready

200ms

ROI upscale + MoveNet INT8 inference

150ms

Pose classification

10ms

ZMQ publish + proxy routing

20ms

Alert engine fusion evaluation

100ms

MQTT broker publish

80ms

Network delivery to phone

200ms

App notification display

100ms

Per-path delivery total

860ms

+ Fusion hold (5s sustained event)

5000ms

Time to first alert (avg)

~5.8s

Performance Evidence

Desktop Benchmarks — Separated from Device Validation

These benchmarks were run on an x86-64 development machine, not on the Raspberry Pi 4. They establish inference feasibility — model speed, ZMQ routing overhead, optical-flow accuracy — rather than production device-level measurements.

Model / System	Role	Mean	Median	Min	Max	Runs	Note
YAMNet INT8	Cry / audio classification	4.8ms	4.36ms	4.02ms	6.12ms	50	521-class output, 96kB model, x86-64 development machine
MoveNet Lightning INT8	Pose keypoint detection	19.74ms	18.78ms	18.59ms	31.37ms	50	17 keypoints × (y, x, confidence), 2.8MB model
ZMQ Bus (XPUB/XSUB)	Intra-device message routing	0.215ms	0.202ms	0.17ms	1.239ms	300	218-byte payload, P95: 0.295ms, P99: 0.418ms
Optical Flow Respiration	Chest-movement respiratory rate	0.384 bpm MAE	—	0.09 bpm error	1.07 bpm error	10	Farneback on synthetic frames (15–60 bpm range). Real-world: 82% within ±4 bpm

YAMNet INT8: 4.8ms Mean Inference

Audio Classification

521-class audio classification at 4.8ms mean on x86-64. The model processes 0.975-second audio windows. On Raspberry Pi 4 (ARM Cortex-A72), inference is approximately 3–5× slower, keeping it well within the audio service's 100ms processing budget at 10fps audio sampling.

MoveNet Lightning: 19.74ms Mean on Desktop

Pose Keypoint Detection

2.8MB model, 17 keypoints × (y, x, confidence). On Raspberry Pi 4 with INT8 quantization and TFLite delegate, the vision service achieves ~150ms per frame including ROI preprocessing — the dominant per-frame cost, not the model itself. CPU budget: ~55% of a single Pi 4 core.

ZMQ Bus: 0.215ms Mean, 1.239ms Max

Intra-Device Message Routing

The XPUB/XSUB proxy handles 218-byte JSON payloads with P95 at 0.295ms and P99 at 0.418ms across 300 runs. The 1.239ms max spike is an outlier — likely OS scheduler jitter. At 5fps vision + 10fps audio + 1Hz env, the bus handles ~16 messages per second, far below its saturation point.

Optical Flow: 0.384 bpm MAE

Chest-Movement Respiration Rate

Farneback optical flow on synthetic frames across the 15–60 bpm range achieves 0.384 bpm mean absolute error. Real-world device testing shows 82% of 30-second windows within ±4 bpm against reference. The rPPG implementation is labeled experimental — at 1.5m ceiling distance, the face occupies ~20×20 pixels, dominated by interpolation artefacts.

Engineering Honesty

8 Known Limitations, Documented in Full

These are real constraints discovered during development and acceptance testing, not theoretical edge cases. Each has a documented mitigation. None are hidden.

Known Limitations

01 — Perception

02 — Physics

03 — Geometry

04 — Hardware

05 — Thermal

06 — Connectivity

07 — Scope

08 — Coverage

Perception—Limitation 1 of 8

IR Occlusion: 8/10 vs 9/10 Target

Issue

Thin IR-transmissive fabrics (e.g. muslin) cannot be distinguished from an uncovered face by keypoint confidence alone. The algorithm improved from 6/10 to 8/10 with the temporal filter, but the 9/10 target remains unmet in IR mode.

Mitigation

Occlusion algorithm switches to full face-keypoint-dropout rule at night; CLAHE recovers ~+12% keypoint confidence.

Physics—Limitation 2 of 8

rPPG Unreliable at Ceiling Distance

Issue

At 1.5m ceiling distance, the face region is approximately 20×20 pixels. Green-channel variation is dominated by interpolation artefacts rather than actual perfusion signal. Implemented as a proof-of-concept, labeled "experimental": true in all payloads.

Mitigation

rPPG is not used in any fusion rule. Primary respiratory monitoring is optical-flow-based.

Geometry—Limitation 3 of 8

Side Position Detection ~70%

Issue

The shoulder–hip rotation metric is geometrically ambiguous when viewed from directly above: a baby lying at an angle between supine and true side-lying produces similar keypoint patterns. Detection is unreliable for this position.

Mitigation

Prone and supine detection remain robust. Side-lying is logged as a warning, not a CRITICAL trigger.

Hardware—Limitation 4 of 8

SGP30 Occasional Zero Reads

Issue

After 2+ hours of continuous operation, the SGP30 TVOC/eCO2 sensor occasionally produces zero readings. Root cause appears to be I2C timing or baseline drift after extended uptime.

Mitigation

Mitigated by last-known-good value substitution and WARNING log entry. Environmental alerts remain active during substitution.

Thermal—Limitation 5 of 8

Thermal Throttling After 7+ Hours

Issue

Raspberry Pi 4 reaches 72°C peak with passive heatsink during extended operation. The OS throttles the CPU above 75°C. A fan is required for indefinite continuous operation.

Mitigation

Thermal-aware mode: drops to 3fps and pauses rPPG above 75°C. Low-power mode at 2fps when baby is still.

Connectivity—Limitation 6 of 8

BLE Reconnection Fragility

Issue

Android drops BLE connections after ~5 minutes of idle. The keepalive workaround (30-second ping) is functional but not robust. Proper GATT connection parameters are needed for reliable long-session BLE.

Mitigation

MQTT is the primary alert path. BLE is labeled a fallback. MQTT remains active and unaffected by BLE state.

Scope—Limitation 7 of 8

Single-Crib Only

Issue

The system defines one ROI per camera. Multi-crib or twin monitoring is not supported. The caregiver suppression logic assumes the larger skeleton is always the adult.

Mitigation

Documented scope constraint. Multi-crib support identified as future work requiring a second camera or wider-angle lens.

Coverage—Limitation 8 of 8

No Automated Unit Tests

Issue

Integration testing was performed manually during development due to timeline pressure. The fusion logic is verified via benchmark.py (13/13 tests pass), but individual service unit tests were not written.

Mitigation

Acceptance tests cover system-level correctness. Fusion logic tested synthetically. Hardware integration manual.

Validation

10 Acceptance Tests. 9 PASS. 1 MARGINAL.

System-level tests against the full Raspberry Pi 4 stack with a real crib setup, infant mannequin, and reference sensors. These are the primary evidence artifacts — not desktop inference timings.

Test	Requirement	Result	Status
Prone detection (mannequin)	9/10 scenarios	9/10	PASS
Face occlusion (daytime)	9/10 scenarios	9/10	PASS
Face occlusion (IR night mode)	9/10 scenarios	8/10	MARGINAL
Respiratory rate accuracy	±4 bpm in 80% of windows	82% within ±4 bpm	PASS
Temperature accuracy	±1°C vs reference	±0.8°C	PASS
Humidity accuracy	±5% RH vs reference	±4.2% RH	PASS
False CRITICAL alerts	< 3 per 8-hour session	2.1 avg	PASS
Alert latency P95	< 8s in 95% of tests	7.2s	PASS
Zero video/audio transmitted	Packet capture audit	Zero packets	PASS
Continuous operation	10-hour uptime	11h 2min	PASS

Privacy Proof

0 packets

Wireshark packet-capture audit across full 10-hour soak run. Zero video, audio, or raw sensor packets transmitted.

Alert Latency P95

7.2s

Measured end-to-end from event onset to phone notification. Requirement was <8s. Sustained-event hold window accounts for 5s of this budget.

Continuous Operation

11h 2min

Single uninterrupted soak run. Requirement was 10 hours. Thermal throttling observed after 7h; mitigated by adaptive fps reduction above 75°C.

Next Case Study

NullRing

The project asks a narrow question and answers it honestly: once the handoff path is reduced to the essentials, the remaining latency belongs as much to the machine as it does to the code.

View Case Study