The problem

An ultrafiltration machine runs continuously and can’t simply be rebooted — if the control software stops, water treatment stops. On top of that, operators run these machines across many sites with no reliable way to see how each unit is doing without standing in front of it.

What it does

On the machine, Guardian drives the process and shows operators a live, animated diagram of the whole system — valves, pumps, membranes, pressures — and raises alarms the moment something drifts. In the cloud, it gives a fleet-wide view of every machine’s health and history, and lets a technician run a process (like a backwash) on a specific machine remotely.

The hard part

This is a genuinely complex system. The parts I’m most proud of:

  • It can’t go down. Built on Elixir/OTP so that if any one piece fails it restarts in isolation without taking the rest with it. The unusual bit: it also watches for the subtler failure — a process that’s technically alive but frozen on stuck hardware I/O — and self-heals that too, with cooldowns so it never thrashes. The machine stays protected even if the on-screen UI crashes.
  • Live, not polled. When a valve opens, the change is pushed to both the on-machine screen and the cloud dashboard as a tiny delta, so the diagram animates in real time instead of refreshing on a timer.
  • It speaks the machine’s language reliably. Talking to industrial controllers over Modbus is a minefield of addressing and byte-order gotchas. Guardian standardised it into one convention with automatic decoding, killing a whole class of “the sensor reads 32 million instead of 5 bar” field bugs.
  • Alarms that match how operators actually work. Each alarm tracks two separate things — whether the problem is fixed, and whether a human has seen it — so “resolved but unacknowledged” is its own state. Every transition is logged for audits.
  • Offline-first. The machine keeps running and recording entirely on its own; telemetry batches up to the cloud when the network is there, and never blocks the live screen when it isn’t.
  • Automation operators define themselves. They compose conditions visually — “if membrane pressure is high and the inlet valve is open, start a backwash” — with no code, and the machine fires the sequence when the condition first becomes true.
  • Remote fleet control, safely. The cloud sends commands to a specific machine over a message bus rather than exposing the machine to the internet, and each distributor only ever sees their own units.

What makes it distinctive

  • Self-healing the “alive but stuck” failure mode, not just crashes — the failure that usually means a 3am call.
  • Real-time, delta-based state sync (a web/mobile technique) brought into industrial monitoring, so dashboards feel instant instead of laggy.
  • One shared diagram definition is the single source of truth — edited on the machine, shown identically in the cloud, with no drift between the two.

Outcome

Machines that keep running and protecting the process through faults and network drops, operators who can see and control the whole fleet remotely, and a complete audit trail for compliance. In production on Vontech ultrafiltration machines.

← all work