$ cat beam-vs-jvm-real-time-iot.mdx

I Compared Akka and Elixir for Fault-Tolerant Systems. The Differences Are Structural.

2026-03-07

elixirerlangakkaarchitecturedistributed-systems

I've been researching how different actor model implementations handle fault-tolerant, real-time workloads. The kind of systems where you have thousands of concurrent device connections, sensor streams that can't drop, and deployments that can't cause downtime. Both Akka (Scala/JVM) and Elixir (Erlang/BEAM) implement the actor model. Both have production track records. But the deeper I dug, the more I realized the differences aren't surface-level API choices. They're baked into the VM.

##Both use actors. They don't use them the same way.

On paper, Akka and Elixir look similar. You spawn lightweight processes, they communicate via message passing, they don't share memory. The actor model in both cases.

The difference is what sits underneath. Akka actors run on the JVM — a runtime designed for general-purpose computing with shared-memory threads, JIT compilation, and a unified garbage collector. Elixir processes run on BEAM — a runtime designed at Ericsson in the '80s specifically for telecom switches that needed 99.999% uptime across thousands of concurrent connections.

That origin story matters. BEAM didn't bolt on concurrency primitives after the fact. Isolation, fault tolerance, and distribution are the foundation. Akka built excellent concurrency abstractions on top of a VM that wasn't designed with those as first principles. The abstractions are good. But abstractions have edges, and those edges show up under pressure.

##Hot code swapping is a real operational advantage

BEAM lets you deploy new code to a running system without dropping connections. The VM runs multiple versions of a module simultaneously and transitions between them seamlessly. If you have an active data pipeline streaming sensor readings and you need to push a fix, you push it. No restart. No reconnection window. No buffered data at risk of being lost.

Akka on the JVM doesn't have this. Your best option is rolling restarts — spin up new nodes, drain old ones, hope the transition is fast enough. For most web apps, that works fine. For systems where a dropped connection during a critical operation means lost data or lost money, the gap is real.

This is one of those things that sounds like a nice-to-have until you're running something in production where you literally cannot restart the service. Then it becomes the feature.

##Garbage collection tells you a lot about a runtime's priorities

This is the one that surprised me most when I actually dug into the internals.

BEAM garbage collects per-process. Each process has its own heap. When GC triggers, it only affects that single process — microsecond pauses, completely isolated from every other process on the system. Ten thousand processes running, one gets collected, the other 9,999 don't notice.

The JVM uses a shared heap. All threads, all actors, one memory space. Modern collectors like ZGC and Shenandoah have gotten impressively good at reducing pause times. But the fundamental architecture means GC events still ripple across the system. When you need predictable sub-2-second latency on every message, a GC pause at the wrong moment breaks your SLA. On BEAM, that category of problem doesn't exist.

##Process weight compounds at scale

A BEAM process costs roughly 2KB and takes microseconds to spawn. An Akka actor is lighter than an OS thread but still heavier than a BEAM process. At small scale, you don't feel the difference. At thousands of concurrent connections — each with its own state, message queue, and lifecycle — it compounds.

The production numbers speak for themselves. WhatsApp handled 2+ million concurrent TCP connections per server on Erlang. Discord scaled Elixir to 5 million concurrent users, eventually pushing past 12 million with 26 million WebSocket events per second. Five engineers running 20+ Elixir services. Those aren't benchmarks. Those are production systems.

##BEAM's scheduler gives you guarantees Akka can't

BEAM runs its own preemptive scheduler using reduction counting. Every process gets a budget of roughly 4,000 reductions — function calls, arithmetic, message passes — before the scheduler forces a context switch. No single process can starve the others. It's deterministic and baked into the VM.

Akka actors run on JVM thread pools. The JVM relies on OS-level thread scheduling, which doesn't have the same granular control over application-level fairness. A blocking or compute-heavy actor can hog a thread and degrade latency for everything sharing that pool. You can mitigate it with dispatcher configuration, but you're managing around the problem rather than having the runtime solve it for you.

##Distribution is a first-class concept vs an added layer

On BEAM, you send a message to a process on another node with the same syntax as a local process. Supervisors monitor across nodes. Node discovery is native. Distribution isn't a library — it's part of the runtime.

Akka Cluster works and it's proven at LinkedIn and PayPal. But it's configuration you maintain, tune, and debug. For smaller teams, that operational overhead is time not spent on the product.

##Then there's the licensing situation

Lightbend moved Akka from Apache 2.0 to the Business Source License in late 2022. Companies above $25 million in revenue now pay $1,995-$2,995 per core for production use. The community responded by forking Akka into Apache Pekko, which graduated as a Top-Level Project in 2024. Pekko has solid momentum — Play Framework, Apache Flink, and others have adopted it — but you're still betting on a fork of a project whose original maintainer walked away from open source.

Elixir and Erlang are Apache 2.0. No revenue thresholds. No surprises.

##Where Akka still makes sense

The JVM ecosystem is massive. Library availability, tooling maturity, hiring pool — if your team has deep JVM expertise and your workload is more about raw computational throughput than massive concurrency, Akka is a strong choice. HotSpot's JIT compilation will outperform BEAM on pure number crunching. That's not a knock on BEAM. It was never optimized for speed. It was optimized for safety.

Akka has also been battle-tested at serious scale. If your concurrency requirements are moderate and your team already knows the JVM inside out, migrating to BEAM for theoretical purity would be over-engineering.

##What I took away from this

Both Akka and Elixir implement the actor model. But the runtime underneath changes what guarantees you actually get. BEAM gives you per-process GC, preemptive scheduling, hot code swapping, and native distribution not as features on top of the VM, but as the VM itself. Akka gives you excellent actor abstractions on top of a runtime that was designed for different problems.

For greenfield systems where the core requirement is fault-tolerant concurrency at scale — IoT pipelines, real-time ingestion, anything where uptime isn't negotiable — BEAM is the more natural fit. Not because Akka can't do it. It can. But with BEAM you're working with the grain of the runtime instead of around it.