Scale and Performance

Microbus is designed to scale horizontally without bottlenecks and to perform well per instance. The two are complementary axes - scaling adds throughput by adding instances; performance pushes the throughput each instance can deliver. The substrate works hard at both, and the work is done at the layer where it matters rather than left to the operator to figure out at deploy time.

This page consolidates how each axis works. Each subsection links to the deeper doc for that mechanism.

Scaling Horizontally

Every component of a Microbus deployment scales horizontally. There is no central element that becomes a bottleneck as the system grows - the transport, the microservices, the workflow engine, the cache, the configurator, the trust-root services, and the ingress all add capacity by adding instances rather than upgrading hardware.

NATS Cluster

NATS is the transport layer and the foundation of the substrate’s scalability. A NATS cluster forms a full mesh: every broker connects to every other broker, and any message traverses at most two nodes regardless of cluster size. There is no central broker, no leader, and no shared state that bounds throughput.

Adding capacity is operationally cheap:

Add a NATS node to absorb more publishers, subscribers, and message volume.
Span clusters across availability zones via NATS clustering.
Span regions via NATS supercluster (gateway connections).

The framework treats clustering as transparent: microservices connect to any node and the cluster routes their traffic. Connection-level concerns - TLS, credentials, slow-consumer policy - are documented in NATS Connection Settings.

Replicating Microservices

Every microservice is independently replicable. Multiple instances of the same microservice subscribe to the same NATS subjects, and the broker delivers each request to exactly one of them. The result is built-in load balancing across replicas with no external load balancer required.

Three properties make replica scaling trivial:

Dynamic discovery. A new replica becomes discoverable the instant it subscribes; a stopping replica drops out the instant it unsubscribes. There is no service registry to update and no DNS TTL to wait through.
Load balancing. The NATS queue group dispatches each message to one random replica. Load distribution is automatic and stateless.
Locality-aware routing. Unicast requests prefer replicas in the same availability zone as the caller, reducing cross-AZ traffic and tail latency at scale.

Adaptable topology covers the patterns - simple replication of a bundled application, weighted replication when one microservice needs more capacity than its siblings, asymmetrical hardware, and individually-wrapped applications - with diagrams.

Replicating Core Microservices

Every core microservice replicates the same way solution microservices do, since they all share the same Connector. There is no special path:

The configurator, the HTTP ingress, the HTTP egress, the SMTP ingress, the metrics aggregator, the OpenAPI portal, the MCP portal, and the LLM bridge all scale by adding replicas.
The trust-root services - access token and bearer token - replicate too. Each replica signs independently with its share of the issuing keypair.

The HTTP ingress is the one that operators most often replicate behind a load balancer, since it sees the external traffic. The rest typically need fewer replicas because their workloads are smaller, but the ceiling is operator’s choice.

Replicating the Foreman

The Foreman orchestrates agentic workflows and is itself a microservice. It scales the same way every other microservice does: deploy multiple replicas, and the broker distributes pending steps across them. There is no leader, no coordinator. Each replica picks up steps independently from the SQL store.

Two scaling levers specific to the Foreman:

Replica count. More foreman replicas means higher concurrent step throughput. The replicas coordinate through the database; no in-memory shared state.
Database sharding. The Foreman’s NumShards config distributes flows across multiple database instances. Each shard is opened and migrated independently. Shards can be added dynamically (but never removed).

A workload that runs many concurrent flows scales by adding foreman replicas. A workload whose database becomes the bottleneck scales by sharding - executing a flow involves frequent reads and writes per step (state load, step record append, transition evaluation, state save), and sharding distributes that I/O across multiple database instances. The two levers are independent.

Distributed Cache

The distributed cache is per-microservice and replica-scoped. Cache entries are sharded across the replicas of a microservice via consistent hashing. Adding a replica increases cache capacity proportionally; removing one reduces capacity but does not invalidate other replicas’ entries.

Two important properties:

Capacity scales with replica count. Doubling the replicas doubles the cache capacity for that microservice.
No central cache. Cross-microservice cache sharing is deliberately absent. Each microservice’s cache is its own; coupling microservices through a shared cache is the anti-pattern this design rules out.

Application Bundling and Scale

How microservices are bundled into applications is independent of broker-level enforcement, but it does shape scaling in practice:

Bundling several microservices into one application means they scale together. One replica adds capacity to all of them at once.
Splitting a high-traffic microservice into its own application lets it scale independently. The framework’s topology tooling makes this an operator decision rather than a code change.
Application bundling covers the security implications of these choices, which often correlate with scaling decisions (trust roots get their own application; broad-ACL services get their own).

Performance

Per-instance performance is the other half of throughput. Benchmarks indicate Microbus is capable of processing upward of 94,500 req/sec on a 10-core MacBook Pro M1 CPU connected to a messaging bus on localhost. The performance work happens at the layer where it matters - in the runtime, in the transport, and in the routing.

Go Runtime

Microbus is written in Go. Microservices are not heavy OS processes - they are goroutines that consume messages from NATS subscriptions. The runtime properties this gives the framework:

Lightweight concurrency. A microservice can serve thousands of concurrent requests as cheap goroutines. There is no thread-per-request overhead, no thread pool tuning.
Fast startup. A bundled application of dozens of microservices starts in seconds. Iteration is tight; restarts during local development are not a wait.
Small memory footprint. A microservice’s working set is its goroutines, its handlers, and its NATS connection buffers - on the order of megabytes per microservice, not hundreds.
Static binaries. A Microbus application compiles to a single binary. No interpreter, no JVM, no installed dependency tree on the host. Container images are small and start fast.

These are properties of Go itself, made available throughout the framework by writing every layer in Go - the Connector, the core microservices, and the codegen all run in the same runtime.

Persistent Multiplexed Connections

Microservices connect to NATS over a persistent multiplexed connection. The connection is established at startup and stays open for the microservice’s lifetime, monitored constantly by NATS pings and reconnected automatically on failure.

Multiplexing means many in-flight requests share one connection:

No per-request connection setup. Unlike HTTP/1.1’s connection-per-request (or the connection pool dance of HTTP/1.1 keep-alive), every Microbus request reuses the same connection.
No head-of-line blocking. Multiple concurrent requests over the same connection do not block each other.
Lower kernel overhead. Fewer sockets, fewer file descriptors, fewer TCP handshakes.

The result is dramatically less per-request overhead than HTTP-on-TCP, especially for chatty service-to-service patterns.

Short-Circuit Transport

When two microservices are bundled into the same application (and therefore the same OS process), service-to-service calls between them can take a short-circuit transport - in-process Go channels instead of NATS.

The short-circuit path:

Skips the NATS round-trip entirely.
Skips message serialization and deserialization (where the language stack permits).
Drops latency to the bare cost of a goroutine context switch.

The framework chooses the short-circuit path automatically when both endpoints are in-process. Cross-application traffic always goes through NATS, which is what gives the security model its broker-side enforcement. Operators tune this lever via application bundling: bundle for performance, separate for isolation.

Time Budget and Fail-Fast Patterns

Two transport-layer patterns avoid wasted work and tail-latency cliffs:

Time budget. A depleting timeout that propagates downstream along the call chain. A request with 200ms left does not waste cycles starting a 500ms downstream call - it fails immediately. This bounds tail latency cleanly even when the call graph is deep.
Ack or fail fast. The server acknowledges receipt of a request before processing it. The client knows whether to wait for a response (ack received) or fail quickly (no ack). This avoids the silent-timeout failure mode where the client hangs against a server that never received the message.

The combined effect is that overload manifests as fast failures rather than queue buildup and cascading slowness.