The Mesh

Architecture Decisions

Why The Mesh was built this way — key architecture decision records and the reasoning behind them

Architecture Decisions

Every decision has a reason. If a decision seems wrong, read the context first. If it is still wrong, open a PR.


Decision Summary

#DecisionChoiceWhy
ADR-001Server languageGo 1.24+Goroutines handle hundreds of concurrent WebSocket connections without Node.js event loop bottlenecks.
ADR-002Default databaseSQLite (pure Go)Zero-config, single file, no external process — passes the walkaway test.
ADR-003LicenseAGPL v3Network copyleft prevents cloud providers from taking the code without contributing back.
ADR-004AuthorizationUCAN over OAuthDecentralized proof chains work across federated meshes without a central authority.
ADR-005IdentityDID:key over DID:webSelf-sovereign identity with no DNS or CA dependency.
ADR-006Bot lifecycleKubernetes PodsTrue isolation, resource limits, auto-restart — with subprocess fallback for dev.
ADR-007Frontend stateRedux Toolkit over ZustandEnforced patterns, middleware for WebSocket sync, time-travel debugging via DevTools.
ADR-008DeploymentSingle Go binaryOne process, one port, one log stream — no microservices complexity for self-hosters.
ADR-009Agent protocolsMCP + A2AOpen standards with growing adoption — any MCP-compatible agent connects without Mesh-specific code.
ADR-010Model strategyOSS-firstSelf-hosted models keep data in your mesh; API models supported as convenience fallback.
ADR-011FundingToken over VCOpen-source sovereignty infrastructure should not be owned by a venture fund.
ADR-012UI frameworkTailwind v4, dark-onlyConsistent visual identity, simpler CSS — no dual-theme complexity.

Expanded Highlights

Go Over Node.js (ADR-001)

The original v1 was Node.js/Express with an ECS architecture. Performance issues emerged with real-time WebSocket at scale: memory overhead, event loop bottlenecks under combined WebSocket, bot lifecycle, and file I/O workloads.

Go goroutines handle I/O-heavy concurrent work far more efficiently for this workload profile. The result is a single binary (~20MB), no node_modules, no runtime dependency. The trade-off: lost TypeScript type sharing between server and client, mitigated by the packages/protocol package defining wire types in Zod schemas.

Rust was considered but rejected — too steep a learning curve for contributors, violating the walkaway test.

SQLite as Default (ADR-002)

The Mesh targets solo operators and small teams first. Requiring Postgres or MongoDB for a single-node deployment adds unnecessary infrastructure complexity. SQLite via modernc.org/sqlite (pure Go, no CGo) gives zero-config storage in a single file. Backup is a file copy.

The limitation — single writer, not suitable for horizontal scaling — is mitigated by the storage adapter pattern. The same API surface works with MongoDB for multi-node cloud deployments via the MONGODB_URI environment variable.

AGPL v3 License (ADR-003)

If a cloud provider takes the code, wraps it in a managed service, and never contributes back, the open-source community gets nothing. MIT and Apache allow this explicitly.

AGPL v3 adds network copyleft: if you modify The Mesh and offer it as a service, you must share your modifications. A dual-license commercial option is available for enterprises that need an AGPL exemption. This model is well-established (MongoDB, Elastic, GitLab all used variants).

UCAN Over OAuth (ADR-004)

Traditional auth systems require a central authority to validate tokens. In a federated mesh, there is no central authority. UCANs form cryptographic proof chains verifiable without contacting the issuer.

The Anti-CLU Principle — named for the antagonist in Tron who granted himself escalating privileges — demands that capabilities can only narrow, never expand. An agent spawned with read-only access to one room cannot grant itself write access to all rooms, regardless of what code it runs. UCAN enforces this at the protocol level.

Single Binary, Not Microservices (ADR-008)

The Mesh targets self-hosters running on a single machine or a small cluster. A microservices architecture requires service discovery, inter-service communication, distributed tracing, and operational complexity that violates the walkaway test.

One Go binary owns everything: HTTP API, WebSocket, auth, storage, bot lifecycle, model proxy, federation. One process, one port (4000), one deployment unit. Copy the binary, set environment variables, run it. Horizontal scaling happens at the mesh-to-mesh level (federation), not at the process level.

OSS-First Model Architecture (ADR-010)

Running open-source models on your own hardware is both a sovereignty position and a security decision. Your data never leaves your mesh. No API provider sees your prompts or completions. No third-party trains on your data.

The model proxy at /api/models/v1/chat/completions is OpenAI-compatible, so switching from a centralized API to a self-hosted model (Llama, Mistral, DeepSeek) requires changing a URL, not rewriting integration code. API models are supported as a convenience fallback, not the default path.


Further Reading