Method · How we build systems

We prove the code works.

Most agencies hope their code works. We run mathematical verification, drift detection between spec and code, and 8+ destructive tests per function — before a single line reaches production. That's why our systems don't panic at three in the morning.

Pipeline · Spec → Allium → Plan → Impl → Playwright → TLA+
Verification
TLA+ formal · Allium drift · Playwright destructive
Stack
.NET · ASP.NET Core · React · Blazor · TypeScript · SQL Server · Postgres · SQLite · Docker Swarm · Azure · Semantic Kernel
Test coverage
1:1 functional + 8 destructive per surface
AI-augmented
Claude Code · spec register · auto-hooks
In production
6 systems live on live4.se · all demo-open

"We use agile methodology" is the most meaningless thing you can say about software development in 2026. Everyone says it. Nobody can show you what it means. It's not a process — it's a plaque to hang on the wall after a daily stand-up.

Our method can be audited. It leaves artefacts — spec files, Allium reports, TLA+ models, Playwright reports — that you can read and argue with. It catches race conditions before they reach production, drift between specification and implementation before anyone wonders why the code and the documentation no longer agree, and illogical state transitions before an end user finds them for us.

The pipeline is not a secret. It runs on every non-trivial change in every system we own. Below is what each phase does, why it exists, and what it means in practice.

We don't build more features than the customer needs. The features we do build have provable logic behind them.

The pipeline in six phases.

Each phase produces artefacts that humans can read and machines can verify. No phase-skipping, no shortcuts. The phases run in order on every non-trivial change — whether that's a new field on a form or a new domain model.

Phase 01

Specification.

We write what the system should do before we write how. One spec file per feature, reviewed before any code is written. Plus a clarification pass that catches gaps that would otherwise surface in production.

/specify · /clarify
Phase 02

Formal spec (Allium).

The spec is formalised in Allium — a specification interrogation language. Inconsistencies and ambiguities surface as open questions before implementation. Later, spec is compared against code for drift detection.

/allium:elicit
Phase 03

Plan + tasks.

The spec is broken down into a plan and a dependency-ordered task list. Then an analysis phase auto-applies every recommended remediation. Implementation begins against a fully instrumented blueprint, not a hash of ideas.

/plan · /tasks · /speckit.analyze
Phase 04

Implementation.

Code is written against the plan — not in an abstract sense, but task by task. Backend-first as a principle, not a slogan: all authorisation, audit, soft-delete and idempotency are enforced server-side. The frontend is a thin view layer.

.NET · React · EF Core · SSE
Phase 05

Browser tests.

1:1 functional coverage — twelve functions get twelve tests. Plus eight destructive tests per surface: XSS, SQL injection, auth bypass, race conditions, malformed input, boundary values. All in Playwright. No "happy path".

Playwright · Functional + destructive
Phase 06

Formal verification (TLA+).

State machines are modelled in TLA+ and invariants proven mathematically. The TLC model checker finds race conditions, deadlocks, and states nobody thought of. What comes out of that phase is not a test that "passed" — it's a proof.

TLA+ · TLC · Invariants

What most agencies do. What we do.

An honest comparison against what we find when we take over a system built by someone else. Not an attack — just a calibration of what "production-ready" actually means.

Testing

A manual click-through before release.

Typical agency. Someone clicks through the system, sees nothing exploding, and presses deploy. Edge cases are discovered by the customer.

Manual · Wishful thinking
Testing

1:1 functional + 8 destructive.

Us. Every implemented function has at least one Playwright test. Plus eight destructive tests per surface that try to break the system in the six most common ways it can be broken.

Playwright · Auditable
Spec

The spec lives in a Confluence doc.

Typical agency. The spec is written once, then drifts away from the code. Six months in, nobody can answer the question "why does it work like this?".

Drift · Forgetting
Spec

Allium drift between spec and code.

Us. Spec and code are compared automatically. Drift surfaces as open questions, not guesses. The spec is source code — versioned, reviewed, always in sync.

Allium · Drift detection
State

Race conditions found in production.

Typical agency. A state machine is sketched on a whiteboard, then coded on the fly. Race conditions are found by unlucky users on a Thursday afternoon.

Whiteboard · Hope
State

TLA+ proves invariants.

Us. State machines are modelled formally, invariants proven. TLC finds the cluster of states the logic doesn't handle — before it becomes a production incident.

TLA+ · Provable

The stack — what we choose, and why.

Few technical choices, made once, deepened over ten years. We don't change stack every sprint. We change when there's an actual reason — and then we justify it in writing.

Backend

.NET · ASP.NET Core.

A mature, fast, free runtime with a strong type system. EF Core with SaveChanges interceptors that capture the audit trail centrally — no code can sneak past. We've been writing .NET since version 1.1.

.NET · EF Core · ASP.NET
Frontend

React · Blazor where it fits.

React as the default for new frontends — server components, suspense, native streaming. Blazor WASM for PWAs where end-to-end C# is worth it. Never SPA religion: we choose per system.

React · Blazor WASM · TypeScript
Database

SQLite. Even in production.

SQLite on NFS holds multiple projects without tenant_id ceremony. Backups are files you can zip. No ops cost for a separate database server. We switch to PostgreSQL the day load demands it — rarely sooner.

SQLite · WAL · NFS
AI

Self-hosted LLM that fails open.

A local model inside the Docker Swarm cluster. If the AI is down the system keeps working without a blocker — auto-categorisation becomes manual, nothing crashes. GDPR-friendly because data never leaves the cluster.

Local LLM · Semantic Kernel · RAG
Ops

Docker Swarm on Azure.

We own the whole chain: architecture, database, frontend, ops. Our own Swarm cluster on live4.se. No managed services with murky pricing. You know exactly what's running and where.

Docker Swarm · Azure · live4.se
Tooling

Claude Code · spec register.

AI-augmented development with deterministic hooks. A spec register as the source of truth for what gets built next. Pipeline hooks that block feature work without a spec. The AI follows a method we wrote down, reviewed, and versioned.

Claude Code · Hooks · Subagents

The full breadth of competence.

We pick few technologies per system — but the list of what we actually know is longer than that. 25+ years of deliveries gives a broad base to lean on when the problem doesn't fit the standard solution.

Backend

The .NET ecosystem.

.NET 6–10 · C# · ASP.NET Core · .NET Aspire · EF Core · Dapper · MediatR · Quartz.NET · SignalR · gRPC · GraphQL · REST · Clean Architecture · CQRS · DDD · multi-tenant SaaS · microservices · Node.js

Frontend

Web and mobile.

TypeScript · JavaScript · React · React Native · Expo · Next.js · Vue · Svelte · Blazor WASM · Blazor Server · HTMX · Tailwind · Bootstrap · jQuery · HTML · CSS · WordPress · Android (Java) · iOS (Obj-C)

Databases

Relational, document, search.

SQL Server · T-SQL · PostgreSQL · pgvector · pg_trgm · MySQL · MongoDB · Redis · SQLite · Elasticsearch · index tuning · backup strategies · execution plans · schema migrations

AI & formal verification

Agents, RAG, provable logic.

Semantic Kernel · Microsoft Agent Framework · Azure OpenAI · Ollama · local LLMs · Claude Code · Cursor · GitHub Copilot · RAG · embeddings · hybrid search (pgvector + pg_trgm) · TLA+ · Allium · spec-driven dev · custom skills

Cloud (Azure)

Services we use.

App Service · Functions · Container Apps · Service Bus · Event Grid · SQL · Storage · Key Vault · API Management · Entra ID · Monitor · App Insights

Containers & infra

Ops, virt, networking.

Docker · Docker Swarm · Docker Compose · Kubernetes · Linux (Ubuntu/Debian) · Windows Server · Active Directory · DNS · IIS · Nginx · virtualisation · GPU farm

CI/CD & test

Pipelines and quality gates.

Azure DevOps · GitHub Actions · BitBucket · Bicep (IaC) · xUnit · NUnit · Playwright · Testcontainers · k6 stress tests · code review · pull requests · Git/GitHub

Security & identity

Auth, GDPR, accessibility.

OAuth2 · OpenID Connect · JWT · WebAuthn · Passkeys · GDPR · Zero Trust · audit trail · soft-delete · idempotency keys · WCAG 2.1 AA

Integrations & data

Payment, communication, data governance.

Stripe · Swish · Twilio · Mailjet · Google APIs · Fortnox · SCORM · OpenTelemetry · Prometheus · Grafana · data governance · data catalogue · metadata management · concept modelling · data quality

The long version.

For anyone who wants the technical depth — here is what each phase actually produces, why it exists, and how it connects to the next. Broadly: the pipeline is a chain of artefacts where each phase takes in a document from the previous phase and produces a new document the next phase can read. No verbal handoffs. No "I think we talked about that at Wednesday's stand-up".

Allium — drift as a data point.

Allium is an interrogation language: you write a specification, then run an elicitation pass that finds gaps, ambiguities and inconsistencies. It produces open questions that must be answered before we move on. Later a distillation pass compares the finished code against the spec — drift surfaces in three categories: specified-but-not-implemented, implemented-but-not-specified, and behavioural drift. Each item gets an individual decision: fix now, defer (track in spec), or dismiss with a reason. No group decisions to "deal with it later" — every finding has an owner and a date.

TLA+ — proofs, not tests.

A test that passes shows that one execution worked. It says nothing about the executions you didn't test. TLA+ models the system as a state space and runs an exhaustive walk — the TLC model checker tries every possible sequence of events and finds the states where invariants break. For case management: can an issue be both closed and in-progress at the same time? For SSE: what happens if a client reconnects mid-update? For idempotency: what happens when the same idempotency key arrives in two simultaneous requests? TLC answers mathematically, not probabilistically.

Destructive tests — six categories.

Functional tests verify that the happy path works. Destructive tests verify that the system doesn't fall over when somebody tries to break it. We run at least eight destructive scenarios per surface, across six categories.

Input validation bombards fields with overlong strings, invalid characters, nulls and empty values. Authorisation tries to read and write resources the user doesn't own. The injection category feeds XSS into text fields and SQL injection into search parameters. Race conditions force two clients to change the same resource in the same second. Boundary values exercise maximum lengths, maximum counts, and pagination at the final page. State corruption sends changes in the wrong order or against resources that have already been deleted.

None of this is theoretical. Every category has caught real bugs in production systems we've built — before they shipped.

AI-augmented development — without AI religion.

Claude Code is used as a tool, not a replacement. The pipeline above is not AI-generated — it is documented, versioned, and enforced via deterministic hooks. The hooks block feature work without a spec, require clarification before plan, run Allium elicitation before implementation, and refuse to release a feature without both Playwright and TLA+ validation. The AI follows the method — it doesn't invent it. The result: AI speed with human-level review on architecture decisions.

This method takes longer in the first week. It saves months once the system is a year old.

We're not the right fit for everything.

We'll come back with a read on whether we're the right call — and if we're not, we'll say so.

Drop us a line about what you're trying to build.

Write to us