DOS and the alternatives

Every tool on this page is good at its job. This page shows which job each one does — so you can see the one job none of them do, which is the only job DOS does.

DOS is a small deterministic kernel that adjudicates completed agent work from evidence the agent did not author: git ancestry, exit codes, file trees, read-backs of the world. It is a referee, not an orchestrator — it runs beside everything below, and most sections end with the two composing. The one question to carry through this page, including to DOS itself: when this tool says "OK," what evidence did it read, and who wrote that evidence?

Hosted evals & observability (LangSmith-class)

Eval platforms are how you find out whether your agent is any good: offline evaluation against datasets, LLM-as-judge scoring, human annotation, deterministic code evaluators, and online evaluation of live production traffic. For prompt iteration and quality drift there is no DOS substitute. What DOS adds: an evaluator scores the run your application emitted — and for "did the agent actually do it?", the run is the wrong witness, because the claim under test is part of the run. DOS's verdicts read nothing the agent emitted. Evals tell you the work is good; DOS tells you the work is real.

Framework guardrails (OpenAI Agents SDK, CrewAI)

The Agents SDK halts a run when an output guardrail trips; CrewAI validates each task's output and retries on failure. These are the right seams in the right places. What DOS adds: a guardrail checking schema or content is still reading text the agent wrote. DOS ships a driver for each seat — one import line — that checks the output's claim ("I committed X") against a read-back the agent didn't author, so the retry loop retries until the work is done, not until the narration parses.

Durable execution (Temporal-class)

Temporal records every workflow step in an event history and resumes from the last recorded event after a crash — the engineered standard for must-not-be-lost processes, and a design DOS's own recovery syscalls learned from. What DOS adds: durability faithfully records what each step returned; whether a returned claim about the world is true is outside its scope. DOS adjudicates exactly that residue. Temporal makes sure the work survives; DOS makes sure the claimed work happened.

Supply-chain attestation (in-toto, witness)

in-toto attestations are signed, verifiable claims about how software was produced; witness gathers that evidence at every pipeline step. Philosophically the closest relative here: both refuse to let the worker be the only author of the evidence about its work. What's different: in-toto attests pipeline steps with real signing infrastructure, portable across organizations; DOS referees an agent's claims at runtime with no keys and no setup — a plain git repo is enough. They compose: a DOS verdict could ride in an attestation predicate (tracked publicly).

Plain CI & branch protection

Required status checks are the one evidence-authored gate almost everyone already runs — and for plenty of setups, genuinely sufficient. What DOS adds: CI guards one path (the merge) at one time (the end). A fleet's false claims mostly happen before and beside that path — "committed!" with no commit, two agents in one working tree, a loop spinning all night. DOS runs the same exit-code discipline at the work surface itself, and plugs back into CI: a shipped GitHub Action posts dos commit-audit as a required check — this repo gates on it.

When NOT to use DOS

One agent, reviewed diffs, good CI. Your review plus required checks already provide the independent witness. A referee for one honest player is overhead.
Fully isolated agents merging through a gated queue. Isolation already serialized the effects; the collision half of DOS has little to do.
You need hard, in-band prevention. DOS decides and reports; if a write must be physically impossible, you want a sandbox or policy engine in the execution path.
Your question is quality, not truth. DOS never grades whether work is good — only whether a claim is witnessed. That's what evals and review are for.

At a glance

Tool	Gates what	Reads what
Evals / observability	quality of outputs	the run the app emitted
Framework guardrails	one task's output	the output text
Temporal-class	execution progress	its own event history
in-toto / witness	artifact provenance	signed step attestations
Plain CI	the merge	the checks' exit codes
DOS	belief in "done"	git ancestry, exit codes, read-backs — never the agent's narration

The full page, with every claim cited to the neighbor's primary docs, lives in the repo: docs/ALTERNATIVES.md. Spotted a claim about your project that's wrong or stale? Open an issue — this page holds itself to the standard it describes.