← DOS · all incidents

Rendered from docs/scoreboard/README.md — the Markdown in the repo is the source of truth; this page is generated by scripts/build_incident_pages.py, never hand-edited.

How AI built the software you already use

Agents now write a real share of the popular open-source projects you depend on — and they write their own commit messages too. This board looks at the recent history of well-known repos and asks three plain questions: how much of it did AI write, which agent did it, and what kind of work was it — fixes, tests, docs.

The catch is that a commit message is just text the agent typed; the diff is what git actually recorded, and the two can disagree. So every number here is checked against the diff, never the message alone. That is the difference between this board and a star count: it reads the thing that can't be talked up.

The picture

Three views of the same audited history. Every figure is generated from the committed per-repo data — no live calls, reproducible offline by anyone who clones the repo.

AI-built share of each repo

Which agent built which repo

What kind of work AI commits claimed

Across these 19 repos, claude is the most prolific agent — it wrote 63% of all the AI-authored commits here, with 7 other toolchains sharing the rest, and 75% of what they all claimed was shipping code, not tests or docs.

Score your own repo in one command

pip install dos-kernel
dos commit-audit --sweep --workspace . BASE..HEAD

That is the exact same check the board runs, on your history — before you trust the next "done". No account, no upload, no one named.

Start here — the auditor grades itself

We ran the check on our own repo first and published whatever it said. It says non-zero — a few commits that claim a fix but touched nothing. They're a deliberate house convention, and the page shows exactly why. We left them in. A scoreboard that airbrushed its own page to zero wouldn't be worth reading.

Repo by repo

The detail behind the charts — each repo's AI-built share, the agents that did it, and whether every checkable claim was backed by its own diff. Sorted by AI-built share. Click a repo for the full receipt.

Repo AI-built Agents Claims checked Backed
kenn-io/roborev 65% claude 430 · copilot 1 · cursor 1 273 100%
JuliusBrussee/caveman 32% claude 65 49 100%
getzep/graphiti 15% claude 127 66 100%
pydantic/pydantic-ai 9% claude 188 · devin 7 · copilot 4 · … 139 100%
openai/codex 5% codex 331 · claude 10 · copilot 3 155 100%
exo-explore/exo 4% claude 99 · cursor 1 · jules 1 67 100%
OpenInterpreter/open-interpreter 4% codex 240 · claude 10 · copilot 3 118 100%
assistant-ui/assistant-ui 4% claude 119 · copilot 12 · devin 2 · … 79 100%
crewAIInc/crewAI 3% devin 51 · claude 29 · aider 3 · … 69 100%
mem0ai/mem0 3% claude 77 66 100%
agno-agi/agno 3% claude 159 · copilot 7 · aider 1 · … 103 100%
charmbracelet/crush 3% crush 86 · copilot 9 · claude 1 50 100%
farion1231/cc-switch 2% claude 40 · copilot 1 · cursor 1 30 100%
livekit/agents 2% claude 45 · devin 17 · cursor 6 · … 58 100%
danny-avila/LibreChat 1% claude 24 · copilot 13 · cursor 1 24 100%
microsoft/autogen 1% copilot 28 · claude 2 27 100%
unslothai/unsloth <1% claude 26 · cursor 2 22 100%
langchain-ai/langchain <1% copilot 24 · claude 15 29 100%
anthony-chaudhary/dos-kernel 315 98%

The fine print (it matters)

A mismatch is not an accusation. It does not mean the code is wrong, or that anyone lied. It means one thing only: a commit's subject claimed something its own diff doesn't show. A real fix to the wrong bug passes the check; an honest doc cleanup with a sloppy subject can flag. A message-vs-diff mismatch is never a correctness, honesty, or intent grade — only a note that a commit's words and its own diff disagree.

The pages above are the 19 repos we've audited and named. A repo is named only when its verdict is published; a non-clean or unadjudicated verdict is reported only as a count, never as a named page (docs/311 §2).

The kernel is the part that doesn't believe the agents.