Rendered from docs/scoreboard/README.md — the Markdown in the repo is the source of truth; this page is generated by scripts/build_incident_pages.py, never hand-edited.

How AI built the software you already use

Agents now write a real share of the popular open-source projects you depend on — and they write their own commit messages too. This board looks at the recent history of well-known repos and asks three plain questions: how much of it did AI write, which agent did it, and what kind of work was it — fixes, tests, docs.

The catch is that a commit message is just text the agent typed; the diff is what git actually recorded, and the two can disagree. So every number here is checked against the diff, never the message alone. That is the difference between this board and a star count: it reads the thing that can't be talked up.

The board at a glance

The picture

Three views of the same audited history. Every figure is generated from the committed per-repo data — no live calls, reproducible offline by anyone who clones the repo.

Which agent built which repo

What kind of work AI commits claimed

Across these 19 repos, claude is the most prolific agent — it wrote 63% of all the AI-authored commits here, with 7 other toolchains sharing the rest, and 75% of what they all claimed was shipping code, not tests or docs.

Who builds whose repo

One agent shows up far more than the rest: claude is the single biggest committer in 12 of 18 repos on this board. The exceptions are the tell — they are mostly a vendor's own tool building the vendor's own project:

OpenInterpreter/open-interpreter — led by codex (its makers' own agent)
charmbracelet/crush — led by crush (its makers' own agent)
microsoft/autogen — led by copilot (its makers' own agent)
openai/codex — led by codex (its makers' own agent)

And claude does not stay in its lane: it turns up inside the histories of repos another agent leads —

29 commits in crewAIInc/crewAI, 15 commits in langchain-ai/langchain, 10 commits in OpenInterpreter/open-interpreter, 10 commits in openai/codex, 2 commits in microsoft/autogen, and 1 commit in charmbracelet/crush.

None of this is a quality or honesty judgment — it is just who pressed the keys, read straight from the commit attribution. It is the kind of picture a star count can't show.

Score your own repo in one command

pip install dos-kernel
dos commit-audit --sweep --workspace . BASE..HEAD

That is the exact same check the board runs, on your history — before you trust the next "done". No account, no upload, no one named.

Start here — the auditor grades itself

We ran the check on our own repo first and published whatever it said. It says non-zero — a few deliberate empty re-stamp commits, whose subject re-anchors a plan after a renumber, so the claim rests on the subject text alone by house convention. The page shows each one, and the methodology explains why the auditor is right to count them. We left them in. A scoreboard that airbrushed its own page to zero wouldn't be worth reading.

anthony-chaudhary/dos-kernel — our own grade, every flag explained.

Repo by repo

The detail behind the charts — each repo's AI-built share, the agents that did it, and whether every checkable claim was backed by its own diff. Sorted by AI-built share. Click a repo for the full receipt.

Repo	AI-built	Agents	Claims checked	Backed
kenn-io/roborev	65%	claude 430 · copilot 1 · cursor 1	273	100%
JuliusBrussee/caveman	32%	claude 65	49	100%
getzep/graphiti	15%	claude 127	66	100%
pydantic/pydantic-ai	9%	claude 188 · devin 7 · copilot 4 · …	139	100%
openai/codex	5%	codex 331 · claude 10 · copilot 3	155	100%
exo-explore/exo	4%	claude 99 · cursor 1 · jules 1	67	100%
OpenInterpreter/open-interpreter	4%	codex 240 · claude 10 · copilot 3	118	100%
assistant-ui/assistant-ui	4%	claude 119 · copilot 12 · devin 2 · …	79	100%
crewAIInc/crewAI	3%	devin 51 · claude 29 · aider 3 · …	69	100%
mem0ai/mem0	3%	claude 77	66	100%
agno-agi/agno	3%	claude 159 · copilot 7 · aider 1 · …	103	100%
charmbracelet/crush	3%	crush 86 · copilot 9 · claude 1	50	100%
farion1231/cc-switch	2%	claude 40 · copilot 1 · cursor 1	30	100%
livekit/agents	2%	claude 45 · devin 17 · cursor 6 · …	58	100%
danny-avila/LibreChat	1%	claude 24 · copilot 13 · cursor 1	24	100%
microsoft/autogen	1%	copilot 28 · claude 2	27	100%
unslothai/unsloth	<1%	claude 26 · cursor 2	22	100%
langchain-ai/langchain	<1%	copilot 24 · claude 15	29	100%
anthony-chaudhary/dos-kernel	—	—	315	98%

The fine print (it matters)

A mismatch is not an accusation. It does not mean the code is wrong, or that anyone lied. It means one thing only: a commit's subject claimed something its own diff doesn't show. A real fix to the wrong bug passes the check; an honest doc cleanup with a sloppy subject can flag. A message-vs-diff mismatch is never a correctness, honesty, or intent grade — only a note that a commit's words and its own diff disagree.

How it works — exactly what the check reads, what it skips, and every time the check itself was wrong (we narrow the check, never trust the subject).
The big picture — the population mismatch rate across public repos, with every flag hand-checked and denominators everywhere.
The live roll-up — the published set above, folded into one aggregate by scripts/scoreboard_rollup.py. Every number is derived from the committed per-repo data, reproducible offline.
Want your repo listed? Clean or not, it's opt-in and you see the result before it publishes. See the methodology's registration section.

The pages above are the 19 repos we've audited and named. A repo is named only when its verdict is published; a non-clean or unadjudicated verdict is reported only as a count, never as a named page (docs/311 §2).

The kernel is the part that doesn't believe the agents.