fak — the agent kernel
Treat the model like an untrusted program, and the tool call like a syscall.
The headline benchmarks as a ~40-second reveal · full-resolution MP4
fak is an agent kernel (an agent tool firewall): an in-process,
default-deny permission gate for AI agents, fused with an addressable,
bit-exact KV cache, written in Go. Every tool call an agent makes passes
through a kernel the model doesn’t control — the same boundary that enforces
security (which effects are allowed, which tool results may enter the model’s
context) also drives performance (do shared work once instead of every turn).
In one line: prompt-injection containment, capability security, and cache-efficient inference for self-hosted LLM agent fleets — at one boundary.
How the boundary works — the agent tool firewall as a ~44-second reveal · full-resolution MP4
▶ Try the live demos · Get started · See the showcase · Read the FAQ · GitHub repository
▶ See it run. Three interactive demos drive the real kernel live on GCP — the turn-tax race (a SOTA agent loop vs fak’s one-shot kernel), the multi-agent context-reuse proof, and a live model reuse race. Nothing to install; they run in your browser. Open the demos → Or run your own copy —
go run ./cmd/turntaxdemolocally, in a container, or on your own cloud VM.
What fak does
- Stops prompt injection and tool poisoning by structure. Suspicious tool results are quarantined out of the model’s context entirely; dangerous tools are never on the allow-list. Two independent gates, not one evadable classifier. Addresses the OWASP Agentic Top-10 and the MCP Top-10 (Tool Poisoning, Memory Poisoning).
- Default-deny capability security. The permission policy runs inside the kernel, on the same call path as the tool call. It fails closed, not open.
- Addressable, bit-exact KV cache. Evict one span from the middle of a kept
model run — a poisoned result, an expired secret — and leave the cache
bit-for-bit identical to a run that never saw it (
max|Δ| = 0). No shipped serving engine offers mid-run causal eviction. - Cache-efficient agent fleets. ~4× fewer tokens than a tuned warm-cache stack on a 50-turn × 5-agent run; 8.8–9.7× measured prefill elimination on real WebVoyager web-agent workloads.
What fak is not
fak is not a faster model server. vLLM, SGLang, and llama.cpp win raw throughput
and front-of-prompt prefix caching, and fak doesn’t try to beat them — it owns the
orthogonal questions they don’t: which effects are allowed, which results may enter
memory, when reuse is still legal, and what survives a session boundary. You can run
fak serve in front of any of them.
Try it in 2 minutes (no key, no model, no GPU)
go run ./cmd/fak preflight --policy examples/customer-support-readonly-policy.json --tool refund_payment --args "{}"
go run ./cmd/fak agent --offline
refund_payment returns DENY (POLICY_BLOCK); agent --offline runs the same task
twice — tools wired directly vs. behind fak — and prints the before/after.
Learn more
| If you want… | Read |
|---|---|
| The quick answers | FAQ |
| A guided first run | Tutorial |
| The two core ideas | Policy in the kernel · Addressable KV cache |
| Every benchmark number | Benchmark authority |
| Every machine fak runs on | Hardware matrix (4 platforms · 2 CPU ISAs · 4 GPU backends) |
| What’s real, what’s not | Claims ledger |
| A machine-readable map (for LLMs) | llms.txt |
License: Apache-2.0 · Report a vulnerability · Keywords: agent kernel, agent tool firewall, AI agent security, prompt injection defense, tool poisoning, capability security, default-deny permission gate, KV cache, addressable KV cache, self-hosted LLM, LLM agent fleet, agentic AI, Go.