My AI dev stack — what I run, and what I deleted — Notes

People keep asking what my "AI setup" actually looks like — which model, which tools, how the autonomy doesn't turn into chaos. So here's the whole thing, top to bottom: the models, the MCP servers, the agents, the memory layer, the skills, and the hooks that keep it honest.

The short version: Claude Code is the runtime, and everything else is plumbing I bolted on until the loop stopped fighting me. The interesting part isn't what I added — it's what I deleted.

The brains — which models actually run

Two Claude models do ~95% of the work, on purpose:

Opus 4.8 (1M context) — the driver. Planning, multi-file edits, the hard reasoning, anything that touches architecture. The 1M window means I rarely have to babysit context on a big repo.
Sonnet 4.6 — the reviewer. My pre-commit code-review hook runs on Sonnet, not Opus. It's fast, it's cheap, and for "did this diff just introduce a SQL injection" it's more than enough. No reason to burn Opus on a gate that fires on every commit.

On top of that I keep a model playground wired in through Qubu (more on that below), so I can fan a prompt across other models when I want a second opinion or need a non-Claude modality. But the daily loop is Opus-drives-Sonnet-reviews, and that split alone cut my cost without me noticing any quality drop.

The hands — MCP servers

MCP is where Claude stops being a chatbot and starts being able to do things. Ten servers run globally for me; the rest are per-project. The global set:

qubu — my own AI platform. Models, datasets, notebooks, inference, a generation playground, even the forum/journal. This is also what I use to generate media.
context7 — live, version-accurate library docs. Knowledge cutoffs lie about minor versions; this doesn't. I reach for it before trusting my own memory on any framework API.
figma-desktop + figma-developer-mcp — design → code without the screenshot-and-guess dance. Real variables, real metadata, real layout.
kie-ai — image/video generation when I need media inline.
cloudflare-api — DNS, Workers, the whole CF surface across several accounts, from the terminal.
mt-glitch-blog — yes, the server I'm publishing this post through. Create, update, upload covers — the blog is just another tool call.
agent-browser + playwright — two browser drivers. Snapshot a page, click through a flow, scrape, or run a real E2E check.
markdowner — turn any URL/document into clean Markdown so it's actually usable in context.

Per-project I layer in more: SSH MCPs into infra, Postgres/pgbouncer MCPs, Sentry, Vercel, a YouGile board MCP. The point isn't to have — it's that each repo only sees the servers it needs.

My AI dev stack — what I run, and what I deleted

The brains — which models actually run

The hands — MCP servers

The memory — claude-mem

The crew — agents

The muscle memory — skills (and why I deleted most of them)

The reflexes — hooks

The one principle