Solo engineer 2026

claude-controller

A single command that points Claude Code at any inference backend — local Ollama, Anthropic, OpenAI, or Gemini — with a clean environment and no manual env-var juggling.

Problem

Running Claude Code against non-Anthropic backends meant hand-exporting environment variables, standing up a LiteLLM proxy, and hoping stale vars from the last session didn't leak. Switching backends mid-workflow was fiddly, and when a local model misbehaved there was no structured way to diagnose why.

Contribution

Built an interactive picker that resolves live model facts from the Ollama API, merges them with a curated compatibility table (tool support, known issues per model), pre-flight-checks the target endpoint, and launches Claude Code with a clean environment. Added two post-session tools: cc-health, a JSONL session analyser that flags nine named failure patterns; and cc-diagnose, an Ollama performance probe for VRAM, GPU layer split, and tok/s.

Outcome

Open-sourced; actively maintained. [Add adoption signal — forks, dependent projects, or a user quote when available.]

BashPythonOllama APILiteLLMClaude Code

The core insight was that “which backend should I use” is not a one-time setup question — it’s answered differently depending on the task, the available hardware, and what broke last session. A controller that treats backend selection as a first-class, repeatable workflow decision — rather than a config file you edit once and forget — fits how developers actually work.

The post-session tooling is where the project earns its depth. cc-health reads Claude Code’s own JSONL session logs and names failure modes precisely: a model stuck calling the same tool with identical parameters (TOOL_LOOP), raw <function=...> syntax leaking as plain text (RAW_XML), identical input token counts across all turns indicating context truncation (CONTEXT_CAP). Nine failure patterns in total, each with a plain-English label. That turns “the local model didn’t work” — which is where most debugging stops — into a diagnosable, actionable signal.

cc-diagnose completes the picture: VRAM headroom at various context sizes, GPU vs CPU layer split, cold-start latency, and prefill/decode token speed. Together the two tools make local model selection an informed choice rather than trial and error.