Cost Optimization
Model routing, hardware decisions, cheap models that actually work, and what this realistically costs per month. The biggest wins come from understanding what doesn't need an expensive model.
The mistake most people make early on
The most common mistake is treating OpenClaw like a single super-intelligent chatbot that should handle everything at once. Conversation, planning, research, coding, memory, task tracking, monitoring. All through one model, all the time.
That setup ends in endless follow-up questions, permission loops, silent failures, and burned quotas. When it works, it's expensive. When it breaks, it's hard to tell why.
What clicked for me was that the main model should be a coordinator, not a worker. The default agent should be capable but not overkill. Expensive models stay out of the hot path.
My config looks roughly like this:
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-5",
"fallbacks": [
"kimi-coding/k2p5",
"synthetic/hf:zai-org/GLM-4.7",
"openrouter/google/gemini-3-flash-preview",
"openrouter/openai/gpt-5-mini",
"openrouter/openai/gpt-5-nano",
"openrouter/google/gemini-2.5-flash-lite"
]
}
}
}The exact model list matters less than the intent. Expensive models aren't sitting in the default loop, and fallback behavior is explicit.
Auto-mode and blind routing
I tried auto-mode and blind routing early on. Stopped using both.
The idea of letting the system decide which model to use sounds great. When I actually ran it, it led to indecision, unexpected cost spikes, and behavior I couldn't reason about when something went wrong.
Being explicit works better. Default routing stays cheap and predictable. Agents get pinned to specific models for specific jobs. When something expensive runs, it's because I asked for it.
Key takeaway: Less magical. Far more debuggable.
Why strong models shouldn't be defaults
High-quality models like Opus are useful. I use them. They're great at restructuring prompts, designing agents, reasoning through messy problems, and unfucking things that are already broken.
Where I got burned was leaving that level of model running all day.
It felt powerful until I hit rate limits and ended up locked out waiting for quotas to refresh. At that point you're not building anything. You're just waiting.
Don't do this: Strong models work best when they're scoped. Pin them to specific agents and call them when you actually need them. Don't leave them in the default coordinator loop burning through your quota on routine work.
Don't buy hardware yet
There's been a lot of hype around buying Mac minis or Mac Studios just to run OpenClaw. I'd strongly recommend against doing this early.
Not everyone has $600 to drop on a tool, and even if you do, it's usually the wrong move to make first. The FOMO around OpenClaw is real. It's easy to feel like you need dedicated hardware immediately.
Learn your workflow first. Learn your costs. Figure out your failure modes. I would have saved money if I had done that before buying anything.
The reality of local models
Local models get pitched as the solution to everything. The math rarely works out unless you already have serious hardware.
Mac Studio with 512 GB unified memory and 2 TB storage
Two Mac Studios needed to realistically host Kimi 2.5 with usable performance
Unless you're building a business that needs that hardware for more than just OpenClaw, skip it.
Local models are fine for experimentation and simple tasks. But I've found that bending over backwards to save a few cents usually costs more in lost time and degraded performance than just paying for API calls.
Free ≠ usable: NVIDIA NIM's free tier for Kimi K2.5 regularly has 150+ requests in queue. That kind of latency makes it unusable for agent workflows where you need responses in seconds, not minutes.
The hype problem
This part is worth saying.
There is a lot of hype around OpenClaw right now. Flashy demos, YouTube videos promising it will replace everything you do, "this changes everything" energy on every social platform. I've watched people spend more time configuring OpenClaw than doing the work they wanted OpenClaw to help with.
I'd encourage people to resist the FOMO and ignore most of the YouTube content. A lot of it is optimized for clicks, not for the kind of boring Tuesday-afternoon usage that actually matters.
OpenClaw gets useful when you stop expecting magic and start expecting a tool that needs tuning.
Cheap models are fine, actually
One of the bigger mental shifts for me was realizing how cheap some models are when used correctly.
Heartbeats run often but do simple checks. No reason to burn premium models on background plumbing. I've seen tens of thousands of heartbeat tokens cost fractions of a cent on cheap models.
For the specific model recommendations and cost math, see the Config Reference chapter.
Concurrency limits also matter for cost control:
"maxConcurrent": 4,
"subagents": {
"maxConcurrent": 8
}Those limits prevent one bad task from cascading into retries and runaway cost.
What this costs me per month
I don't pay for everything through APIs.
I use two coding subscriptions at about $20 each. On top of that, API usage runs about $5-$10 per month split between OpenRouter and OpenAI.
Your numbers will be different
If you let agents run nonstop, allow unlimited retries, or route everything through premium models, costs will climb. I've seen people hit $200+ in a weekend by leaving things uncapped.
If you scope models, cap concurrency, and keep background work on cheap models, costs flatten out fast.
- • Agents run nonstop
- • Unlimited retries
- • Premium models as defaults
- • No concurrency limits
- • Scoped model usage
- • Capped concurrency
- • Cheap background models
- • Explicit routing