Back to Runbook
🧠 Chapter 3

Memory Configuration

Cheap embeddings, context pruning with cache-TTL, and compaction that actually prevents "why did it forget that" moments. The one change that made OpenClaw reliable.

📖 8 min read⚙️ 3 config sections

Making memory explicit

Most memory complaints I see come from assuming memory is automatic. It isn't, and the default behavior is confusing if you don't configure it.

I made memory explicit and cheap. I use cheap embeddings for search (text-embedding-3-small), prune context based on cache TTL (6 hours), and set up compaction to automatically flush sessions to daily memory files when they hit 40k tokens.

This one change eliminated most of the "why did it forget that" moments I was having. Before I set this up, I was losing context constantly and blaming the model when it was really a configuration problem.

The fix: Configure memory explicitly. The three settings below — search, pruning, and compaction — work together to keep your agent's context relevant and costs low.

Uses cheap embeddings (text-embedding-3-small) to search your memory files.

openclaw.json — Memory Search
"memorySearch": {
  "sources": ["memory", "sessions"],
  "experimental": { "sessionMemory": true },
  "provider": "openai",
  "model": "text-embedding-3-small"
}
Cost Comparison — Memory Search
Thousands of searches with text-embedding-3-small~$0.10
Same searches with premium models$5–10+

The sources array tells it where to look — both your stored memory files and past sessions. The sessionMemory experimental flag indexes session content for recall.

Context Pruning (contextPruning)

openclaw.json — Context Pruning
"contextPruning": {
  "mode": "cache-ttl",
  "ttl": "6h",
  "keepLastAssistants": 3
}

cache-ttl mode explained

  • Keeps prompt cache valid for 6 hours
  • Automatically drops old messages when cache expires
  • keepLastAssistants: 3 preserves recent continuity — the last 3 assistant messages always stay in context
💡

Why TTL matters: Without this, you'll hit token limits faster and pay for re-processing the same context repeatedly. The cache ensures you're not paying to send the same conversation prefix over and over.

Compaction (compaction.memoryFlush)

This is the most important memory setting. When context gets too long, the agent distills the session into a daily memory file instead of just losing it.

openclaw.json — Compaction
"compaction": {
  "mode": "default",
  "memoryFlush": {
    "enabled": true,
    "softThresholdTokens": 40000,
    "prompt": "Distill this session to memory/YYYY-MM-DD.md. Focus on decisions, state changes, lessons, blockers. If nothing worth storing: NO_FLUSH",
    "systemPrompt": "Extract only what is worth remembering. No fluff."
  }
}

How it works

1

Context hits threshold

When context reaches softThresholdTokens (40k), the flush triggers.

2

Agent distills the session

Using the flush prompt, the agent extracts decisions, state changes, lessons, and blockers.

3

Writes to daily memory file

Output goes to memory/YYYY-MM-DD.md — one file per day, automatically organized.

Or writes NO_FLUSH

If nothing worth storing happened, the agent skips the write. No clutter in your memory files.

⚠️
The flush prompt matters. It tells the agent what to remember. Focus on decisions, state changes, and lessons — not routine exchanges. A bad flush prompt creates memory files full of noise that pollute future context.

Complete Memory Configuration

Here's how all three pieces fit together in your openclaw.json:

openclaw.json — Full Memory Block
{
  "memorySearch": {
    "sources": ["memory", "sessions"],
    "experimental": { "sessionMemory": true },
    "provider": "openai",
    "model": "text-embedding-3-small"
  },
  "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "6h",
    "keepLastAssistants": 3
  },
  "compaction": {
    "mode": "default",
    "memoryFlush": {
      "enabled": true,
      "softThresholdTokens": 40000,
      "prompt": "Distill this session to memory/YYYY-MM-DD.md. Focus on decisions, state changes, lessons, blockers. If nothing worth storing: NO_FLUSH",
      "systemPrompt": "Extract only what is worth remembering. No fluff."
    }
  }
}

Why this works

🔍

Search

Cheap embeddings find past context without burning premium tokens

✂️

Prune

Cache-TTL drops old messages so you don't re-process stale context

📦

Compact

Auto-flush distills long sessions into searchable daily memory files

Together these three settings create a memory system that's cheap, reliable, and doesn't lose important context. The agent remembers what matters, forgets what doesn't, and the whole thing runs on a fraction of what premium models would cost.

For more memory-related configuration options, see the Config Reference chapter or the full JSON configuration guide.