How to Fix AI Memory Loss: Building a Skill-Specific System for Hermes Agent

Love Hermes Agent but find it forgets what you told it yesterday? Here's how a Just-in-Time, skill-based memory system keeps your agent fast, sharp, and focused.

Love Hermes Agent but still finds that it sometimes forgets what you told it yesterday or your preferences about certain things?

The Tiny Brain Problem

Hermes Agent doesn't have a massive, sprawling brain. By default, its "active" prompt memory is strictly capped—usually at a combined total of roughly 1,300 tokens between two core files.

Why so small? It's intentional.

Keeping this "hot" memory tiny ensures the system stays fast and cost-effective. It forces the agent to be efficient. Technically, you could expand it, but you shouldn't. A "big brain" creates noise. When the prompt gets too crowded, the agent loses its focus, fails to follow complex instructions, and starts hallucinating. The goal isn't a bigger brain; it's a cleaner one.

Databases Aren't for Thinking

People see that Hermes uses a SQLite database (state.db) and assume it's the agent's memory. It's not.

In Hermes, the SQL database is a structured archive, specifically partitioned into three main tables:

Sessions: Metadata like IDs, titles, and timestamps.

Messages: The raw, word-for-word history of every interaction.

Messages_FTS: A specialized table for "Full-Text Search" so the agent can look up old conversations.

This database is built for retrieval, not reasoning. It's like a library. The agent can go there to find a specific book, but it can't "hold" the entire library in its head while it's trying to write code for you. Confusing storage with active memory is the fastest way to break an agent's logic.

The Just-In-Time Fix

To fix the clutter, we taught our agent to use a "Just-in-Time" memory system. We moved the heavy lifting out of the core prompt and into a file-based skill system.

We asked the agent to create a dedicated skill_memory folder. For the tasks we do most—like Python development or content editing—it creates specific Markdown files (e.g., coding_memory.md). These files store the specific "how-to" guidelines and technical preferences for that task.

Now, when Hermes launches a specific skill, it's programmed to look into that folder first. It pulls only the relevant Markdown file into its active context and ignores the rest. We freed up 80% of the active memory space instantly. It makes the agent lightweight, fast, and incredibly sharp.

Guarding the Golden Facts

If the skills handle the heavy lifting, what stays in the core "active" memory? We call these the Golden Facts. These live in the USER.md and MEMORY.md files and are injected into every single prompt.

These should be high-level, "permanent" truths that never change, regardless of the task:

Identity: "The user is a Senior Developer who prefers technical, no-fluff explanations."

Global Context: "Always assume we are working within the AWS ecosystem unless stated otherwise."

Safety & Style: "Never take destructive actions (like deleting files) without an explicit 'yes' in the chat."

Communication: "Use 'American English' and avoid emojis."

Everything else? It's noise. If it isn't a Golden Fact, it belongs in a skill file. Efficiency isn't about how much the agent remembers—it's about how much it can afford to forget.