Cease Treating AI Reminiscence Like a Search Downside

Why Each AI Coding Assistant Wants a Reminiscence Layer

How Does AI Study to See in 3D and Perceive House?

, my AI assistant saved a reminiscence with an significance rating of 8/10. Content material: “Investigating Bun.js as a possible runtime swap.”

I by no means truly switched to Bun. To be honest, it was a two-day curiosity that went nowhere. However this reminiscence persevered for six months, popping up every time I requested about my construct course of and quietly pushing the AI towards a Bun resolution with confidence.

There was nothing improper with the system; it was doing precisely what it was alleged to do. That was the problem.

Right here’s the failure mode nobody talks about when constructing AI reminiscence programs. You make it work correctly. It remembers issues, retrieves issues, the entire great things. And for some time, the AI appears intelligent.

You then truly begin utilizing it.

Recollections pile up. Selections get reversed. Preferences shift. The system doesn’t discover.

You casually point out one thing in January, and it will get saved with excessive significance.

Cool.

By April, the AI treats it like a present truth. And generally, it takes some time to appreciate you’ve been working from outdated information.

A system that remembers all the things doesn’t have a reminiscence. It has an archive. And an archive that grows with out hygiene rapidly turns into messier than having no reminiscence in any respect.

Nick Lawson wrote an incredible piece right here on TDS describing how he carried out simply that. You’ll wish to learn it; the storage/retrieval structure is basically good.

However there’s an issue with this type of system: what occurs to reminiscences as they age?

When ought to they die?

Which reminiscence is extra dependable than the others?

What number of overlapping reminiscences ought to be mixed into one?

That’s what this text is about. Not storing and never retrieving, however what occurs in between.

I’ll cowl sufficient of the bottom layer to observe alongside, even when you haven’t learn Nick’s piece. However the brand new floor begins the place his article ends.

Let’s get into it.

The Downside With “Retailer and Retrieve”

Most reminiscence programs usually assume a two-step course of. Write. Learn. Checkmate.

Positive, that’s tremendous when you’re constructing a submitting cupboard. Not when you’re attempting to construct an assistant which you can depend on for months.

What does that appear to be?

The reminiscence you wrote in week one stays in week eight simply as recent and high-priority because the day you made it, despite the fact that the choice you made was reversed two weeks in the past.

The opposite reminiscence, which contradicts your earlier choice, was filed away casually and easily by no means had time to turn out to be a precedence as a result of it hasn’t obtained practically sufficient accesses to push itself up the queue.

And so, with out hesitation, your assistant pulls a choice you unmade. It’s not till the third try that you just lastly catch onto the sample that your assistant has been counting on out of date data the entire time.

The issue isn’t remembering, it’s failing to let go.

A comparability between an ordinary append-only archive and a lifecycle reminiscence system that actively manages outmoded data. Picture by writer.

The distinction I needed to construct: an method to reminiscence that works like a mind, not like a database. Reminiscence decays. It will get outmoded.

Some reminiscences aren’t very dependable from the beginning. Others expire after a sure interval. The mind manages all of those mechanically and with out you doing something. That was my goal.

The Basis (Temporary, I Promise)

Let’s get a fast context verify.

Relatively than encoding your reminiscences and operating cosine similarity searches, you retain them in plain textual content inside an SQLite database, which the LLM can seek the advice of for a concise index on each request.

There’s no want for any embedding course of, third-party API, or additional recordsdata. The LLM’s language understanding performs the retrieval activity. It appears too easy. But it surely truly does surprisingly effectively on a private stage.

My schema builds on prime of that with lifecycle fields:

# memory_store.py
import sqlite3
import json
from datetime import datetime
from pathlib import Path
from contextlib import contextmanager

DB_PATH = Path("agent_memory.db")

@contextmanager
def _db():
    conn = sqlite3.join(DB_PATH)
    conn.row_factory = sqlite3.Row
    strive:
        yield conn
    lastly:
        conn.shut()

def init_db():
    with _db() as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS reminiscences (
                id              INTEGER PRIMARY KEY AUTOINCREMENT,
                content material         TEXT NOT NULL,
                abstract         TEXT,
                tags            TEXT DEFAULT '[]',

                -- Lifecycle fields — that is what this text provides
                significance      REAL DEFAULT 5.0,
                confidence      REAL DEFAULT 1.0,
                access_count    INTEGER DEFAULT 0,
                decay_score     REAL DEFAULT 1.0,
                standing          TEXT DEFAULT 'energetic',
                contradicted_by INTEGER REFERENCES reminiscences(id),

                created_at      TEXT NOT NULL,
                last_accessed   TEXT,
                expires_at      TEXT
            )
        """)
        conn.execute("""
            CREATE TABLE IF NOT EXISTS memory_events (
                id          INTEGER PRIMARY KEY AUTOINCREMENT,
                memory_id   INTEGER REFERENCES reminiscences(id),
                event_type  TEXT NOT NULL,
                element      TEXT,
                occurred_at TEXT NOT NULL
            )
        """)
        conn.commit()

def store_memory(content material: str, abstract: str = None, tags: listing[str] = None,
                 significance: float = 5.0, confidence: float = 1.0) -> int:
    with _db() as conn:
        cur = conn.execute("""
            INSERT INTO reminiscences
                (content material, abstract, tags, significance, confidence, created_at)
            VALUES (?, ?, ?, ?, ?, ?)
        """, (
            content material,
            abstract or content material[:120],
            json.dumps(tags or []),
            significance,
            confidence,
            datetime.now().isoformat()
        ))
        conn.commit()
        return cur.lastrowid

def log_event(memory_id: int, event_type: str, element: str = ""):
    # Pulled this out of each module that wanted it — was copy-pasting
    # the identical INSERT 4 occasions. Basic.
    with _db() as conn:
        conn.execute("""
            INSERT INTO memory_events (memory_id, event_type, element, occurred_at)
            VALUES (?, ?, ?, ?)
        """, (memory_id, event_type, element, datetime.now().isoformat()))
        conn.commit()

init_db()

The attention-grabbing columns are those you don’t see in an ordinary reminiscence schema: confidence, decay_score, standing, contradicted_by, expires_at. Each solutions a query a few reminiscence’s well being that “does it exist?” can’t.

Reminiscence Decay

The primary drawback is just about easy: previous reminiscences don’t tidy themselves.

Every reminiscence within the database is assigned a decay_score from 0 to 1. It begins at 1.0 on the level of creation and decays over time, relying on how way back the reminiscence was final accessed.

Recollections you retain referencing keep recent. Whereas reminiscences that aren’t consulted for a number of months fade in the direction of zero.

As soon as they fall under the relevance threshold, they’re archived, not deleted, as a result of fading away doesn’t imply they had been improper, simply now not helpful.

# decay.py
import math
from datetime import datetime
from memory_store import _db, log_event

HALF_LIFE_DAYS = 30  # tune this — 30 works effectively for conversational reminiscence,
                     # push to 90+ when you're monitoring long-running tasks

def _decay_score(last_accessed: str | None, created_at: str, access_count: int) -> float:
    ref = last_accessed or created_at
    days_idle = (datetime.now() - datetime.fromisoformat(ref)).days

    # Customary exponential decay: e^(-ln2 * t / half_life)
    # (In observe, the rating halves each HALF_LIFE_DAYS.)

    rating = math.exp(-0.693 * days_idle / HALF_LIFE_DAYS)

    # Continuously accessed reminiscences earn a small freshness bonus.
    # Cap at 1.0 — this is not meant to inflate past recent.
    return min(1.0, rating + min(0.3, access_count * 0.03))

def run_decay_pass():
    """Run each day. Updates scores, archives something under 0.1."""
    with _db() as conn:
        rows = conn.execute("""
            SELECT id, created_at, last_accessed, access_count
            FROM reminiscences WHERE standing = 'energetic'
        """).fetchall()

        to_archive = [(r["id"],) for r in rows
                      if _decay_score(r["last_accessed"], r["created_at"], r["access_count"]) < 0.1]
        to_update  = [(_decay_score(r["last_accessed"], r["created_at"], r["access_count"]), r["id"])
                      for r in rows
                      if _decay_score(r["last_accessed"], r["created_at"], r["access_count"]) >= 0.1]

        if to_archive:
            conn.executemany(
                "UPDATE reminiscences SET standing='archived', decay_score=0.0 WHERE id=?",
                to_archive
            )
        if to_update:
            conn.executemany(
                "UPDATE reminiscences SET decay_score=? WHERE id=?",
                to_update
            )
        conn.commit()

    for (mid,) in to_archive:
        log_event(mid, "archived", "decay under threshold")

    print(f"Decay cross: {len(to_update)} up to date, {len(to_archive)} archived.")

HALF_LIFE_DAYS lives on the module stage as a result of that’s the quantity you’ll seemingly wish to change, and default values for features stay someplace in limbo.

The batched executemany as a substitute of looping execute issues when you’ve amassed just a few hundred reminiscences. SQLite is quick, however not “500 particular person commits in a each day cron job” quick.

That is additionally what would have caught the problem with Bun.js again on the intro of this put up. My forgotten reminiscence would have light away inside two months, with out me even having to delete it.

Contradiction Detection

That is the half no one builds and the one which causes essentially the most harm when it’s lacking.

Let’s take this situation: you inform the AI that you just’re utilizing PostgreSQL. Then three months later, you migrate to MySQL, briefly mentioning it in dialog.

Now, you will have fourteen reminiscences associated to PostgreSQL with excessive significance, whereas your single reminiscence involving MySQL has low significance.

So once you ask about your database setup six months from now, the AI confidently says “you’re utilizing PostgreSQL,” and also you spend ten minutes confused earlier than you realise what’s taking place.

I bumped into this myself. I’d stopped utilizing poetry and began utilizing uv as my dependency supervisor, I discussed it as soon as, with out triggering a excessive significance rating, after which spent per week questioning why the assistant saved suggesting poetry instructions. The previous reminiscence wasn’t improper; it simply hadn’t been outmoded.

The repair: when a brand new reminiscence is created, verify whether or not it contradicts something already saved and actively mark older ones as outmoded.

# contradiction.py
import json
from openai import OpenAI
from memory_store import _db, log_event

shopper = OpenAI()

def _build_index(exclude_id: int) -> str:
    with _db() as conn:
        rows = conn.execute("""
            SELECT id, abstract FROM reminiscences
            WHERE standing = 'energetic' AND id != ?
            ORDER BY significance DESC, created_at DESC
            LIMIT 80
        """, (exclude_id,)).fetchall()
    return "n".be a part of(f"[{r['id']}] {r['summary']}" for r in rows)

def check_for_contradictions(new_content: str, new_id: int) -> listing[int]:
    """
    Name instantly after storing a brand new reminiscence.
    Returns IDs of reminiscences now outmoded by the brand new one.
    """
    index = _build_index(exclude_id=new_id)
    if not index:
        return []

    resp = shopper.chat.completions.create(
        mannequin="gpt-4o-mini",
        temperature=0,
        messages=[{"role": "user", "content": f"""A new memory was just stored:
"{new_content}"

Which of these existing memories does it directly contradict or supersede?
A contradiction means the new info makes the old one factually wrong or outdated.

NOT contradictions:
- "User likes Python" vs "User also uses JavaScript" (additive, not contradictory)
- "Working on study tracker" vs "Added auth to study tracker" (same project, progression)

CONTRADICTIONS:
- "Uses PostgreSQL" vs "Migrated to MySQL" (one replaces the other)
- "Deadline is March 15" vs "Deadline pushed to April 1" (superseded)

EXISTING MEMORIES:
{index}

JSON array of IDs only. [] if none."""}]
    )

    uncooked = resp.selections[0].message.content material.strip()
    strive:
        old_ids = json.masses(uncooked)
        if not isinstance(old_ids, listing):
            return []
    besides json.JSONDecodeError:
        return []

    if not old_ids:
        return []

    now = __import__("datetime").datetime.now().isoformat()
    with _db() as conn:
        conn.executemany("""
            UPDATE reminiscences
            SET standing = 'outmoded', contradicted_by = ?
            WHERE id = ? AND standing = 'energetic'
        """, [(new_id, oid) for oid in old_ids])
        conn.commit()

    for oid in old_ids:
        log_event(oid, "outmoded", f"by #{new_id}: {new_content[:100]}")

    return old_ids

However the contradicted_by deserves an additional point out. When a reminiscence is outmoded by a more moderen one, it’s not merely deleted. Relatively, a reference to the alternative is added to it, enabling you to backtrack to the unique reminiscence from the up to date one when wanted.

For those who’re debugging why the AI mentioned one thing bizarre, you may pull up the reminiscence it used and hint its historical past by way of memory_events. Seems, “why does the AI suppose this?” is a query you ask extra typically than you’d anticipate.

As for the 80-memory restrict within the contradiction verify, it’s fairly affordable because you don’t essentially want the entire reminiscences obtainable to search out conflicts. These reminiscences which have the very best possibilities of contradicting the brand new reminiscence are latest and extremely vital anyway, which is what the ORDER BY displays.

Confidence Scoring

Two reminiscences may be about the identical truth. In a single case, the declare is explicitly made: “I exploit FastAPI, all the time have.” In one other case, the opposite was inferred (“they appear to favor async frameworks”). These shouldn’t be weighted equally.

Confidence scores are what assist the system differentiate between what you mentioned to it and what it discovered about you. It begins at evaluation time, in the intervening time a reminiscence is saved, with one small LLM name:

# confidence.py
from openai import OpenAI
from memory_store import _db, log_event
from datetime import datetime

shopper = OpenAI()

def assess_confidence(content material: str, user_msg: str, assistant_msg: str) -> float:
    """
    Synchronous LLM name within the write path. Provides ~200ms.
    Value it for reminiscences that'll affect responses for months.
    """
    resp = shopper.chat.completions.create(
        mannequin="gpt-4o-mini",
        temperature=0,
        messages=[{"role": "user", "content": f"""Rate confidence in this memory (0.0-1.0):

MEMORY: {content}

FROM THIS EXCHANGE:
User: {user_msg}
Assistant: {assistant_msg}

Scale:
1.0 = explicit, direct statement ("I use Python", "deadline is March 15")
0.7 = clearly implied but not stated outright
0.5 = reasonable inference, could be wrong
0.3 = weak inference — user might disagree
0.1 = speculation

Single float only."""}]
    )

    strive:
        return max(0.0, min(1.0, float(resp.selections[0].message.content material.strip())))
    besides ValueError:
        return 0.5


def reinforce(memory_id: int, bump: float = 0.1):
    """
    Bump confidence when a later dialog confirms one thing the system already knew.

    TODO: I have never wired up the detection that triggers this but —
    determining "this new dialog confirms reminiscence X" is more durable than it sounds.
    The operate works, the caller does not exist. Will replace when I've one thing
    that does not produce too many false positives.
    """
    with _db() as conn:
        conn.execute("""
            UPDATE reminiscences
            SET confidence    = MIN(1.0, confidence + ?),
                access_count  = access_count + 1,
                last_accessed = ?
            WHERE id = ?
        """, (bump, datetime.now().isoformat(), memory_id))
        conn.commit()
    log_event(memory_id, "bolstered", f"+{bump:.2f}")

The reinforce operate is partially full, and I’m being upfront about that.

The logic for detecting “this dialog confirms an current reminiscence” is genuinely onerous to get proper with out producing false positives, and I’d fairly ship trustworthy, incomplete code than assured code that does the improper factor quietly. It’s in there, it really works, the set off simply doesn’t exist but.

Confidence immediately influences the retrieval sorting. A reminiscence that’s rated at 8 significance however solely 0.3 confidence ranks behind a reminiscence with significance at 6 and confidence at 0.9.

That is precisely the thought. Excessive confidence in a weaker reminiscence beats low confidence in a strong-seeming one when the query is “what does the AI truly know?”

Compression and Elevation

Nick’s consolidation agent seems to be for similarities throughout reminiscences. However what I want to do is be much more aggressive: discover teams of reminiscences which might be principally repeating themselves in different conversations, and substitute these with one higher entry.

Not “what connects these?”; “can I substitute these 5 with one?”

In different phrases, you’re not grouping reminiscences, you’re rewriting them right into a cleaner model of the reality.

After just a few months of working with a private assistant, you get fairly just a few duplicate reminiscences. “Person prefers brief operate names” from January. “Person talked about maintaining code readable over intelligent” from February. “Person requested to keep away from one-liners within the refactor” from March.

This is identical desire. It ought to be put collectively right into a single reminiscence.

# compression.py
import json
from openai import OpenAI
from memory_store import _db, log_event, store_memory
from datetime import datetime

shopper = OpenAI()

def run_compression_pass():
    """
    Full compression cycle: discover clusters, merge every, archive originals.
    Runs weekly. Calls gpt-4o for synthesis so it is not low-cost — do not
    set off this on each session.
    """
    with _db() as conn:
        rows = conn.execute("""
            SELECT id, abstract, confidence, access_count, significance
            FROM reminiscences
            WHERE standing = 'energetic'
            ORDER BY significance DESC, access_count DESC
            LIMIT 100
        """).fetchall()

    if len(rows) < 5:
        return

    index = "n".be a part of(
        f"[{r['id']}] (conf:{r['confidence']:.1f} hits:{r['access_count']}) {r['summary']}"
        for r in rows
    )

    # gpt-4o-mini for cluster identification — simply grouping, not synthesising
    cluster_resp = shopper.chat.completions.create(
        mannequin="gpt-4o-mini",
        temperature=0,
        messages=[{"role": "user", "content": f"""Review this memory index and identify groups that
could be merged into a single, more useful memory.

Merge candidates:
- Multiple memories about the same topic from different conversations
- Incremental updates that could be expressed as one current state
- Related preferences that form a clear pattern

Do NOT merge:
- Different topics that share a tag
- Memories where each individual detail matters separately

MEMORY INDEX:
{index}

JSON array of arrays. Example: [[3,7,12],[5,9]]
Return [] if nothing qualifies."""}]
    )

    strive:
        clusters = json.masses(cluster_resp.selections[0].message.content material.strip())
        clusters = [c for c in clusters if isinstance(c, list) and len(c) >= 2]
    besides (json.JSONDecodeError, TypeError):
        return

    if not clusters:
        return

    row_map = {r["id"]: r for r in rows}
    for cluster_ids in clusters:
        legitimate = [mid for mid in cluster_ids if mid in row_map]
        if len(legitimate) >= 2:
            _compress(legitimate, row_map)


def _compress(memory_ids: listing[int], row_map: dict):
    """Synthesise a cluster into one elevated reminiscence, archive the remainder."""
    with _db() as conn:
        ph = ",".be a part of("?" * len(memory_ids))
        source_rows = conn.execute(
            f"SELECT id, content material, significance, access_count FROM reminiscences WHERE id IN ({ph})",
            memory_ids
        ).fetchall()

    if not source_rows:
        return

    bullets       = "n".be a part of(f"- {r['content']}" for r in source_rows)
    avg_importance = sum(r["importance"] for r in source_rows) / len(source_rows)
    peak_access    = max(r["access_count"] for r in source_rows)

    # gpt-4o for the precise merge — that is the step that decides
    # what survives, so use the higher mannequin
    synth_resp = shopper.chat.completions.create(
        mannequin="gpt-4o",
        temperature=0,
        messages=[{"role": "user", "content": f"""Compress these related memories into one better memory.
Be specific. Keep all important details. Don't repeat yourself.

MEMORIES:
{bullets}

JSON: {{"content": "...", "summary": "max 120 chars", "tags": ["..."]}}"""}]
    )

    strive:
        merged = json.masses(synth_resp.selections[0].message.content material.strip())
    besides json.JSONDecodeError:
        return  # synthesis failed, depart originals alone

    with _db() as conn:
        ph = ",".be a part of("?" * len(memory_ids))
        conn.execute(
            f"UPDATE reminiscences SET standing='compressed' WHERE id IN ({ph})",
            memory_ids
        )
        cur = conn.execute("""
            INSERT INTO reminiscences
                (content material, abstract, tags, significance, confidence,
                 access_count, decay_score, standing, created_at)
            VALUES (?, ?, ?, ?, 0.85, ?, 1.0, 'energetic', ?)
        """, (
            merged["content"],
            merged.get("abstract", merged["content"][:120]),
            json.dumps(merged.get("tags", [])),
            min(10.0, avg_importance * 1.2),
            peak_access,
            datetime.now().isoformat()
        ))
        conn.commit()
        new_id = cur.lastrowid

    for mid in memory_ids:
        log_event(mid, "compressed", f"merged into #{new_id}")

    print(f"[compression] {len(memory_ids)} reminiscences collapsed into #{new_id}")

The cluster identification makes use of gpt-4o-mini since that’s all we’re doing at this level. The synthesis makes use of gpt-4o as a result of that’s the place precise data is being created from a number of sources.

Doing each with a budget mannequin to save lots of just a few cents felt just like the improper trade-off for one thing that runs as soon as per week and makes everlasting selections.

The merged reminiscence will get confidence=0.85. Positively not 1.0, since compression stays a synthesis course of, which can end in lack of nuance. However 0.85 acknowledges the excessive sign energy in a number of converging conversations.

Expiring Recollections

Some issues shouldn’t final without end by design. A deadline. A brief blocker. “Ready to listen to again from Alice in regards to the API spec.” That’s helpful context in the present day. In three weeks, it’s simply noise.

# expiry.py
import json
from openai import OpenAI
from memory_store import _db, log_event
from datetime import datetime

shopper = OpenAI()

def maybe_set_expiry(content material: str, memory_id: int):
    """Examine at write time whether or not this reminiscence has a pure finish date."""
    in the present day = datetime.now().strftime("%Y-%m-%d")

    resp = shopper.chat.completions.create(
        mannequin="gpt-4o-mini",
        temperature=0,
        messages=[{"role": "user", "content": f"""Does this memory have a natural expiration?

MEMORY: "{content}"
TODAY: {today}

Expires if it contains:
- A deadline or specific due date
- A temporary state ("currently blocked on...", "waiting for...")
- A one-time event ("meeting Thursday", "presenting tomorrow")
- An explicit time bound ("this sprint", "until we ship v2")

If yes: {{"expires": true, "date": "YYYY-MM-DD"}}
If no:  {{"expires": false}}

JSON only."""}]
    )

    strive:
        parsed = json.masses(resp.selections[0].message.content material.strip())
    besides json.JSONDecodeError:
        return

    if parsed.get("expires") and parsed.get("date"):
        with _db() as conn:
            conn.execute(
                "UPDATE reminiscences SET expires_at=? WHERE id=?",
                (parsed["date"], memory_id)
            )
            conn.commit()


def purge_expired():
    """Archive something previous its expiry date. Protected to name each day."""
    now = datetime.now().isoformat()

    with _db() as conn:
        expired = [
            r["id"] for r in conn.execute("""
                SELECT id FROM reminiscences
                WHERE expires_at IS NOT NULL
                  AND expires_at < ?
                  AND standing = 'energetic'
            """, (now,)).fetchall()
        ]
        if expired:
            conn.executemany(
                "UPDATE reminiscences SET standing='expired' WHERE id=?",
                [(mid,) for mid in expired]
            )
            conn.commit()

    # Log occasions after closing the write connection.
    # log_event opens its personal connection — nesting them on the identical
    # SQLite file can impasse in default journal mode.
    for mid in expired:
        log_event(mid, "expired", "previous expiry date")

    if expired:
        print(f"Expired {len(expired)} reminiscences.")

The cause discipline that was in an earlier model of this bought lower. It was satisfying to mannequin, however nothing ever learn it. Unused columns in SQLite are nonetheless columns it’s important to bear in mind exist. The date string is sufficient.

Wiring It Collectively

The entire structure separating the quick, synchronous write path from the asynchronous background lifecycle scheduler. Picture by writer.

All 5 passes want a scheduler. Right here’s the coordinator, with threading accomplished correctly:

# lifecycle.py
import time
import threading
from datetime import datetime, timedelta
from decay import run_decay_pass
from expiry import purge_expired
from compression import run_compression_pass

class LifecycleScheduler:
    """
    Background upkeep for the reminiscence retailer.
    Decay + expiry run each day. Compression runs weekly (calls gpt-4o).

    Utilization:
        scheduler = LifecycleScheduler()
        scheduler.begin()      # as soon as at startup
        scheduler.force_run()  # for testing
        scheduler.cease()       # clear shutdown
    """

    def __init__(self, decay_interval_h: int = 23, compression_interval_days: int = 6):
        self._decay_interval    = timedelta(hours=decay_interval_h)
        self._compress_interval = timedelta(days=compression_interval_days)
        self._last_decay        = None
        self._last_compression  = None
        self._stop_event        = threading.Occasion()
        self._thread            = None

    def begin(self):
        if self._thread and self._thread.is_alive():
            return
        self._stop_event.clear()
        self._thread = threading.Thread(goal=self._loop, daemon=True)
        self._thread.begin()

    def cease(self):
        self._stop_event.set()

    def force_run(self):
        self._run(pressure=True)

    def _loop(self):
        whereas not self._stop_event.is_set():
            self._run()
            # Sleep briefly increments so cease() is definitely responsive.
            # threading.Occasion().wait() in a loop creates a brand new Occasion each
            # iteration that is by no means set — seems to be proper, blocks accurately,
            # however cease() by no means truly wakes it up.
            for _ in vary(60):
                if self._stop_event.is_set():
                    break
                time.sleep(60)

    def _run(self, pressure: bool = False):
        now = datetime.now()
        print(f"[lifecycle] {now.strftime('%H:%M:%S')}")

        purge_expired()

        if pressure or not self._last_decay or (now - self._last_decay) >= self._decay_interval:
            run_decay_pass()
            self._last_decay = now

        if pressure or not self._last_compression or (now - self._last_compression) >= self._compress_interval:
            run_compression_pass()
            self._last_compression = now

        print("[lifecycle] accomplished.")

And the write path, the place contradiction detection, confidence scoring, and expiry all get triggered each time a reminiscence is saved:

# memory_writer.py
import json
from openai import OpenAI
from memory_store import store_memory
from confidence import assess_confidence
from contradiction import check_for_contradictions
from expiry import maybe_set_expiry

shopper = OpenAI()

def maybe_store(user_msg: str, assistant_msg: str) -> int | None:
    resp = shopper.chat.completions.create(
        mannequin="gpt-4o-mini",
        temperature=0,
        messages=[{"role": "user", "content": f"""Should this conversation turn be saved to long-term memory?

USER: {user_msg}
ASSISTANT: {assistant_msg}

Save if it contains:
- user preferences or personal context
- project decisions, trade-offs made
- bugs found, fixes applied, approaches ruled out
- explicit instructions ("always...", "never...", "I prefer...")

Don't save: greetings, one-off lookups, generic back-and-forth.

If yes: {{"save": true, "content": "...", "summary": "max 100 chars", "tags": [...], "significance": 1-10}}
If no:  {{"save": false}}
JSON solely."""}]
    )

    strive:
        choice = json.masses(resp.selections[0].message.content material.strip())
    besides json.JSONDecodeError:
        return None

    if not choice.get("save"):
        return None

    confidence = assess_confidence(choice["content"], user_msg, assistant_msg)

    mid = store_memory(
        content material    = choice["content"],
        abstract    = choice.get("abstract"),
        tags       = choice.get("tags", []),
        significance = choice.get("significance", 5),
        confidence = confidence
    )

    outmoded = check_for_contradictions(choice["content"], mid)
    if outmoded:
        print(f"[memory] #{mid} outmoded {outmoded}")

    maybe_set_expiry(choice["content"], mid)

    return mid

What Retrieval Seems Like Now

With the lifecycle operating, the reminiscence index the LLM reads on each question carries an precise sign about every reminiscence’s well being:

# retrieval.py
import json
from datetime import datetime
from openai import OpenAI
from memory_store import _db

shopper = OpenAI()

def get_active_memories(restrict: int = 60) -> listing[dict]:
    with _db() as conn:
        rows = conn.execute("""
            SELECT id, content material, abstract, tags, significance,
                   confidence, decay_score, access_count, created_at
            FROM reminiscences
            WHERE standing = 'energetic'
              AND decay_score > 0.15
            ORDER BY (significance * confidence * decay_score) DESC
            LIMIT ?
        """, (restrict,)).fetchall()
    return [dict(r) for r in rows]

def retrieve_relevant_memories(question: str, top_n: int = 6) -> listing[dict]:
    reminiscences = get_active_memories()
    if not reminiscences:
        return []

    index = "n".be a part of(
        f"[{m['id']}] (conf:{m['confidence']:.1f} recent:{m['decay_score']:.1f}) {m['summary']}"
        for m in reminiscences
    )

    resp = shopper.chat.completions.create(
        mannequin="gpt-4o-mini",
        temperature=0,
        messages=[{"role": "user", "content": f"""Pick the most relevant memories for this message.

MEMORY INDEX (conf=confidence 0-1, fresh=recency 0-1):
{index}

MESSAGE: {query}

Prefer high-conf, high-fresh memories when relevance is otherwise equal.
JSON array of IDs, max {top_n}. Return [] if nothing suits."""}]
    )

    uncooked = resp.selections[0].message.content material.strip()
    strive:
        ids = json.masses(uncooked)
        if not isinstance(ids, listing):
            return []
    besides json.JSONDecodeError:
        return []

    mem_by_id = {m["id"]: m for m in reminiscences}
    chosen  = []
    now       = datetime.now().isoformat()

    with _db() as conn:
        for mid in ids:
            if mid not in mem_by_id:
                proceed
            conn.execute("""
                UPDATE reminiscences
                SET access_count = access_count + 1, last_accessed = ?
                WHERE id = ?
            """, (now, mid))
            chosen.append(mem_by_id[mid])
        conn.commit()

    return chosen

The type order in get_active_memories is significance * confidence * decay_score. That composite rating is the place all 5 lifecycle ideas converge into one quantity. A reminiscence that’s vital however poorly supported surfaces under one which’s reasonably vital and constantly bolstered. One which hasn’t been touched in three months competes poorly towards a latest one, no matter its unique rating.

That is what the state of well being of the data seems to be like. And that’s precisely what we wish!

Is This Overkill?

No. However I believed it was, for longer than I’d prefer to admit.

I saved telling myself I’d add these things “later, when the system bought larger.” However that’s not true. It’s not about how massive the system is; it’s about how lengthy it’s been round. Simply three months of on a regular basis utilization is greater than sufficient.

In my case, I discovered myself manually battling decay by the second month, opening up the SQLite file by way of the DB Browser, manually deleting rows, and manually updating the significance scores.

And that’s exactly what you need to by no means do: when you’re manually cleansing the system, the system isn’t actually working.

The overhead is actual, nevertheless it’s small. Decay and expiry are pure SQLite, milliseconds. Contradiction detection provides one gpt-4o-mini name per write, perhaps 200ms. Compression calls gpt-4o however runs as soon as per week on a handful of clusters.

General, the fee for a each day private assistant is just a few additional mini calls per dialog and a weekly synthesis job that most likely prices lower than a cup of espresso per 30 days.

Nicely, it is dependent upon your intention. In case you are constructing a system you’re going to use for 2 weeks after which put to another use, neglect about all the things under. Retailer-and-retrieve is sufficient. However in case you are engaged on one thing you propose to get to know you, which is what’s intriguing right here, what we’re speaking about is non-negotiable.

The place This Really Leaves You

Nick Lawson confirmed that the embedding pipeline may be non-obligatory at a private scale. This opened up the potential for an easier structure. What this text offers is the operational framework that makes that structure work past the primary month.

There are different doable ideas for the design of the reminiscence lifecycle; decay, contradiction, confidence, compression, and expiry will not be the one choices, however these are those that I saved wishing I had for debugging my very own database.

And since every of those depends on the identical SQLite information construction and LLM judgment-based framework that Nick launched, you might be nonetheless zero infrastructure. You solely want one native file. You may learn all of it. You may hint the occasions of your total reminiscence lifecycle in memory_events.

You may open the database and ask: why does the agent suppose this? What bought outmoded? What decayed? What bought merged into what? The system’s reasoning is clear in a manner {that a} vector index by no means is.

That issues greater than I anticipated it to. Not only for debugging. For belief. An AI assistant you may audit is one you’ll belief. Belief is what turns a instrument into one thing you truly depend on.

And that solely occurs when your system is aware of not simply the best way to bear in mind, however when to neglect.

Earlier than you go!

I’m constructing a neighborhood for builders and information scientists the place I share sensible tutorials, break down advanced CS ideas, and drop the occasional rant in regards to the tech business.

If that appears like your type of area, be a part of my free e-newsletter.

Join With Me

Cease Treating AI Reminiscence Like a Search Downside

Why Each AI Coding Assistant Wants a Reminiscence Layer

How Does AI Study to See in 3D and Perceive House?

Related Posts

Why Each AI Coding Assistant Wants a Reminiscence Layer

How Does AI Study to See in 3D and Perceive House?

Detecting Translation Hallucinations with Consideration Misalignment

Democratizing Advertising and marketing Combine Fashions (MMM) with Open Supply and Gen AI

Learn how to Run Claude Code Brokers in Parallel

Proxy-Pointer RAG: Reaching Vectorless Accuracy at Vector RAG Scale and Value

Leave a Reply Cancel reply

POPULAR NEWS

Gemini 2.0 Flash vs GPT 4o: Which is Higher?

Chainlink’s Run to $20 Beneficial properties Steam Amid LINK Taking the Helm because the High Creating DeFi Challenge ⋆ ZyCrypto

Easy methods to Use LLMs for Highly effective Computerized Evaluations

XMN is accessible for buying and selling!

College endowments be a part of crypto rush, boosting meme cash like Meme Index

EDITOR'S PICK

Classifier-Free Steering for LLMs Efficiency Enhancing | by Roman S | Dec, 2024

Constructing A Trendy Dashboard with Python and Taipy

Free Instruments to Check Web site Accessibility

3 AI Use Instances (That Are Not a Chatbot) | by Shaw Talebi

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?

Cease Treating AI Reminiscence Like a Search Downside

READ ALSO

The Downside With “Retailer and Retrieve”

The Basis (Temporary, I Promise)

Reminiscence Decay

Contradiction Detection

Confidence Scoring

Compression and Elevation

Expiring Recollections

Wiring It Collectively

What Retrieval Seems Like Now

Is This Overkill?

The place This Really Leaves You

Earlier than you go!

Join With Me

Related Posts

Leave a Reply Cancel reply

POPULAR NEWS

EDITOR'S PICK

About Us

Categories

Recent Posts

Are you sure want to unlock this post?

Are you sure want to cancel subscription?