meetingsmemoryagents

Meeting Transcript to Agent Memory Pipeline

Extract action items and decisions from meeting transcripts, enrich with web context via search API, store as agent-accessible memory. Full pipeline.

May 20, 2026

8 min

Meeting data becomes useful agent memory only after splitting into structured buckets: facts mentioned, decisions made, action items with owners and dates, open questions, and stale context. Raw transcript search is useful but treating every sentence as durable truth breaks agents that act on the information.

The problem with raw transcripts

A 60-minute meeting transcript is 8,000-12,000 tokens. Feeding it directly into an agent's context window wastes capacity and includes irrelevant chatter, tentative statements, and contradicted points. The agent cannot distinguish "we decided to use PostgreSQL" from "maybe we should consider PostgreSQL."

Structured extraction pipeline

Python

def extract_meeting_memory(transcript: str) -> dict:
    """Extract structured memory from meeting transcript."""
    prompt = """Extract from this meeting transcript:
    1. FACTS: Verified statements (not opinions or maybes)
    2. DECISIONS: Things explicitly decided with consensus
    3. ACTION_ITEMS: Tasks with owner and deadline
    4. OPEN_QUESTIONS: Unresolved issues needing follow-up
    5. STALE_CONTEXT: Things mentioned as changing or uncertain

    Format as JSON. Only include items with high confidence.
    Mark anything tentative or debated as OPEN_QUESTIONS, not DECISIONS."""

    result = call_llm(prompt + "\n\n" + transcript)
    return json.loads(result)

# Example output:
# {
#   "facts": ["Q1 revenue was $2.3M", "Current team size is 12"],
#   "decisions": ["Switch to PostgreSQL by March",
#                  "Hire 2 more engineers"],
#   "action_items": [
#     {"task": "PostgreSQL migration plan", "owner": "Sarah",
#      "deadline": "2026-02-15"},
#   ],
#   "open_questions": ["Should we use RDS or self-host?"],
#   "stale_context": ["Budget may change after board meeting"]
# }

Making it queryable for agents

Python

import sqlite3, json

db = sqlite3.connect("meeting_memory.db")
db.execute("""CREATE TABLE IF NOT EXISTS memory (
    id INTEGER PRIMARY KEY, meeting_date TEXT,
    category TEXT, content TEXT, owner TEXT,
    deadline TEXT, confidence REAL)""")

def store_memory(meeting_date: str, extracted: dict):
    for category in ["facts", "decisions", "open_questions"]:
        for item in extracted.get(category, []):
            text = item if isinstance(item, str) else json.dumps(item)
            db.execute(
                "INSERT INTO memory (meeting_date, category, content, confidence) VALUES (?,?,?,?)",
                (meeting_date, category, text, 0.9))
    for item in extracted.get("action_items", []):
        db.execute(
            "INSERT INTO memory (meeting_date, category, content, owner, deadline, confidence) VALUES (?,?,?,?,?,?)",
            (meeting_date, "action_item", item["task"],
             item.get("owner"), item.get("deadline"), 0.95))
    db.commit()

Cross-model portability

A structured meeting memory database outlasts any single model's context window and survives model version upgrades. When the next Claude or GPT version drops and the agent stack needs re-onboarding, the meeting memory is already in a portable format that any model can query.

What not to store

Casual conversation and greetings (noise)
Tentative ideas presented but not agreed on (misleading as facts)
Context that will change by next meeting (set TTL on stale items)

Meeting Transcript to Agent Memory Pipeline

The problem with raw transcripts

Structured extraction pipeline

Making it queryable for agents

Cross-model portability

What not to store

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph