RAG & Memory

Document ingestion, semantic search, and agent memory systems

This guide covers the RAG (Retrieval-Augmented Generation) and Memory components in detail.

RAG — Document Search Pipeline

RAG enables your AI to answer questions based on your own documents.

Setup

gencow add RAG

This creates:

  • gencow/rag.ts — legacy local RAG helpers (ingest, retrieve, ask) plus canonical grounded-answer facades
  • gencow/schema-rag.ts — local rag_documents table for the legacy helpers

Important: Import schema-rag.ts in your schema to create the tables:

// gencow/schema.ts
export * from "./schema-rag";

Grounded answers use a different corpus path. rag.askGrounded(), rag.compareCorpus(), and rag.extractTopics() read canonical Phase 2 rag_* tables populated through documents.ingest.*; documents inserted by rag.ingest() are only available to rag.retrieve() / rag.ask().

For production RAG and grounded answers, use the built-in canonical pipeline:

storage file -> documents.ingest.start -> rag_* tables -> ctx.search() / ctx.grounding.answer()

The canonical tables are rag_corpora, rag_sources, rag_sections, rag_chunks, rag_ingest_jobs, and rag_operation_metrics. This path is tenant-scoped and is the only path used by grounded citations.

Canonical ingest creates chunk embeddings through the OpenAI-compatible /platform/ai/v1/embeddings route in deployed apps. The runtime no longer uses ctx.ai for document ingest. If no embedding target is configured in local development, chunks are still stored for keyword search and embedding remains null; if a configured proxy returns non-JSON, ingest fails with the upstream status, route, and content type in the error.

Start Ingest

import { defineMutation, defineQuery, useMutation } from "@gencow/react";

export const api = {
    ragIngest: {
        start: defineMutation("documents.ingest.start"),
        jobs: defineQuery("rag_ingest_jobs.list"),
    },
};

// inside a React component
const { mutate: startIngest } = useMutation(api.ragIngest.start);

async function handleIngest(storageId: string) {
    await startIngest({
        storageId,
        corpus: "manuals",
        visibility: "shared",
        sourceKey: "refund-policy.pdf",
        mode: "auto",
        provider: "auto",
    });
}

Search Canonical Chunks

import { v } from "@gencow/core";
import { procedure } from "./runtime";

export const searchManuals = procedure.query
    .name("manuals.search")
    .input(v.object({ question: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        return ctx.search("rag_chunks", input.question, {
            fields: ["chunk_text", "lexical_text"],
            scope: { corpus: "manuals", visibility: "shared" },
            limit: 10,
        });
    });

Grounded Answer

import { v } from "@gencow/core";
import { procedure } from "./runtime";

export const askManuals = procedure.query
    .name("manuals.ask")
    .input(v.object({ question: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();
        if (!ctx.grounding) throw new Error("Grounding runtime is not available");

        return ctx.grounding.answer({
            question: input.question,
            scope: { corpus: "manuals", visibility: "shared" },
            mode: "qa",
            budget: {
                maxVerifyLoops: 2,
                maxResearchQueriesPerLoop: 3,
                maxCitationsPerClaim: 3,
            },
        });
    },
});

Operations Surface

import { defineMutation, defineQuery } from "@gencow/react";

export const api = {
    ragOps: {
        summary: defineQuery("rag_operations.summary"),
        metrics: defineQuery("rag_operations.metrics"),
        evaluate: defineMutation("rag_evaluations.run"),
        reindexPlan: defineQuery("rag_reindex.plan"),
    },
};

Use these operations to inspect source/chunk counts, ingest status, grounded-answer metrics, evaluation fixtures, and reindex candidates.

Legacy Local Ingest

rag.ingest(), rag.retrieve(), and rag.ask() are lightweight starter helpers backed by the generated rag_documents table. They are useful for demos and local experiments, but they do not populate canonical rag_* tables.

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { rag } from "./rag";

export const ingestDoc = procedure.mutation
    .name("docs.ingest")
    .input(v.object({ source: v.string(), content: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();
        await rag.ingest(ctx, input.source, input.content);
        return { status: "ingested", source: input.source };
    });

Legacy Search Documents

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { rag } from "./rag";

export const search = procedure.query
    .name("docs.search")
    .input(v.object({ question: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();
        return rag.retrieve(ctx, input.question);
    });

Legacy RAG + AI Q&A Pipeline

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";
import { rag } from "./rag";

export const ask = procedure.mutation
    .name("docs.ask")
    .input(v.object({ question: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        const results = await rag.retrieve(ctx, input.question);

        // 2. Build context from results
        const context = results
            .map(r => `[${r.source}]: ${r.content}`)
            .join("\n\n");

        // 3. Ask AI with context
        const result = await ai.chat({
            system: `Answer based on the following context. If the answer is not in the context, say "I don't know."

Context:
${context}`,
            messages: [{ role: "user", content: input.question }],
        });

        return {
            answer: result.text,
            sources: results.map(r => r.source),
        };
    });

Legacy Parsers + Reranker

Production file ingest should use documents.ingest.* and workflow document conversion rather than calling legacy parsers directly. See Document Conversion for PDF, HWPX, DOCX, and XLSX routing.

Full document pipeline:

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { parsers } from "./parsers";
import { rag } from "./rag";
import { reranker } from "./reranker";

// Ingest: Parse PDF → chunk → embed → store
export const ingestPdf = procedure.mutation
    .name("docs.ingestPdf")
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        const file = input?.["file"] as File;
        if (!file || typeof file === "string") throw new Error("No file");

        const text = await parsers.pdf(Buffer.from(await file.arrayBuffer()));
        await rag.ingest(ctx, file.name, text);
    });

// Search: Query → search → rerank → top results
export const smartSearch = procedure.query
    .name("docs.smartSearch")
    .input(v.object({ question: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();
        return reranker.searchAndRerank(ctx, rag, input.question);
    });

// Grounded answer: returns answer, claims, citations, warnings, grounded.
// Requires canonical Phase 2 ingestion into rag_* tables, not rag.ingest().
export const askGroundedDocs = procedure.query
    .name("docs.askGrounded")
    .input(v.object({ question: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        return await rag.askGrounded(ctx, input.question, {
            corpus: "default",
            visibility: "shared",
        });
    });

Memory — Agent Memory System

Memory gives your AI agents persistent context across conversations.

Setup

gencow add Memory

This creates:

  • gencow/memory.ts — Memory engine
  • gencow/schema-memory.ts — Database tables for memory storage

Import in schema: export * from "./schema-memory";

Three Memory Types

Type Purpose Example
Episodic Recent conversation history "5 minutes ago you asked about…"
Semantic Facts learned about the user "User prefers Korean language"
Procedural Learned behaviors and rules "When user says 'report', generate PDF"

Building Context

Before each AI response, build context from memory:

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";
import { memory } from "./memory";

export const chat = procedure.mutation
    .name("agent.chat")
    .input(v.object({
        sessionId: v.string(),
        message: v.string(),
    }))
    .handler(async ({ context: ctx, input }) => {
        const session = ctx.auth.requireAuth();
        const userId = session.user.id;

        const memCtx = await memory.buildContext(
            ctx, userId, input.sessionId, input.message
        );

        const result = await ai.chat({
            system: `You are a helpful assistant.

Memory context:
${memCtx.episodic ? `Recent: ${memCtx.episodic}` : ""}
${memCtx.semantic ? `Known facts: ${memCtx.semantic}` : ""}
${memCtx.procedural ? `Learned rules: ${memCtx.procedural}` : ""}`,
            messages: [{ role: "user", content: input.message }],
        });

        await memory.extract(ctx, userId, input.sessionId, [
            { role: "user", content: input.message },
            { role: "assistant", content: result.text },
        ]);

        return { role: "assistant", content: result.text };
    });

Memory Lifecycle

User: "I prefer responses in Korean"
        │
        ├── memory.extract() → stores semantic memory:
        │   "User prefers Korean language responses"
        │
        [Next conversation]
        │
        ├── memory.buildContext() → retrieves:
        │   semantic: "User prefers Korean language responses"
        │
        └── AI responds in Korean ✅

Next Steps