RAG & Memory
Document ingestion, semantic search, and agent memory systems
This guide covers the RAG (Retrieval-Augmented Generation) and Memory components in detail.
RAG — Document Search Pipeline
RAG enables your AI to answer questions based on your own documents.
Setup
gencow add RAGThis creates:
gencow/rag.ts— legacy local RAG helpers (ingest,retrieve,ask) plus canonical grounded-answer facadesgencow/schema-rag.ts— localrag_documentstable for the legacy helpers
Important: Import
schema-rag.tsin your schema to create the tables:// gencow/schema.ts export * from "./schema-rag";
Grounded answers use a different corpus path.
rag.askGrounded(),rag.compareCorpus(), andrag.extractTopics()read canonical Phase 2rag_*tables populated throughdocuments.ingest.*; documents inserted byrag.ingest()are only available torag.retrieve()/rag.ask().
Canonical RAG Foundation (Recommended)
For production RAG and grounded answers, use the built-in canonical pipeline:
storage file -> documents.ingest.start -> rag_* tables -> ctx.search() / ctx.grounding.answer()The canonical tables are rag_corpora, rag_sources, rag_sections, rag_chunks,
rag_ingest_jobs, and rag_operation_metrics. This path is tenant-scoped and is
the only path used by grounded citations.
Canonical ingest creates chunk embeddings through the OpenAI-compatible
/platform/ai/v1/embeddings route in deployed apps. The runtime no longer uses
ctx.ai for document ingest. If no embedding target is configured in local
development, chunks are still stored for keyword search and embedding remains
null; if a configured proxy returns non-JSON, ingest fails with the upstream
status, route, and content type in the error.
Start Ingest
import { defineMutation, defineQuery, useMutation } from "@gencow/react";
export const api = {
ragIngest: {
start: defineMutation("documents.ingest.start"),
jobs: defineQuery("rag_ingest_jobs.list"),
},
};
// inside a React component
const { mutate: startIngest } = useMutation(api.ragIngest.start);
async function handleIngest(storageId: string) {
await startIngest({
storageId,
corpus: "manuals",
visibility: "shared",
sourceKey: "refund-policy.pdf",
mode: "auto",
provider: "auto",
});
}Search Canonical Chunks
import { v } from "@gencow/core";
import { procedure } from "./runtime";
export const searchManuals = procedure.query
.name("manuals.search")
.input(v.object({ question: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
return ctx.search("rag_chunks", input.question, {
fields: ["chunk_text", "lexical_text"],
scope: { corpus: "manuals", visibility: "shared" },
limit: 10,
});
});Grounded Answer
import { v } from "@gencow/core";
import { procedure } from "./runtime";
export const askManuals = procedure.query
.name("manuals.ask")
.input(v.object({ question: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
if (!ctx.grounding) throw new Error("Grounding runtime is not available");
return ctx.grounding.answer({
question: input.question,
scope: { corpus: "manuals", visibility: "shared" },
mode: "qa",
budget: {
maxVerifyLoops: 2,
maxResearchQueriesPerLoop: 3,
maxCitationsPerClaim: 3,
},
});
},
});Operations Surface
import { defineMutation, defineQuery } from "@gencow/react";
export const api = {
ragOps: {
summary: defineQuery("rag_operations.summary"),
metrics: defineQuery("rag_operations.metrics"),
evaluate: defineMutation("rag_evaluations.run"),
reindexPlan: defineQuery("rag_reindex.plan"),
},
};Use these operations to inspect source/chunk counts, ingest status, grounded-answer metrics, evaluation fixtures, and reindex candidates.
Legacy Local Ingest
rag.ingest(), rag.retrieve(), and rag.ask() are lightweight starter helpers
backed by the generated rag_documents table. They are useful for demos and local
experiments, but they do not populate canonical rag_* tables.
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { rag } from "./rag";
export const ingestDoc = procedure.mutation
.name("docs.ingest")
.input(v.object({ source: v.string(), content: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
await rag.ingest(ctx, input.source, input.content);
return { status: "ingested", source: input.source };
});Legacy Search Documents
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { rag } from "./rag";
export const search = procedure.query
.name("docs.search")
.input(v.object({ question: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
return rag.retrieve(ctx, input.question);
});Legacy RAG + AI Q&A Pipeline
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";
import { rag } from "./rag";
export const ask = procedure.mutation
.name("docs.ask")
.input(v.object({ question: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
const results = await rag.retrieve(ctx, input.question);
// 2. Build context from results
const context = results
.map(r => `[${r.source}]: ${r.content}`)
.join("\n\n");
// 3. Ask AI with context
const result = await ai.chat({
system: `Answer based on the following context. If the answer is not in the context, say "I don't know."
Context:
${context}`,
messages: [{ role: "user", content: input.question }],
});
return {
answer: result.text,
sources: results.map(r => r.source),
};
});Legacy Parsers + Reranker
Production file ingest should use documents.ingest.* and workflow document
conversion rather than calling legacy parsers directly. See
Document Conversion for PDF, HWPX, DOCX, and
XLSX routing.
Full document pipeline:
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { parsers } from "./parsers";
import { rag } from "./rag";
import { reranker } from "./reranker";
// Ingest: Parse PDF → chunk → embed → store
export const ingestPdf = procedure.mutation
.name("docs.ingestPdf")
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
const file = input?.["file"] as File;
if (!file || typeof file === "string") throw new Error("No file");
const text = await parsers.pdf(Buffer.from(await file.arrayBuffer()));
await rag.ingest(ctx, file.name, text);
});
// Search: Query → search → rerank → top results
export const smartSearch = procedure.query
.name("docs.smartSearch")
.input(v.object({ question: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
return reranker.searchAndRerank(ctx, rag, input.question);
});
// Grounded answer: returns answer, claims, citations, warnings, grounded.
// Requires canonical Phase 2 ingestion into rag_* tables, not rag.ingest().
export const askGroundedDocs = procedure.query
.name("docs.askGrounded")
.input(v.object({ question: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
return await rag.askGrounded(ctx, input.question, {
corpus: "default",
visibility: "shared",
});
});Memory — Agent Memory System
Memory gives your AI agents persistent context across conversations.
Setup
gencow add MemoryThis creates:
gencow/memory.ts— Memory enginegencow/schema-memory.ts— Database tables for memory storage
Import in schema:
export * from "./schema-memory";
Three Memory Types
| Type | Purpose | Example |
|---|---|---|
| Episodic | Recent conversation history | "5 minutes ago you asked about…" |
| Semantic | Facts learned about the user | "User prefers Korean language" |
| Procedural | Learned behaviors and rules | "When user says 'report', generate PDF" |
Building Context
Before each AI response, build context from memory:
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";
import { memory } from "./memory";
export const chat = procedure.mutation
.name("agent.chat")
.input(v.object({
sessionId: v.string(),
message: v.string(),
}))
.handler(async ({ context: ctx, input }) => {
const session = ctx.auth.requireAuth();
const userId = session.user.id;
const memCtx = await memory.buildContext(
ctx, userId, input.sessionId, input.message
);
const result = await ai.chat({
system: `You are a helpful assistant.
Memory context:
${memCtx.episodic ? `Recent: ${memCtx.episodic}` : ""}
${memCtx.semantic ? `Known facts: ${memCtx.semantic}` : ""}
${memCtx.procedural ? `Learned rules: ${memCtx.procedural}` : ""}`,
messages: [{ role: "user", content: input.message }],
});
await memory.extract(ctx, userId, input.sessionId, [
{ role: "user", content: input.message },
{ role: "assistant", content: result.text },
]);
return { role: "assistant", content: result.text };
});Memory Lifecycle
User: "I prefer responses in Korean"
│
├── memory.extract() → stores semantic memory:
│ "User prefers Korean language responses"
│
[Next conversation]
│
├── memory.buildContext() → retrieves:
│ semantic: "User prefers Korean language responses"
│
└── AI responds in Korean ✅Next Steps
- CLI Reference — All CLI commands
- React Hooks — Frontend API reference
- Core API — Backend API reference