AI Engine

Built-in AI with import { ai } — chat, images, embeddings, structured output, model catalog, and service-credit billing

Gencow includes a built-in AI engine powered by the Vercel AI SDK and the Gencow Platform AI proxy. Application code should call the generated helper:

import { ai } from "./ai";

In local development, the helper calls OpenAI directly with OPENAI_API_KEY. After cloud deployment, the same code automatically uses the Gencow proxy, so tenant apps do not manage provider API keys.

Quick Setup

# Add the AI component (installs ai + @ai-sdk/openai)
npx gencow@latest add AI

This creates gencow/ai.ts and gencow/ai-image.ts with the pre-configured AI helper.

Runtime Contract

Application code must import the generated gencow/ai.ts helper:

import { ai } from "./ai";

ctx.ai is not part of the runtime context. It was removed with the legacy platform routes, so generated apps should not rely on autocomplete suggestions or older examples that call ctx.ai.chat(), ctx.ai.embed(), or ctx.ai.embedMany().

Deployed apps use OpenAI-compatible platform routes:

Capability	Platform route	Cloud support
Chat completions	`/platform/ai/v1/chat/completions`	Non-streaming only
Embeddings	`/platform/ai/v1/embeddings`	Supported
Image generation	`/platform/ai/v1/images/generations`	Single-image generation
Proxy health	`/platform/ai/health`	Supported

Cloud streaming status: ai.stream() can work in local/direct mode, but cloud AI proxy streaming is not supported yet. For deployed apps, use ai.chat() until the proxy adds streaming support.

Using AI in Mutations

Chat

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";

export const chat = procedure.mutation
    .name("chat.send")
    .input(v.object({
        messages: v.array(v.object({
            role: v.string(),
            content: v.string(),
        })),
    }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        const result = await ai.chat({
            system: "You are a concise support assistant.",
            messages: input.messages,
            model: "gpt-5.4-mini",
        });

        return {
            role: "assistant",
            content: result.text,
            usage: result.usage,
            creditsCharged: result.creditsCharged,
        };
    });

You can omit model to use the helper default. For new production workloads, pick an explicit model so quality/cost tradeoffs are intentional.

Image Generation

ai.image.generate() uses gpt-image-2 by default. In local development it calls OpenAI directly with OPENAI_API_KEY; in cloud it uses the Gencow AI proxy and charges service credits. The helper does not fall back from cloud proxy mode to direct OpenAI calls.

const icon = await ai.image.generate({
    prompt: "A clean app icon for a project management product",
    model: "gpt-image-2",
    size: "1024x1024",
    quality: "low",
    format: "png",
});

return {
    base64: icon.images[0].base64,
    mimeType: icon.images[0].mimeType,
    creditsCharged: icon.creditsCharged,
};

Model Selection Examples

// Highest-quality reasoning/coding path
await ai.chat({
    model: "gpt-5.5",
    messages: [{ role: "user", content: "Review this architecture..." }],
});

// Strong default for most production chat, coding, and agent steps
await ai.chat({
    model: "gpt-5.4-mini",
    messages: [{ role: "user", content: "Draft the reply." }],
});

// High-volume classification/extraction
await ai.chat({
    model: "gpt-5.4-nano",
    messages: [{ role: "user", content: "Classify this ticket." }],
});

// Compatibility with older generated examples
await ai.chat({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "안녕?" }],
});

Structured Output

Use ai.generateObject() when the handler needs typed JSON. Do not ask a model to return JSON text and then call JSON.parse().

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { z } from "zod";
import { ai } from "./ai";

export const analyze = procedure.mutation
    .name("tasks.analyze")
    .input(v.object({ text: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        const { object } = await ai.generateObject({
            model: "gpt-5.4-mini",
            system: "Analyze the text and extract structured data.",
            schema: z.object({
                sentiment: z.enum(["positive", "negative", "neutral"]),
                score: z.number().min(0).max(1),
                keywords: z.array(z.string()),
                summary: z.string(),
            }),
            prompt: input.text,
        });

        return object;
    });

Avoid ai.chat() + JSON.parse(): LLMs can include Markdown fences or violate the schema. ai.generateObject() uses SDK-level structured output and Zod validation.

Return Type

ai.chat() returns a structured result object, not a raw string:

const result = await ai.chat({ messages: [{ role: "user", content: "Hi" }] });

result.text              // AI response text
result.usage.totalTokens // Token usage
result.creditsCharged    // Service credits charged in cloud; 0 in local mode
result.model             // Model name used

// Wrong: result is an object
console.log(result);

// Correct
console.log(result.text);

Embeddings

const embedding = await ai.embed("Hello, world!");
// number[]; text-embedding-3-small currently returns 1536 dimensions

For bulk document search, prefer gencow add RAG and the canonical documents.ingest.* flow. It calls /platform/ai/v1/embeddings through the platform path and keeps indexing, metering, and visibility scope consistent.

API Key Setup

Local Development

Add your OpenAI key to .env:

OPENAI_API_KEY=sk-...

Local calls go directly to OpenAI and do not charge Gencow service credits.

Cloud Deployment

No provider key is required in tenant app code or tenant app environment variables. The platform injects GENCOW_AI_PROXY_URL, GENCOW_AI_PROXY_URL_ALT, and GENCOW_AI_PROXY_TOKEN at runtime.

Do not add OPENAI_API_KEY to a deployed tenant app to bypass the proxy. That breaks centralized key management, service-credit charging, and usage reporting.

Supported Models

The cloud proxy accepts active rows from the platform model_pricing table. The catalog below is seeded by the platform bootstrap and migrations 20260512_seed_current_openai_models.sql and 20260513_seed_openai_image_models.sql.

The displayed credit rates are base service credits per 1K tokens before plan markup. Internally, the gateway divides these rates by 1000 and multiplies by the actual input/output token counts.

Model	Best fit	Input cr / 1K	Output cr / 1K
`gpt-5.5`	Frontier reasoning, coding, complex professional work	50	300
`gpt-5.4`	High-quality coding/professional work at lower cost than 5.5	25	150
`gpt-5.4-mini`	Recommended strong default for production chat, coding, agents	7.5	45
`gpt-5.4-nano`	Simple high-volume extraction, ranking, classification	2	12.5
`gpt-5.3-chat-latest`	ChatGPT-style instant chat compatibility	17.5	140
`gpt-5.2`	Previous frontier reasoning/professional model	17.5	140
`gpt-5.2-chat-latest`	GPT-5.2 ChatGPT-style chat compatibility	17.5	140
`gpt-5.1`	Previous coding/agentic model	12.5	100
`gpt-5`	Previous reasoning/coding model	12.5	100
`gpt-5-mini`	Low-latency GPT-5-class tasks	2.5	20
`gpt-5-nano`	Cheapest GPT-5-class summarization/classification	0.5	4
`gpt-4.1`	Non-reasoning instruction following and tool calling	20	80
`gpt-4.1-mini`	Smaller/faster GPT-4.1 family model	4	16
`gpt-4.1-nano`	Very low-cost GPT-4.1 family model	1	4
`gpt-4o`	Legacy/compatibility multimodal chat workloads	25	100
`gpt-4o-mini`	Legacy low-cost default in older generated examples	1.5	6

Embedding models:

Model	Best fit	Input cr / 1K	Output cr / 1K
`text-embedding-3-small`	Default RAG/search embedding	0.2	0
`text-embedding-3-large`	Higher-quality embedding when cost is acceptable	1.3	0

Image models:

Model	Best fit	Notes
`gpt-image-2`	Default image generation path	Supports `quality: "low"` for cheaper smoke/testing
`gpt-image-1.5`	Compatibility/fallback image path	Uses legacy image size options
`gpt-image-1-mini`	Lower-cost image generation	Useful for smoke tests and budget-sensitive apps

Image generation supports single-image n=1 requests in MVP. moderation is server-gated to auto; low moderation is not exposed until platform policy and plan gating exist. The proxy validates provider b64_json output before returning it and rejects oversized images with a typed error.

Choosing a Model

Requirement	Recommended model
Best possible answer quality	`gpt-5.5`
Strong production default	`gpt-5.4-mini`
Cheaper strong model	`gpt-5-mini`
Cheapest high-volume GPT-5-class path	`gpt-5-nano` or `gpt-5.4-nano`
Fast non-reasoning tool/instruction workloads	`gpt-4.1-mini`
Existing 4o-era app compatibility	`gpt-4o-mini` or `gpt-4o`
Vector search/RAG embeddings	`text-embedding-3-small`
Image generation	`gpt-image-2`; use `gpt-image-1-mini` for lower-cost smoke tests

OpenAI recommends starting with the newest frontier model for complex reasoning and smaller variants for latency/cost-sensitive work. Gencow follows that shape but lets platform admins disable or reprice a model without tenant code changes.

Service-Credit Billing

Gencow uses service credits for AI and other provider-backed services.

Base calculation:

baseCredits =
  inputTokens  * inputCrPerToken +
  outputTokens * outputCrPerToken

creditsCharged = baseCredits * plan.serviceMarkup

Default plan markups:

Plan	AI service markup
Free	1.5x
Pro	1.0x
Scale	0.8x

Example with gpt-5.4-mini, Pro plan, 1,000 input tokens and 1,000 output tokens:

base = 7.5 + 45 = 52.5 credits
charged = 52.5 * 1.0 = 52.5 credits

If service credits are exhausted or a spend cap blocks the request, the proxy returns a 402 response before or after provider execution depending on whether a reservation was possible.

Image generation records ai_image_input_tokens, ai_image_output_tokens, and ai_image_count service-usage metrics so image costs do not mix into generic AI chat token rows.

Error Handling

Common cloud proxy responses:

Status	Cause	Fix
400	Unsupported `model`	Use an active model from `model_pricing`
400	`stream: true` through cloud proxy	Use `ai.chat()` for deployed apps
401/403	Missing or invalid proxy/app token	Redeploy or reinstall AI component
402	Service credits exhausted or spend cap exceeded	Charge credits, disable cap, or use cheaper model
413	Generated image exceeds platform byte cap	Retry with smaller/lower-quality output
500/502	Provider or platform transient failure	Retry with backoff; check platform health/logs

Do Not Call LLM APIs Directly

// Wrong: provider SDK/key management bypasses Gencow
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
await openai.chat.completions.create({
    model: "gpt-5.4-mini",
    messages: [{ role: "user", content: "Hello" }],
});

// Correct: generated Gencow AI helper
import { ai } from "./ai";
const result = await ai.chat({
    model: "gpt-5.4-mini",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(result.text);

Why Not Direct LLM Calls?

	Direct SDK	`import { ai }`
API key	Each app manages provider keys	Platform manages provider keys
Cloud billing	Not tracked by Gencow	Service-credit charging and usage snapshots
Spend caps	App must implement	Platform enforced
Supported model list	Hardcoded in app	Active `model_pricing` rows
RAG/Memory	App must assemble manually	`gencow add RAG` / `gencow add Memory`
Guardrails	App must assemble manually	`gencow add Guardrails`

Quick Decision Tree

Need AI?
    |
    |-- Chat / text response       -> gencow add AI -> ai.chat()
    |-- Typed JSON extraction      -> gencow add AI -> ai.generateObject({ schema })
    |-- Image generation           -> gencow add AI -> ai.image.generate()
    |-- Document search / RAG      -> gencow add RAG -> rag.retrieve() / ctx.search()
    |-- Agent memory               -> gencow add Memory -> memory.buildContext()
    |-- Safety filtering           -> gencow add Guardrails -> guardrails.validateInput()
    `-- Vector embedding           -> gencow add AI -> ai.embed()

# Do not install provider SDKs directly for app AI calls
npm install openai
npm install @anthropic-ai/sdk
npm install @google/generative-ai

# Use Gencow components
npx gencow@latest add AI
npx gencow@latest add RAG
npx gencow@latest add Memory

Next Steps

Components — All gencow add components
RAG & Memory — Document search and agent memory
Local Development — Local .env setup

Document Conversion Components