AI Engine

Built-in AI with import { ai } — chat, images, embeddings, structured output, model catalog, and service-credit billing

Gencow includes a built-in AI engine powered by the Vercel AI SDK and the Gencow Platform AI proxy. Application code should call the generated helper:

import { ai } from "./ai";

In local development, the helper calls OpenAI directly with OPENAI_API_KEY. After cloud deployment, the same code automatically uses the Gencow proxy, so tenant apps do not manage provider API keys.

Quick Setup

# Add the AI component (installs ai + @ai-sdk/openai)
npx gencow@latest add AI

This creates gencow/ai.ts and gencow/ai-image.ts with the pre-configured AI helper.

Runtime Contract

Application code must import the generated gencow/ai.ts helper:

import { ai } from "./ai";

ctx.ai is not part of the runtime context. It was removed with the legacy platform routes, so generated apps should not rely on autocomplete suggestions or older examples that call ctx.ai.chat(), ctx.ai.embed(), or ctx.ai.embedMany().

Deployed apps use OpenAI-compatible platform routes:

Capability Platform route Cloud support
Chat completions /platform/ai/v1/chat/completions Non-streaming only
Embeddings /platform/ai/v1/embeddings Supported
Image generation /platform/ai/v1/images/generations Single-image generation
Proxy health /platform/ai/health Supported

Cloud streaming status: ai.stream() can work in local/direct mode, but cloud AI proxy streaming is not supported yet. For deployed apps, use ai.chat() until the proxy adds streaming support.

Using AI in Mutations

Chat

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";

export const chat = procedure.mutation
    .name("chat.send")
    .input(v.object({
        messages: v.array(v.object({
            role: v.string(),
            content: v.string(),
        })),
    }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        const result = await ai.chat({
            system: "You are a concise support assistant.",
            messages: input.messages,
            model: "gpt-5.4-mini",
        });

        return {
            role: "assistant",
            content: result.text,
            usage: result.usage,
            creditsCharged: result.creditsCharged,
        };
    });

You can omit model to use the helper default. For new production workloads, pick an explicit model so quality/cost tradeoffs are intentional.

Image Generation

ai.image.generate() uses gpt-image-2 by default. In local development it calls OpenAI directly with OPENAI_API_KEY; in cloud it uses the Gencow AI proxy and charges service credits. The helper does not fall back from cloud proxy mode to direct OpenAI calls.

const icon = await ai.image.generate({
    prompt: "A clean app icon for a project management product",
    model: "gpt-image-2",
    size: "1024x1024",
    quality: "low",
    format: "png",
});

return {
    base64: icon.images[0].base64,
    mimeType: icon.images[0].mimeType,
    creditsCharged: icon.creditsCharged,
};

Model Selection Examples

// Highest-quality reasoning/coding path
await ai.chat({
    model: "gpt-5.5",
    messages: [{ role: "user", content: "Review this architecture..." }],
});

// Strong default for most production chat, coding, and agent steps
await ai.chat({
    model: "gpt-5.4-mini",
    messages: [{ role: "user", content: "Draft the reply." }],
});

// High-volume classification/extraction
await ai.chat({
    model: "gpt-5.4-nano",
    messages: [{ role: "user", content: "Classify this ticket." }],
});

// Compatibility with older generated examples
await ai.chat({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "안녕?" }],
});

Structured Output

Use ai.generateObject() when the handler needs typed JSON. Do not ask a model to return JSON text and then call JSON.parse().

import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { z } from "zod";
import { ai } from "./ai";

export const analyze = procedure.mutation
    .name("tasks.analyze")
    .input(v.object({ text: v.string() }))
    .handler(async ({ context: ctx, input }) => {
        ctx.auth.requireAuth();

        const { object } = await ai.generateObject({
            model: "gpt-5.4-mini",
            system: "Analyze the text and extract structured data.",
            schema: z.object({
                sentiment: z.enum(["positive", "negative", "neutral"]),
                score: z.number().min(0).max(1),
                keywords: z.array(z.string()),
                summary: z.string(),
            }),
            prompt: input.text,
        });

        return object;
    });

Avoid ai.chat() + JSON.parse(): LLMs can include Markdown fences or violate the schema. ai.generateObject() uses SDK-level structured output and Zod validation.

Return Type

ai.chat() returns a structured result object, not a raw string:

const result = await ai.chat({ messages: [{ role: "user", content: "Hi" }] });

result.text              // AI response text
result.usage.totalTokens // Token usage
result.creditsCharged    // Service credits charged in cloud; 0 in local mode
result.model             // Model name used

// Wrong: result is an object
console.log(result);

// Correct
console.log(result.text);

Embeddings

const embedding = await ai.embed("Hello, world!");
// number[]; text-embedding-3-small currently returns 1536 dimensions

For bulk document search, prefer gencow add RAG and the canonical documents.ingest.* flow. It calls /platform/ai/v1/embeddings through the platform path and keeps indexing, metering, and visibility scope consistent.

API Key Setup

Local Development

Add your OpenAI key to .env:

OPENAI_API_KEY=sk-...

Local calls go directly to OpenAI and do not charge Gencow service credits.

Cloud Deployment

No provider key is required in tenant app code or tenant app environment variables. The platform injects GENCOW_AI_PROXY_URL, GENCOW_AI_PROXY_URL_ALT, and GENCOW_AI_PROXY_TOKEN at runtime.

Do not add OPENAI_API_KEY to a deployed tenant app to bypass the proxy. That breaks centralized key management, service-credit charging, and usage reporting.

Supported Models

The cloud proxy accepts active rows from the platform model_pricing table. The catalog below is seeded by the platform bootstrap and migrations 20260512_seed_current_openai_models.sql and 20260513_seed_openai_image_models.sql.

The displayed credit rates are base service credits per 1K tokens before plan markup. Internally, the gateway divides these rates by 1000 and multiplies by the actual input/output token counts.

Model Best fit Input cr / 1K Output cr / 1K
gpt-5.5 Frontier reasoning, coding, complex professional work 50 300
gpt-5.4 High-quality coding/professional work at lower cost than 5.5 25 150
gpt-5.4-mini Recommended strong default for production chat, coding, agents 7.5 45
gpt-5.4-nano Simple high-volume extraction, ranking, classification 2 12.5
gpt-5.3-chat-latest ChatGPT-style instant chat compatibility 17.5 140
gpt-5.2 Previous frontier reasoning/professional model 17.5 140
gpt-5.2-chat-latest GPT-5.2 ChatGPT-style chat compatibility 17.5 140
gpt-5.1 Previous coding/agentic model 12.5 100
gpt-5 Previous reasoning/coding model 12.5 100
gpt-5-mini Low-latency GPT-5-class tasks 2.5 20
gpt-5-nano Cheapest GPT-5-class summarization/classification 0.5 4
gpt-4.1 Non-reasoning instruction following and tool calling 20 80
gpt-4.1-mini Smaller/faster GPT-4.1 family model 4 16
gpt-4.1-nano Very low-cost GPT-4.1 family model 1 4
gpt-4o Legacy/compatibility multimodal chat workloads 25 100
gpt-4o-mini Legacy low-cost default in older generated examples 1.5 6

Embedding models:

Model Best fit Input cr / 1K Output cr / 1K
text-embedding-3-small Default RAG/search embedding 0.2 0
text-embedding-3-large Higher-quality embedding when cost is acceptable 1.3 0

Image models:

Model Best fit Notes
gpt-image-2 Default image generation path Supports quality: "low" for cheaper smoke/testing
gpt-image-1.5 Compatibility/fallback image path Uses legacy image size options
gpt-image-1-mini Lower-cost image generation Useful for smoke tests and budget-sensitive apps

Image generation supports single-image n=1 requests in MVP. moderation is server-gated to auto; low moderation is not exposed until platform policy and plan gating exist. The proxy validates provider b64_json output before returning it and rejects oversized images with a typed error.

Choosing a Model

Requirement Recommended model
Best possible answer quality gpt-5.5
Strong production default gpt-5.4-mini
Cheaper strong model gpt-5-mini
Cheapest high-volume GPT-5-class path gpt-5-nano or gpt-5.4-nano
Fast non-reasoning tool/instruction workloads gpt-4.1-mini
Existing 4o-era app compatibility gpt-4o-mini or gpt-4o
Vector search/RAG embeddings text-embedding-3-small
Image generation gpt-image-2; use gpt-image-1-mini for lower-cost smoke tests

OpenAI recommends starting with the newest frontier model for complex reasoning and smaller variants for latency/cost-sensitive work. Gencow follows that shape but lets platform admins disable or reprice a model without tenant code changes.

Service-Credit Billing

Gencow uses service credits for AI and other provider-backed services.

Base calculation:

baseCredits =
  inputTokens  * inputCrPerToken +
  outputTokens * outputCrPerToken

creditsCharged = baseCredits * plan.serviceMarkup

Default plan markups:

Plan AI service markup
Free 1.5x
Pro 1.0x
Scale 0.8x

Example with gpt-5.4-mini, Pro plan, 1,000 input tokens and 1,000 output tokens:

base = 7.5 + 45 = 52.5 credits
charged = 52.5 * 1.0 = 52.5 credits

If service credits are exhausted or a spend cap blocks the request, the proxy returns a 402 response before or after provider execution depending on whether a reservation was possible.

Image generation records ai_image_input_tokens, ai_image_output_tokens, and ai_image_count service-usage metrics so image costs do not mix into generic AI chat token rows.

Error Handling

Common cloud proxy responses:

Status Cause Fix
400 Unsupported model Use an active model from model_pricing
400 stream: true through cloud proxy Use ai.chat() for deployed apps
401/403 Missing or invalid proxy/app token Redeploy or reinstall AI component
402 Service credits exhausted or spend cap exceeded Charge credits, disable cap, or use cheaper model
413 Generated image exceeds platform byte cap Retry with smaller/lower-quality output
500/502 Provider or platform transient failure Retry with backoff; check platform health/logs

Do Not Call LLM APIs Directly

// Wrong: provider SDK/key management bypasses Gencow
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
await openai.chat.completions.create({
    model: "gpt-5.4-mini",
    messages: [{ role: "user", content: "Hello" }],
});

// Correct: generated Gencow AI helper
import { ai } from "./ai";
const result = await ai.chat({
    model: "gpt-5.4-mini",
    messages: [{ role: "user", content: "Hello" }],
});
console.log(result.text);

Why Not Direct LLM Calls?

Direct SDK import { ai }
API key Each app manages provider keys Platform manages provider keys
Cloud billing Not tracked by Gencow Service-credit charging and usage snapshots
Spend caps App must implement Platform enforced
Supported model list Hardcoded in app Active model_pricing rows
RAG/Memory App must assemble manually gencow add RAG / gencow add Memory
Guardrails App must assemble manually gencow add Guardrails

Quick Decision Tree

Need AI?
    |
    |-- Chat / text response       -> gencow add AI -> ai.chat()
    |-- Typed JSON extraction      -> gencow add AI -> ai.generateObject({ schema })
    |-- Image generation           -> gencow add AI -> ai.image.generate()
    |-- Document search / RAG      -> gencow add RAG -> rag.retrieve() / ctx.search()
    |-- Agent memory               -> gencow add Memory -> memory.buildContext()
    |-- Safety filtering           -> gencow add Guardrails -> guardrails.validateInput()
    `-- Vector embedding           -> gencow add AI -> ai.embed()
# Do not install provider SDKs directly for app AI calls
npm install openai
npm install @anthropic-ai/sdk
npm install @google/generative-ai

# Use Gencow components
npx gencow@latest add AI
npx gencow@latest add RAG
npx gencow@latest add Memory

Next Steps