AI Engine
Built-in AI with import { ai } — chat, images, embeddings, structured output, model catalog, and service-credit billing
Gencow includes a built-in AI engine powered by the Vercel AI SDK and the Gencow Platform AI proxy. Application code should call the generated helper:
import { ai } from "./ai";In local development, the helper calls OpenAI directly with OPENAI_API_KEY.
After cloud deployment, the same code automatically uses the Gencow proxy, so
tenant apps do not manage provider API keys.
Quick Setup
# Add the AI component (installs ai + @ai-sdk/openai)
npx gencow@latest add AIThis creates gencow/ai.ts and gencow/ai-image.ts with the pre-configured
AI helper.
Runtime Contract
Application code must import the generated gencow/ai.ts helper:
import { ai } from "./ai";ctx.ai is not part of the runtime context. It was removed with the legacy
platform routes, so generated apps should not rely on autocomplete suggestions
or older examples that call ctx.ai.chat(), ctx.ai.embed(), or
ctx.ai.embedMany().
Deployed apps use OpenAI-compatible platform routes:
| Capability | Platform route | Cloud support |
|---|---|---|
| Chat completions | /platform/ai/v1/chat/completions |
Non-streaming only |
| Embeddings | /platform/ai/v1/embeddings |
Supported |
| Image generation | /platform/ai/v1/images/generations |
Single-image generation |
| Proxy health | /platform/ai/health |
Supported |
Cloud streaming status:
ai.stream()can work in local/direct mode, but cloud AI proxy streaming is not supported yet. For deployed apps, useai.chat()until the proxy adds streaming support.
Using AI in Mutations
Chat
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { ai } from "./ai";
export const chat = procedure.mutation
.name("chat.send")
.input(v.object({
messages: v.array(v.object({
role: v.string(),
content: v.string(),
})),
}))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
const result = await ai.chat({
system: "You are a concise support assistant.",
messages: input.messages,
model: "gpt-5.4-mini",
});
return {
role: "assistant",
content: result.text,
usage: result.usage,
creditsCharged: result.creditsCharged,
};
});You can omit model to use the helper default. For new production workloads,
pick an explicit model so quality/cost tradeoffs are intentional.
Image Generation
ai.image.generate() uses gpt-image-2 by default. In local development it
calls OpenAI directly with OPENAI_API_KEY; in cloud it uses the Gencow AI
proxy and charges service credits. The helper does not fall back from cloud
proxy mode to direct OpenAI calls.
const icon = await ai.image.generate({
prompt: "A clean app icon for a project management product",
model: "gpt-image-2",
size: "1024x1024",
quality: "low",
format: "png",
});
return {
base64: icon.images[0].base64,
mimeType: icon.images[0].mimeType,
creditsCharged: icon.creditsCharged,
};Model Selection Examples
// Highest-quality reasoning/coding path
await ai.chat({
model: "gpt-5.5",
messages: [{ role: "user", content: "Review this architecture..." }],
});
// Strong default for most production chat, coding, and agent steps
await ai.chat({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "Draft the reply." }],
});
// High-volume classification/extraction
await ai.chat({
model: "gpt-5.4-nano",
messages: [{ role: "user", content: "Classify this ticket." }],
});
// Compatibility with older generated examples
await ai.chat({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "안녕?" }],
});Structured Output
Use ai.generateObject() when the handler needs typed JSON. Do not ask a model
to return JSON text and then call JSON.parse().
import { v } from "@gencow/core";
import { procedure } from "./runtime";
import { z } from "zod";
import { ai } from "./ai";
export const analyze = procedure.mutation
.name("tasks.analyze")
.input(v.object({ text: v.string() }))
.handler(async ({ context: ctx, input }) => {
ctx.auth.requireAuth();
const { object } = await ai.generateObject({
model: "gpt-5.4-mini",
system: "Analyze the text and extract structured data.",
schema: z.object({
sentiment: z.enum(["positive", "negative", "neutral"]),
score: z.number().min(0).max(1),
keywords: z.array(z.string()),
summary: z.string(),
}),
prompt: input.text,
});
return object;
});Avoid
ai.chat() + JSON.parse(): LLMs can include Markdown fences or violate the schema.ai.generateObject()uses SDK-level structured output and Zod validation.
Return Type
ai.chat() returns a structured result object, not a raw string:
const result = await ai.chat({ messages: [{ role: "user", content: "Hi" }] });
result.text // AI response text
result.usage.totalTokens // Token usage
result.creditsCharged // Service credits charged in cloud; 0 in local mode
result.model // Model name used
// Wrong: result is an object
console.log(result);
// Correct
console.log(result.text);Embeddings
const embedding = await ai.embed("Hello, world!");
// number[]; text-embedding-3-small currently returns 1536 dimensionsFor bulk document search, prefer gencow add RAG and the canonical
documents.ingest.* flow. It calls /platform/ai/v1/embeddings through the
platform path and keeps indexing, metering, and visibility scope consistent.
API Key Setup
Local Development
Add your OpenAI key to .env:
OPENAI_API_KEY=sk-...Local calls go directly to OpenAI and do not charge Gencow service credits.
Cloud Deployment
No provider key is required in tenant app code or tenant app environment
variables. The platform injects GENCOW_AI_PROXY_URL,
GENCOW_AI_PROXY_URL_ALT, and GENCOW_AI_PROXY_TOKEN at runtime.
Do not add OPENAI_API_KEY to a deployed tenant app to bypass the proxy. That
breaks centralized key management, service-credit charging, and usage reporting.
Supported Models
The cloud proxy accepts active rows from the platform model_pricing table. The
catalog below is seeded by the platform bootstrap and migrations
20260512_seed_current_openai_models.sql and
20260513_seed_openai_image_models.sql.
The displayed credit rates are base service credits per 1K tokens before plan markup. Internally, the gateway divides these rates by 1000 and multiplies by the actual input/output token counts.
| Model | Best fit | Input cr / 1K | Output cr / 1K |
|---|---|---|---|
gpt-5.5 |
Frontier reasoning, coding, complex professional work | 50 | 300 |
gpt-5.4 |
High-quality coding/professional work at lower cost than 5.5 | 25 | 150 |
gpt-5.4-mini |
Recommended strong default for production chat, coding, agents | 7.5 | 45 |
gpt-5.4-nano |
Simple high-volume extraction, ranking, classification | 2 | 12.5 |
gpt-5.3-chat-latest |
ChatGPT-style instant chat compatibility | 17.5 | 140 |
gpt-5.2 |
Previous frontier reasoning/professional model | 17.5 | 140 |
gpt-5.2-chat-latest |
GPT-5.2 ChatGPT-style chat compatibility | 17.5 | 140 |
gpt-5.1 |
Previous coding/agentic model | 12.5 | 100 |
gpt-5 |
Previous reasoning/coding model | 12.5 | 100 |
gpt-5-mini |
Low-latency GPT-5-class tasks | 2.5 | 20 |
gpt-5-nano |
Cheapest GPT-5-class summarization/classification | 0.5 | 4 |
gpt-4.1 |
Non-reasoning instruction following and tool calling | 20 | 80 |
gpt-4.1-mini |
Smaller/faster GPT-4.1 family model | 4 | 16 |
gpt-4.1-nano |
Very low-cost GPT-4.1 family model | 1 | 4 |
gpt-4o |
Legacy/compatibility multimodal chat workloads | 25 | 100 |
gpt-4o-mini |
Legacy low-cost default in older generated examples | 1.5 | 6 |
Embedding models:
| Model | Best fit | Input cr / 1K | Output cr / 1K |
|---|---|---|---|
text-embedding-3-small |
Default RAG/search embedding | 0.2 | 0 |
text-embedding-3-large |
Higher-quality embedding when cost is acceptable | 1.3 | 0 |
Image models:
| Model | Best fit | Notes |
|---|---|---|
gpt-image-2 |
Default image generation path | Supports quality: "low" for cheaper smoke/testing |
gpt-image-1.5 |
Compatibility/fallback image path | Uses legacy image size options |
gpt-image-1-mini |
Lower-cost image generation | Useful for smoke tests and budget-sensitive apps |
Image generation supports single-image n=1 requests in MVP. moderation is
server-gated to auto; low moderation is not exposed until platform policy and
plan gating exist. The proxy validates provider b64_json output before
returning it and rejects oversized images with a typed error.
Choosing a Model
| Requirement | Recommended model |
|---|---|
| Best possible answer quality | gpt-5.5 |
| Strong production default | gpt-5.4-mini |
| Cheaper strong model | gpt-5-mini |
| Cheapest high-volume GPT-5-class path | gpt-5-nano or gpt-5.4-nano |
| Fast non-reasoning tool/instruction workloads | gpt-4.1-mini |
| Existing 4o-era app compatibility | gpt-4o-mini or gpt-4o |
| Vector search/RAG embeddings | text-embedding-3-small |
| Image generation | gpt-image-2; use gpt-image-1-mini for lower-cost smoke tests |
OpenAI recommends starting with the newest frontier model for complex reasoning and smaller variants for latency/cost-sensitive work. Gencow follows that shape but lets platform admins disable or reprice a model without tenant code changes.
Service-Credit Billing
Gencow uses service credits for AI and other provider-backed services.
Base calculation:
baseCredits =
inputTokens * inputCrPerToken +
outputTokens * outputCrPerToken
creditsCharged = baseCredits * plan.serviceMarkupDefault plan markups:
| Plan | AI service markup |
|---|---|
| Free | 1.5x |
| Pro | 1.0x |
| Scale | 0.8x |
Example with gpt-5.4-mini, Pro plan, 1,000 input tokens and 1,000 output
tokens:
base = 7.5 + 45 = 52.5 credits
charged = 52.5 * 1.0 = 52.5 creditsIf service credits are exhausted or a spend cap blocks the request, the proxy returns a 402 response before or after provider execution depending on whether a reservation was possible.
Image generation records ai_image_input_tokens, ai_image_output_tokens, and
ai_image_count service-usage metrics so image costs do not mix into generic AI
chat token rows.
Error Handling
Common cloud proxy responses:
| Status | Cause | Fix |
|---|---|---|
| 400 | Unsupported model |
Use an active model from model_pricing |
| 400 | stream: true through cloud proxy |
Use ai.chat() for deployed apps |
| 401/403 | Missing or invalid proxy/app token | Redeploy or reinstall AI component |
| 402 | Service credits exhausted or spend cap exceeded | Charge credits, disable cap, or use cheaper model |
| 413 | Generated image exceeds platform byte cap | Retry with smaller/lower-quality output |
| 500/502 | Provider or platform transient failure | Retry with backoff; check platform health/logs |
Do Not Call LLM APIs Directly
// Wrong: provider SDK/key management bypasses Gencow
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
await openai.chat.completions.create({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "Hello" }],
});
// Correct: generated Gencow AI helper
import { ai } from "./ai";
const result = await ai.chat({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "Hello" }],
});
console.log(result.text);Why Not Direct LLM Calls?
| Direct SDK | import { ai } |
|
|---|---|---|
| API key | Each app manages provider keys | Platform manages provider keys |
| Cloud billing | Not tracked by Gencow | Service-credit charging and usage snapshots |
| Spend caps | App must implement | Platform enforced |
| Supported model list | Hardcoded in app | Active model_pricing rows |
| RAG/Memory | App must assemble manually | gencow add RAG / gencow add Memory |
| Guardrails | App must assemble manually | gencow add Guardrails |
Quick Decision Tree
Need AI?
|
|-- Chat / text response -> gencow add AI -> ai.chat()
|-- Typed JSON extraction -> gencow add AI -> ai.generateObject({ schema })
|-- Image generation -> gencow add AI -> ai.image.generate()
|-- Document search / RAG -> gencow add RAG -> rag.retrieve() / ctx.search()
|-- Agent memory -> gencow add Memory -> memory.buildContext()
|-- Safety filtering -> gencow add Guardrails -> guardrails.validateInput()
`-- Vector embedding -> gencow add AI -> ai.embed()# Do not install provider SDKs directly for app AI calls
npm install openai
npm install @anthropic-ai/sdk
npm install @google/generative-ai
# Use Gencow components
npx gencow@latest add AI
npx gencow@latest add RAG
npx gencow@latest add MemoryNext Steps
- Components — All
gencow addcomponents - RAG & Memory — Document search and agent memory
- Local Development — Local
.envsetup