openai#
What it is#
openai is the official JavaScript/TypeScript SDK for the OpenAI API — chat completions, the Responses API (the current preferred surface as of 2025+), streaming, tool/function calling, structured outputs (JSON Schema-enforced), embeddings, image generation (DALL-E / gpt-image-1), speech-to-text (Whisper), text-to-speech, vision input, fine-tuning, batch, files, and assistants.
It works on Node 18+, the browser (with dangerouslyAllowBrowser: true and proper key handling), Cloudflare Workers, Vercel Edge, Bun, and Deno. The v5 major (mid-2025) reorganised the API surface around the Responses primitive — a unified “give me a response” surface that subsumes Chat Completions, Tools, and Files into one consistent flow.
In 2026, the canonical pattern is client.responses.create({ model: "gpt-...", input: "..." }) — the Chat Completions endpoint (client.chat.completions.create) still works and is supported, but the Responses API is what new OpenAI features ship in first.
Install#
# npm / pnpm / yarn / bun
npm install openai
pnpm add openai
yarn add openai
bun add openai
Output: runtime dep. ~600 KB unpacked (large — includes all model/endpoint type definitions and shapes).
# Optional — zod for runtime validation of structured outputs
npm install zod
Output: Zod schemas plug into the SDK’s structured-output helpers.
# CLI (community, not official) — only the SDK is published officially
npx openai --help # NOT a real subcommand; refer to docs
Output: no official CLI; the package is a library only.
Versioning & Node support#
- Current major line is
5.x(stable since mid-2025) — introduces the Responses API as the headline, with the older Chat Completions still supported. Provides expanded streaming events, native browser support, and many type-shape changes. 4.xis the previous major (the rewrite from^3axios-based to native fetch). Still supported via security backports; many existing codebases live here.- Node ≥18 required. Uses built-in
fetch; older Node requires a polyfill. - Pure TS source compiled to dual ESM + CJS. Types bundled.
- Always a runtime dependency — your code calls the API at runtime.
- API versions are separate from SDK versions. OpenAI’s API has its own
Dateversioning (api-version: 2024-12-01). Pin the SDK against the API version expected by your features.
Package metadata#
- Maintainer: OpenAI (
@openai) — official SDK - Project home: github.com/openai/openai-node
- Docs: platform.openai.com/docs
- npm: npmjs.com/package/openai
- License: Apache-2.0
- First released: 2022 (
v1thin wrapper); rewrites in 2023 (v4) and 2025 (v5). - Downloads: ~5 million per week — top AI-SDK by far on npm
Peer dependencies & extras#
| Package | Purpose |
|---|---|
zod | Structured-output schemas — openai SDK has helpers for Zod. |
@anthropic-ai/sdk | Sibling SDK; pattern is similar. |
ai (Vercel AI SDK) | Higher-level abstraction over openai + others. Pick when you want a unified multi-provider interface. |
langchain | Heavier orchestration layer; uses openai under the hood for OpenAI calls. |
llamaindex | RAG-focused; also uses openai. |
tiktoken | OpenAI’s tokenizer for counting tokens client-side. |
gpt-tokenizer | Pure-JS tokenizer alternative. |
Alternatives#
| Library | Trade-off |
|---|---|
Bare fetch | Zero-dep. Manual streaming, manual error handling. Pick for tiny scripts that call one endpoint. |
Vercel ai SDK | Provider-agnostic (OpenAI, Anthropic, Google, etc.). Streaming primitives, React hooks. Pick for full-stack AI apps. |
| LangChain JS | Heavyweight RAG/agent framework. Pick for complex multi-step pipelines. |
| LlamaIndex | RAG-centric. Pick when retrieval is the focus. |
| Anthropic SDK | For Claude models. Different provider, same SDK pattern. |
@google/genai | Gemini. |
openai-fetch | Third-party tiny fetch wrapper around the OpenAI API. Pick to avoid the 600 KB bundle. |
Common gotchas#
- Don’t put your API key in client-side code. Even with
dangerouslyAllowBrowser: true, exposing the key is a key-exfiltration vector. Use server-side proxying for browser apps. - Streaming requires
for awaitconsumption.const stream = await client.responses.create({ stream: true, ... })returns an async iterable, not a Promise of all events. Looping over it withfor awaitis the only consumption pattern. - Token limits aren’t enforced client-side. Sending too many tokens returns an API error. Count with
tiktoken(or a heuristic) before sending if you need predictable behaviour. responses≠chat.completions. The Responses API is a different surface — different request shape, different event types in streaming. Code written forchat.completionsdoesn’t drop in.- Network errors get auto-retried by default. SDK retries 2× on 5xx / connection errors. Set
maxRetries: 0to disable; turn it up for flaky integrations. tool_callsarrive in chunks during streaming. Reassembly is your responsibility. The SDK provides helpers (stream.finalRunStep()etc.) but the raw event flow is fine-grained.
Real-world recipes#
Responses API — the canonical recipe (v5+)#
import OpenAI from "openai";
const client = new OpenAI(); // reads OPENAI_API_KEY from env
const response = await client.responses.create({
model: "gpt-4.1",
input: "Write a haiku about TypeScript.",
});
console.log(response.output_text);
Output:
Types guide each step taken,
Compiler whispers gently,
Bugs found before run.
The Responses API is the simplest entry point in v5. output_text is a convenience field; the full structured output lives in response.output.
Chat Completions — still supported#
import OpenAI from "openai";
const client = new OpenAI();
const completion = await client.chat.completions.create({
model: "gpt-4.1",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What's 2 + 2?" },
],
});
console.log(completion.choices[0].message.content);
Output:
2 + 2 = 4.
Chat Completions remains fully supported. New apps prefer responses; existing apps don’t need to migrate.
Streaming response#
import OpenAI from "openai";
const client = new OpenAI();
const stream = await client.responses.create({
model: "gpt-4.1",
input: "Count to 5 slowly.",
stream: true,
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
Output:
One.
Two.
Three.
Four.
Five.
Streaming with the Responses API uses a typed event stream — response.output_text.delta for token chunks, response.completed at end, response.tool_call.created when tools are invoked.
For Chat Completions streaming (legacy):
const stream = await client.chat.completions.create({
model: "gpt-4.1",
messages: [{ role: "user", content: "Count to 5." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? "");
}
Tool / function calling#
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-4.1",
input: "What's the weather in Tokyo?",
tools: [
{
type: "function",
name: "get_weather",
description: "Get the current weather for a city.",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
},
],
});
// Check if the model called a tool
for (const output of response.output) {
if (output.type === "function_call") {
const args = JSON.parse(output.arguments);
const weather = await fetchWeather(args.city);
// Send the result back
const followUp = await client.responses.create({
model: "gpt-4.1",
input: [
{ type: "function_call_output", call_id: output.call_id, output: JSON.stringify(weather) },
],
previous_response_id: response.id,
});
console.log(followUp.output_text);
}
}
Output:
The current weather in Tokyo is 22°C and clear.
Two API round-trips: (1) model calls the tool, (2) you send the result back via previous_response_id, model produces the final answer.
Structured outputs (JSON Schema)#
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const client = new OpenAI();
const Recipe = z.object({
title: z.string(),
ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
steps: z.array(z.string()),
});
const completion = await client.chat.completions.parse({
model: "gpt-4.1",
messages: [
{ role: "user", content: "Give me a recipe for pancakes." },
],
response_format: zodResponseFormat(Recipe, "recipe"),
});
const recipe = completion.choices[0].message.parsed;
console.log(recipe?.title);
console.log(recipe?.ingredients);
Output:
{
"title": "Classic Pancakes",
"ingredients": [
{ "name": "flour", "amount": "1.5 cups" },
{ "name": "milk", "amount": "1.25 cups" },
{ "name": "eggs", "amount": "1" }
],
"steps": ["Whisk dry ingredients.", "Add wet ingredients.", "Cook on a griddle."]
}
parsed is typed against the Zod schema — fully type-safe structured output. The model is constrained at decode time to produce JSON matching the schema.
Embeddings#
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.embeddings.create({
model: "text-embedding-3-small",
input: ["The quick brown fox", "jumps over the lazy dog"],
});
console.log(response.data.length); // 2
console.log(response.data[0].embedding.length); // 1536 (dimensions)
console.log(response.data[0].embedding.slice(0, 5)); // first 5 floats
Output:
2
1536
[0.013, -0.041, 0.022, 0.011, -0.008]
Embed in batches up to ~2048 inputs per request — much cheaper than one-at-a-time. Use text-embedding-3-small (1536-d, cheap) or text-embedding-3-large (3072-d, better quality).
Vision input#
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-4.1",
input: [
{
role: "user",
content: [
{ type: "input_text", text: "What's in this image?" },
{ type: "input_image", image_url: "https://example.com/photo.jpg" },
],
},
],
});
console.log(response.output_text);
Output:
The image shows a golden retriever sitting on a grassy field with mountains in the background.
Pass URLs (HTTPS only) or base64-encoded data: data:image/jpeg;base64,/9j/.... For local files, read and base64-encode.
Production deployment#
API key handling#
// Server-side — env var
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
Never ship the API key in client-side bundles. For browser apps, proxy through your server:
// Client
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ prompt }) });
// Server (Next.js route, Express, etc.)
import OpenAI from "openai";
const client = new OpenAI();
export async function POST(req: Request) {
const { prompt } = await req.json();
const response = await client.responses.create({ model: "gpt-4.1", input: prompt });
return Response.json({ text: response.output_text });
}
Timeouts and retries#
const client = new OpenAI({
timeout: 60 * 1000, // 60s per request
maxRetries: 3, // retry 3× on 5xx / connection errors
});
// Per-request override
const response = await client.responses.create(
{ model: "gpt-4.1", input: "..." },
{ timeout: 120 * 1000, maxRetries: 0 }
);
For long-running streams, increase per-request timeout. The SDK uses exponential backoff between retries.
Edge runtime#
The SDK works on Cloudflare Workers, Vercel Edge, Deno, Bun. Pass fetch explicitly if your runtime needs custom request handling:
const client = new OpenAI({
apiKey: env.OPENAI_API_KEY,
fetch: globalThis.fetch,
});
Rate limit handling#
The SDK retries on rate-limit (429) automatically. For batch jobs, use the Batch API (client.batches.create) — 50% cheaper, 24h SLA.
const batch = await client.batches.create({
input_file_id: file.id,
endpoint: "/v1/chat/completions",
completion_window: "24h",
});
Performance tuning#
Pick the right model#
| Model | Latency | Cost (relative) | Use |
|---|---|---|---|
gpt-4.1 / gpt-5 | High | Highest | Complex reasoning, tool use, vision |
gpt-4.1-mini / gpt-5-mini | Medium | Mid | Most app workflows; great default |
gpt-4.1-nano / gpt-5-nano | Low | Lowest | Simple classification, light tasks |
text-embedding-3-small | Very low | Cheap | Embeddings (always use small unless you’ve measured) |
(Specific names hedge — model lineups evolve quarterly; check the OpenAI docs for current canonical names.)
Streaming reduces perceived latency#
Streaming doesn’t reduce total latency, but the first-token latency is far lower than waiting for the full response. Always stream for user-facing chat.
Batch embeddings#
// Slow — 1 request per input
for (const text of texts) {
await client.embeddings.create({ model: "...", input: text });
}
// Fast — 1 request for all
await client.embeddings.create({ model: "...", input: texts });
Batch up to ~2048 inputs per call.
Connection reuse#
The SDK uses Node’s keep-alive by default — no special config needed.
Token counting client-side#
import { encoding_for_model } from "tiktoken";
const enc = encoding_for_model("gpt-4");
const tokens = enc.encode("hello world").length;
enc.free();
Useful for staying under context windows; saves a round-trip vs API trial-and-error.
Version migration guide#
v3 → v4 (2023) — the axios → fetch rewrite#
- Drop
openaiv3 axios shape. - New TS-first API with full typed responses.
- ESM + CJS dual published.
// v3
const { Configuration, OpenAIApi } = require("openai");
const cfg = new Configuration({ apiKey: "..." });
const openai = new OpenAIApi(cfg);
const res = await openai.createChatCompletion({ model: "...", messages: [...] });
// v4
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const completion = await client.chat.completions.create({ model: "...", messages: [...] });
v4 → v5 (2025) — Responses API#
The v5 migration is the important one in 2026. New canonical surface:
| Concept | v4 | v5 |
|---|---|---|
| Simple response | client.chat.completions.create({ messages: [...] }) | client.responses.create({ input: "..." }) (or messages still work) |
| Streaming | chunk.choices[0].delta.content | typed events: response.output_text.delta |
| Tools | messages: [{ role: "tool", tool_call_id, content }] | input: [{ type: "function_call_output", call_id, output }] |
| Structured output | response_format: { type: "json_schema" } | same — works in both APIs |
| Multi-turn | rebuild full history each time | previous_response_id chains |
| File inputs | Files API + assistants | input: [{ type: "input_file", file_id: "..." }] |
// v4 — Chat Completions
const c = await client.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: "..." },
{ role: "user", content: "Hi" },
],
});
// v5 — Responses
const r = await client.responses.create({
model: "gpt-4.1",
instructions: "...",
input: "Hi",
});
Chat Completions still works in v5. Many teams stay on Chat Completions and adopt Responses incrementally. The migration is opt-in, not forced.
Things to watch when upgrading from v4 to v5:
- Type names changed for many response shapes (
Responseis a global DOM type now used in the SDK). - Streaming event types are different — re-write stream consumers.
- Tools: the request and result shapes differ; not a drop-in.
- File handling: Responses API has its own input/output file types separate from the legacy Files API.
Stay on v4#
If you’re not adopting Responses-specific features (statefulness via previous_response_id, simpler streaming, image generation in-line), staying on v4 with Chat Completions is fine. Backports continue.
Security considerations#
- API key exfiltration is the #1 risk. Never put
OPENAI_API_KEYin client bundles, environment files in repo, or browser localStorage. Use server-side proxying. - Prompt injection. User input embedded in a system prompt can override instructions (“Ignore previous and …”). Sanitise — or rely on careful prompt engineering — for any user-input → system-prompt flow.
- Data exfiltration via tools. A model with tool access can be tricked into calling tools with attacker-controlled args. Filter tool arguments server-side; don’t let the model invoke arbitrary HTTP fetches.
- PII leakage. OpenAI’s data-retention policy varies by tier; check the policy for your account. For PII-heavy workloads, use the zero-retention tier or run an on-prem model.
- Rate-limit abuse. A buggy frontend can drain your monthly quota. Always wrap API calls with per-user rate limits.
- Untrusted tool outputs. If a tool returns attacker-controlled content (e.g. web search results), the model may follow instructions in it. Treat tool outputs like untrusted input.
Testing & CI integration#
import { describe, it, expect, vi } from "vitest";
import OpenAI from "openai";
describe("ai integration", () => {
it("calls responses API", async () => {
const mockCreate = vi.fn().mockResolvedValue({
output_text: "Hello!",
output: [{ type: "message", content: [{ type: "output_text", text: "Hello!" }] }],
});
const client = { responses: { create: mockCreate } } as unknown as OpenAI;
const r = await client.responses.create({ model: "gpt-4.1", input: "Hi" });
expect(r.output_text).toBe("Hello!");
});
});
Output: mock the SDK at the method level. For integration tests, use a separate test key with strict spend limits.
For CI, set spend caps in the OpenAI dashboard — never run the SDK against prod keys in CI without limits.
Ecosystem integrations#
| Tool | Integration |
|---|---|
zod | zodResponseFormat(schema) for structured outputs |
ai (Vercel AI SDK) | Higher-level streaming, React hooks; uses openai for OpenAI calls |
langchain | OpenAI provider for chains/agents |
llamaindex | OpenAI provider for RAG |
tiktoken | Token counting client-side |
next.js | Server actions / route handlers — call SDK server-side |
cloudflare-workers | Pass fetch: globalThis.fetch |
vercel edge | Works out of the box |
mcp (Model Context Protocol) | Pair with OpenAI’s MCP server APIs |
Troubleshooting common errors#
401 Unauthorized—OPENAI_API_KEYnot set or invalid. Check env loading.429 Too Many Requests— rate limit hit. SDK auto-retries; for sustained load, upgrade tier or use Batch API.400: model does not exist— model name typo, or model was retired. Check the docs for current names.context_length_exceeded— input + output > context window. Trim history, use a model with a larger window, or summarise.- Stream hangs — iterator never exits. Always have a timeout via
AbortControllerortimeout:option. - TS:
Cannot find module 'openai/helpers/zod'— bundler doesn’t honour subpath exports. Ensure modern Node TS resolution ("moduleResolution": "node16"or"bundler"). - High latency on first request — connection setup. Warm the connection (no-op embedding call) at app start if startup latency matters.
output_textis undefined — model returned tool calls, not text. Inspectresponse.output[]forfunction_callitems.
When NOT to use this#
- You need a multi-provider abstraction. Use Vercel’s
aiSDK — swap between OpenAI, Anthropic, Google with a config change. - You only need one endpoint and want a tiny bundle. Use bare
fetchagainst the REST API — ~50 lines, zero deps. The SDK is ~600 KB. - You’re doing complex multi-step agent flows. Use LangChain JS or the Vercel AI SDK’s agents API — more orchestration scaffolding.
- You’re using a local model (Ollama, LM Studio). They expose an OpenAI-compatible API but the SDK adds nothing — use
fetch. - You need browser-side streaming with token auth. Use server-sent events from your backend; don’t put the SDK in the browser.
See also#
- Concept: agents — tool-using LLM patterns, multi-step flows
- Concept: API — API key handling, rate limiting, retries
- JavaScript: fetch — what the SDK uses under the hood