openai#

What it is#

openai is the official JavaScript/TypeScript SDK for the OpenAI API — chat completions, the Responses API (the current preferred surface as of 2025+), streaming, tool/function calling, structured outputs (JSON Schema-enforced), embeddings, image generation (DALL-E / gpt-image-1), speech-to-text (Whisper), text-to-speech, vision input, fine-tuning, batch, files, and assistants.

It works on Node 18+, the browser (with dangerouslyAllowBrowser: true and proper key handling), Cloudflare Workers, Vercel Edge, Bun, and Deno. The v5 major (mid-2025) reorganised the API surface around the Responses primitive — a unified “give me a response” surface that subsumes Chat Completions, Tools, and Files into one consistent flow.

In 2026, the canonical pattern is client.responses.create({ model: "gpt-...", input: "..." }) — the Chat Completions endpoint (client.chat.completions.create) still works and is supported, but the Responses API is what new OpenAI features ship in first.

Install#

# npm / pnpm / yarn / bun
npm install openai
pnpm add openai
yarn add openai
bun add openai

Output: runtime dep. ~600 KB unpacked (large — includes all model/endpoint type definitions and shapes).

# Optional — zod for runtime validation of structured outputs
npm install zod

Output: Zod schemas plug into the SDK’s structured-output helpers.

# CLI (community, not official) — only the SDK is published officially
npx openai --help     # NOT a real subcommand; refer to docs

Output: no official CLI; the package is a library only.

Versioning & Node support#

Current major line is 5.x (stable since mid-2025) — introduces the Responses API as the headline, with the older Chat Completions still supported. Provides expanded streaming events, native browser support, and many type-shape changes.
4.x is the previous major (the rewrite from ^3 axios-based to native fetch). Still supported via security backports; many existing codebases live here.
Node ≥18 required. Uses built-in fetch; older Node requires a polyfill.
Pure TS source compiled to dual ESM + CJS. Types bundled.
Always a runtime dependency — your code calls the API at runtime.
API versions are separate from SDK versions. OpenAI’s API has its own Date versioning (api-version: 2024-12-01). Pin the SDK against the API version expected by your features.

Package metadata#

Maintainer: OpenAI (@openai) — official SDK
Project home: github.com/openai/openai-node
Docs: platform.openai.com/docs
npm: npmjs.com/package/openai
License: Apache-2.0
First released: 2022 (v1 thin wrapper); rewrites in 2023 (v4) and 2025 (v5).
Downloads: ~5 million per week — top AI-SDK by far on npm

Peer dependencies & extras#

Package	Purpose
`zod`	Structured-output schemas — `openai` SDK has helpers for Zod.
`@anthropic-ai/sdk`	Sibling SDK; pattern is similar.
`ai` (Vercel AI SDK)	Higher-level abstraction over `openai` + others. Pick when you want a unified multi-provider interface.
`langchain`	Heavier orchestration layer; uses `openai` under the hood for OpenAI calls.
`llamaindex`	RAG-focused; also uses `openai`.
`tiktoken`	OpenAI’s tokenizer for counting tokens client-side.
`gpt-tokenizer`	Pure-JS tokenizer alternative.

Alternatives#

Library	Trade-off
Bare `fetch`	Zero-dep. Manual streaming, manual error handling. Pick for tiny scripts that call one endpoint.
Vercel `ai` SDK	Provider-agnostic (OpenAI, Anthropic, Google, etc.). Streaming primitives, React hooks. Pick for full-stack AI apps.
LangChain JS	Heavyweight RAG/agent framework. Pick for complex multi-step pipelines.
LlamaIndex	RAG-centric. Pick when retrieval is the focus.
Anthropic SDK	For Claude models. Different provider, same SDK pattern.
`@google/genai`	Gemini.
`openai-fetch`	Third-party tiny fetch wrapper around the OpenAI API. Pick to avoid the 600 KB bundle.

Common gotchas#

Don’t put your API key in client-side code. Even with dangerouslyAllowBrowser: true, exposing the key is a key-exfiltration vector. Use server-side proxying for browser apps.
Streaming requires for await consumption. const stream = await client.responses.create({ stream: true, ... }) returns an async iterable, not a Promise of all events. Looping over it with for await is the only consumption pattern.
Token limits aren’t enforced client-side. Sending too many tokens returns an API error. Count with tiktoken (or a heuristic) before sending if you need predictable behaviour.
responses ≠ chat.completions. The Responses API is a different surface — different request shape, different event types in streaming. Code written for chat.completions doesn’t drop in.
Network errors get auto-retried by default. SDK retries 2× on 5xx / connection errors. Set maxRetries: 0 to disable; turn it up for flaky integrations.
tool_calls arrive in chunks during streaming. Reassembly is your responsibility. The SDK provides helpers (stream.finalRunStep() etc.) but the raw event flow is fine-grained.

Real-world recipes#

Responses API — the canonical recipe (v5+)#

import OpenAI from "openai";

const client = new OpenAI();   // reads OPENAI_API_KEY from env

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "Write a haiku about TypeScript.",
});

console.log(response.output_text);

Output:

Types guide each step taken,
Compiler whispers gently,
Bugs found before run.

The Responses API is the simplest entry point in v5. output_text is a convenience field; the full structured output lives in response.output.

Chat Completions — still supported#

import OpenAI from "openai";

const client = new OpenAI();

const completion = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What's 2 + 2?" },
  ],
});

console.log(completion.choices[0].message.content);

Output:

2 + 2 = 4.

Chat Completions remains fully supported. New apps prefer responses; existing apps don’t need to migrate.

Streaming response#

import OpenAI from "openai";

const client = new OpenAI();

const stream = await client.responses.create({
  model: "gpt-4.1",
  input: "Count to 5 slowly.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

Output:

One.
Two.
Three.
Four.
Five.

Streaming with the Responses API uses a typed event stream — response.output_text.delta for token chunks, response.completed at end, response.tool_call.created when tools are invoked.

For Chat Completions streaming (legacy):

const stream = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "Count to 5." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Tool / function calling#

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get the current weather for a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  ],
});

// Check if the model called a tool
for (const output of response.output) {
  if (output.type === "function_call") {
    const args = JSON.parse(output.arguments);
    const weather = await fetchWeather(args.city);

    // Send the result back
    const followUp = await client.responses.create({
      model: "gpt-4.1",
      input: [
        { type: "function_call_output", call_id: output.call_id, output: JSON.stringify(weather) },
      ],
      previous_response_id: response.id,
    });

    console.log(followUp.output_text);
  }
}

Output:

The current weather in Tokyo is 22°C and clear.

Two API round-trips: (1) model calls the tool, (2) you send the result back via previous_response_id, model produces the final answer.

Structured outputs (JSON Schema)#

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const Recipe = z.object({
  title: z.string(),
  ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
  steps: z.array(z.string()),
});

const completion = await client.chat.completions.parse({
  model: "gpt-4.1",
  messages: [
    { role: "user", content: "Give me a recipe for pancakes." },
  ],
  response_format: zodResponseFormat(Recipe, "recipe"),
});

const recipe = completion.choices[0].message.parsed;
console.log(recipe?.title);
console.log(recipe?.ingredients);

Output:

{
  "title": "Classic Pancakes",
  "ingredients": [
    { "name": "flour", "amount": "1.5 cups" },
    { "name": "milk", "amount": "1.25 cups" },
    { "name": "eggs", "amount": "1" }
  ],
  "steps": ["Whisk dry ingredients.", "Add wet ingredients.", "Cook on a griddle."]
}

parsed is typed against the Zod schema — fully type-safe structured output. The model is constrained at decode time to produce JSON matching the schema.

Embeddings#

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["The quick brown fox", "jumps over the lazy dog"],
});

console.log(response.data.length);                    // 2
console.log(response.data[0].embedding.length);       // 1536 (dimensions)
console.log(response.data[0].embedding.slice(0, 5));  // first 5 floats

Output:

2
1536
[0.013, -0.041, 0.022, 0.011, -0.008]

Embed in batches up to ~2048 inputs per request — much cheaper than one-at-a-time. Use text-embedding-3-small (1536-d, cheap) or text-embedding-3-large (3072-d, better quality).

Vision input#

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: [
    {
      role: "user",
      content: [
        { type: "input_text", text: "What's in this image?" },
        { type: "input_image", image_url: "https://example.com/photo.jpg" },
      ],
    },
  ],
});

console.log(response.output_text);

Output:

The image shows a golden retriever sitting on a grassy field with mountains in the background.

Pass URLs (HTTPS only) or base64-encoded data: data:image/jpeg;base64,/9j/.... For local files, read and base64-encode.

Production deployment#

API key handling#

// Server-side — env var
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Never ship the API key in client-side bundles. For browser apps, proxy through your server:

// Client
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ prompt }) });

// Server (Next.js route, Express, etc.)
import OpenAI from "openai";
const client = new OpenAI();
export async function POST(req: Request) {
  const { prompt } = await req.json();
  const response = await client.responses.create({ model: "gpt-4.1", input: prompt });
  return Response.json({ text: response.output_text });
}

Timeouts and retries#

const client = new OpenAI({
  timeout: 60 * 1000,          // 60s per request
  maxRetries: 3,                // retry 3× on 5xx / connection errors
});

// Per-request override
const response = await client.responses.create(
  { model: "gpt-4.1", input: "..." },
  { timeout: 120 * 1000, maxRetries: 0 }
);

For long-running streams, increase per-request timeout. The SDK uses exponential backoff between retries.

Edge runtime#

The SDK works on Cloudflare Workers, Vercel Edge, Deno, Bun. Pass fetch explicitly if your runtime needs custom request handling:

const client = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  fetch: globalThis.fetch,
});

Rate limit handling#

The SDK retries on rate-limit (429) automatically. For batch jobs, use the Batch API (client.batches.create) — 50% cheaper, 24h SLA.

const batch = await client.batches.create({
  input_file_id: file.id,
  endpoint: "/v1/chat/completions",
  completion_window: "24h",
});

Performance tuning#

Pick the right model#

Model	Latency	Cost (relative)	Use
`gpt-4.1` / `gpt-5`	High	Highest	Complex reasoning, tool use, vision
`gpt-4.1-mini` / `gpt-5-mini`	Medium	Mid	Most app workflows; great default
`gpt-4.1-nano` / `gpt-5-nano`	Low	Lowest	Simple classification, light tasks
`text-embedding-3-small`	Very low	Cheap	Embeddings (always use small unless you’ve measured)

(Specific names hedge — model lineups evolve quarterly; check the OpenAI docs for current canonical names.)

Streaming reduces perceived latency#

Streaming doesn’t reduce total latency, but the first-token latency is far lower than waiting for the full response. Always stream for user-facing chat.

Batch embeddings#

// Slow — 1 request per input
for (const text of texts) {
  await client.embeddings.create({ model: "...", input: text });
}

// Fast — 1 request for all
await client.embeddings.create({ model: "...", input: texts });

Batch up to ~2048 inputs per call.

Connection reuse#

The SDK uses Node’s keep-alive by default — no special config needed.

Token counting client-side#

import { encoding_for_model } from "tiktoken";

const enc = encoding_for_model("gpt-4");
const tokens = enc.encode("hello world").length;
enc.free();

Useful for staying under context windows; saves a round-trip vs API trial-and-error.

Version migration guide#

v3 → v4 (2023) — the axios → fetch rewrite#

Drop openai v3 axios shape.
New TS-first API with full typed responses.
ESM + CJS dual published.

// v3
const { Configuration, OpenAIApi } = require("openai");
const cfg = new Configuration({ apiKey: "..." });
const openai = new OpenAIApi(cfg);
const res = await openai.createChatCompletion({ model: "...", messages: [...] });

// v4
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const completion = await client.chat.completions.create({ model: "...", messages: [...] });

v4 → v5 (2025) — Responses API#

The v5 migration is the important one in 2026. New canonical surface:

Concept	v4	v5
Simple response	`client.chat.completions.create({ messages: [...] })`	`client.responses.create({ input: "..." })` (or messages still work)
Streaming	`chunk.choices[0].delta.content`	typed events: `response.output_text.delta`
Tools	`messages: [{ role: "tool", tool_call_id, content }]`	`input: [{ type: "function_call_output", call_id, output }]`
Structured output	`response_format: { type: "json_schema" }`	same — works in both APIs
Multi-turn	rebuild full history each time	`previous_response_id` chains
File inputs	Files API + assistants	`input: [{ type: "input_file", file_id: "..." }]`

// v4 — Chat Completions
const c = await client.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "..." },
    { role: "user", content: "Hi" },
  ],
});

// v5 — Responses
const r = await client.responses.create({
  model: "gpt-4.1",
  instructions: "...",
  input: "Hi",
});

Chat Completions still works in v5. Many teams stay on Chat Completions and adopt Responses incrementally. The migration is opt-in, not forced.

Things to watch when upgrading from v4 to v5:

Type names changed for many response shapes (Response is a global DOM type now used in the SDK).
Streaming event types are different — re-write stream consumers.
Tools: the request and result shapes differ; not a drop-in.
File handling: Responses API has its own input/output file types separate from the legacy Files API.

Stay on v4#

If you’re not adopting Responses-specific features (statefulness via previous_response_id, simpler streaming, image generation in-line), staying on v4 with Chat Completions is fine. Backports continue.

Security considerations#

API key exfiltration is the #1 risk. Never put OPENAI_API_KEY in client bundles, environment files in repo, or browser localStorage. Use server-side proxying.
Prompt injection. User input embedded in a system prompt can override instructions (“Ignore previous and …”). Sanitise — or rely on careful prompt engineering — for any user-input → system-prompt flow.
Data exfiltration via tools. A model with tool access can be tricked into calling tools with attacker-controlled args. Filter tool arguments server-side; don’t let the model invoke arbitrary HTTP fetches.
PII leakage. OpenAI’s data-retention policy varies by tier; check the policy for your account. For PII-heavy workloads, use the zero-retention tier or run an on-prem model.
Rate-limit abuse. A buggy frontend can drain your monthly quota. Always wrap API calls with per-user rate limits.
Untrusted tool outputs. If a tool returns attacker-controlled content (e.g. web search results), the model may follow instructions in it. Treat tool outputs like untrusted input.

Testing & CI integration#

import { describe, it, expect, vi } from "vitest";
import OpenAI from "openai";

describe("ai integration", () => {
  it("calls responses API", async () => {
    const mockCreate = vi.fn().mockResolvedValue({
      output_text: "Hello!",
      output: [{ type: "message", content: [{ type: "output_text", text: "Hello!" }] }],
    });

    const client = { responses: { create: mockCreate } } as unknown as OpenAI;
    const r = await client.responses.create({ model: "gpt-4.1", input: "Hi" });
    expect(r.output_text).toBe("Hello!");
  });
});

Output: mock the SDK at the method level. For integration tests, use a separate test key with strict spend limits.

For CI, set spend caps in the OpenAI dashboard — never run the SDK against prod keys in CI without limits.

Ecosystem integrations#

Tool	Integration
`zod`	`zodResponseFormat(schema)` for structured outputs
`ai` (Vercel AI SDK)	Higher-level streaming, React hooks; uses `openai` for OpenAI calls
`langchain`	OpenAI provider for chains/agents
`llamaindex`	OpenAI provider for RAG
`tiktoken`	Token counting client-side
`next.js`	Server actions / route handlers — call SDK server-side
`cloudflare-workers`	Pass `fetch: globalThis.fetch`
`vercel edge`	Works out of the box
`mcp` (Model Context Protocol)	Pair with OpenAI’s MCP server APIs

Troubleshooting common errors#

401 Unauthorized — OPENAI_API_KEY not set or invalid. Check env loading.
429 Too Many Requests — rate limit hit. SDK auto-retries; for sustained load, upgrade tier or use Batch API.
400: model does not exist — model name typo, or model was retired. Check the docs for current names.
context_length_exceeded — input + output > context window. Trim history, use a model with a larger window, or summarise.
Stream hangs — iterator never exits. Always have a timeout via AbortController or timeout: option.
TS: Cannot find module 'openai/helpers/zod' — bundler doesn’t honour subpath exports. Ensure modern Node TS resolution ("moduleResolution": "node16" or "bundler").
High latency on first request — connection setup. Warm the connection (no-op embedding call) at app start if startup latency matters.
output_text is undefined — model returned tool calls, not text. Inspect response.output[] for function_call items.

When NOT to use this#

You need a multi-provider abstraction. Use Vercel’s ai SDK — swap between OpenAI, Anthropic, Google with a config change.
You only need one endpoint and want a tiny bundle. Use bare fetch against the REST API — ~50 lines, zero deps. The SDK is ~600 KB.
You’re doing complex multi-step agent flows. Use LangChain JS or the Vercel AI SDK’s agents API — more orchestration scaffolding.
You’re using a local model (Ollama, LM Studio). They expose an OpenAI-compatible API but the SDK adds nothing — use fetch.
You need browser-side streaming with token auth. Use server-sent events from your backend; don’t put the SDK in the browser.

g h	home
g p	Programming section
g p	Python section
g j	JavaScript section
g t	TypeScript section
g o	OS section
g l	Linux section
g w	Windows section
g z	z/OS section
g o	macOS section
g a	AI section
g c	Claude Code section
g c	Codex CLI section
g c	Claude API section
g p	Prompting section
g f	Frameworks section
g p	Packages section
g p	Pip (Python) section
g p	npm (Node) section
g p	Cargo (Rust) section
g p	Go modules section
g g	graph view
g t	tags index

⌘K / /	open search palette
t	cycle theme (dark → light → system)
?	toggle this panel

[ / ]	previous / next sheet in section
j / k	scroll down / up

openai — official OpenAI Node SDK

openai#

What it is#

Install#

Versioning & Node support#

Package metadata#

Peer dependencies & extras#

Alternatives#

Common gotchas#

Real-world recipes#

Responses API — the canonical recipe (v5+)#

Chat Completions — still supported#

Streaming response#

Tool / function calling#

Structured outputs (JSON Schema)#

Embeddings#

Vision input#

Production deployment#

API key handling#

Timeouts and retries#

Edge runtime#

Rate limit handling#

Performance tuning#

Pick the right model#

Streaming reduces perceived latency#

Batch embeddings#

Connection reuse#

Token counting client-side#

Version migration guide#

v3 → v4 (2023) — the axios → fetch rewrite#

v4 → v5 (2025) — Responses API#

Stay on v4#

Security considerations#

Testing & CI integration#

Ecosystem integrations#

Troubleshooting common errors#

When NOT to use this#

See also#