skip to content

openai — official OpenAI Node SDK

Package-level reference for openai on npm — Chat Completions, the Responses API, streaming, tool calls, structured outputs, embeddings, and the v4→v5 migration.

14 min read 28 snippets deep dive

openai#

What it is#

openai is the official JavaScript/TypeScript SDK for the OpenAI API — chat completions, the Responses API (the current preferred surface as of 2025+), streaming, tool/function calling, structured outputs (JSON Schema-enforced), embeddings, image generation (DALL-E / gpt-image-1), speech-to-text (Whisper), text-to-speech, vision input, fine-tuning, batch, files, and assistants.

It works on Node 18+, the browser (with dangerouslyAllowBrowser: true and proper key handling), Cloudflare Workers, Vercel Edge, Bun, and Deno. The v5 major (mid-2025) reorganised the API surface around the Responses primitive — a unified “give me a response” surface that subsumes Chat Completions, Tools, and Files into one consistent flow.

In 2026, the canonical pattern is client.responses.create({ model: "gpt-...", input: "..." }) — the Chat Completions endpoint (client.chat.completions.create) still works and is supported, but the Responses API is what new OpenAI features ship in first.

Install#

# npm / pnpm / yarn / bun
npm install openai
pnpm add openai
yarn add openai
bun add openai

Output: runtime dep. ~600 KB unpacked (large — includes all model/endpoint type definitions and shapes).

# Optional — zod for runtime validation of structured outputs
npm install zod

Output: Zod schemas plug into the SDK’s structured-output helpers.

# CLI (community, not official) — only the SDK is published officially
npx openai --help     # NOT a real subcommand; refer to docs

Output: no official CLI; the package is a library only.

Versioning & Node support#

  • Current major line is 5.x (stable since mid-2025) — introduces the Responses API as the headline, with the older Chat Completions still supported. Provides expanded streaming events, native browser support, and many type-shape changes.
  • 4.x is the previous major (the rewrite from ^3 axios-based to native fetch). Still supported via security backports; many existing codebases live here.
  • Node ≥18 required. Uses built-in fetch; older Node requires a polyfill.
  • Pure TS source compiled to dual ESM + CJS. Types bundled.
  • Always a runtime dependency — your code calls the API at runtime.
  • API versions are separate from SDK versions. OpenAI’s API has its own Date versioning (api-version: 2024-12-01). Pin the SDK against the API version expected by your features.

Package metadata#

Peer dependencies & extras#

PackagePurpose
zodStructured-output schemas — openai SDK has helpers for Zod.
@anthropic-ai/sdkSibling SDK; pattern is similar.
ai (Vercel AI SDK)Higher-level abstraction over openai + others. Pick when you want a unified multi-provider interface.
langchainHeavier orchestration layer; uses openai under the hood for OpenAI calls.
llamaindexRAG-focused; also uses openai.
tiktokenOpenAI’s tokenizer for counting tokens client-side.
gpt-tokenizerPure-JS tokenizer alternative.

Alternatives#

LibraryTrade-off
Bare fetchZero-dep. Manual streaming, manual error handling. Pick for tiny scripts that call one endpoint.
Vercel ai SDKProvider-agnostic (OpenAI, Anthropic, Google, etc.). Streaming primitives, React hooks. Pick for full-stack AI apps.
LangChain JSHeavyweight RAG/agent framework. Pick for complex multi-step pipelines.
LlamaIndexRAG-centric. Pick when retrieval is the focus.
Anthropic SDKFor Claude models. Different provider, same SDK pattern.
@google/genaiGemini.
openai-fetchThird-party tiny fetch wrapper around the OpenAI API. Pick to avoid the 600 KB bundle.

Common gotchas#

  1. Don’t put your API key in client-side code. Even with dangerouslyAllowBrowser: true, exposing the key is a key-exfiltration vector. Use server-side proxying for browser apps.
  2. Streaming requires for await consumption. const stream = await client.responses.create({ stream: true, ... }) returns an async iterable, not a Promise of all events. Looping over it with for await is the only consumption pattern.
  3. Token limits aren’t enforced client-side. Sending too many tokens returns an API error. Count with tiktoken (or a heuristic) before sending if you need predictable behaviour.
  4. responseschat.completions. The Responses API is a different surface — different request shape, different event types in streaming. Code written for chat.completions doesn’t drop in.
  5. Network errors get auto-retried by default. SDK retries 2× on 5xx / connection errors. Set maxRetries: 0 to disable; turn it up for flaky integrations.
  6. tool_calls arrive in chunks during streaming. Reassembly is your responsibility. The SDK provides helpers (stream.finalRunStep() etc.) but the raw event flow is fine-grained.

Real-world recipes#

Responses API — the canonical recipe (v5+)#

import OpenAI from "openai";

const client = new OpenAI();   // reads OPENAI_API_KEY from env

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "Write a haiku about TypeScript.",
});

console.log(response.output_text);

Output:

Types guide each step taken,
Compiler whispers gently,
Bugs found before run.

The Responses API is the simplest entry point in v5. output_text is a convenience field; the full structured output lives in response.output.

Chat Completions — still supported#

import OpenAI from "openai";

const client = new OpenAI();

const completion = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What's 2 + 2?" },
  ],
});

console.log(completion.choices[0].message.content);

Output:

2 + 2 = 4.

Chat Completions remains fully supported. New apps prefer responses; existing apps don’t need to migrate.

Streaming response#

import OpenAI from "openai";

const client = new OpenAI();

const stream = await client.responses.create({
  model: "gpt-4.1",
  input: "Count to 5 slowly.",
  stream: true,
});

for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}

Output:

One.
Two.
Three.
Four.
Five.

Streaming with the Responses API uses a typed event stream — response.output_text.delta for token chunks, response.completed at end, response.tool_call.created when tools are invoked.

For Chat Completions streaming (legacy):

const stream = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "Count to 5." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Tool / function calling#

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: "What's the weather in Tokyo?",
  tools: [
    {
      type: "function",
      name: "get_weather",
      description: "Get the current weather for a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  ],
});

// Check if the model called a tool
for (const output of response.output) {
  if (output.type === "function_call") {
    const args = JSON.parse(output.arguments);
    const weather = await fetchWeather(args.city);

    // Send the result back
    const followUp = await client.responses.create({
      model: "gpt-4.1",
      input: [
        { type: "function_call_output", call_id: output.call_id, output: JSON.stringify(weather) },
      ],
      previous_response_id: response.id,
    });

    console.log(followUp.output_text);
  }
}

Output:

The current weather in Tokyo is 22°C and clear.

Two API round-trips: (1) model calls the tool, (2) you send the result back via previous_response_id, model produces the final answer.

Structured outputs (JSON Schema)#

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

const Recipe = z.object({
  title: z.string(),
  ingredients: z.array(z.object({ name: z.string(), amount: z.string() })),
  steps: z.array(z.string()),
});

const completion = await client.chat.completions.parse({
  model: "gpt-4.1",
  messages: [
    { role: "user", content: "Give me a recipe for pancakes." },
  ],
  response_format: zodResponseFormat(Recipe, "recipe"),
});

const recipe = completion.choices[0].message.parsed;
console.log(recipe?.title);
console.log(recipe?.ingredients);

Output:

{
  "title": "Classic Pancakes",
  "ingredients": [
    { "name": "flour", "amount": "1.5 cups" },
    { "name": "milk", "amount": "1.25 cups" },
    { "name": "eggs", "amount": "1" }
  ],
  "steps": ["Whisk dry ingredients.", "Add wet ingredients.", "Cook on a griddle."]
}

parsed is typed against the Zod schema — fully type-safe structured output. The model is constrained at decode time to produce JSON matching the schema.

Embeddings#

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: ["The quick brown fox", "jumps over the lazy dog"],
});

console.log(response.data.length);                    // 2
console.log(response.data[0].embedding.length);       // 1536 (dimensions)
console.log(response.data[0].embedding.slice(0, 5));  // first 5 floats

Output:

2
1536
[0.013, -0.041, 0.022, 0.011, -0.008]

Embed in batches up to ~2048 inputs per request — much cheaper than one-at-a-time. Use text-embedding-3-small (1536-d, cheap) or text-embedding-3-large (3072-d, better quality).

Vision input#

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4.1",
  input: [
    {
      role: "user",
      content: [
        { type: "input_text", text: "What's in this image?" },
        { type: "input_image", image_url: "https://example.com/photo.jpg" },
      ],
    },
  ],
});

console.log(response.output_text);

Output:

The image shows a golden retriever sitting on a grassy field with mountains in the background.

Pass URLs (HTTPS only) or base64-encoded data: data:image/jpeg;base64,/9j/.... For local files, read and base64-encode.

Production deployment#

API key handling#

// Server-side — env var
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Never ship the API key in client-side bundles. For browser apps, proxy through your server:

// Client
const res = await fetch("/api/llm", { method: "POST", body: JSON.stringify({ prompt }) });

// Server (Next.js route, Express, etc.)
import OpenAI from "openai";
const client = new OpenAI();
export async function POST(req: Request) {
  const { prompt } = await req.json();
  const response = await client.responses.create({ model: "gpt-4.1", input: prompt });
  return Response.json({ text: response.output_text });
}

Timeouts and retries#

const client = new OpenAI({
  timeout: 60 * 1000,          // 60s per request
  maxRetries: 3,                // retry 3× on 5xx / connection errors
});

// Per-request override
const response = await client.responses.create(
  { model: "gpt-4.1", input: "..." },
  { timeout: 120 * 1000, maxRetries: 0 }
);

For long-running streams, increase per-request timeout. The SDK uses exponential backoff between retries.

Edge runtime#

The SDK works on Cloudflare Workers, Vercel Edge, Deno, Bun. Pass fetch explicitly if your runtime needs custom request handling:

const client = new OpenAI({
  apiKey: env.OPENAI_API_KEY,
  fetch: globalThis.fetch,
});

Rate limit handling#

The SDK retries on rate-limit (429) automatically. For batch jobs, use the Batch API (client.batches.create) — 50% cheaper, 24h SLA.

const batch = await client.batches.create({
  input_file_id: file.id,
  endpoint: "/v1/chat/completions",
  completion_window: "24h",
});

Performance tuning#

Pick the right model#

ModelLatencyCost (relative)Use
gpt-4.1 / gpt-5HighHighestComplex reasoning, tool use, vision
gpt-4.1-mini / gpt-5-miniMediumMidMost app workflows; great default
gpt-4.1-nano / gpt-5-nanoLowLowestSimple classification, light tasks
text-embedding-3-smallVery lowCheapEmbeddings (always use small unless you’ve measured)

(Specific names hedge — model lineups evolve quarterly; check the OpenAI docs for current canonical names.)

Streaming reduces perceived latency#

Streaming doesn’t reduce total latency, but the first-token latency is far lower than waiting for the full response. Always stream for user-facing chat.

Batch embeddings#

// Slow — 1 request per input
for (const text of texts) {
  await client.embeddings.create({ model: "...", input: text });
}

// Fast — 1 request for all
await client.embeddings.create({ model: "...", input: texts });

Batch up to ~2048 inputs per call.

Connection reuse#

The SDK uses Node’s keep-alive by default — no special config needed.

Token counting client-side#

import { encoding_for_model } from "tiktoken";

const enc = encoding_for_model("gpt-4");
const tokens = enc.encode("hello world").length;
enc.free();

Useful for staying under context windows; saves a round-trip vs API trial-and-error.

Version migration guide#

v3 → v4 (2023) — the axios → fetch rewrite#

  • Drop openai v3 axios shape.
  • New TS-first API with full typed responses.
  • ESM + CJS dual published.
// v3
const { Configuration, OpenAIApi } = require("openai");
const cfg = new Configuration({ apiKey: "..." });
const openai = new OpenAIApi(cfg);
const res = await openai.createChatCompletion({ model: "...", messages: [...] });

// v4
import OpenAI from "openai";
const client = new OpenAI({ apiKey: "..." });
const completion = await client.chat.completions.create({ model: "...", messages: [...] });

v4 → v5 (2025) — Responses API#

The v5 migration is the important one in 2026. New canonical surface:

Conceptv4v5
Simple responseclient.chat.completions.create({ messages: [...] })client.responses.create({ input: "..." }) (or messages still work)
Streamingchunk.choices[0].delta.contenttyped events: response.output_text.delta
Toolsmessages: [{ role: "tool", tool_call_id, content }]input: [{ type: "function_call_output", call_id, output }]
Structured outputresponse_format: { type: "json_schema" }same — works in both APIs
Multi-turnrebuild full history each timeprevious_response_id chains
File inputsFiles API + assistantsinput: [{ type: "input_file", file_id: "..." }]
// v4 — Chat Completions
const c = await client.chat.completions.create({
  model: "gpt-4",
  messages: [
    { role: "system", content: "..." },
    { role: "user", content: "Hi" },
  ],
});

// v5 — Responses
const r = await client.responses.create({
  model: "gpt-4.1",
  instructions: "...",
  input: "Hi",
});

Chat Completions still works in v5. Many teams stay on Chat Completions and adopt Responses incrementally. The migration is opt-in, not forced.

Things to watch when upgrading from v4 to v5:

  • Type names changed for many response shapes (Response is a global DOM type now used in the SDK).
  • Streaming event types are different — re-write stream consumers.
  • Tools: the request and result shapes differ; not a drop-in.
  • File handling: Responses API has its own input/output file types separate from the legacy Files API.

Stay on v4#

If you’re not adopting Responses-specific features (statefulness via previous_response_id, simpler streaming, image generation in-line), staying on v4 with Chat Completions is fine. Backports continue.

Security considerations#

  1. API key exfiltration is the #1 risk. Never put OPENAI_API_KEY in client bundles, environment files in repo, or browser localStorage. Use server-side proxying.
  2. Prompt injection. User input embedded in a system prompt can override instructions (“Ignore previous and …”). Sanitise — or rely on careful prompt engineering — for any user-input → system-prompt flow.
  3. Data exfiltration via tools. A model with tool access can be tricked into calling tools with attacker-controlled args. Filter tool arguments server-side; don’t let the model invoke arbitrary HTTP fetches.
  4. PII leakage. OpenAI’s data-retention policy varies by tier; check the policy for your account. For PII-heavy workloads, use the zero-retention tier or run an on-prem model.
  5. Rate-limit abuse. A buggy frontend can drain your monthly quota. Always wrap API calls with per-user rate limits.
  6. Untrusted tool outputs. If a tool returns attacker-controlled content (e.g. web search results), the model may follow instructions in it. Treat tool outputs like untrusted input.

Testing & CI integration#

import { describe, it, expect, vi } from "vitest";
import OpenAI from "openai";

describe("ai integration", () => {
  it("calls responses API", async () => {
    const mockCreate = vi.fn().mockResolvedValue({
      output_text: "Hello!",
      output: [{ type: "message", content: [{ type: "output_text", text: "Hello!" }] }],
    });

    const client = { responses: { create: mockCreate } } as unknown as OpenAI;
    const r = await client.responses.create({ model: "gpt-4.1", input: "Hi" });
    expect(r.output_text).toBe("Hello!");
  });
});

Output: mock the SDK at the method level. For integration tests, use a separate test key with strict spend limits.

For CI, set spend caps in the OpenAI dashboard — never run the SDK against prod keys in CI without limits.

Ecosystem integrations#

ToolIntegration
zodzodResponseFormat(schema) for structured outputs
ai (Vercel AI SDK)Higher-level streaming, React hooks; uses openai for OpenAI calls
langchainOpenAI provider for chains/agents
llamaindexOpenAI provider for RAG
tiktokenToken counting client-side
next.jsServer actions / route handlers — call SDK server-side
cloudflare-workersPass fetch: globalThis.fetch
vercel edgeWorks out of the box
mcp (Model Context Protocol)Pair with OpenAI’s MCP server APIs

Troubleshooting common errors#

  • 401 UnauthorizedOPENAI_API_KEY not set or invalid. Check env loading.
  • 429 Too Many Requests — rate limit hit. SDK auto-retries; for sustained load, upgrade tier or use Batch API.
  • 400: model does not exist — model name typo, or model was retired. Check the docs for current names.
  • context_length_exceeded — input + output > context window. Trim history, use a model with a larger window, or summarise.
  • Stream hangs — iterator never exits. Always have a timeout via AbortController or timeout: option.
  • TS: Cannot find module 'openai/helpers/zod' — bundler doesn’t honour subpath exports. Ensure modern Node TS resolution ("moduleResolution": "node16" or "bundler").
  • High latency on first request — connection setup. Warm the connection (no-op embedding call) at app start if startup latency matters.
  • output_text is undefined — model returned tool calls, not text. Inspect response.output[] for function_call items.

When NOT to use this#

  • You need a multi-provider abstraction. Use Vercel’s ai SDK — swap between OpenAI, Anthropic, Google with a config change.
  • You only need one endpoint and want a tiny bundle. Use bare fetch against the REST API — ~50 lines, zero deps. The SDK is ~600 KB.
  • You’re doing complex multi-step agent flows. Use LangChain JS or the Vercel AI SDK’s agents API — more orchestration scaffolding.
  • You’re using a local model (Ollama, LM Studio). They expose an OpenAI-compatible API but the SDK adds nothing — use fetch.
  • You need browser-side streaming with token auth. Use server-sent events from your backend; don’t put the SDK in the browser.

See also#