Skip to main content
@voightxyz/openai instruments the official OpenAI Node SDK. Wrap your client once and every chat.completions.create or responses.create call — non-streaming or streaming — lands in Voight with prompts, tokens, cache reads, tool calls, latency, and errors. This is the same SDK-instrumentation model Sentry / Vercel AI / LangChain use. Same backend, same dashboard as @voightxyz/anthropic and library mode — events from all three land side-by-side under the same agent. From the root of your app:
npx -y @voightxyz/sdk init
The wizard detects openai in your package.json, prompts for your Voight key + privacy level, validates the key, and writes a ready-to-import src/lib/voight.ts with the wrapped client. 30 seconds, zero copy-paste. Continue below if you’d rather wire it manually.

Install

npm install openai @voightxyz/openai
Requirements:
  • Node.js 18+ (uses global fetch)
  • openai SDK 4.0.0+

Quick start

import OpenAI from 'openai'
import { wrapOpenAI } from '@voightxyz/openai'

const client = wrapOpenAI(new OpenAI(), {
  voightApiKey: process.env.VOIGHT_KEY,
  agent: 'my-prod-agent',
})

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hello' }],
})
That’s it. Every call is captured automatically. Visit your dashboard to see them in the AI Apps section.

Tracing & per-user tags

For production apps, wrap each request boundary with withTrace to group every LLM call inside one request into one trace, and to attribute cost per end-user with one line of code:
import OpenAI from 'openai'
import { wrapOpenAI, withTrace, log } from '@voightxyz/openai'

const openai = wrapOpenAI(new OpenAI(), { agent: 'production-chat-api' })

app.post('/api/chat', async (req, res) => {
  await withTrace(
    async () => {
      log('chat request received')

      const reply = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: [{ role: 'user', content: req.body.prompt }],
      })

      res.json({ reply })
    },
    {
      routeTag: 'POST /api/chat',
      tags: { userId: req.user.id, plan: req.user.plan },
    },
  )
})
Every wrapped LLM call inside the withTrace block gets stamped with metadata.tags = { userId, plan, ... } automatically. The dashboard’s AI Apps section then surfaces:
  • Users sub-tab — per-user spend, traces, tokens (driven by tags.userId)
  • Trace cards — every withTrace block as one drillable card with cost, latency, and event timeline
  • User filter pill — narrow Overview / Models / Tools to one user
Full surface in Tracing. The per-user pattern in depth — with examples for Auth0, Clerk, NextAuth, custom JWT, and anonymous flows — lives at per-user spend.

Options

OptionTypeDefaultPurpose
voightApiKeystringenv VOIGHT_KEYYour Voight key from the dashboard
agentstringenv VOIGHT_AGENTHOSTNAME'unknown-agent'Stable identifier surfaced in the dashboard
apiBasestringhttps://api.voight.xyzOverride for self-hosted deployments
privacy'minimal' | 'standard' | 'full''standard'Capture aggressiveness
sessionIdstringauto UUID v4Trace grouping. Stable across calls of one wrapper instance — events sharing a sessionId render as a single trace in the dashboard
enabledbooleantrueKill switch — when false, returns the original client untouched (zero overhead)
otelbooleanfalseEmit captured calls as OpenTelemetry spans alongside the direct ingest. See OpenTelemetry side-channel below.
A missing or empty API key is non-fatal: the wrapper prints a one-line warning and returns the original client. Production keeps running.

What’s captured

SignalField on the event
Model id (with version suffix)model
API surface used (chat completions vs responses)metadata.api ('responses' on responses events; omitted on chat)
Prompt messages (chat)input.messages
Input payload (responses)input.input
Response textmetadata.responseText
Token counts (input / output / total)metadata.tokens
Cache reads (prompt_tokens_details.cached_tokens)metadata.tokens.cache_read
Reasoning tokens (Responses API, o1 / o3 / future reasoning models)metadata.tokens.reasoning
Tool / function calls (full array)metadata.toolCalls
First tool’s name (audit-log compat)toolExecuted
Streaming flagmetadata.streaming
Finish reason / response statusmetadata.finishReason
Trace groupingmetadata.sessionId
Trace ID (when inside withTrace)metadata.traceId
Route tag (when inside withTrace)metadata.routeTag
User / plan / org tags (when inside withTrace)metadata.tags
Capture level usedmetadata.privacyLevel
Latency in millisecondsdurationMs
Errors (re-thrown to the caller, recorded with outcome: 'failed')errorMessage, outcome

Supported endpoints

The wrapper intercepts two paths:
  • client.chat.completions.create — Chat Completions (non-streaming + streaming, tool calling)
  • client.responses.create — Responses API (non-streaming + streaming, function calling, reasoning models)
Everything else on the OpenAI client passes through untouched. Embeddings, images, audio, and the Azure OpenAI client are on the 0.2.0 roadmap.

Chat Completions

const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'Hi' }],
})
Streaming works without setup. The wrapper auto-injects stream_options.include_usage: true so the final chunk carries the token count, and a per-index aggregator reassembles tool-call argument fragments.
const stream = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  stream: true,
  messages: [{ role: 'user', content: 'count to five' }],
})

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '')
}
An explicit stream_options: { include_usage: false } from the caller is preserved — you opt out of token capture for streaming events but everything else is still captured.

Responses API

The Responses API is OpenAI’s surface for apps built after Mar 2025: typed output items, stateful conversations via previous_response_id, built-in tools, and explicit reasoning_tokens for o1 / o3 models.
const response = await client.responses.create({
  model: 'gpt-4o-mini',
  input: 'Reply with: pong',
})
console.log(response.output_text)
Events from this surface carry metadata.api: 'responses' so dashboards can distinguish call sites from Chat Completions. Streaming is a typed event sequence (response.created, response.output_text.delta, response.output_item.added, response.function_call_arguments.delta, response.completed); the wrapper’s state machine reacts to the critical events and passes the rest through unchanged.
const stream = await client.responses.create({
  model: 'gpt-4o-mini',
  input: 'Count to five',
  stream: true,
})
for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta)
  }
}
For reasoning models, metadata.tokens.reasoning separates the “thinking” overhead from the visible answer so cost analysis stays accurate.

Tool / function calling

Works the same for both endpoints:
const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: "what's the weather in Tokyo?" }],
  tools: [{
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get weather for a city',
      parameters: {
        type: 'object',
        properties: { location: { type: 'string' } },
        required: ['location'],
      },
    },
  }],
})
On the captured event:
  • toolExecuted: 'get_weather' — first tool’s name. Renders in the audit-log DETAIL column with the same shape as Bash/Edit hook events.
  • metadata.toolCalls: [{ id, name, arguments }] — full array. arguments is the raw JSON string the model produced (we don’t parse it — invalid JSON is a real failure mode you need to debug).
Streaming function calls work the same — fragment deltas across chunks (Chat Completions) or function_call_arguments.delta events (Responses) are concatenated per tool index.

OpenTelemetry side-channel

By default the wrapper POSTs each captured call directly to api.voight.xyz. Set otel: true to additionally emit each call as an OpenTelemetry span — useful when the host process already runs an OTel pipeline (Langfuse, Phoenix, Datadog, Sentry, or @voightxyz/vercel-ai) and you want Voight events to appear there too.
const client = wrapOpenAI(new OpenAI(), { agent: 'my-app', otel: true })
Each span is named voight.openai.chat (or voight.openai.responses) and carries the standard gen_ai.* semantic-convention attributes (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.cache_read_input_tokens, gen_ai.response.finish_reasons) plus the parallel Vercel-style ai.* namespace. The direct ingest path is unchanged — otel: true is purely additive.

Dedup marker

Every emitted span carries voight.source: 'wrapper'. If you also use @voightxyz/vercel-ai ≥ 0.1.1 in the same process, that exporter skips wrapper-emitted spans automatically — no duplicate events in your dashboard.

Optional peer dependency

@opentelemetry/api is now an optional peer dependency. If you never set otel: true, nothing changes. If you set otel: true but the package isn’t installed, the wrapper logs a single warning and falls back to direct ingest only.

Privacy

Three levels apply to prompts, response text, and tool-call arguments. The function name in toolExecuted is treated as a tag (not user content) and survives all levels.
LevelPromptsResponse textTool argumentstoolExecuted (name)
minimaldroppeddroppeddroppedkept
standardscrubbedscrubbedscrubbedkept
fullverbatimverbatimverbatimkept
Standard scrubs 12 patterns: PEM private keys, JWTs, Anthropic / OpenAI / Stripe live / GitHub / AWS / Slack / Voight API keys, emails, E.164 phones, and Luhn-validated credit cards. Token counts, model ids, and timing are NEVER scrubbed — they’re numeric or tags, no PII risk. See PII patterns for the full catalogue.

How it compares

Use caseReach for
Coding agent (Claude Code, Cursor, Codex) capturing your dev sessionsHooks-based SDK
Autonomous TS/JS bot you wrote yourself emitting custom eventsLibrary mode
Production app calling OpenAI in user-facing flowsThis package
Production app calling Anthropic@voightxyz/anthropic
Per-user / per-tenant cost attribution in any of the abovePer-user spend
Anything else (Python, Go, Rust)HTTP API
The packages coexist — wrap your OpenAI client AND call voight.log() for your own domain events under the same agent. Adding withTrace on top groups them all per-request.

Source

Roadmap

  • Embeddings (embeddings.create)
  • Image generation (images.generate)
  • Audio (Whisper / TTS)
  • Azure OpenAI client
See the changelog for shipped releases.