@voightxyz/openai instruments the official OpenAI Node SDK. Wrap your client once and every chat.completions.create or responses.create call — non-streaming or streaming — lands in Voight with prompts, tokens, cache reads, tool calls, latency, and errors.
This is the same SDK-instrumentation model Sentry / Vercel AI / LangChain use. Same backend, same dashboard as @voightxyz/anthropic and library mode — events from all three land side-by-side under the same agent.
Quick setup (recommended)
From the root of your app:openai in your package.json, prompts for your Voight key + privacy level, validates the key, and writes a ready-to-import src/lib/voight.ts with the wrapped client. 30 seconds, zero copy-paste.
Continue below if you’d rather wire it manually.
Install
- Node.js 18+ (uses global
fetch) openaiSDK 4.0.0+
Quick start
Tracing & per-user tags
For production apps, wrap each request boundary withwithTrace to group every LLM call inside one request into one trace, and to attribute cost per end-user with one line of code:
withTrace block gets stamped with metadata.tags = { userId, plan, ... } automatically. The dashboard’s AI Apps section then surfaces:
- Users sub-tab — per-user spend, traces, tokens (driven by
tags.userId) - Trace cards — every
withTraceblock as one drillable card with cost, latency, and event timeline - User filter pill — narrow Overview / Models / Tools to one user
Options
| Option | Type | Default | Purpose |
|---|---|---|---|
voightApiKey | string | env VOIGHT_KEY | Your Voight key from the dashboard |
agent | string | env VOIGHT_AGENT → HOSTNAME → 'unknown-agent' | Stable identifier surfaced in the dashboard |
apiBase | string | https://api.voight.xyz | Override for self-hosted deployments |
privacy | 'minimal' | 'standard' | 'full' | 'standard' | Capture aggressiveness |
sessionId | string | auto UUID v4 | Trace grouping. Stable across calls of one wrapper instance — events sharing a sessionId render as a single trace in the dashboard |
enabled | boolean | true | Kill switch — when false, returns the original client untouched (zero overhead) |
otel | boolean | false | Emit captured calls as OpenTelemetry spans alongside the direct ingest. See OpenTelemetry side-channel below. |
What’s captured
| Signal | Field on the event |
|---|---|
| Model id (with version suffix) | model |
| API surface used (chat completions vs responses) | metadata.api ('responses' on responses events; omitted on chat) |
| Prompt messages (chat) | input.messages |
| Input payload (responses) | input.input |
| Response text | metadata.responseText |
| Token counts (input / output / total) | metadata.tokens |
Cache reads (prompt_tokens_details.cached_tokens) | metadata.tokens.cache_read |
| Reasoning tokens (Responses API, o1 / o3 / future reasoning models) | metadata.tokens.reasoning |
| Tool / function calls (full array) | metadata.toolCalls |
| First tool’s name (audit-log compat) | toolExecuted |
| Streaming flag | metadata.streaming |
| Finish reason / response status | metadata.finishReason |
| Trace grouping | metadata.sessionId |
Trace ID (when inside withTrace) | metadata.traceId |
Route tag (when inside withTrace) | metadata.routeTag |
User / plan / org tags (when inside withTrace) | metadata.tags |
| Capture level used | metadata.privacyLevel |
| Latency in milliseconds | durationMs |
Errors (re-thrown to the caller, recorded with outcome: 'failed') | errorMessage, outcome |
Supported endpoints
The wrapper intercepts two paths:client.chat.completions.create— Chat Completions (non-streaming + streaming, tool calling)client.responses.create— Responses API (non-streaming + streaming, function calling, reasoning models)
Chat Completions
stream_options.include_usage: true so the final chunk carries the token count, and a per-index aggregator reassembles tool-call argument fragments.
stream_options: { include_usage: false } from the caller is preserved — you opt out of token capture for streaming events but everything else is still captured.
Responses API
The Responses API is OpenAI’s surface for apps built after Mar 2025: typed output items, stateful conversations viaprevious_response_id, built-in tools, and explicit reasoning_tokens for o1 / o3 models.
metadata.api: 'responses' so dashboards can distinguish call sites from Chat Completions. Streaming is a typed event sequence (response.created, response.output_text.delta, response.output_item.added, response.function_call_arguments.delta, response.completed); the wrapper’s state machine reacts to the critical events and passes the rest through unchanged.
metadata.tokens.reasoning separates the “thinking” overhead from the visible answer so cost analysis stays accurate.
Tool / function calling
Works the same for both endpoints:toolExecuted: 'get_weather'— first tool’s name. Renders in the audit-log DETAIL column with the same shape as Bash/Edit hook events.metadata.toolCalls: [{ id, name, arguments }]— full array.argumentsis the raw JSON string the model produced (we don’t parse it — invalid JSON is a real failure mode you need to debug).
function_call_arguments.delta events (Responses) are concatenated per tool index.
OpenTelemetry side-channel
By default the wrapper POSTs each captured call directly toapi.voight.xyz. Set otel: true to additionally emit each call as an OpenTelemetry span — useful when the host process already runs an OTel pipeline (Langfuse, Phoenix, Datadog, Sentry, or @voightxyz/vercel-ai) and you want Voight events to appear there too.
voight.openai.chat (or voight.openai.responses) and carries the standard gen_ai.* semantic-convention attributes (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.cache_read_input_tokens, gen_ai.response.finish_reasons) plus the parallel Vercel-style ai.* namespace. The direct ingest path is unchanged — otel: true is purely additive.
Dedup marker
Every emitted span carriesvoight.source: 'wrapper'. If you also use @voightxyz/vercel-ai ≥ 0.1.1 in the same process, that exporter skips wrapper-emitted spans automatically — no duplicate events in your dashboard.
Optional peer dependency
@opentelemetry/api is now an optional peer dependency. If you never set otel: true, nothing changes. If you set otel: true but the package isn’t installed, the wrapper logs a single warning and falls back to direct ingest only.
Privacy
Three levels apply to prompts, response text, and tool-call arguments. The function name intoolExecuted is treated as a tag (not user content) and survives all levels.
| Level | Prompts | Response text | Tool arguments | toolExecuted (name) |
|---|---|---|---|---|
minimal | dropped | dropped | dropped | kept |
standard | scrubbed | scrubbed | scrubbed | kept |
full | verbatim | verbatim | verbatim | kept |
How it compares
| Use case | Reach for |
|---|---|
| Coding agent (Claude Code, Cursor, Codex) capturing your dev sessions | Hooks-based SDK |
| Autonomous TS/JS bot you wrote yourself emitting custom events | Library mode |
| Production app calling OpenAI in user-facing flows | This package |
| Production app calling Anthropic | @voightxyz/anthropic |
| Per-user / per-tenant cost attribution in any of the above | Per-user spend |
| Anything else (Python, Go, Rust) | HTTP API |
voight.log() for your own domain events under the same agent. Adding withTrace on top groups them all per-request.
Source
Roadmap
- Embeddings (
embeddings.create) - Image generation (
images.generate) - Audio (Whisper / TTS)
- Azure OpenAI client