Files
openclaw/extensions/xai/stt.test.ts
Jaaneek 5f1df99a9c xai: OAuth login fixes plus openclaw User-Agent attribution
OAuth login flow
----------------
- Hard-require refresh_token after the authorization-code exchange in
  xai-oauth.ts. Access-only responses persisted credentials that the
  downstream usability check later rejected; the new requireRefreshToken
  option fails the exchange instead. Error wording explains the missing
  refresh_token in OIDC scope terms (offline_access scope rejected),
  not a "grant".
- Derive token expiry from the access-token JWT exp claim when
  expires_in is missing. id_token exp is intentionally not used as a
  fallback because id_token lifetime tracks the OIDC session, not the
  access token, and would defer refresh past actual expiry.
- Handle CORS preflight OPTIONS on the loopback OAuth callback in
  src/plugin-sdk/provider-auth-runtime.ts. The previous handler treated
  any non-callback request as a failed GET, returned "Missing code or
  state", and tore the server down before the real GET arrived. The
  CORS allowlist is now an optional `corsOriginAllowlist` parameter on
  waitForLocalOAuthCallback so the SDK helper stays generic. The xAI
  plugin passes ["auth.x.ai", "accounts.x.ai"] from loginXaiOAuth.

Sidecar surfaces
----------------
- speech-provider.ts (POST /v1/tts) honors the xAI OAuth profile in
  addition to provider config and XAI_API_KEY. isConfigured now also
  reports true when an xAI auth profile is configured (via
  isProviderAuthProfileConfigured), so OAuth-only users are no longer
  silently filtered out by the selection layer. The bearer resolver
  threads req.cfg into resolveApiKeyForProvider so the right xAI auth
  profile is picked when a user has multiple.
- realtime-transcription-provider.ts (WSS /stt) gets the same
  isConfigured fix, and the lazy headers() resolver threads req.cfg
  into the OAuth bearer lookup. createSession stays sync per its
  plugin contract.
- stt.ts: drop the plugin-side OAuth fallback. The media-understanding
  core already resolves auth (cfg/agentDir-aware) via
  resolveProviderExecutionContext before calling transcribeAudio, so
  the wrapper was redundant. transcribeAudio is now the registered
  hook directly.

User-Agent attribution
----------------------
- New buildXaiAttributionPolicy in src/agents/provider-attribution.ts
  injects User-Agent: openclaw/<version>, originator, and version on
  /v1/responses and /v1/chat/completions traffic that goes through
  resolveProviderRequestHeaders. Gated to xai-native and default
  endpoint classes; custom proxy baseUrls remain withheld. reviewNote
  is honest about which headers are spec-verified vs mirrored.
- Shared extensions/xai/src/xai-user-agent.ts helper exports
  xaiUserAgentHeaderFor(baseUrl) which only emits the User-Agent when
  the resolved baseUrl points at the xAI-native API host. Threaded
  through TTS and realtime STT (WS upgrade headers) so user-configured
  proxy baseUrls do not receive the openclaw identity. OAuth discovery
  and token endpoints still send User-Agent unconditionally because
  isTrustedXaiOAuthEndpoint already restricts those URLs to *.x.ai.
- Image gen, batch STT, and video gen rely on the attribution policy
  alone (no manual User-Agent in defaultHeaders), so attribution
  withholding on user-configured proxy baseUrls is preserved
  end-to-end.
- UA is bearer-agnostic: same value whether the bearer comes from an
  xAI API key or the xAI OAuth flow.

Drop dead api.grok.x.ai alias
-----------------------------
- xAI retired the api.grok.x.ai alias; DNS now returns NXDOMAIN from
  xAI's own authoritative nameservers. Drop it from the xai-native
  endpoint host set in extensions/xai/openclaw.plugin.json,
  extensions/xai/api.ts, extensions/xai/tts.ts, and the
  openai-responses payload policy. Update the attribution test to
  classify api.grok.x.ai as "custom" (no live user can reach it; the
  classification keeps documenting the host's status).

Video generation now matches xAI's actual API behavior
------------------------------------------------------
Previously, real video generation requests failed with
"xAI video generation response malformed" because the poll-status
handler validated against a closed enum that did not match what the
xAI service actually returns. Four fixes:
- Loosen the poll-status handler. xAI returns intermediate strings
  outside `["queued", "processing", "done", "failed", "expired"]`
  (commonly `submitted`, `pending`, `in_progress`, ...). Treat `done`
  as terminal-success, `["failed", "error", "expired", "cancelled"]`
  as terminal-failure, and any other string (including empty) as
  continue-polling. Also accept `cancelled` as a terminal failure.
- Send default duration/aspect_ratio/resolution on every generate and
  reference-image submit. xAI rejects bodies that omit these fields.
  Defaults: duration=8s, aspect_ratio="16:9", resolution="720p".
- Accept lowercase resolution input ("480p"/"720p"/"1080p") in
  addition to uppercase, normalize to lowercase on the wire.
- Add an `x-idempotency-key` header (fresh `crypto.randomUUID()`) on
  every submit so a network retry does not double-charge the user.
  Polls intentionally reuse the unmodified `headers` without the key.

Ergonomics
----------
- All "missing xAI credentials" errors (code_execution, lazy
  code_execution fallback in extensions/xai/index.ts, x_search,
  web_search grok in web-search-provider.runtime.ts, TTS, batch STT,
  realtime STT) now mention `openclaw onboard --auth-choice xai-oauth`
  first.
- Dedupe the Grok model-id alias table: model-compat.ts re-exports
  normalizeXaiModelId from model-id.ts as normalizeNativeXaiModelId.

Test coverage
-------------
- src/plugin-sdk/provider-auth-runtime.test.ts: locks the new pure
  buildOAuthCallbackOriginResolver gate (allowlist match,
  case-normalization, https-only, non-allowlisted hosts dropped,
  multi-Origin handling).
- extensions/xai/xai-oauth.test.ts: locks
  XAI_OAUTH_CALLBACK_CORS_ORIGIN_ALLOWLIST so loginXaiOAuth keeps
  threading the right hosts to the SDK helper.
- extensions/xai/speech-provider.test.ts: OAuth-only auth profile
  flips isConfigured to true; cfg threads into the OAuth fallback
  resolver.
- extensions/xai/realtime-transcription-provider.test.ts: same +
  upgrade headers carry the OAuth bearer end-to-end.
- extensions/xai/stt.test.ts: explicit assertion that transcribeAudio
  trusts the core-resolved apiKey (no plugin-side wrapper).

Verification
------------
- pnpm install: clean
- 154/154 vitest tests pass across 13 touched test files
- pnpm check:changed: typecheck core/ext + tests, oxlint core/ext,
  runtime guards, dependency pin guard, package patch guard, runtime
  import cycles, sidecar loader guard - all green
- pnpm build: 0 errors, 0 [INEFFECTIVE_DYNAMIC_IMPORT] warnings
2026-05-18 02:43:12 +01:00

107 lines
3.5 KiB
TypeScript

import { describe, expect, it, vi } from "vitest";
import {
buildXaiMediaUnderstandingProvider,
transcribeXaiAudio,
XAI_DEFAULT_STT_MODEL,
} from "./stt.js";
const { postTranscriptionRequestMock } = vi.hoisted(() => ({
postTranscriptionRequestMock: vi.fn(
async (_params: { headers: Headers; body: BodyInit; url: string; timeoutMs?: number }) => ({
response: new Response(JSON.stringify({ text: "hello from audio" }), { status: 200 }),
release: vi.fn(),
}),
),
}));
function requireLastPostTranscriptionCall(): {
url?: string;
timeoutMs?: number;
auditContext?: string;
headers: Headers;
body: BodyInit;
} {
const params = (postTranscriptionRequestMock.mock.calls as unknown as Array<[unknown]>).at(
-1,
)?.[0] as
| {
url?: string;
timeoutMs?: number;
auditContext?: string;
headers?: Headers;
body?: BodyInit;
}
| undefined;
if (!params?.headers || !params.body) {
throw new Error("Expected transcription request params");
}
return {
...params,
headers: params.headers,
body: params.body,
};
}
vi.mock("openclaw/plugin-sdk/provider-http", async (importOriginal) => {
const actual = await importOriginal<typeof import("openclaw/plugin-sdk/provider-http")>();
return {
...actual,
postTranscriptionRequest: postTranscriptionRequestMock,
};
});
describe("xai stt", () => {
it("posts audio files to the xAI STT endpoint", async () => {
const result = await transcribeXaiAudio({
buffer: Buffer.from("audio-bytes"),
fileName: "sample.wav",
mime: "audio/wav",
apiKey: "xai-key",
baseUrl: "https://api.x.ai/v1/",
model: XAI_DEFAULT_STT_MODEL,
language: "en",
prompt: "ignored provider hint",
timeoutMs: 10_000,
});
expect(result).toEqual({ text: "hello from audio", model: XAI_DEFAULT_STT_MODEL });
const call = requireLastPostTranscriptionCall();
expect(call.url).toBe("https://api.x.ai/v1/stt");
expect(call.timeoutMs).toBe(10_000);
expect(call.auditContext).toBe("xai stt");
expect(call.headers.get("authorization")).toBe("Bearer xai-key");
expect(call.body).toBeInstanceOf(FormData);
const form = call.body as FormData;
expect(form.get("model")).toBe(XAI_DEFAULT_STT_MODEL);
expect(form.get("language")).toBe("en");
expect(form.get("prompt")).toBeNull();
expect(form.get("file")).toBeInstanceOf(Blob);
});
it("registers as an audio media-understanding provider", () => {
const provider = buildXaiMediaUnderstandingProvider();
expect(provider.id).toBe("xai");
expect(provider.capabilities).toEqual(["audio"]);
expect(provider.defaultModels).toEqual({ audio: XAI_DEFAULT_STT_MODEL });
expect(provider.autoPriority).toEqual({ audio: 25 });
});
it("trusts the core-resolved apiKey on transcribeAudio (no plugin-side OAuth fallback)", async () => {
const provider = buildXaiMediaUnderstandingProvider();
if (!provider.transcribeAudio) {
throw new Error("xAI media-understanding provider should register transcribeAudio");
}
await provider.transcribeAudio({
buffer: Buffer.from("audio-bytes"),
fileName: "sample.wav",
mime: "audio/wav",
apiKey: "core-resolved-bearer",
baseUrl: "https://api.x.ai/v1/",
model: XAI_DEFAULT_STT_MODEL,
timeoutMs: 10_000,
});
const call = requireLastPostTranscriptionCall();
expect(call.headers.get("authorization")).toBe("Bearer core-resolved-bearer");
});
});