parakeet.js API
    Preparing search index...

    parakeet.js API

    Parakeet.js

    Live Demo | Keet | NPM | API Docs

    Browser speech-to-text for NVIDIA Parakeet ONNX models.

    parakeet.js runs fully in the browser with onnxruntime-web. It can use WebGPU for the encoder and WASM for the decoder, so apps can transcribe audio without sending it to a server.

    npm i parakeet.js
    
    import { fromHub } from 'parakeet.js';

    const model = await fromHub('parakeet-tdt-0.6b-v3', {
    backend: 'webgpu',
    encoderQuant: 'fp32',
    decoderQuant: 'int8',
    });

    const result = await model.transcribe(pcm, 16000, {
    returnTimestamps: true,
    returnConfidences: true,
    });

    console.log(result.utterance_text);

    pcm is mono Float32Array audio. The sample rate should be 16000. In a browser app, decode files with the Web Audio API or your existing audio pipeline before calling transcribe.

    For a complete React example, see examples/demo.

    • Client-side transcription in the browser.
    • WebGPU and WASM execution through ONNX Runtime Web.
    • Hugging Face model loading with IndexedDB caching.
    • Local or self-hosted model files via explicit URLs.
    • Timestamp and confidence output when requested.
    • Long-audio chunking with sentence-aware merge behavior.
    • Stateful streaming helpers for live transcription apps.

    The easiest path is fromHub:

    import { fromHub } from 'parakeet.js';

    const model = await fromHub('parakeet-tdt-0.6b-v3', {
    backend: 'webgpu',
    encoderQuant: 'fp32',
    decoderQuant: 'int8',
    preprocessorBackend: 'js',
    });

    Use fromUrls when you host the files yourself:

    import { fromUrls } from 'parakeet.js';

    const model = await fromUrls({
    encoderUrl: '/models/encoder-model.onnx',
    decoderUrl: '/models/decoder_joint-model.int8.onnx',
    tokenizerUrl: '/models/vocab.txt',
    backend: 'webgpu',
    preprocessorBackend: 'js',
    });

    If your ONNX model uses external data, pass the matching .data URL too:

    const model = await fromUrls({
    encoderUrl: '/models/encoder-model.onnx',
    encoderDataUrl: '/models/encoder-model.onnx.data',
    decoderUrl: '/models/decoder_joint-model.int8.onnx',
    tokenizerUrl: '/models/vocab.txt',
    backend: 'webgpu',
    });

    backend: 'webgpu' is the recommended browser mode. It runs the encoder on WebGPU and the decoder on WASM.

    Available backend values:

    • webgpu
    • webgpu-hybrid (kept for compatibility; same behavior as webgpu)
    • webgpu-strict
    • wasm

    Quantization options are fp32, fp16, and int8, depending on which files exist in the model repo. In WebGPU modes, int8 encoder requests are upgraded to fp32 because the encoder path does not support int8 there.

    preprocessorBackend: 'js' is the default and usually the best choice. Use preprocessorBackend: 'onnx' only when you specifically want the ONNX preprocessor file.

    Short audio:

    const result = await model.transcribe(pcm, 16000);
    console.log(result.utterance_text);

    Timestamps and confidences:

    const result = await model.transcribe(pcm, 16000, {
    returnTimestamps: true,
    returnConfidences: true,
    });

    console.log(result.words);

    Long audio:

    const result = await model.transcribeLongAudio(pcm, 16000, {
    returnTimestamps: true,
    });

    console.log(result.text);
    console.log(result.chunks);

    Streaming:

    const transcriber = model.createStreamingTranscriber();
    const partial = await transcriber.pushAudioChunk(chunkPcm, 16000);

    For the full option and result types, use the published API docs or the TypeScript declarations in types/.

    • examples/demo: current development demo. Use it to test local source and npm-package behavior.
    • compat-tests/demo-v*: older demo snapshots. These help catch breaking changes against previous app code.
    • Keet: real-time reference app built on parakeet.js.

    Common demo commands:

    cd examples/demo
    npm install
    npm run dev:local # use local repo source
    npm run dev # use package dependency
    npm install
    npm test
    npm run verify:frame-copy
    npm run docs:api

    The project keeps behavior checks in tests/ and browser/manual checks in examples/demo and compat-tests/.

    • WebGPU requires a browser/runtime with WebGPU enabled.
    • Multithreaded WASM needs cross-origin isolation (COOP/COEP) in deployed apps.
    • Hugging Face model files are cached in IndexedDB for faster reloads.
    • If browser model cache becomes stale, the hub loader validates cached blobs and redownloads when needed.

    MIT

    Thanks to istupakov/onnx-asr for the reference implementation and model tooling foundations.