parakeet.js API
    Preparing search index...

    parakeet.js API

    Parakeet.js

    Live Demo | Keet | NPM Package

    parakeet.js is browser speech-to-text for NVIDIA Parakeet ONNX models. It runs fully client-side using onnxruntime-web with WebGPU or WASM execution.

    npm i parakeet.js
    # or
    yarn add parakeet.js
    • Use WebGPU when available for best throughput.
    • Use WASM when WebGPU is not available or for compatibility-first setups.
    import { fromHub } from 'parakeet.js';

    const model = await fromHub('parakeet-tdt-0.6b-v3', {
    backend: 'webgpu-hybrid',
    encoderQuant: 'fp32',
    decoderQuant: 'int8',
    });

    // `file` should be a File (for example from <input type="file">)
    const pcm = await getMono16kPcm(file); // returns mono Float32Array at 16 kHz
    const result = await model.transcribe(pcm, 16000, {
    returnTimestamps: true,
    returnConfidences: true,
    });

    console.log(result.utterance_text);

    Use your existing app audio pipeline for getMono16kPcm(file) (Web Audio API, ffmpeg, server-side decode, etc.). A complete browser example is available in examples/demo/src/App.jsx (transcribeFile flow).

    • fromHub(repoIdOrModelKey, options): easiest path. Accepts model keys like parakeet-tdt-0.6b-v3 or full repo IDs.
    • fromUrls(cfg): explicit URL wiring when you host assets yourself.
    import { fromUrls } from 'parakeet.js';

    const model = await fromUrls({
    encoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/encoder-model.onnx',
    decoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/decoder_joint-model.int8.onnx',
    tokenizerUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/vocab.txt',
    // Only needed if you choose preprocessorBackend: 'onnx'
    preprocessorUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/nemo128.onnx',
    backend: 'webgpu-hybrid',
    preprocessorBackend: 'js',
    });
    • Backends are selected with backend:
      • webgpu (alias accepted)
      • wasm
      • advanced: webgpu-hybrid, webgpu-strict
    • In WebGPU modes, the encoder prefers WebGPU but decoder session runs on WASM (hybrid execution).
    • In getParakeetModel/fromHub, if backend starts with webgpu and encoderQuant is int8, encoder quantization is forced to fp32.
    • Encoder/decoder quantization supports int8, fp32, and fp16.
    • FP16 requires FP16 ONNX artifacts (for example encoder-model.fp16.onnx).
    • ONNX Runtime Web does not convert FP32 model files into FP16 at load time.
    • getParakeetModel/fromHub are strict about requested quantization: they do not auto-switch fp16 to fp32.
    • If requested FP16 artifacts are missing or fail to load, API calls throw actionable errors so callers can choose a different quantization explicitly.
    • Decoder runs on WASM in WebGPU modes; if decoder FP16 is unsupported in your runtime, choose decoderQuant: 'int8' or decoderQuant: 'fp32' explicitly.
    • preprocessorBackend is js (default) or onnx.

    parakeet.js now uses the pr74 real-FFT path in the default JS preprocessor (preprocessorBackend: 'js'). This keeps feature compatibility with the previous implementation while reducing mel extraction cost.

    Item Previous JS path New JS path (default)
    FFT strategy Full N=512 complex FFT per frame Real-FFT via one N/2=256 complex FFT + spectrum reconstruction (pr74)
    Expected speed Baseline Faster mel stage (commonly around ~1.5x in local mel benchmarks)
    Output behavior NeMo-compatible normalized log-mel Same behavior and ONNX-reference accuracy thresholds preserved
    API changes N/A None (JsPreprocessor / IncrementalMelProcessor unchanged)

    If you need exact ONNX preprocessor execution instead of JS mel, set preprocessorBackend: 'onnx'.

    Before using FP16 examples: ensure FP16 artifacts exist in the target repo and your browser/runtime supports FP16 execution (WebGPU FP16 path).

    Load known FP16 model key:

    import { fromHub } from 'parakeet.js';

    const model = await fromHub('parakeet-tdt-0.6b-v3', {
    backend: 'webgpu-hybrid',
    encoderQuant: 'fp16',
    decoderQuant: 'fp16',
    });

    Use explicit FP16 URLs:

    import { fromUrls } from 'parakeet.js';

    const model = await fromUrls({
    encoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/encoder-model.fp16.onnx',
    decoderUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/decoder_joint-model.fp16.onnx',
    tokenizerUrl: 'https://huggingface.co/ysdede/parakeet-tdt-0.6b-v3-onnx/resolve/main/vocab.txt',
    preprocessorBackend: 'js',
    backend: 'webgpu-hybrid',
    });

    The demo flow in examples/demo/src/App.jsx is:

    1. Load a model with public APIs (fromHub(...) for hub loading, or fromUrls(...) for explicit URLs).
    2. Decode uploaded audio with AudioContext({ sampleRate: 16000 }) + decodeAudioData(...).
    3. Convert decoded audio to mono 16 kHz PCM (Float32Array) by averaging channels when needed.
    4. Call model.transcribe(pcm, 16000, options) and render utterance_text.

    Reference code:

    • App component in examples/demo/src/App.jsx (loadModel / transcribeFile flow)

    model.transcribe(...) returns a TranscribeResult with this shape:

    type TranscribeResult = {
    utterance_text: string;
    words: Array<{
    text: string;
    start_time: number;
    end_time: number;
    confidence?: number;
    }>;
    tokens?: Array<{
    token: string;
    raw_token?: string;
    is_word_start?: boolean;
    start_time?: number;
    end_time?: number;
    confidence?: number;
    }>;
    confidence_scores?: {
    token?: number[] | null;
    token_avg?: number | null;
    word?: number[] | null;
    word_avg?: number | null;
    frame: number[] | null;
    frame_avg: number | null;
    overall_log_prob: number | null;
    };
    metrics?: {
    preprocess_ms: number;
    encode_ms: number;
    decode_ms: number;
    tokenize_ms: number;
    total_ms: number;
    rtf: number;
    mel_cache?: { cached_frames: number; new_frames: number };
    } | null;
    is_final: boolean;
    tokenIds?: number[];
    frameIndices?: number[];
    logProbs?: number[];
    tdtSteps?: number[];
    };
    • Enable returnTimestamps for meaningful start_time/end_time.
    • Enable returnConfidences for per-token/per-word confidence fields.
    • Advanced alignment/debug outputs are opt-in: returnTokenIds, returnFrameIndices, returnLogProbs, returnTdtSteps.

    Keet is a reference real-time app built on parakeet.js (repo).

    • For contiguous chunk streams, Keet uses createStreamingTranscriber(...).
    • Keet currently defaults to v4 utterance-based merging (UtteranceBasedMerger) with cursor/windowed chunk processing.
    Real-time transcription with Keet and parakeet.js
    npm run docs:api
    

    MIT