Live Demo | Keet | NPM | API Docs
Browser speech-to-text for NVIDIA Parakeet ONNX models.
parakeet.js runs fully in the browser with onnxruntime-web. It can use WebGPU for the encoder and WASM for the decoder, so apps can transcribe audio without sending it to a server.
npm i parakeet.js
import { fromHub } from 'parakeet.js';
const model = await fromHub('parakeet-tdt-0.6b-v3', {
backend: 'webgpu',
encoderQuant: 'fp32',
decoderQuant: 'int8',
});
const result = await model.transcribe(pcm, 16000, {
returnTimestamps: true,
returnConfidences: true,
});
console.log(result.utterance_text);
pcm is mono Float32Array audio. The sample rate should be 16000. In a browser app, decode files with the Web Audio API or your existing audio pipeline before calling transcribe.
For a complete React example, see examples/demo.
The easiest path is fromHub:
import { fromHub } from 'parakeet.js';
const model = await fromHub('parakeet-tdt-0.6b-v3', {
backend: 'webgpu',
encoderQuant: 'fp32',
decoderQuant: 'int8',
preprocessorBackend: 'js',
});
Use fromUrls when you host the files yourself:
import { fromUrls } from 'parakeet.js';
const model = await fromUrls({
encoderUrl: '/models/encoder-model.onnx',
decoderUrl: '/models/decoder_joint-model.int8.onnx',
tokenizerUrl: '/models/vocab.txt',
backend: 'webgpu',
preprocessorBackend: 'js',
});
If your ONNX model uses external data, pass the matching .data URL too:
const model = await fromUrls({
encoderUrl: '/models/encoder-model.onnx',
encoderDataUrl: '/models/encoder-model.onnx.data',
decoderUrl: '/models/decoder_joint-model.int8.onnx',
tokenizerUrl: '/models/vocab.txt',
backend: 'webgpu',
});
backend: 'webgpu' is the recommended browser mode. It runs the encoder on WebGPU and the decoder on WASM.
Available backend values:
webgpuwebgpu-hybrid (kept for compatibility; same behavior as webgpu)webgpu-strictwasmQuantization options are fp32, fp16, and int8, depending on which files exist in the model repo. In WebGPU modes, int8 encoder requests are upgraded to fp32 because the encoder path does not support int8 there.
preprocessorBackend: 'js' is the default and usually the best choice. Use preprocessorBackend: 'onnx' only when you specifically want the ONNX preprocessor file.
Short audio:
const result = await model.transcribe(pcm, 16000);
console.log(result.utterance_text);
Timestamps and confidences:
const result = await model.transcribe(pcm, 16000, {
returnTimestamps: true,
returnConfidences: true,
});
console.log(result.words);
Long audio:
const result = await model.transcribeLongAudio(pcm, 16000, {
returnTimestamps: true,
});
console.log(result.text);
console.log(result.chunks);
Streaming:
const transcriber = model.createStreamingTranscriber();
const partial = await transcriber.pushAudioChunk(chunkPcm, 16000);
For the full option and result types, use the published API docs or the TypeScript declarations in types/.
examples/demo: current development demo. Use it to test local source and npm-package behavior.compat-tests/demo-v*: older demo snapshots. These help catch breaking changes against previous app code.parakeet.js.Common demo commands:
cd examples/demo
npm install
npm run dev:local # use local repo source
npm run dev # use package dependency
npm install
npm test
npm run verify:frame-copy
npm run docs:api
The project keeps behavior checks in tests/ and browser/manual checks in examples/demo and compat-tests/.
COOP/COEP) in deployed apps.MIT
Thanks to istupakov/onnx-asr for the reference implementation and model tooling foundations.