According to the OpenASR Leaderboard [1], looks like Parakeet V2/V3 and Canary-Qwen (a Qwen finetune) handily beat Moonshine. All 3 models are open, but Parakeet is the smallest of the 3. I use Parakeet V3 with Handy and it works great locally for me.
No idea why 'sudo pip install --break-system-packages moonshine-voice' is the recommended way to install on raspi?
The authors do acknowledge this though and give a slightly too complex way to do this with uv in an example project (FYI, you dont need to source anything if you use uv run)
asqueella 22 minutes ago [-]
For those wondering about the language support, currently English, Arabic, Japanese, Korean, Mandarin, Spanish, Ukrainian, Vietnamese are available (most in Base size = 58M params)
armcat 49 minutes ago [-]
This is awesome, well done guys, I’m gonna try it as my ASR component on the local voice assistant I’ve been building https://github.com/acatovic/ova. The tiny streaming latencies you show look insane
pzo 32 minutes ago [-]
haven't tested yet but I'm wondering how it will behave when talking about many IT jargon and tech acronyms. For those reason I had to mostly run LLM after STT but that was slowing done parakeet inference. Otherwise had problems to detect properly sometimes when talking about e.g. about CoreML, int8, fp16, half float, ARKit, AVFoundation, ONNX etc.
g-mork 43 minutes ago [-]
How does this compare to Parakeet, which runs wonderfully on CPU?
> This code, apart from the source in core/third-party, is licensed under the MIT License, see LICENSE in this repository.
> The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.
> The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder.
altruios 1 hours ago [-]
reading through readme.md
"License
This code, apart from the source in core/third-party, is licensed under the MIT License, see LICENSE in this repository.
The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.
The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder."
[1]: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
The authors do acknowledge this though and give a slightly too complex way to do this with uv in an example project (FYI, you dont need to source anything if you use uv run)
> This code, apart from the source in core/third-party, is licensed under the MIT License, see LICENSE in this repository.
> The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.
> The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder.
The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.
The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder."