Alibaba Cloud's Qwen team open-sources two voice base models with better speech recognition performance than OpenAI Whisper model

Alibaba Cloud's Qwen team recently open-sourced two speech base models on Github, SenseVoice and CosyVoice, with the former designed for speech recognition and the latter for speech generation. These models demonstrate exceptional performance, with SenseVoice outperforming OpenAI's Whisper model in recognition accuracy.

These models are fully open-sourced under the Apache 2.0 license, allowing individuals, developers, and enterprises to download and use the models for free, offering an alternative to paid API models like Whisper.

SenseVoice Model:

SenseVoice is a multilingual audio understanding model supporting speech recognition, language identification, speech emotion recognition, acoustic event detection, and inverse text normalization. The model was trained using hundreds of thousands of hours of labeled audio to ensure its universal recognition capability. It can be applied to recognize Mandarin, Cantonese, English, Japanese, and Korean audio and produce rich text transcriptions with emotional and event annotations.

Multilingual Recognition: Trained with over 400,000 hours of data, supporting more than 50 languages, and surpassing the Whisper model in recognition accuracy.
Rich Text Recognition: Features excellent emotion recognition capabilities, achieving and exceeding the results of the current best emotion recognition models on test data.
Sound Event Detection: Capable of detecting a variety of common human-computer interaction events, including music, applause, laughter, crying, coughing, and sneezing.
Efficient Inference: The SenseVoice-Small model uses a non-autoregressive end-to-end framework with extremely low inference latency, processing 10s of audio in just 70ms, 15 times faster than Whisper-Large.
Fine-tuning and Customization: Offers convenient fine-tuning scripts and strategies for users to address long-tail sample issues based on business scenarios.
Service Deployment: Features a complete service deployment chain, supporting multi-concurrent requests, with client-side language support for Python, C++, HTML, Java, and C#.

CosyVoice Model:

CosyVoice also supports multilingualism, timbre, and emotional control, excelling in multilingual voice, zero-shot voice generation, cross-language voice cloning, and command following capabilities.

Both models are part of the FunAudioLLM series, a framework aimed at enhancing natural speech interactions between humans and large language models to enable applications like speech translation, emotional voice chat, interactive blogging, and expressive audiobook narration, pushing the boundaries of speech interaction technology.

These models are now available on the Modelscope and HuggingFace platforms for interested developers to download and test.

SenseVoice Model: https://github.com/FunAudioLLM/SenseVoice
CosyVoice Model: https://github.com/FunAudioLLM/CosyVoice

For a complete description of FunAudioLLM: https://fun-audio-llm.github.io/

AI(273)Qwen(3)speech generation(1)speech recognition(1)

Copyright Notice:
Thank you for reading. This article was written by Landian News, and the author is Brook.X. If you wish to repost this article, please include a link to the original: https://landian.news/article/2512.html

{{userData.name}}

Alibaba Cloud's Qwen team open-sources two voice base models with better speech recognition performance than OpenAI Whisper model

SenseVoice Model:

CosyVoice Model:

Microsoft Unveils Copilot+PC: A New Frontier in AI-Enhanced Computing

Cloudflare Introduces AI Auditing for Websites to Monitor and Block AI Crawlers

[Online Tool] Clay Filter AI – Quickly Convert Your Photos into Clay Animation Style (Free)

Redis Releases Redis 8, Its First Major Version Update Following License Change

Invisible Watermarks: OpenAI's Solution to Detect AI-Generated Content

Apple and Other Tech Giants Accused of Using 170,000 Unlicensed YouTube Video Captions to Train AI

Microsoft Announces Bing Search Revamp with AI Summaries, Enabling Users to View Answers Without Clicking

Windows 11 Recall Revamps with Screenray and Manual Snapshot Features

Elon Musk's AI Company xAI Secures 6 Billion in Funding, ,Totaling12 Billion for the Year

OpenAI Introduces Free Access to GPT-o1 mini Model for Enhanced Complex Reasoning

[Download] VirtualBox 7.1.6 Official Release: Now with Support for Linux Kernel 6.13

[Download] Wine 10.0 Released: A Major Update to the Linux-Windows Compatibility Layer Software, Bringing Numerous Functional Improvements

Developer's Account Suspended by OpenAI for Utilizing ChatGPT to Operate a Turret (Automatic Rifle)

Salesforce Halts Engineering Hires as AI Boosts Productivity by 30%

ByteDance's Enterprise App Lark to Cease U.S. Services, Businesses Face Troublesome Migration

Debian 12.9 Official Release: Featuring Linux Kernel LTS 6.1 - Download/Upgrade Guide Included

NVIDIA Launches Palm-Sized AI Supercomputer Named Project Digits, Delivering 1 PetaFLOPS of Floating-Point Performance

Microsoft Confirms Fix in Progress for Windows 11 Explorer Icon Overlap and Menu Disarray

Japanese Copyright Organization Sues Adult Piracy Site MISSAV for $45 Million, Seizes Main Domain

{{userData.name}}

SenseVoice Model:

CosyVoice Model:

相关文章

Microsoft Unveils Copilot+PC: A New Frontier in AI-Enhanced Computing

Cloudflare Introduces AI Auditing for Websites to Monitor and Block AI Crawlers

[Online Tool] Clay Filter AI – Quickly Convert Your Photos into Clay Animation Style (Free)

Redis Releases Redis 8, Its First Major Version Update Following License Change

Invisible Watermarks: OpenAI's Solution to Detect AI-Generated Content

Apple and Other Tech Giants Accused of Using 170,000 Unlicensed YouTube Video Captions to Train AI

Microsoft Announces Bing Search Revamp with AI Summaries, Enabling Users to View Answers Without Clicking

Windows 11 Recall Revamps with Screenray and Manual Snapshot Features

Elon Musk's AI Company xAI Secures 6 Billion in Funding, ,Totaling12 Billion for the Year

OpenAI Introduces Free Access to GPT-o1 mini Model for Enhanced Complex Reasoning

[Download] VirtualBox 7.1.6 Official Release: Now with Support for Linux Kernel 6.13

[Download] Wine 10.0 Released: A Major Update to the Linux-Windows Compatibility Layer Software, Bringing Numerous Functional Improvements

Developer's Account Suspended by OpenAI for Utilizing ChatGPT to Operate a Turret (Automatic Rifle)

Salesforce Halts Engineering Hires as AI Boosts Productivity by 30%

ByteDance's Enterprise App Lark to Cease U.S. Services, Businesses Face Troublesome Migration

Debian 12.9 Official Release: Featuring Linux Kernel LTS 6.1 - Download/Upgrade Guide Included

NVIDIA Launches Palm-Sized AI Supercomputer Named Project Digits, Delivering 1 PetaFLOPS of Floating-Point Performance

Microsoft Confirms Fix in Progress for Windows 11 Explorer Icon Overlap and Menu Disarray

Japanese Copyright Organization Sues Adult Piracy Site MISSAV for $45 Million, Seizes Main Domain