Google AI Language Model: Over 300 Languages Available
Google’s Universal Speech Model (USM) has expanded its language support to over 300 languages, an impressive milestone that brings the search giant closer to its ultimate goal of working across 1,000 languages. This announcement is the result of the company’s pledge to build a machine learning model that encompasses as many languages as possible.
The USM, a standard encoder-decoder architecture, uses a Conformer to take the log-mel spectrogram of speech signals as input and perform a convolutional sub-sampling. The model has two billion parameters and has been trained on 12 million hours of speech and 28 billion sentences of text. In simpler terms, Google‘s researchers pre-trained the model on a large unlabeled multilingual dataset and fine-tuned it on a smaller set of labeled data.
This approach is more effective than prior techniques, and the model is already being used by YouTube to generate closed captions for under-resourced languages like Amharic, Cebuano, Assamese, and Azerbaijani, among others. The USM achieves less than a 30% word error rate on average across 73 languages on YouTube.
Compared to OpenAI’s Whisper model, USM has a relatively lower word error rate for some 18 languages. However, computational efficiency needs to be improved to expand both language coverage and quality to achieve the company’s lofty goal.
In Conclusion
Google’s USM is an impressive feat of machine learning that sets the standard for natural language processing. With its expanding language support, the model will continue to drive innovation and enable accessibility for under-resourced languages worldwide.