Meta’s Reveals New Voicebox AI For Text-to-Speech
Meta has just unveiled its new VoiceBox AI that provides ChatGPT-like text-to-voice conversion for its users.
The developers claim that this model will do for spoken language what ChatGPT and DALL-E did for text and images.
Similar to generative systems for text and images, Voicebox is capable of creating new data, changing styles, and modifying provided samples. 50,000 hours of speech recordings and audiobook transcripts in English, French, Spanish, German, Polish and Portuguese were used to train the system. These materials are in the public domain.
With Voicebox, you can edit audio clips, remove noise, and correct mispronounced words. In addition, the model can render speech based on a two-second fragment, transfer speech style between different languages, and create a variety of synthetic datasets.
However, with the disclosure of this technology, Meta has made it clear that AI fanatics shouldn’t get their hopes up until it’s released. Due to “potential risk of misuse,” Meta does not yet release the application or its source code.