Microsoft new AI VALL-E that replicates voice in 3 seconds

Listen to the podcast:

VALL-E is the name of a new artificial intelligence that still gives people the willies who marvel at how far technology has advanced and how close it is, one invention at a time, to being able to do what a person can. do.

And the reason for this is that we have already seen that AI imitates human behaviors such as long conversations, housework, creating photos and texts, and even researching historical events. This is mainly because more people are realizing how artificial intelligence can learn through repetition, information codes, and rewarded or punished behavior patterns. This contributes to advancing the capabilities of this technology.

Now a project has been created in which a person’s voice can be copied after only three seconds of listening to it. It is a new application of artificial intelligence that has taken us by surprise.

READ MORE: What is Martin Luther King Jr. Day and why is it celebrated?

This project is known as VALL-E. It is a language model created by Microsoft for text-to-speech synthesis (TTS). In recent years, the corporation has made significant efforts to improve this type of technology. Also, once this AI gets good enough, it will be able to integrate with ChatGPT technology, which is known for being able to build text out of basic information and make it look like you’re chatting with someone else (even going as far as writing reviews of celebrities). CDs (compact discs). That is, over time, this voice simulator will be able to imitate a conversation, giving the user the impression that they are speaking with the person whose voice was captured, despite the fact that both inputs are generated by artificial intelligence.

See also  How to combine multiple emails on one device?

One of the most remarkable aspects of VALL-E is that it only takes three seconds to hear the voice of the person you want to copy, either in person or by recording. According to Microsoft, artificial intelligence can not only duplicate the speech, but also the original rhythm of the language and the pitch with which the voice sample was recorded. This increases the feeling that you are chatting with a friend.

What is VAL-E?

VALL-E can achieve so much with so little information because it can combine techniques from different intelligences, such as TTS, voice editing, and GPT-3, which replicates the human speech pattern. This helps you understand the logical structure of a speech, as well as the patterns that emerge when expressing emotions like anger or exhaustion in your speech.

The model is not ready for use yet, however there are examples of how VALL-E can detect how people are feeling and display it in its voice simulation using just three seconds of speech.

READ ALSO: Critics’ Choice Awards 2023: the list of all the winners

“In terms of speech naturalness and speaker similarity, the results of the experiment suggest that Vall-E outperforms the state-of-the-art zero-trigger TTS system. [AI that recreates voices it’s never heard]”, according to a VALL-E study article published at Cornell University. Furthermore, we found that during synthesis, VALL-E was able to preserve the speaker’s emotion as well as the acoustic context of the acoustic signal.”

How does VAL-E work?

Microsoft has released VALL-E, a new artificial intelligence (AI) technology that can play any voice in just three seconds. According to Gizmochina, the tool was trained on 60,000 hours of English speech data. Plus, it can mimic the speaker’s emotions and tone, something previous models couldn’t.

See also  Jony Ive and OpenAI plan to revolutionize AI with a new device

However, there are doubts about the ethical consequences of the new technology. The voices generated by VALL-E and related technology will become more convincing, perhaps paving the door for realistic spam calls that impersonate the sounds of real people a potential victim knows.

Another potential is the impersonation of politicians and other public figures, which can lead to the spread of false material on social media. Also, some banks use voice recognition technology to authenticate the identity of the caller, and with AI-generated voices, it can be more difficult to determine if a caller is legitimate or not.

As a result, it is critical that Microsoft develop controls to ensure that VALL-E is used for good and not evil, according to the document.

Subscribe to our latest newsletter

To read our exclusive content, sign up now. $5/month, $50/year

Categories: Technology
Source: vtt.edu.vn

Leave a Comment