New AI voice-cloning tools 'add fuel' to misinformation fire

FILE - President Joe Biden speaks about Ukraine from the Roosevelt Room at the White House in Washington on Jan. 25, 2023. An altered video that shows Biden making comments that attack transgender people was created with a new generation of artificial intelligence tools. While Hollywood studios have long been able to distort reality in this way, experts say the technology has been democratized without considering how it can fall into the wrong hands and be used to spread disinformation. (AP Photo/Susan Walsh, File)

NEW YORK (AP) 鈥 In a video from a Jan. 25 news report, President Joe Biden talks about tanks. But a doctored version of the video has amassed hundred of thousands of views this week on social media, making it appear he gave a speech that attacks transgender people.

Digital forensics experts say the video was created using a new generation of artificial intelligence tools, which allow anyone to quickly generate audio simulating a person鈥檚 voice with a few clicks of a button. And while the Biden clip on social media may have failed to fool most users this time, the clip shows how easy it now is for people to generate hateful and disinformation-filled 鈥渄eepfake鈥 videos that could do real-world harm.

鈥淭ools like this are going to basically add more fuel to fire,鈥 said Hafiz Malik, a professor of electrical and computer engineering at the University of Michigan who focuses on multimedia forensics. 鈥淭he monster is already on the loose.鈥

It arrived last month with the beta phase of ElevenLabs鈥 voice synthesis platform, which allowed users to generate realistic audio of any person鈥檚 voice by uploading a few minutes of audio samples and typing in any text for it to say.

The startup says the technology was developed to dub audio in different languages for movies, audiobooks and gaming to preserve the speaker鈥檚 voice and emotions.

Social media users quickly began sharing an AI-generated audio sample of Hillary Clinton reading the same transphobic text featured in the Biden clip, along with fake audio clips of Bill Gates supposedly saying that the COVID-19 vaccine causes AIDS and actress Emma Watson purportedly reading Hitler鈥檚 manifesto 鈥淢ein Kampf.鈥

Shortly after, ElevenLabs seeing 鈥渁n increasing number of voice cloning misuse cases,鈥 and announced that it was now exploring safeguards to tamp down on abuse. One of the first steps was to make the feature available only to those who provide payment information. Initially, anonymous users were able to access the voice cloning tool for free. The company also claims that if there are issues, it can trace any generated audio back to the creator.

But even the ability to track creators won鈥檛 mitigate the tool鈥檚 harm, said Hany Farid, a professor at the University of California, Berkeley, who focuses on digital forensics and misinformation.

鈥淭he damage is done,鈥 he said.

As an example, Farid said bad actors could move the stock market with fake audio of a top CEO saying profits are down. And already there's a clip on YouTube that used the tool to alter a video to make it appear Biden said the U.S. was launching a nuclear attack against Russia.

Free and open-source software with the same capabilities have also emerged online, meaning paywalls on commercial tools aren鈥檛 an impediment. Using one free online model, the AP generated audio samples to sound like actors Daniel Craig and Jennifer Lawrence in just a few minutes.

鈥淭he question is where to point the finger and how to put the genie back in the bottle?鈥 Malik said. 鈥淲e can鈥檛 do it.鈥

When deepfakes first made headlines about five years ago, they were easy enough to detect since the subject didn鈥檛 blink and audio sounded robotic. That鈥檚 no longer the case as the tools become more sophisticated.

The altered video of Biden making derogatory comments about transgender people, for instance, combined the AI-generated audio with a real clip of the president, taken from a Jan. 25 CNN live broadcast announcing the U.S. dispatch of tanks . Biden鈥檚 mouth was manipulated in the video to match the audio. While most Twitter users recognized that the content was not something Biden was likely to say, they were nevertheless shocked at how realistic it appeared. Others appeared to believe it was real 鈥 or at least didn鈥檛 know what to believe.

Hollywood studios have long been able to distort reality, but access to that technology has been democratized without considering the implications, said Farid.

鈥淚t鈥檚 a combination of the very, very powerful AI based technology, the ease of use, and then the fact that the model seems to be: let鈥檚 put it on the internet and see what happens next,鈥 Farid said.

Audio is just one area where AI-generated misinformation poses a threat.

Free online like Midjourney and DALL-E can churn out photorealistic images of war and natural disasters in the style of legacy media outlets with a simple text prompt. Last month, some school districts in the U.S. began , which can produce readable text 鈥 like student term papers 鈥 on demand.

ElevenLabs did not respond to a request for comment.

The 春色直播 Press. All rights reserved.

More Science Stories

Sign Up to Newsletters

Get the latest from 春色直播News in your inbox. Select the emails you're interested in below.