IBM Security Reveals AI-powered 'Audio-jacking': A New Threat to Live Conversations
Summary:
IBM Security scientists discover an 'audio-jacking' technique that employs generative AI to hijack and alter live conversations undetected. The easy-to-build system requires just three seconds of a person's voice to clone it, replacing original audio with a manipulated one. The technique could be potentially used for unethical purposes such as transferring funds into incorrect accounts or covertly modifying live news content and political speeches.
In an alarming new development, IBM Security scientists have uncovered an unanticipatedly simple method to take over and alter live discussions by leveraging the power of artificial intelligence (AI). This technique, termed "audio-jacking," depends significantly on generative AI variants, such as OpenAI's ChatGPT and Meta's Llama-2, also including deepfake audio tech.
In their research, the team directed the AI to analyze audio from dual sources in a real-time conversation, akin to a phone call. On detecting a specified keyword or phrase, the AI would then intervene to tamper with the relevant audio, prior to transmitting it to the intended recipient.
As per IBM Security's blog notation, the test concluded with the AI efficiently seizing a participant's audio while another person asked them to share their banking data. The AI further replaced the original voice with deepfake audio, communicating an alternate account number. The recipients were oblivious to this attack.
While launching such attacks would require some form of scheming or deceit, designing the AI system didn't pose substantial obstacles. The construction of this proof-of-concept (PoC) was shockingly simple โ a significant amount of time was spent on understanding how to extract audio from the microphone and deliver it to the generative AI. Conventionally, designing an autonomous system to intervene specific audio segments and substitute them with on-the-go, self-generated audio files would demand substantial computer science skills. However, with modern generative AI technology, the process becomes more manageable. The blog further clarified that just three seconds of a person's voice are sufficient to duplicate it and such deepfakes are now predominantly executed via APIs.
The risk of audio-jacking extends beyond duping individuals into transferring funds into incorrect accounts. The researchers have warned that it may also serve as a clandestine censorship tool with the capacity to alter live news content or modify live-transmitted political speeches.
Published At
2/5/2024 8:20:06 PM
Disclaimer: Algoine does not endorse any content or product on this page. Readers should conduct their own research before taking any actions related to the asset, company, or any information in this article and assume full responsibility for their decisions. This article should not be considered as investment advice. Our news is prepared with AI support.
Do you suspect this content may be misleading, incomplete, or inappropriate in any way, requiring modification or removal?
We appreciate your report.