Live Chat

Crypto News

Cryptocurrency News 8 months ago
ENTRESRUARPTDEFRZHHIIT

UNC Scientists Highlight the Complexities of Erasing Sensitive Data from AI Language Models

Algoine News
Summary:
Scientists from the University of North Carolina have explored the challenges of removing sensitive data from large language models (LLMs). The research highlights the complexities in verifying data removal due to how these models are designed and trained. While certain guardrails and methods like reinforcement learning from human feedback (RLHF) are applied to guide model behaviour, these do not fully delete sensitive data. Crucially, the study underlines that even advanced editing techniques fail to fully delete explicit facts from LLMs, indicating that defence techniques will always tail behind new attack methodologies.
Three scientists from the University of North Carolina in Chapel Hill have recently unveiled AI research in a pre-print paper. They discussed the complexity involved in removing sensitive data embedded in large language models (LLMs), like Google's Bard and OpenAI's ChatGPT. The paper suggests that eradicating information from these models could be achieved, yet verifying the completion of this process is equally challenging. This difficulty arises from the architectural design and training methods of LLMs. These models undergo pre-training on databases, which are then fine-tuned to produce intelligible outputs. Once a model has completed training, its developers cannot revisit the database to remove specific files and prevent the model from producing related results. In essence, all the data used to train a model is stored in the model's weights and parameters, only being ascertainable when generating outputs. This complexity is often referred to as AI's "black box". Issues occur when LLMs, trained on extensive datasets, produce outputs that reveal sensitive data such as personal details or financial records. Supposing an LLM was trained using confidential banking details, ordinarily, there's no method for AI developers to locate and delete these files. Instead, developers employ safety measures like hard-coded prompts that restrict certain behaviors, or they apply reinforcement learning from human feedback (RLHF). In an RLHF system, human assessors interact with models to evoke both desired and unwanted behaviors. Models are refined according to the feedback received, which either steers them towards desired behavior or curbs future unwanted behavior. But as the UNC scientists noted, this approach still depends on humans identifying all potential model flaws. And even if successful, it doesn't eradicate the information from the model. Based on the research paper: "A more fundamental weakness of RLHF is that a model might still possess the sensitive data. Although there's much argument regarding what models genuinely 'know', it seems problematic if a model can describe, for instance, how to manufacture a bioweapon but chooses not to provide information on how to do it." The UNC scientists concluded that even advanced model editing techniques, like Rank-One Model Editing (ROME), fail to completely eliminate factual data from LLMs. Facts can still be accessed 38% of the time through whitebox attacks and 29% through blackbox attacks. The research team utilized a model called GPT-J, with only 6 billion parameters, compared to GPT-3.5, one of the models backing ChatGPT, engineered with 170 billion parameters. It indicates that identifying and eradicating unwanted data in a larger LLM like GPT-3.5 would be substantially more challenging compared to a smaller model. The researchers managed to innovate new defense techniques to safeguard LLMs against extraction attacks. These are deliberate attempts by malicious entities to manipulate a model's safety measures, forcing it to release sensitive data. Nevertheless, the researchers observe that the task of deleting sensitive information is a challenge where defense techniques are constantly striving to outdo the evolving attack methodologies.

Published At

10/2/2023 5:30:00 PM

Disclaimer: Algoine does not endorse any content or product on this page. Readers should conduct their own research before taking any actions related to the asset, company, or any information in this article and assume full responsibility for their decisions. This article should not be considered as investment advice. Our news is prepared with AI support.

Do you suspect this content may be misleading, incomplete, or inappropriate in any way, requiring modification or removal? We appreciate your report.

Report

Fill up form below please

๐Ÿš€ Algoine is in Public Beta! ๐ŸŒ We're working hard to perfect the platform, but please note that unforeseen glitches may arise during the testing stages. Your understanding and patience are appreciated. Explore at your own risk, and thank you for being part of our journey to redefine the Algo-Trading! ๐Ÿ’ก #AlgoineBetaLaunch