Live Chat

Crypto News

Cryptocurrency News 1 years ago
ENTRESRUARPTDEFRZHHIIT

Researchers Develop AI Tool to Preempt and Block Harmful Outputs in Language Models

Algoine News
Summary:
AutoGPT, Northeastern University, and Microsoft Research have created a monitoring agent for large language models (LLMs) that can preempt and block potentially damaging outputs. The tool, designed to supervise existing LLMs across various contexts, stops unsafe testing and records actions for human review. The teams used a dataset comprising 2,000 safe human/AI interactions across numerous tasks to train the agent on OpenAI’s GPT 3.5 turbo, enabling it to distinguish between harmless and potentially harmful outputs with 90% accuracy.
In a collaborative effort, AI firm AutoGPT, Northeastern University, and Microsoft Research have conceived an agent capable of overseeing large language models (LLMs) so as to hinder potentially damaging outputs. The findings of the team, presented in a preliminary study titled “Testing Language Model Agents Safely in the Wild,” assert that the tool possesses the requisite flexibility to supervise existing LLMs and can pre-empt harmful consequences, such as code-based attacks. The study elaborates that all actions undertaken by the agent are meticulously examined through a context-sensitive tool that functions within stringent safety parameters and can terminate unsafe testing. These questionable activities are then ranked and recorded for human review. Although existing tools to oversee LLM outputs for potential hazards seem to function adequately within lab conditions, their application to models already present on the open internet fails to fully grasp the complex dynamics of the real world. This is mainly attributed to the occurrence of edge cases. The thought that researchers, irrespective of their prowess, can predict every possible risk scenario before it transpires is widely disapproved in the AI arena. Even when individuals interacting with AI harbor the best intentions, unforeseen harm can emanate from seemingly safe suggestions. To train their overseeing agent, the team constructed a dataset comprising close to 2,000 safe interactions between humans and AI across 29 disparate tasks -- from basic text-recall operations and code rectifications to building entire web pages from scratch. In relation to this, the researchers also formed a rival testing dataset, filled with intentionally unsafe, manually put together adversarial outcomes. Subsequently, these datasets were leveraged to train an agent on OpenAI’s GPT 3.5 turbo -- a leading-edge system with the capacity to differentiate between harmless and potentially damaging outputs with approximately 90% accuracy.

Published At

11/20/2023 5:18:44 PM

Disclaimer: Algoine does not endorse any content or product on this page. Readers should conduct their own research before taking any actions related to the asset, company, or any information in this article and assume full responsibility for their decisions. This article should not be considered as investment advice. Our news is prepared with AI support.

Do you suspect this content may be misleading, incomplete, or inappropriate in any way, requiring modification or removal? We appreciate your report.

Report

Fill up form below please

🚀 Algoine is in Public Beta! 🌐 We're working hard to perfect the platform, but please note that unforeseen glitches may arise during the testing stages. Your understanding and patience are appreciated. Explore at your own risk, and thank you for being part of our journey to redefine the Algo-Trading! 💡 #AlgoineBetaLaunch