Crypto News

Home
»
News

Cryptocurrency News 11 months ago

Researchers Develop AI Tool to Preempt and Block Harmful Outputs in Language Models

Summary:

AutoGPT, Northeastern University, and Microsoft Research have created a monitoring agent for large language models (LLMs) that can preempt and block potentially damaging outputs. The tool, designed to supervise existing LLMs across various contexts, stops unsafe testing and records actions for human review. The teams used a dataset comprising 2,000 safe human/AI interactions across numerous tasks to train the agent on OpenAI’s GPT 3.5 turbo, enabling it to distinguish between harmless and potentially harmful outputs with 90% accuracy.

In a collaborative effort, AI firm AutoGPT, Northeastern University, and Microsoft Research have conceived an agent capable of overseeing large language models (LLMs) so as to hinder potentially damaging outputs. The findings of the team, presented in a preliminary study titled “Testing Language Model Agents Safely in the Wild,” assert that the tool possesses the requisite flexibility to supervise existing LLMs and can pre-empt harmful consequences, such as code-based attacks. The study elaborates that all actions undertaken by the agent are meticulously examined through a context-sensitive tool that functions within stringent safety parameters and can terminate unsafe testing. These questionable activities are then ranked and recorded for human review. Although existing tools to oversee LLM outputs for potential hazards seem to function adequately within lab conditions, their application to models already present on the open internet fails to fully grasp the complex dynamics of the real world. This is mainly attributed to the occurrence of edge cases. The thought that researchers, irrespective of their prowess, can predict every possible risk scenario before it transpires is widely disapproved in the AI arena. Even when individuals interacting with AI harbor the best intentions, unforeseen harm can emanate from seemingly safe suggestions. To train their overseeing agent, the team constructed a dataset comprising close to 2,000 safe interactions between humans and AI across 29 disparate tasks -- from basic text-recall operations and code rectifications to building entire web pages from scratch. In relation to this, the researchers also formed a rival testing dataset, filled with intentionally unsafe, manually put together adversarial outcomes. Subsequently, these datasets were leveraged to train an agent on OpenAI’s GPT 3.5 turbo -- a leading-edge system with the capacity to differentiate between harmless and potentially damaging outputs with approximately 90% accuracy.

#ChatGPT #Microsoft

Published At

11/20/2023 5:18:44 PM

Disclaimer: Algoine does not endorse any content or product on this page. Readers should conduct their own research before taking any actions related to the asset, company, or any information in this article and assume full responsibility for their decisions. This article should not be considered as investment advice. Our news is prepared with AI support.

Do you suspect this content may be misleading, incomplete, or inappropriate in any way, requiring modification or removal? We appreciate your report.

Try Free

Start exploring Algoine for 7 days.

No Credit-Card Required!

Crypto News

Researchers Develop AI Tool to Preempt and Block Harmful Outputs in Language Models

Summary:

Published At

Report

Try Free

Cookie Consent