AI chatbots such as ChatGPT or Gemini can be tricked with ease into complying with queries that generate harmful responses, according to a new study by the UK’s AI Safety Institute (AISI).
The government researchers tested the integrity of large language models (LLMs) – the technology behind the artificial intelligence chatbots – against national security attacks.
The findings come ahead of the AI Seoul Summit, which will be co-chaired by UK prime minister, Rishi Sunak, in South Korea on May 21-22.
Also read: Safety Will be a Top Agenda Item at the Seoul AI Summit
AI Chatbots Prone to Toxic Replies
AISI tested basic ‘jailbreaks’ – text prompts meant to override protections against illegal, toxic or explicit output – against five top LLMs. The Institute did not name the AI systems, but it found all of them “highly vulnerable.”
“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” the study said.
According to the report, ‘relatively simple’ attacks like prompting the chatbot to include, “Sure, I’m happy to help,” can deceive large language models into providing content that it is harmful in many ways.
The content can aid self-harm, dangerous chemical solutions, sexism or Holocaust denial, it said. AISI used publicly available prompts and privately engineered other jailbreaks for the study.
The Institute also tested the quality of responses to biologically and chemically themed queries.
While expert-level knowledge in the fields can be used for good, researchers wanted to know if AI chatbots can be used for harmful purposes like compromising critical national infrastructure.
“Several LLMs demonstrated expert-level knowledge of chemistry and biology. Models answered over 600 private expert-written chemistry and biology questions at similar levels to humans with PhD-level training,” researchers found.
AI chatbots can be bypassed with prompts
AI Poses Limited Cyber-security Threat
With respect to AI chatbots being potentially weaponized to perform cyber-attacks, the study said the LLMs aced simple cyber security tasks built for high-school students.
However, the chatbots struggled with tasks aimed at university students, suggesting limited malign potential.
Another area of concern was whether the chatbots can be deployed as agents to autonomously undertake a series of actions in ways that “may be difficult for humans to control.”
“Two LLMs completed short-horizon agent tasks (such as simple software engineering problems) but were unable to plan and execute sequences of actions for more complex tasks,” noted the study.
Also read: ‘AI Godfather’ Wants Universal Basic Income for Job Losses
UK’s Under-secretary of State for the Department of Science, Innovation and Technology, Saqib Bhatti MP, was recently quoted saying legislation will take shape in due course and will be informed by testing.
Firms Claim to Filter Bad Content
Companies such as Claude creator Anthropic, Meta, which made Llama, and OpenAI, the ChatGPT developer, have emphasized the in-built security mechanisms of their respective models.
OpenAI says it does not allow its technology to be “used to generate hateful, harassing, violent or adult content.” Anthropic stated that it prioritizes “avoiding harmful, illegal, or unethical responses before they occur”.
The AI Safety Institute’s findings are expected to be tabled before tech executives, government leaders and artificial intelligence experts at the Seoul summit.
Cryptopolitan Reporting By Jeffrey Gogo