OpenAI’s New AI Model o1 Not Only Became Better at Reasoning, It Can Also Deceive to Achieve a Goal

September 18, 2024 Natalia Ganeva

A new OpenAI model called o1, according to research firm Apollo, exhibits unusual behavior — the ability to generate false information and simulate rule-following. This means that the model, while apparently following instructions, can actually ignore them, and even deliberately deceive, in order to achieve goals. This aspect is raising concerns among AI safety experts, despite the AI model’s improved cognitive abilities.

OpenAI head of research Jerry Tworek told The Verge that o1 was trained using “an entirely new optimization algorithm and a new training dataset created specifically for it.” Previous AI models respond with patterns derived from the data sets they were trained on. o1 uses a technique called “reinforcement learning,” which rewards or punishes the system as it interacts with its environment.

This method allows the model to form a chain of decisions or even “thoughts” that it uses to process queries. This process is similar to how a normal person solves problems. The model is able to not only give answers, but also explain its decisions. Tworek also noted that o1 hallucinates less than other products from the company, although the developers have not yet managed to completely get rid of this problem. According to OpenAI Chief Scientist Bob McGrew, the model copes with the AP math test better than he does. The system also solved 83% of the problems in the International Mathematical Olympiad qualifying exam. The most advanced model to date, GPT-4o, trained using a different algorithm, coped with only 13% of the problems.

The fact that this model lies a small percentage of the time in security tests doesn’t signal an imminent Terminator-style apocalypse, but it’s an important fact to discover before deploying future iterations at scale.

You May Also Like

Leave a Reply Cancel reply