AI becomes curiouser and curiouser, but not too curious

November 11, 2022

undefined mins

AI trained on Mario Kart and other video games using a new algorithm to optimise curiosity is a step towards making AI agents as smart as kids, say experts

Researchers in the United States have created an algorithm designed to prevent artificial intelligence from becoming “too curious” and are training AI agents to use it with video games.

Experts working at MIT’s Improbable AI Laboratory and Computer Science and Artificial Intelligence Laboratory (CSAIL) say their algorithm automatically increases curiosity when it's required and then suppresses it if the agent has enough supervision to know what to do.

“Reinforcement learning” has previously been employed by systems which involve an AI agent iteratively learning from being rewarded for good behaviour and punished for bad. These agents can struggle to balance the time spent discovering better actions and the time spent taking actions that led to high rewards in the past. Too much curiosity can distract the agent from making good decisions, say researchers, while too little means the agent will never discover good decisions.

MIT’s new algorithm was tested on over 60 video games and succeeded at both hard and easy exploration tasks. Previous algorithms have only been able to tackle only a hard or easy domain, so the new method requires fewer data.

“If you master the exploration-exploitation trade-off well, you can learn the right decision-making rules faster — and anything less will require lots of data, which could mean suboptimal medical treatments, lesser profits for websites, and robots that don't learn to do the right thing,” says Pulkit Agrawal, an Assistant Professor of Electrical Engineering and Computer Science (EECS) at MIT, Director of the Improbable AI Lab, and CSAIL affiliate who supervised the research.

“Imagine a website trying to figure out the design or layout of its content that will maximise sales,” he says. “If one doesn’t perform exploration-exploitation well, converging to the right website design or the right website layout will take a long time, which means profit loss.”

New algorithm reduces a week of work to a few hours

In experiments, researchers divided games like Mario Kart and Montezuma’s Revenge into two different categories: one where supervision was sparse - meaning the agent had less guidance, which was considered “hard” exploration games - and a second where supervision was denser, or the “easy” exploration games. The team’s algorithm consistently performed well in both kinds of games.

“Getting consistent good performance on a novel problem is extremely challenging — so by improving exploration algorithms, we can save your effort on tuning an algorithm for your problems of interest, says Zhang-Wei Hong, an EECS PhD student, CSAIL affiliate, and co-lead author along with Eric Chen on a new paper about the work. We need curiosity to solve extremely challenging problems, but on some problems, it can hurt performance. Previously what took, for instance, a week to successfully solve the problem, with this new algorithm, we can get satisfactory results in a few hours.”

One of the greatest challenges for current AI and cognitive science is balancing exploration and exploitation, something children do seamlessly, but a challenge to reproduce for computers, says Alison Gopnik, Professor of Psychology and Affiliate Professor of Philosophy at the University of California at Berkeley. “This paper uses impressive new techniques to accomplish this automatically, designing an agent that can systematically balance curiosity about the world and the desire for reward, [thus taking] another step towards making AI agents (almost) as smart as children.”

machine learning data