What Maluuba beating Ms. Pac-Man means for AI research

Artificial intelligence researchers sure seem to love beating humans at games. We’ve already seen AI agents overtake human capabilities in poker, chess, and Go. Now, Microsoft-acquired Maluuba in Montreal has defeated Ms. Pac-Man, achieving the highest score possible — a whopping 999,990 points. While the win proved somewhat anticlimactic — “We had different guesses, but we never expected that after a certain number of points the game would just reset to zero,” — it was a great boon for the the team’s approach to reinforcement learning, in which they broke the problem down into numerous parts, and set each AI agent on its own mission.

Maluuba product manager, Rahul Mehrotra, described the approach like the organization of a startup: “a very flat architecture where you have a number of different agents working in parallel, and then there’s one aggregator who takes the signal or the reward from all of them — kind of like a startup where everyone reports to the CEO.”

But why beat video games at all? Clearly there are better uses for artificial intelligence.

“The reason that I’m here in Canada is because of the great AI research and the great professors you have here.”

“The higher level goal is really to solve AI, and build really intelligent agents,” said Harm van Seijen, the research manager whose team tackled this problem. “To get there, there are many different obstacles that have to be overcome. We have addressed one set of obstacles, but there are many other obstacles before a technique like reinforcement learning can really break through and we get truly intelligent agents. Our goal has always been to focus on those obstacles and to try to understand them and try to come up with techniques to deal with them.”

For that reason, Ms. Pac-Man proved an excellent testing ground, given the complexity of the game. With ghosts and fruit moving in unpredictable ways, the team set up a reward system for the 150+ AI agents for recommending good moves to the “CEO,” and giving individual responsibility to the agents. This approach is unique to Maluuba, and now that it’s proven fruitful, the applications are as numerous as problems that exist.

Other teams at Microsoft have already expressed interested in incorporating it into their products, and continued research and product development will now take place in parallel.

“We would love to take this algorithm in its current state and see where it can be applied to different products — they could be new or existing — and from the research side continue to explore different architectures,” said Mehrotra.

Given that the team only started working on this project last summer, these results are even more exciting.

maluuba | BetaKit — Maluuba co-founders Kaheer Suleman and Sam Pasupalak

“We started focusing on the Atari game, Ms. Pac-Man, to see if the things it did on the smaller game would transfer over to this really big complex game, and it turned out to do really, really well. We expected it to do well, but to do as well as it actually did? That was a surprise.”

As van Seijen’s first project at Maluuba, this bodes well for things to come. Since Microsoft’s acquisition of the AI startup, the team has already grown by 10 to 15 percent, with a goal to double in the next 18 months. Employees from Maluuba’s former headquarters in Waterloo have all moved to Montreal, as Mehrotra notes that “Montreal’s a great place to be for the work we do, especially in deep learning and reinforcement learning..”

With Microsoft already present, the recent announcements from IBM and Google about their forthcoming AI labs, and the whopping $137.5 million Series A that Element AI just raised, there’s simply no question that Montreal is developing into the global hotbed for AI research.

“The reason that I’m here in Canada is because of the great AI research and the great professors you have here,” said van Seijen. “I did a postdoc with Rich Sutton in Edmonton. He’s the godfather of reinforcement learning, in a sense. Now here in Montreal we’re with Yoshua Bengio, so it’s a really great place to be.”

In terms of other applications and projects they’re working on, Mehrotra and van Seijen were mum.

“There are lots of new problems we’re tackling but we can’t talk about them yet,” said Mehrotra.

Given the significant advances and novel approaches to deep learning and reinforcement learning taken by their team, we’re sure to hear more announcements coming from them soon.