Pages

Monday, July 16, 2012

ICFP 2012 contest

With this year's ICFP contest I was mostly on my own. Albert helped interpret the spec at the start, though.

The source code is here: https://bitbucket.org/janto/icfp2012

For the lightning round I implemented the spec and wrote some typical Miner bots (greedy, random, ...). The update function was made faster by remembering and only updating rock positions instead of all positions of the map. The Miner I ended up submitting for the lightning round did an A* search that looked 7 moves ahead. A hash of the map's state was used to avoid searching recurring positions.

7 moves is not really deep enough for anything non-trivial, though. The maps are large enough that interesting things happen further away. A full transposition table might have helped, and I'm sure the update function could have been much faster if implemented smarter and in Cython. PyPy runs the code in about a quarter of the cPython time. (The A* search also penalized a large Manhattan distance to the nearest lambda to avoid ties and be attracted to lambdas further than 7 moves away, but I disabled that before submitting, due to not trusting the code.)

In any case, A* is boring. Everyone was going to do that. And larger teams would find smarter branching rules in languages faster than python.

First I implemented a Monte Carlo method Miner. For each of the 6 action, about 10,000 random successive actions of length 0-20 were generated the maximum score used to decide on an action to take. Repeat for each time step. The primary problem with a naive generation is that you need to do a lot of them to calculate representative properties.

A quick reading up on tree searches for Go AIs, suggests that "Upper Confidence Bounds for Trees" is state of the art. I took the literature's advice and thought about the problem as a Multi-armed bandit. So I implemented an "epsilon-greedy Miner" to select an action, with the Monte Carlo Miner embedded to simulate each "pull of the lever". The results aren't that bad, however, it's way too slow for the actual competition.

I couldn't really keep up with the updating specs. It's not that much work, but without a team I just ran out of energy tracking down a few simple bugs. I had a testing framework that helped a lot: input maps and commands mapped against expected final states and scores (created from the example validator). Even with this there was still a bug at the end where rising water killed the robot too quickly.

I gave up half-way into the competition. There was a lot I still wanted to do: an explicit search tree (instead of the implicit one created by recursive function calls), epsilon-greedy on each search level and eliminating transposition branches in the tree. I also wanted to train parameters such as depth, epsilon and number of iteration per branch estimation.

This year's problem was a lot of fun. It was simple to start with and had as much complexity as you wanted to put in.

To see my writeups of previous years go to the icfp label page.

No comments: