A solution for bigbadbob

par Benoit, 2013-08-11, Version originale en français

When I participated to the DEFCON 21 Qualifier CTF, I spent the last third of the competition time on the fourth guerilla programming problem, named bigbadbob, without finding a good solution. Late, but with utter satisfaction, I present in this post an approach that would have certainly solved the problem.

I implemented this approach; the Python 2.7 code can be obtained by cloning this Git repository.

The problem

Let us recall the problem definition. Bob must conceive circuits. These circuits must carry a signal from a feed to antennae located on a discrete 2D Cartesian panel of finite dimensions. The task is to lay out signal lines, the paths, on the panel. Here are the rules:

A path is a sequence of lines for which endpoints correspond to spaces on the discrete panel.
There must be a path between the feed and each antenna.
A path must not be laid outside of the panel’s boundaries.
A path must not cross over the feed or an antenna, except as its endpoints.
A path must not cross over any space marked as null (which are known from the problem description).
Although a path can split into multiple other paths (forks are allowed), a path cannot cross over another one.

During the CTF, a sequence of such circuit building problems was downloaded from a server. One had to solve all problems of the sequence as fast as possible: after excessive computation delay, the connection was broken by the server. If one could solve all problems correctly, one was given a password that unlocked 4 points for one’s team.

Lessons learned during the CTF

A wrong approach

As mentioned above, during the CTF, I put up a solution that failed to unlock the password. Here is its high-level outline:

As long as all paths that satisfy the above conditions have been found:
1. Lay out a path from the feed to the antenna farthest from it.
2. For each other antenna a:
  1. For each path c already laid out:
    1. Try to find a path to a by forking path c. This also considers a fork starting on the feed space, which corresponds to laying out a new full path.
  2. If one has not found a path towards a, one eliminates the last path found, and try again. This backtracks one step of the solution.

This solution works well… for problems of small dimensions. Therefore, when we contacted the server, it was possible to solve the three or four first instances that were downloaded. However, we inevitably fell on a larger problem that would trigger computations of many minutes, which we often would interrupt ourselves. I worked feverishly to add and adjust various search heuristics, but without success.

Humans to the rescue

Desperate, we remarked that although the sequence of problems was different each time we contacted the server, may instances recurred. We thus stored these instances to solve them by hand, on the blackboard. We tabulated the solutions in the file so that when we received an instance of some known problem, we could yield the solution without recomputing it. Surprisingly, this strategy could get us to the eighth problem in the sequence – out of how many?

Taming the exponential complexity

While programming the solution outlined above, I already felt that it was possible to hit an important difficulty with this kind of problem: the size of the state space, to which the solution belongs, is an exponential function of the problem size. Specifically for this circuit conception problem, the number of paths of length n grows exponentially with n. This entails that solving the problem with a backtracking approach is equivalent to looking for an atomic-size needle in a haystack as large as the universe.

In such a situation, whether we reach a solution depends entirely on search heuristics: where we look for first, what information on the solution we inject in the search… These heuristics are not guaranteed properties with respect to the solution – they are thumb rules, approximations that are true more often than not. The heuristics implemented above are rather poor: the best information on the solution being looked for consists that path that compose the solution should stem from a fork of one or two basic paths that start at the feed. By supposing as little on the solution, we open ourselves to the exploration of a very large set of possibilities, thus the excessive computations, even on my rather powerful laptop (Mac Book Pro circa 2011, Intel Core i7, 16 GB RAM).

We can observe two things of the above fruitless approach:

The computation of a path between two points, even points that are far apart, is fast if we don’t constrain its length. It is even simple to introduce heuristics that force the computation of a path that is the shortest possible.
The computation of multiple paths of the same length L is fast if L is small. As mentioned above, the algorithm can solve the problem if the dimensions of the field is not too large. In this case, the paths going from the feed to the antennae are short, thus their computation does not exceed the calculation capacity of my computer.

This observations point the way to a better approach.

Divide to conquer

If the above solution is good enough to solve problems of small dimensions, it appears interesting to try and reduce a large-dimensions problem to another that is smaller. The observation of many instances of the problem acquired during the CTF suggest that often, the antennae are grouped in a small region R of the field, far from the feed. In this kind of situation, the above algorithm would attempt to compute multiple long paths, which takes too long. However, if the feed belonged to the small region that encompasses the antennae, we could ignore much of the rest of the field and the problem would become effectively small.

It seems thus possible to solve, by the above approach, the problem of joining all antennae to a meeting point M that would belong to R (where R is the set of spaces of the smallest rectangular region of the field that contains all antennae). This done, we only need to compute a path of arbitrary length between the source and M. In summary, instead of solving one big problem, we divide it into two smaller ones:

find paths of equal length going from M to each antennae;
find a path of arbitrary length going from the feed to M.

The previous two observations suggest that these two subproblems can be solved in good time. The only problematic case is that the solution to subproblem (1) may raise constraints that could make it impossible to solve subproblem (2). Fortunately, there are typically many solutions to subproblem (1): the solver will likely first find the solutions based on the shortest paths. If such a solution does not work, we would need to explore alternatives based on longer paths, up to a certain limit.

The new algorithm

For each point M of the field that is not an antenna:
1. Find a path from M to the antenna farthest from it.
2. For each other antenna a:
  1. For each path c already laid out:
    1. For each point p of path c:
      1. Find a path from p to a.
  2. If we have not found a suitable path to a, we discard the last path previously laid out, and resume the search. We can resume in step (1.1) if the very first path is thus discarded.
3. Lay out a path from the feed to M, taking care not to cross paths going from M to antennae. If this search fails, we resume the search to a solution to subproblem (1).

The heuristic implemented here is that a solutions consists of a single path starting from the source that forks in various points near the antennae. The division of the problem is an interpretation of this heuristic that reduces the size of the problems under effective consideration.

Optimization heuristics

In addition to this main heuristic, I implemented various supplemental heuristics that contribute to finding the solution faster.

The length of the paths from M to the antennae is constained, to avoid the solutions where, M being ill-positioned, excessively long paths are created from M to antennae, blocking any attempt to find the path from M to the feed. The length of this last path is unconstrained.
The shortest paths are considered first. They tend to be less cumbersome on the field. Technically, this kind of approach is called greedy.
When we’re looking for a path from a certain point to an objective on a field encumbered with various obstacles, we maintain in memory the set of points from which reaching the objective has been already tried and has failed. This memoization approach avoids generating a large set of long paths over and over that are guaranteed not to succeed. This failure map significantly reduces the number of pathfinding attempts for inadequate intermediate solutions.

Visualization

In order to assist the implementation of the algorithm, I implemented tools for visualizing the problem field as well as intermediate solutions. These tools are based on matplotlib. The figures below present an example of using these tools:

Field describing the problem: the feed is the red circle, the antennae are the green triangles and the gray square are the null spaces.

The final solution computed by the new algorithm. The cyan diamond is the meeting point M from which fork the paths towards each antenna.

The visualization tool can also be used during the computation of the solution, to observe the sequence and the order of explored states. Since I used it as a debugging tool, I simply commented out the line that refreshes the display at each step. Please note that I only used this visualization system with ipython, run in the pylab mode. I have no idea whether it works well with ordinary Python.

Conclusion

The project page (in French) presents the computation times of the new algorithm against each of the 81 distinct problems I was able to salvage from the CTF. Most of these solutions are found under 100 ms; in the worst case considered, the solution is found in less than 7 seconds. I am convinced that this kind of performance would have been sufficient to solve all problems of a sequence the get my team 4 more points.

It is quite late, but I am very proud. I was scared of this kind of artificial intelligence problem. However, I have learned a lot and I have had so much fun throughout this little piece of work – I won’t be afraid again. I keep that it is very important to think hard on heuristics to guide a backtracking search: the quality of these heuristics has a crucial impact on the runtime performance, to the point where exploring an otherwise exponential state space becomes doable within an acceptable timeframe.

Commentaires