`bigbadbob`

When I participated to the DEFCON 21 Qualifier
CTF, I spent the last third of the
competition time on the fourth guerilla programming problem, named
`bigbadbob`

, without finding a good solution. Late, but with utter
satisfaction, I present in this post an approach that would have certainly
solved the problem.

I implemented this approach; the Python 2.7 code can be obtained by cloning this Git repository.

# The problem

Let us recall the problem definition. Bob must conceive *circuits*. These
circuits must carry a signal from a *feed* to *antennae* located on a
discrete 2D Cartesian panel of finite dimensions. The task is to lay out
signal lines, the *paths*, on the panel. Here are the rules:

- A path is a sequence of lines for which endpoints correspond to spaces on the discrete panel.
- There must be a path between the feed and each antenna.
- A path must not be laid outside of the panel’s boundaries.
- A path must not cross over the feed or an antenna, except as its endpoints.
- A path must not cross over any space marked as
*null*(which are known from the problem description). - Although a path can split into multiple other paths (forks are allowed), a path cannot cross over another one.

During the CTF, a sequence of such circuit building problems was downloaded from a server. One had to solve all problems of the sequence as fast as possible: after excessive computation delay, the connection was broken by the server. If one could solve all problems correctly, one was given a password that unlocked 4 points for one’s team.

# Lessons learned during the CTF

## A wrong approach

As mentioned above, during the CTF, I put up a solution that failed to unlock the password. Here is its high-level outline:

- As long as all paths that satisfy the above conditions have been found:
- Lay out a path from the feed to the antenna farthest from it.
- For each other antenna
*a*:- For each path
*c*already laid out:- Try to find a path to
*a*by forking path*c*. This also considers a fork starting on the feed space, which corresponds to laying out a new full path.

- Try to find a path to
- If one has not found a path towards
*a*, one eliminates the last path found, and try again. This backtracks one step of the solution.

- For each path

This solution works well… for problems of small dimensions. Therefore, when we contacted the server, it was possible to solve the three or four first instances that were downloaded. However, we inevitably fell on a larger problem that would trigger computations of many minutes, which we often would interrupt ourselves. I worked feverishly to add and adjust various search heuristics, but without success.

### Humans to the rescue

Desperate, we remarked that although the sequence of problems was different
each time we contacted the server, may instances recurred. We thus stored
these instances to solve them *by hand*, on the blackboard. We tabulated the
solutions in the file so that when we received an instance of some known
problem, we could yield the solution without recomputing it. Surprisingly,
this strategy could get us to the eighth problem in the sequence – out of how
many?

## Taming the exponential complexity

While programming the solution outlined above, I already felt that it was
possible to hit an important difficulty with this kind of problem: the size of
the *state space*, to which the solution belongs, is an exponential function
of the problem size. Specifically for this circuit conception problem, the
number of paths of length *n* grows exponentially with *n*. This entails that
solving the problem with a backtracking approach is equivalent to
looking for an atomic-size needle in a haystack as large as the universe.

In such a situation, whether we reach a solution depends entirely on *search
heuristics*: where we look for first, what information on the solution we
inject in the search… These heuristics are not guaranteed properties with
respect to the solution – they are thumb rules, approximations that are true
more often than not. The heuristics implemented above are rather poor: the
best information on the solution being looked for consists that path that
compose the solution should stem from a fork of one or two basic paths that
start at the feed. By supposing as little on the solution, we open ourselves
to the exploration of a very large set of possibilities, thus the excessive
computations, even on my rather powerful laptop (Mac Book Pro circa 2011,
Intel Core i7, 16 GB RAM).

We can observe two things of the above fruitless approach:

- The computation of a path between two points, even points that are far apart, is fast if we don’t constrain its length. It is even simple to introduce heuristics that force the computation of a path that is the shortest possible.
- The computation of multiple paths of the same length
*L*is fast if*L*is small. As mentioned above, the algorithm can solve the problem if the dimensions of the field is not too large. In this case, the paths going from the feed to the antennae are short, thus their computation does not exceed the calculation capacity of my computer.

This observations point the way to a better approach.

# Divide to conquer

If the above solution is good enough to solve problems of small dimensions, it
appears interesting to try and *reduce* a large-dimensions problem to another
that is smaller. The observation of many instances of the problem acquired
during the CTF suggest that often, the antennae are grouped in a small region
*R* of the field, far from the feed. In this kind of situation, the above
algorithm would attempt to compute multiple long paths, which takes too long.
However, if the feed belonged to the small region that encompasses the
antennae, we could ignore much of the rest of the field and the problem would
become effectively small.

It seems thus possible to solve, by the above approach, the problem of joining
all antennae to a *meeting point* *M* that would belong to *R* (where *R* is
the set of spaces of the smallest rectangular region of the field that
contains all antennae). This done, we only need to compute a path of arbitrary
length between the source and *M*. In summary, instead of solving one big
problem, we divide it into two smaller ones:

- find paths of equal length going from
*M*to each antennae; - find a path of arbitrary length going from the feed to
*M*.

The previous two observations suggest that these two subproblems can be solved in good time. The only problematic case is that the solution to subproblem (1) may raise constraints that could make it impossible to solve subproblem (2). Fortunately, there are typically many solutions to subproblem (1): the solver will likely first find the solutions based on the shortest paths. If such a solution does not work, we would need to explore alternatives based on longer paths, up to a certain limit.

## The new algorithm

- For each point
*M*of the field that is not an antenna:- Find a path from
*M*to the antenna farthest from it. - For each other antenna
*a*:- For each path
*c*already laid out:- For each point
*p*of path*c*:- Find a path from
*p*to*a*.

- Find a path from

- For each point
- If we have not found a suitable path to
*a*, we discard the last path previously laid out, and resume the search. We can resume in step (1.1) if the very first path is thus discarded.

- For each path
- Lay out a path from the feed to
*M*, taking care not to cross paths going from*M*to antennae. If this search fails, we resume the search to a solution to subproblem (1).

- Find a path from

The heuristic implemented here is that a solutions consists of a single path starting from the source that forks in various points near the antennae. The division of the problem is an interpretation of this heuristic that reduces the size of the problems under effective consideration.

## Optimization heuristics

In addition to this main heuristic, I implemented various supplemental heuristics that contribute to finding the solution faster.

- The length of the paths from
*M*to the antennae is constained, to avoid the solutions where,*M*being ill-positioned, excessively long paths are created from*M*to antennae, blocking any attempt to find the path from*M*to the feed. The length of this last path is unconstrained. - The shortest paths are considered first. They tend to be less cumbersome on
the field. Technically, this kind of approach is called
*greedy*. - When we’re looking for a path from a certain point to an objective on a
field encumbered with various obstacles, we maintain in memory the set of
points from which reaching the objective has been already tried and has
failed. This memoization approach avoids generating a large set of long
paths over and over that are guaranteed not to succeed. This
*failure map*significantly reduces the number of pathfinding attempts for inadequate intermediate solutions.

## Visualization

In order to assist the implementation of the algorithm, I implemented tools
for visualizing the problem field as well as intermediate solutions. These
tools are based on `matplotlib`

. The figures below
present an example of using these tools:

The visualization tool can also be used during the computation of the
solution, to observe the sequence and the order of explored states. Since I
used it as a debugging tool, I simply commented out the line that refreshes
the display at each step. Please note that I only used this visualization
system with ipython, run in the `pylab`

mode. I have no
idea whether it works well with ordinary Python.

# Conclusion

The project page (in French) presents the computation times of the new algorithm against each of the 81 distinct problems I was able to salvage from the CTF. Most of these solutions are found under 100 ms; in the worst case considered, the solution is found in less than 7 seconds. I am convinced that this kind of performance would have been sufficient to solve all problems of a sequence the get my team 4 more points.

It is quite late, but I am very proud. I was scared of
this kind of artificial intelligence problem. However, I have learned *a lot*
and I have had so much fun throughout this little piece of work – I won’t be
afraid again. I keep that it is very important to think hard on heuristics to
guide a backtracking search: the quality of these heuristics has a crucial
impact on the runtime performance, to the point where exploring an otherwise
exponential state space becomes doable within an acceptable timeframe.