I recently had to make a couple of exhaustive search, and i’m going to share the solution i’ve crafted really quickly. This is for my own future benefits later so that i don’t have to search my disk, but this could be useful and a good starting point for anyone attempting something similar.
First exhaustive search is a search into a set of data for one or multiple values that validate a specific proposition.
An algorithm to search exhaustively is quite easy:
1 2 3 4
A Parallel solution
Exhaustive search is usually a massively parallel problem: you can throw lots of processing unit by splitting the search space into smaller units. This is the main feature i want to focus on.
To complete this, i’m going to introduce 2 differents objects.
The first one is a ComputationUnit: Takes a data set to iterate, and test the proposition.
The second one is a Result type, that represent the value of a terminated ComputationUnit.
Last we need a ComputationManager: It orchestrates ComputationUnits, making sure ComputationUnit are spawned, and that in case a ComputationUnit find a candidate terminate the search.
Starting up we need traditional module definition and imports:
1 2 3
The result type is really simple, either your ComputationUnit terminates without finding any candidate (Terminated), or the ComputationUnit found a candidate that validate the proposition.
This is exactly similar to a Maybe type, but for code self-documentation I chose to create an ad-hoc type.
The Computation type is just a closure that takes a data set “a” and gives back a Result type. The IO monad is just there for maximum flexibility, but in many case, propositions are pure and doesn’t need any monad to run.
Now the only thing missing is the ComputationManager code, first the signature is self explanatory:
1 2 3 4
The manager is constantly going to wait for result from ComputationUnit. For communication, the manager is going to share a Chan with the computation units. It’s going to loop over each data sets by starting computation units and going to wait for result from the computation units. As soon as one computation get a Found result to the manager, the manager will abort all remaining threads and return the result.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Using the code with all your machine cpu cores is very easy and just requires you to compile the code using -rtsopt and -threaded.
ghc --make -O2 -threaded -rtsopts <module-name>
And, then you can just run the program:
./Computation +RTS -N<#cpu> -RTS
Have fun, and with any luck that will inspire someone to write an even better blog post, or a library.
posted byon April 1, 2012.