Fast String Search on Cell


Might be interesting/useful for some, so I thought I'd post it here for reference :)

String searching is one of these basic algorithms. It has a host of applications, including search engines, network intrusion detection, virus scanners, spam filters, and DNA analysis, among others. The Cell processor, with its multiple cores, promises to speed-up string searching a lot.

In this article, we show how we mapped string searching efficiently on the Cell. We present two implementations:

* The fast implementation supports a small dictionary size (approximately 100 patterns) and provides a throughput of 40 Gbps, which is 100 times faster than reference implementations on x86 architectures.
* The heavy-duty implementation is slower (3.3-4.3 Gbps), but supports dictionaries with tens of thousands of strings.

This task is not trivial. We had to change our algorithm significantly to reach top performance. In particular, to exploit the memory subsystem at its best, we employ a pipelined parallelization strategy, and we shuffle the data layout to fight congestion—techniques that are unfamiliar to most programmers of traditional architectures. The source code that implements the techniques we present here is available electronically; see