Hi,
I'm looking to build a system (hardware and software) to handle strings fast, without doing much math at all. Most of the tasks will be based on searching and statistics:
Searching: does this chunk of data include this particular string?
Where in this chunk of data does this string occur? (these slightly different functions are necessary for different tasks). Strings to search for may contain wildcards: the option of flagging a positive match regardless of certain bits: either a known number of bits ([sequence] [14 bits not to check] [more sequence]) or an unknown number, and perhaps some more complex rules -- but most of the matches will be literal.
Statistics: is there a correlation between the occurence of string a and the occurence of string b...
immediately before it
immediately after it
within a certain distance before/after
in a certain location in my input data chunk (e.g. 17th to 24th bytes)
anywhere within the same chunk of data
...etc
My aim is to implement all of this in an utterly astoundingly immensely parallel way: search an input binary data chunk (say 5k) for 500 million different test strings -- needless to say, the vast majority of them won't appear. In general, I want high processing throughput, but I don't have any mission-critical or time-critical applications going on. I can take the odd failure, crash, or error in my stride.
I don't know whether this is something that GPGPU would be good for. I can imagine being able to load the input data to a graphics card over PCIe and then take advantage of massive GPU memory bandwidth to scan it for a vast matrix of test strings stored in the GPU's own memory, thus doing some good filtering and not having to worry about the relatively low bandwidth between GPU and main memory. I've seen a couple of papers on GPU for SQL queries or searching network data streams for threats, but I know this application goes against the classic GPGPU dictum that arithmetic intensity should be high, so I don't know how much real world memory bandwidth I can expect to see. I also realize that a mid-range GPU with high bandwidth but low computational power might fit the bill.
Alternatively I could do the whole thing on CPUs, in which case I still don't know what system architecture is best for such a task. AMD or Intel? A large number of less powerful units, or a small number of powerful core i7s or such like? (The $1000 one is probably out of my price range, though!) A consideration is how much RAM I can stack onto it, as I'm guessing I get more memory for my dollar with cheaper CPUs and motherboards, even if memory per node is lower. I imagine I'll be using a cluster on a gigabit ethernet backbone, but I'm considering a heterogenous cluster -- for instance, if GPUs can give me a big speed up on the filtering, I might not need to have one on each node. In any case, I get the feeling that mass market solutions are much better value that high performance niche hardware (server equipment, Cell processors, etc).
In total, I don't want the cost of the system to go much over US$10,000, so it's a question of how much processing I can get for that money -- anything from maybe 5 big motherboards with core i7 processors and multiple graphics cards, through to a hundred little intel atom systems linked together (which sounds very cool to me, but is almost certainly not a serious option).
If I start developing now it'll still take a good few months or a year to get the software done, so if needs be I can probably wait for Fermi or Larrabee if they might present options for a major enhancement. As I have never programmed a GPU I'll need to learn that from scratch, but that's not a problem; I like learning, have lots of time on my hands, and tend to pick things up fast. A final thought: I live in a country where electricity is expensive, so a power-efficient system would be a big plus.
Huge thanks to anyone who's bothered to read this, particularly if you can give me any orientation.
I'm looking to build a system (hardware and software) to handle strings fast, without doing much math at all. Most of the tasks will be based on searching and statistics:
Searching: does this chunk of data include this particular string?
Where in this chunk of data does this string occur? (these slightly different functions are necessary for different tasks). Strings to search for may contain wildcards: the option of flagging a positive match regardless of certain bits: either a known number of bits ([sequence] [14 bits not to check] [more sequence]) or an unknown number, and perhaps some more complex rules -- but most of the matches will be literal.
Statistics: is there a correlation between the occurence of string a and the occurence of string b...
immediately before it
immediately after it
within a certain distance before/after
in a certain location in my input data chunk (e.g. 17th to 24th bytes)
anywhere within the same chunk of data
...etc
My aim is to implement all of this in an utterly astoundingly immensely parallel way: search an input binary data chunk (say 5k) for 500 million different test strings -- needless to say, the vast majority of them won't appear. In general, I want high processing throughput, but I don't have any mission-critical or time-critical applications going on. I can take the odd failure, crash, or error in my stride.
I don't know whether this is something that GPGPU would be good for. I can imagine being able to load the input data to a graphics card over PCIe and then take advantage of massive GPU memory bandwidth to scan it for a vast matrix of test strings stored in the GPU's own memory, thus doing some good filtering and not having to worry about the relatively low bandwidth between GPU and main memory. I've seen a couple of papers on GPU for SQL queries or searching network data streams for threats, but I know this application goes against the classic GPGPU dictum that arithmetic intensity should be high, so I don't know how much real world memory bandwidth I can expect to see. I also realize that a mid-range GPU with high bandwidth but low computational power might fit the bill.
Alternatively I could do the whole thing on CPUs, in which case I still don't know what system architecture is best for such a task. AMD or Intel? A large number of less powerful units, or a small number of powerful core i7s or such like? (The $1000 one is probably out of my price range, though!) A consideration is how much RAM I can stack onto it, as I'm guessing I get more memory for my dollar with cheaper CPUs and motherboards, even if memory per node is lower. I imagine I'll be using a cluster on a gigabit ethernet backbone, but I'm considering a heterogenous cluster -- for instance, if GPUs can give me a big speed up on the filtering, I might not need to have one on each node. In any case, I get the feeling that mass market solutions are much better value that high performance niche hardware (server equipment, Cell processors, etc).
In total, I don't want the cost of the system to go much over US$10,000, so it's a question of how much processing I can get for that money -- anything from maybe 5 big motherboards with core i7 processors and multiple graphics cards, through to a hundred little intel atom systems linked together (which sounds very cool to me, but is almost certainly not a serious option).
If I start developing now it'll still take a good few months or a year to get the software done, so if needs be I can probably wait for Fermi or Larrabee if they might present options for a major enhancement. As I have never programmed a GPU I'll need to learn that from scratch, but that's not a problem; I like learning, have lots of time on my hands, and tend to pick things up fast. A final thought: I live in a country where electricity is expensive, so a power-efficient system would be a big plus.
Huge thanks to anyone who's bothered to read this, particularly if you can give me any orientation.