Desktop PC Floating Point Accelerator

Acert93

Artist formerly known as Acert93
Legend
This is only slightly console related in the respect of what it may hold for the future competitive landscape of chip design. More of an interesting read and yet an alternative approach to the issues traditional x86 processors (like the Xbox 1 CPU) face.

http://www.tomshardware.com/hardnews/20050620_114721.html

Yet another sign that the good old workstation may not be extinct: Clearspeed will demonstrate on Tuesday a co-processor PCI Express add-in card that has promises a floating point performance to 50 GFlops - about 10x the performance of a regular desktop PC.

It is interesting how the PC evolves. The last 2 years, ever since Intel hit the wall (where is my 10GHz Netburst!) and the announcement of alternative designs like CELL it seems hardware designers have been looking to new solutions. The PPU was one. This is another. I find it interesting on their roadmap for the future they mentioned that in the future multicore processors would have non-symetric cores for specialized tasks. I wonder how far away Intel is from reaching that goal?

Obviously I am interested in how this pans out with CELL. Do companies stick with their current PCs and software and just plug in a "Floating Point accelerator" or do they make an investment to move on to a new platform like CELL?

Clearspeed, which has announced co-processor chips for desktop PCs in 2003 and 2004, will announce on Tuesday that it prepares to ship its first production-ready product. The company claims that new chip, named CSX600, is the world's fastest 64-bit floating point processor, delivering a sustained performance of 25 GFlops. CSX600 boards will integrate two processors delivering 50 GFlops. Since the boards follow the single-slot PCI Express standard, there only limitation of the number of boards in a system is determined by the number of available slots. Clearspeed said that two cards, for example, will deliver 100 GFlops.

If I am reading this correctly the board has 2 chips, both capable of a total of 50GLFOPs of sustained performance (not theoretical). I wonder what the theoretical performance is? I also wonder what kind of architecture it has.

What makes Clearspeed's card attractive is the fact that it can be installed in an existing computer within minutes and immediately can result in added performance for 32-bit and 64-bit systems. Speed increases however are purely limited to floating point operations and mainly address traditional workstation environments - such as scientific applications in the biological or network simulation segment. According to Clearspeed, enthusiasts can also take advantage of the added performance, especially with professional audio and precision rendering software. Per card, such applications can gain about 5x to 10x in speed, the company said.

The CSX600's real estate consists about half of logic and half of memory. The 128-million-transistor chip is built in a 130 nm process and integrates 96 cores with a clock speed of 250 MHz. The low clock speed allows keeping power consumption down at about 10 watts per chip and 25 watts per board, according to Clearspeed. CSX600 boards also offer two banks for DDR2 memory per processor - totaling in 4 GByte of DDR2 per board. On-chip memory bandwidth is rated at 200 GByte per second.

To compare, the PPU has 125M transistors, built on the 130nm process, and I believe the chip is @ 500MHz and consumes 25W.

As a console and PC gamer the PPU and the Floating Point Accelerator are interesting because they are different perspectives on the issues hardware makers are facing and may give a glimpse of the competitive landscape of the future.

Anyhow, I thought this was interesting and worth the read. Only slightly console related, but good info to know.
 
PC-Engine was talking this co-processor up a while ago, and the consensus at that time was that it's capable of great performance, but it's range of applications is pretty limited. Forget what thread that was or I'd pull it up.

The Intel chips you're refering to are on their roadmaps for 2010 I think. I'll go find that one since I know where to look.

EDIT: Here's the thread with the Clearspeed processor mentioned: Thread

and here's the link to some Intel comparisons: Anandtech

I'll include the relevent images as well

evolution.jpg


vision.jpg
 
totaling in 4 GByte of DDR2 per board. On-chip memory bandwidth is rated at 200 GByte per second

Well, it will not be cheap. ;) , so consoles should stay very competetive.

I think this will only sell in workstation, FP only and "only" (not for PC) 25-50, is to low, I think that VMX units are far better, it give the 25 GF, but it cost a lot less.

Edit:
then such boards may become available in the
low four-figure range
.
 
pc999 said:
totaling in 4 GByte of DDR2 per board. On-chip memory bandwidth is rated at 200 GByte per second

Well, it will not be cheap. ;) , so consoles should stay very competetive.

The onchip bandwidth sounds like cache or maybe some spin on eDRAM. The boards will be expensive because of their small market appeal.

I think this will only sell in workstation, FP only and "only" (not for PC) 25-50, is to low, I think that VMX units are far better, it give the 25 GF, but it cost a lot less.

Do they give 25GFLOPs of sustained real world performance? There is a big difference between theoretical peak performance.

In a market like rendering performance in a small area can be important; frequnetly time > cost. So if something like this could double or tripple the performance of your render farm while using the exact same software and systems it could be very viable. Workstation parts are NOT cheap. Basically workstation consumers buy basically the same GPUs as a PC gamer (well, on many cases at least) yet pay through the nose because of the specialized drivers and whatnot. Just look at the fact a mainstream consumer is paying $400 for the same product a workstation gets for $1500-$2000. Prices have come down some since ATI and NV have become more competitive with eachother in the market but it still aint cheap. So $4k may, or may not, be outpricing itself. I do not know enough about the demand and needs renderhouses and small scientific research houses are facing right now and whether or not a farm of cheap PCs is a better solution overall.
 
Yes, I agree with you.

Just dont see how it will against consoles that can do 4X more floating point performance, and a lot more of things at time, but this dont mean that consoles can do what this do, like... --->
Although this may (probably is...science and such) be doble precision FP, in this case, this is a beast, but it is not related to consoles, at all.
 
pc999 said:
Yes, I agree with you.

Just dont see how it will against consoles that can do 4X more floating point performance, and a lot more of things at time, but this dont mean that consoles can do what this do, like... --->

That is the catch. No console is performing 4x sustained real world Floating Point. The XeCPU and CELL are all theoretical.

Although this may (probably is...science and such) be doble precision FP, in this case, this is a beast, but it is not related to consoles, at all.

Yes, it is not directly related to consoles now. But come next gen it may be very valid. If a company like Intel or IBM picks up similar designs and puts them on a better process (less power, more frequency) and integrates it into chips used for consoles it would be an alternative in some ways.

I just thought it was interesting and it was in todays news so I posted it :D
 
This is a vector processor. "Sustained" performance of 25 GFLOPS per processor will only be attainable with certain applications--very much the same ones that Cell will be very good at eating up.

Isn't cell capable of ~26 GFLOPS double-precision? How many bits of precision is that?
 
phat said:
This is a vector processor. "Sustained" performance of 25 GFLOPS per processor will only be attainable with certain applications--very much the same ones that Cell will be very good at eating up.

Yes, it is of limited use. Actually it is probably a lot more limited than the CELL SPEs as this is just a coprocessor. They will both be good at FP, but SPEs can be used for other stuff as well.

Isn't cell capable of ~26 GFLOPS double-precision? How many bits of precision is that?

I believe that single precision is 32bit and double is 64bit but I am sure someone can correct me if I am wrong (because I am not 100% certain).

Does anyone have an idea how long it will be before we can see how CELL performs in the realworld? We know on paper it has pretty good performance, but seeing what its sustainable performance is in the realworld will give a good idea of where the market is.
 
Acert93 said:
ertain).

Does anyone have an idea how long it will be before we can see how CELL performs in the realworld? We know on paper it has pretty good performance, but seeing what its sustainable performance is in the realworld will give a good idea of where the market is.

Well if the IBM FFT test is real-world enough for you, than it can get pretty good. That's a very ideal environment however. Still, it shows that it can approach theoretical.
 
Ok CPU architecture is pretty far out of my realm but couldnt this type of core or process be used in a "in conjunction with" very much like the old days of the 486 with a math coprocessor? (Hope I have the generation correct) :oops:

Maybe to add to the power of current cpu's in the way of on die or logic enhancement for fp power? Instead of using our current dual core processors, One core current gen CPU and the other using clearspeed (or PPU?) sharing same cache but performing as dual core using the perspective CPU to their strength? (Did that make any sense?)
 
They suggest it works out of the box. I'd guess it takes FP instructions away from the CPU for this, but to work at it's optimum would need to addressed specifically.

What gets me is the claim of 200 GB/s off DDR2 RAM! How'd they get that much speed? What'll it cost? Why's it not in next-gen console? :devilish:
 
Shifty Geezer said:
What gets me is the claim of 200 GB/s off DDR2 RAM! How'd they get that much speed? What'll it cost? Why's it not in next-gen console? :devilish:

I think the 200GB/s is for the SRAM.
 
Shifty Geezer said:
They suggest it works out of the box. I'd guess it takes FP instructions away from the CPU for this
I don't think that's possible. It hangs off of the PCIe bus, to run program code it would have to directly interface with the CPU. Besides, I bet its internal structure is not x87-compatible (very weird FPU compared to modern chips).

What gets me is the claim of 200 GB/s off DDR2 RAM!
No, that figure is claimed for ON-CHIP memory! :) They didn't state off-chip memory speed in the text that was quoted in this thread, only type (DDR2) and size (4GB).
 
Oops :oops: Yes, DDR speed isn't specified. This is one of those spurious 'local store' bandwidth figures. Looks set to be a new trend :(
 
IT sounds like you can fill up your pci slots with these .


Perhaps a dual dual core athlon 64 set up with 4 of these boards . Would be great for a rendering farm
 
The chip most likely has eDRAM. Clear Speed is the same company that was making very flexible PixelFuzion 150 chip. PF 150 had 3Mb of eDRAM divided to several ALUs. I think they have devloped the same system even further.

and 200GB/s from eDRAM isn't really big deal with nowadays eDRAM lines... with clock speed of 500MHz, you need 3200 bits wide datapath. )

(3200 bits *500MHz = 200GB/s)


and why I am interested about this? well, during my active days, there was some info floating about PixelFusion / Clear Speed having some connections with 'Boys projects back then, but never got any confirmation on it...
 
Back
Top