london-boy said:blakjedi said:if the XeCPU and the Cell chips were clocked at exactly the same rate either both 3 GHz or both 4 GHZ which chip would be more powerful? my guess (uneducated at best) is that you could "more" with the XeCPU.
What makes you think that?
If there is one thing that's for sure, Cell will be hard to beat.
Anything could happen though. And it still has to be seen what each CPU will have to take care of.
blakjedi said:im no librarian so i cant cite all the MILLIONS of threads on this topic from just this site but... it seems that XeCPU can do "more different types" of work than Cell by virtue of of having more PPE's available.
Cell is better at streaming/FP because it has more SPEs which I consider to be less capable than but faster than a VMX unit. But those SPE's only have value when attached to a PPE.
Maybe I should go pull up the numbers in terms of "work" expected to be done by an SPE versus a VMX unit but.... in a very touchy feely-based-on-my-memory- kind of way I sense that at the same clock rate the XeCPU could do more - not faster than the Cell.
blakjedi said:im no librarian so i cant cite all the MILLIONS of threads on this topic from just this site but... it seems that XeCPU can do "more different types" of work than Cell by virtue of of having more PPE's available.
Titanio said:blakjedi said:im no librarian so i cant cite all the MILLIONS of threads on this topic from just this site but... it seems that XeCPU can do "more different types" of work than Cell by virtue of of having more PPE's available.
You can't simply say, Cell excels at X,Y,Z but a three PPEs are better for A,B,C,D,E, hence 3 PPEs > Cell. You have to look at how much time X,Y,Z take versus A,B,C,D,E - my impression is that STI have optimised for those things that generally take longest, and that the impact of other "types of work" will be relatively small compared to them.
Titanio said:Asides from all that, I don't think it matters. Cell has been optimised for use in a games console. When you're doing that kind of optimising, you look at the biggest bottlenecks. Cell will excel at those things which take longest per frame to compute - I've read that the biggest offenders include collision detection and physics, things which should mesh very nicely with Cell (I would think)*. The "more different types of work" probably aren't going to be taking up a hell of a lot of time per frame, anyway, so there's not much point in dedicating your silicon to them.
*I had read in Christer Ericson's "Real-time Collsion Detection" that collision detection can take over 30% of frametime..I imagine physics ain't a walk in the park either.
blakjedi said:I remember some folks from the board writing out chip vs chip performance peaks and rating the XeCPU between 90 -240 GFLOPS at 3 or 3.5 GHZ and the cell at 256 Gflops based on 4~4.6 GHZ.
blakjedi said:On another note it would hard for me to believe that MS with basically very extensive prior knowledge of the Cell project would not develop with Cell in mind.
blakjedi said:The "beefier" VMX that L-B talked about may just be a dual issue VMX (if it doesnt exist already). Dual issue VMX units would even things up a bit I think. The XeCPU would have the equivalent of 6 single issue SPEs and three dual issue PPE's. In all 9 Vector threads and 3 three integer threads simultaneously every clock cycle.
Gubbi said:Titanio said:blakjedi said:im no librarian so i cant cite all the MILLIONS of threads on this topic from just this site but... it seems that XeCPU can do "more different types" of work than Cell by virtue of of having more PPE's available.
You can't simply say, Cell excels at X,Y,Z but a three PPEs are better for A,B,C,D,E, hence 3 PPEs > Cell. You have to look at how much time X,Y,Z take versus A,B,C,D,E - my impression is that STI have optimised for those things that generally take longest, and that the impact of other "types of work" will be relatively small compared to them.
True, there'll be no apples to apples comparisons this generation.
Titanio said:Asides from all that, I don't think it matters. Cell has been optimised for use in a games console. When you're doing that kind of optimising, you look at the biggest bottlenecks. Cell will excel at those things which take longest per frame to compute - I've read that the biggest offenders include collision detection and physics, things which should mesh very nicely with Cell (I would think)*. The "more different types of work" probably aren't going to be taking up a hell of a lot of time per frame, anyway, so there's not much point in dedicating your silicon to them.
*I had read in Christer Ericson's "Real-time Collsion Detection" that collision detection can take over 30% of frametime..I imagine physics ain't a walk in the park either.
Collision detection is a part of physics I'd say.
The problem with collision detection is that you'll need a space decomposition structure to speed up queries. This is almost always some sort of tree (octree).
In a normal CPU you'd sort/bundle the objects for spatial locality. That way you get a fair amount of reuse of the octree nodes thanks to the caches of a general purpose CPU.
A SPE doesn't have automatically demand loaded caches, it has explicitly loaded local RAM. So if you have to do something to either up the reuse of data or to hide the latency for loading the nodes. This can be done in a variety of ways:
1.) Explicitly load the node needed from main memory when traversing the tree. Vertically thread you collision detection code to have many queries executing simutaneously and thereby hide the main memory latency.
2.) Have a software cache system. Instead of explicitly fetching each node from main memory, use a cache_load function or macro to implicitly load a node into a chunk of local RAM, first query to hit a node will load the node into the cache, subsequent queries that hit in the cache will get the locally cached node.
3.) Do something completely different than an octree.
1. Is going to be hard since main memory latency will be in the order of 40-50 ns (my guess), cycletime will be 0.26ns and hence you'd have to cover 160-200 cycles worth of latency (or 320-400 instructions).
2. Will induce latency since every memory reference will be through a software layer. A simple flat cache (1-way associative) will require a mask and a compare. In order to get a cached value you'd need to:
a) Mask the low bits of the address (a simple AND)
b) Load the address in the cache index (with the masked value)
c) Compare it to the requested address
d) If ok, return the cached value (a branch and subsequent load)
e) Otherwise start main memory fetch
A cache hit (a-d) would be adding 20-something cycles of latency to each load as compared to 3-4 cycles of latency of common L1 caches.
Having a multiple way cache is going to add cost since it requires multiple compares and branch mispredicts are expensive (19 cycles).
3. Since latency is the killer and CELL appears to have ample bandwidth you'd probably stuff multiple nodes into one supernode and trade off bandwidth for latency.
To sum it up: Collision is one area where a CPU with demand loaded caches will do better than a SPE, IMO.
The SPEs will do well on workloads with stream properties, including vector workloads. Have any kind of memory indirection and the overhead becomes staggering.
Cheers
Gubbi
Titanio said:Well, dual-issue is different from dual threads, IIRC. And in the vast majority of cases, I don't think it'd compensate for the lack of extra physical units. Dual-issue has its limitations, it's not like doubling your power. You can't say it'd be like having 6 single-issue cores.
While Cell,no matter what, looks a great CPU assume that is only built for PS3 is a dangerous assumption.
pc999 said:Most of you seems to think that PS3 is the reason of existence of cell, but it could not be.
pc999 said:Just to make one note.
Most of you seems to think that PS3 is the reason of existence of cell, but it could not be.
Cell should be very good to reanime the markets like TV (once that most of persons only buy a new TV because the old one is broken, and TVs with new features are too expensive to most of people), and these markets do have a lot more potencial costumers (=money) than PS3.
While Cell,no matter what, looks a great CPU assume that is only built for PS3 is a dangerous assumption.
BOOMEXPLODE said:Yeah you shouldn't think Cell is only for PS3, but don't assume it has unlimited applications either. Even a scaled down Cell is going to be relatively large and high in transistor count for most embedded applications, when ARM processors etc. can be bought for much less. I really wonder how Sony will ever make back their investment in Cell. I wouldn't be surprised if it blows away the XeCPU though.
Titanio said:pc999 said:Most of you seems to think that PS3 is the reason of existence of cell, but it could not be.
PS3 is the driving force behind Cell. Without a new Playstation, work on Cell would never have started.
I'd be willing to bet that PS3 will be Cell's most popular application.
Titanio said:blakjedi said:The "beefier" VMX that L-B talked about may just be a dual issue VMX (if it doesnt exist already). Dual issue VMX units would even things up a bit I think. The XeCPU would have the equivalent of 6 single issue SPEs and three dual issue PPE's. In all 9 Vector threads and 3 three integer threads simultaneously every clock cycle.
Well, dual-issue is different from dual threads, IIRC. And in the vast majority of cases, I don't think it'd compensate for the lack of extra physical units. Dual-issue has its limitations, it's not like doubling your power. You can't say it'd be like having 6 single-issue cores.
Pozer said:Seriously, maybe a possibility of Cell ending up in future Apples is the murmers I've been hearing and Sony getting out of the clone business.
blakjedi said:Based on this thread...
http://www.beyond3d.com/forum/viewtopic.php?t=22250&start=100
If the XECPU has six VMX units (2 per PPE) then what? or what if it turns out that each core is actually dual cored meaning six physical cores on die... anything is possible i guess
still only 144 GFlops rating on the CPU?