mistan said:Ok, I have gone long enough without asking this.
Version, just who exactly are you?
I've been reading on here and PsiNext and you've been throwing out hints here and there and all over the place.(Especially at PsiNext)
I just wanted to know who you were.
version said:and you hear vector,matrix, or matrix,matrix multiply?
mistan said:Ok, I have gone long enough without asking this.
Version, just who exactly are you?
I've been reading on here and PsiNext and you've been throwing out hints here and there and all over the place.(Especially at PsiNext)
I just wanted to know who you were.
version said:compute again after TGS with 4 ghz
aaronspink said:Um, the critical path in both the XeCPU and the Sony CELL is most likely going to be in the PPE, what makes you think that the same design in the same process is going to yield high enough to get an additional 25% in clock frequency?
In other words, keep dreaming. Both sony and MS are using the same design for all intents and purposes and the likely hood is that the XeCPU has more frequency headroom than CELL.
Aaron Spink
speaking for myself inc.
xbdestroya said:Riiiight, except that the Cell has reached in excess of 5 GHz in tests thus far and to our knowledge the XeCPU never has. Not to mention that the XeCPU throws out a lot more heat than the Cell, even when both are at 3.2 GHz. I don't think that Cell *will* see a clockspeed increase in PS3, but I certainly feel that it would more likely see one than the XeCPU. I see the PPE's as the greater clockspeed bottlenecks - the SPE's should clock quite readily.
cho said:The CELL GFLOPS number of PS3 should be:
PPE: 3.2 GHz*1 VMX*4D VMX FMAC + 3.2 GHz *1 FPU*2D Paired-single FMAC
=3.2 GHz *1*8 FLOPs + 3.2 GHz*1*3 FLOPs
=25.6 GFLOPS + 12.8 GFLOPS
SPE: 3.2 GHz*7 SPEs*4D SPE FMAC
=3.2 GHz*7*8 FLOPs
=179.2 GFLOPS
PPE + SPEs = 25.6 + 12.8 + 179.2 = 217.6 GFLOPS
Oh come on - for one calling anything that assumes code composed of 100% MADDs "sustained" is more then a little silly.aaron said:So sustained is actually: 204.8 GFLOPs
Well people have been advertising PPX cores as being more complex since day one, and with 3 of them wouldn't that make XeCPU less likely to scale - I mean, if like you say, it's the PPC cores that limit the clock speed most.In other words, keep dreaming. Both sony and MS are using the same design for all intents and purposes and the likely hood is that the XeCPU has more frequency headroom than CELL.
ISA support to eliminate branches
The SPU ISA defines compare instructions to set masks that can be used in three
operand select instructions to create efficient conditional assignments. Such conditional
assignments can be used to avoid difficult-to-predict branches.
1.7. Programmer Directed Branch Prediction
Branch prediction can be significantly improved by using feedback-directed optimization. However, feedbackdirected
optimization is not always practical in situations where typical data sets do not exist. Instead, programmerdirected
branch prediction is provided using an enhanced version of GCC’s __builtin_expect function.
int __builtin_expect(int exp, int value)
Programmers can use _builtin_expect to provide the compiler with branch prediction information. The return
value of __builtin_expect is the value of the exp argument, which must be an integral expression. For dynamic
prediction, the value argument can be either a compile-time constant or a variable. The __builtin_expect
function assumes that exp equals value.
can you be a little more speciffic please?Draikin said:killzone ai presentation so much effort for nothing.
Draikin said:"Killzone's AI: Dynamic Procedural Tactics"
The result: everyone think Killzone suffers from bad AI.
aaronspink said:Looking at shmoo plots from ISSCC and predicting yieldable frequencies is pretty pointless. Pentium 4s have reached well beyond 5 Ghz, yet you don't see Intel selling them. Likewise, K8s have reached well in excess of 3 Ghz, yet you don't see AMD selling them.
A shmoo plot is interesting in a technical sense but you have to be carefull to examine the practical aspects of the ranges used to generate the shmoo plot.
The heat of both XeCPU and CELL are fairly indeterminate but if you want to believe that XeCPU throws out a lot more heat than CELL then be my guest, but you'll be doing it on jack all for data.
Look, the PPE design, which you assume has the greater clockspeed bottlenecks is pretty much the exact same design between X360 and PS3 for the most part down to the polygon level. On a given process they are both going to be within spitting distance of each other in frequency.
Aaron Spink
speaking for myself inc.
I have read that the Cell processor was designed in part to run an RTOS -- I guess that's obvious, given its gaming focus. What other embedded processors are interesting right now?
Paul: ARM is heavily used for embedded and Linux runs on both 64-bit and 32-bit ARM. There are even SMP ARM parts out there.
It really blew my mind when I first saw that -- a four-core, single-chip ARM, running at 350 and 550MHz, providing 1,440 Dhrystone MIPS, all on 600 milliwatts of power.
Of course, compare that to our PowerPC processor for Xbox which does 700 times as many floating-point operations per second as the four-core ARM does integer operations per second. It has only three cores instead of four, but it does use quite a bit more power. Still, 85 watts is well within range for a consumer device and not that long ago you couldn't buy a supercomputer that could do what PowerPC can now do, regardless of how much power you had available.
But again, the important question is "what does your application need?" If you're running off a battery, the ARM processor we just talked about is high power. You get only a few minutes of that kind of power from a D cell. On the other hand, if you have a wall outlet, 85 watts is trivial -- less than an amp.
Fafalada said:Oh come on - for one calling anything that assumes code composed of 100% MADDs "sustained" is more then a little silly.
And second, if we DO write an idealized benchmark that will get that kind of utilization, I can do a LOT better then 1:1 ratio with non-arithmetic ops. So it would definately be higher then 25.6GFlop if we could truly dual-issue VMX MADDs.