Will Microsoft trump CELL by using Proximity Communication in XB720 CPU?

It's worth pointing out that HL2 actually calculates it's physics on a relatively small number of objects at once. It's the way the HL2 uses physics that's so cool, rather than the amount of physics work it's doing. That's where we get in to the real debate IMO, which is about how Cell (or anything else) is going to change the face of gaming..


Yeah, that's no point at all. Even the 8 year old PS2 runs Havoc in games! It just depends on the amount of objects that are involved.

I for myself consider Motorstorm as the current benchmark for ingame physics and effects. And i've yet to see any game on 360 or PC that comes even close to this combination of computations as shown in that game (that is 14 high detailed cars in game, loads of wrecks or buildings you can drive through (as every other of those 14 cars) and crash in with correct physics + real time mudd with loads of muddy shader effects on it everything at a perfect framerate including AF and very good textures ). Plus this is only a launch game!
 
Gubbi, would you be able to elaborate on PA Semi's OoO PPC design points for the uninitiated (like me !) ? I'm interested to learn more.

Sure, head to their download section and read through their white papers.

Summary: A bunch of CPU architect veterans designed a 3-way OoO core with focus on power consumption while maintaining a high level of performance. Their core burns only 7W @ 2GHz in a 65nm process. The core runs up to 2.5GHz. Their core is only 10mm^2 discounting level 1 caches, but even if we more than double their size for a speculative 90nm implementation it would still be smaller than current XCPUs.

PA's Semi primary objective is to displace Freescales offerings in the networking gear market.

IMO, this would be an ideal building block for the next XBOX CPU. All future CPUs will be limited by power envelope, so starting out with a small, fast and low power core seems like a good starting point. The fact that it doesn't require one to jump through hoops to program is a bonus.

Cheers
 
That said, Intel did produce that Ice demo thingy which I personally didn't find that impressive but on a technical level, how does it compare?
What demo is that? If you have impressive looking things on PCs, please link to them, as I haven't seen much there!

BTW, there were some interesting points in those slides to note...
We lack context of most of the dicussion. All we can be sure of is what we saw on the screen and the numbers provided. 95% of one SPE was given as the rigid body dynamic.

Based on the seperation of work shown in the slides the PS3 Cell could only handle twice as many ducks since 5 SPE's are already being used for other things (including the one reserved for the OS). So there is only one spare.
Take out the cloth and fluid dynamics and just have ducks. Spend all the SPEs on ducks. You could manage a whole lot more ducks!
Besides, the interaction of the ducks with each other is the least impressive thing about that demo IMO, as I said there are "interacting blocks" demo's on the PC aswell with at least as many units. The impressive thing to me was the water.
Okay, I think you're missing some key info. Blocks are easy to process. They're quick. Second only to spheres, you can use boxes and have simple solvers. The moment you move into mesh collisions, things become a hell of lot harder. A box has 8 vertices you need to check with other boxed for collisions. Comparing meshes, you have hundreds of points per rigid body, and you need to check that with hundreds of points on other rigid bodies it's collidiing with. Thus with one a demo of 100 boxes colliding and 100 ducks, the 100 ducks is probably an order of magnitude harder to do.

Now it's not clear that the LOD demo was using mesh collisions. They could have used multiple spheres to define a collision volume. However, looking closely at the demo, I notice no mesh penetrations at all. I believe it's using mesh collision. The use of the same mesh for every body fits in nicely with Cell, because the mesh will fit into LS elliminating slow main RAM access, which would explain a lot.

I can't categorically state a dual-core x84 can't handle this. I've no idea how well it's cache would manage the data structures. It actually wouldn't be a hard test to create, but I don't have a dual core CPU or Novodex/Havok to create a test myself!

I point to the CryEngine 2 gameplay demo because its actually in a game so its relevance is clear.
A hundred rocks flying through the air is a doddle to solve. A hundred rocks piled high isn't. It's the chained collisions that realy push a physics solver, and Crysis doesn't have a huge amount of that. Of course, it's running a game to. If they managed LOD type physics along with the rest of the game, that end any arguments of Cell's superiority right there!

Obviously Cell has a long way to go and we may yet see the evidence that im speaking of but all im saying is that so far, to me at least there is nothing out there that conclusvely proves Cell is well beyond a powerful multicore x86 as a CPU in a games machine.
You'll probably never see that. As Joshua pointde explained in one of these threads, if one processor isn't as good as another, you just scale down. If Cell can handle cloth 4x faster than a P4, you only need to quarter the resolution of your cloths and the P4 runs the same speed. The game will look as good until you really start analysing it. Likewise if the Cell LOD demo runs with mesh collisions and a dual core P4 can't do that, you could switch to a clever layout of sphere collision objects and get the same impression from less work. So actually looking for better performance might yield very little insight. Physics and animation are the tied to the same problems of diminishing returns as everything else. At some point, a 4x better physics processor won't actually appear to be 4x better.

I actually hadn't seen that demonstration but it does sound like the kind of evidence im looking for, do you have a link? Assuming we are talking about relatively equal levels of optimisation and multithreading then that certainly does sound like compelling evidence.
http://forum.beyond3d.com/showthread.php?t=24061&highlight=alias+wavefront+maya+cell+cloth

Also here's the thread I remember about PhysX on Cell. Somewhere along the line Aegia said Cell would handle PhysX almost as well (in the same ball-park anyway) as their PhysX processor.

Another example is the eveloper who mentioned not that long ago about a fluid dynamic engine they could create on Cell that'll be a download title. Can't remember the details or find the B3D talk on it though.

There's lot of reason to think Cell can manage impressive things. Whether they're things that'll matter is another debate. And tying this in with the original discussion (!) will MS want a Cell-type CPU or a multicore x86? It depends a lot on what happens on PS3, because there we have Cell with lots of potential. If that potential makes a difference to the games, MS will want a fast vector-throughput processor. If it doesn't make much difference, MS can be content to go with the simpler multicore design and make XBox developers lives easier (in Sony's case, switching to something like multicore x86 would make their devs' lives harder as they'll be used to Cell rather than x86)
 
BTW, there were some interesting points in those slides to note. Slide 29 shows a demo that I was running on my XP years ago, obviously I don't know the context of its involvement in that presentation but its worth pointing out. Slide 44 shows a comparison of PPE to SPE performance with 4 SPE's only being about 3 times faster in the best case. Again, we don't know context and I would be suprised if far bigger gains can't be seen in other situations but given how the PPE compares to a modern x86 core, thats not particularly impressive. Also slide 47 states that there were early PC prototypes of some pieces of the of the duck demo. Who knows what performance was like or whether they were using one, or multiple cores but it does suggest thats its not completely outside the realms of possibility for an x86.

In slides 44, it is unclear (to me) if the benchmark exercises PPU's VMX or the scalar units (I don't know enough). If it's the former, the PPU's unoptimized scalar performance numbers should not come into play. The 4 SPU tests may run on 4 SIMD engines, so on and so forth.

Also early prototypes running on x86 says nothing about the scope of the early prototype. Might be an apple to orange comparison.

I would like to make it clear at this point though that im not trying to claim Cell wouldn't be better than a dual core in physics. I think it probably is, my argument is against this whole "order of magnatude" or more difference between x86 (almost regardless of how many cores are involved) and Cell. I think over the course of PS3's life we will see little, if anything that couldn't be done the same, or very similarly on a powerful, multi-core x86 based CPU. Whether the number of cores is 2, 4, or 8 though, im not quite sure yet.

Perhaps, although I question the motivation for comparing an 8-core server class CPU with a console CPU. If it's an "open-ended" comparison, we all know that if thrown enough money, time and supporting gears, some x86 CPU can compare or outrun the Cell implementation in PS3 in various scenarios. If not, you can always bang on a future revision. But there will be other Cell variants for direct comparison with these CPUs. I don't know where we will stop.

For now, Cell needs to run cool and cheap in a small package. PS3 will likely shrink smaller (like PS2 Slim, probably not as small), runs cooler and cost less moving forward.

Obviously Cell has a long way to go and we may yet see the evidence that im speaking of but all im saying is that so far, to me at least there is nothing out there that conclusvely proves Cell is well beyond a powerful multicore x86 as a CPU in a games machine.

Define game machine to put things in perspective. There are other issues that render PC gaming less attractive for me (like price, noise, size, ... besides game library).

A powerful multicore x86 may be irrelevant in this context unless they can run within the cost, space and heat envelope a console demands today, and still maintain the same performance specs on their marketing collaterals.

Beyond this, who knows what innovation will get rev'ed into future Cells just like future x86 revisions ?

Conversely, isn't it impressive that the very same console CPU implementation can run intensive work applications better than peers, and occassionally the world fastest supercomputer ?

But alas yes, we have not seen the best of Cell in gaming yet. It's only been 4 months since release., and they are still at a more complete SDK 1.6 in March 2007 ? It certainly test one's patience.
 
Last edited by a moderator:
Perhaps, although I question the motivation for comparing an 8-core server class CPU with a console CPU. If it's an "open-ended" comparison, we all know that if thrown enough money, time and supporting gears, some x86 CPU can compare or outrun the Cell implementation in PS3 in various scenarios. If not, you can always bang on a future revision. But there will be other Cell variants for direct comparison with these CPUs. I don't know where we will stop.

I expect we will see 8 cores on the desktop fairly soon. If not by the end of this year then certainly some time in 2008. Obviously these processors have a far bigger budget in just about every area to Cell so its not a fair comparison in terms of which is the more efficient architecture for gaming but my reasoning for bringing it up was to highlight that x86 as a gaming CPU isn't necessarily so far behind Cell in the PS3 in performance terms.

Perhaps a dual core is already as good for that workload, maybe not, but 8 core desktops will be available within 2 years of PS3's launch so if an 8 core is what it takes to match Cells performance for that worlkoad then its less than 2 years behind while some claims would have us believe is a decade or more behind, as perhaps it is in some none gaming applications (like the BFS).

Define game machine to put things in perspective. There are other issues that render PC gaming less attractive for me (like price, noise, size, ... besides game library).

A system built to play games as one of its primary functions I guess. Regardless of the other factors (which may or may not be issues depending on your circumstances), the reason I stress the point though is that we already know Cell can be much better for certain workloads so thats not in dispute. My argument is whether or not its performance advantage in F@H, BFS and all the other scientific apps we have seen it excel in actually transfer into running a game. Hence I emphasised the "as part of a games machine" point.
 
Yeah, that's no point at all. Even the 8 year old PS2 runs Havoc in games! It just depends on the amount of objects that are involved.

I for myself consider Motorstorm as the current benchmark for ingame physics and effects. And i've yet to see any game on 360 or PC that comes even close to this combination of computations as shown in that game (that is 14 high detailed cars in game, loads of wrecks or buildings you can drive through (as every other of those 14 cars) and crash in with correct physics + real time mudd with loads of muddy shader effects on it everything at a perfect framerate including AF and very good textures ). Plus this is only a launch game!

Motorstorm physics is good but cant hold it against CoH, yep thats right Company of Heroes. It has physics calculations for everything. My modded version (I made it myself, alters graphics+physics) allows 600+ objects to have unique physics and it shure looks great to see debris and vechicle parts fly everywhere (counted 300+ parts on a medium sized tank battle on screen).
Drive over anything with a tank and see how both it and the tank reacts with realistic physiscs. Visual effects are also top notch in CoH and I've yet to see another game match it (released that is of course!).;)


Code:
HighSpec =
	{
		numAllowedOrphans = 600,
		numAllowedOrphansAlt1 = 300,
		numAllowedPhyFX = 600,
		numAllowedPhantoms = 100,
		desiredStepFrequency = 120, -- Hz
 
I expect we will see 8 cores on the desktop fairly soon. If not by the end of this year then certainly some time in 2008. Obviously these processors have a far bigger budget in just about every area to Cell so its not a fair comparison in terms of which is the more efficient architecture for gaming but my reasoning for bringing it up was to highlight thath x86 as a gaming CPU isn't necessarily so far behind Cell in the PS3 in performance terms.

But it's not about efficiency to begin with (i.e., it's not a feature that can be ignored) ? In my view, it's about reality and needs.

In 2008, PS3 should be cheaper. Sometime further out, even smaller. Cell needs to have sufficient leg room to get there while maintaining high performance. Efficiency is one of the key design points.

Hi-end x86 will be irrelevant if priced high (regardless of how powerful they are). It is still unclear what form of x86 can fit into PS3 form factor and more stringent power/space requirements in subsequent years.

If it's not clear now, I am also trying to pre-empt "silly" multi-Cell desktop configuration argument, or future-Cell conjectures... since budget is not an issue in a fictitious world.

It may be a more fruitful exercise to discuss whether and what x86 can deliver in the console space (more on-topic !).

Perhaps a dual core is already as good for that workload, maybe not, but 8 core desktops will be available within 2 years of PS3's launch so if an 8 core is what it takes to match Cells performance for that worlkoad then its less than 2 years behind while some claims would have us believe is a decade or more behind, as perhaps it is in some none gaming applications (like the BFS).

What decade gap are you refering to ? Have not heard that line of thought before.

My argument is whether or not its performance advantage in F@H, BFS and all the other scientific apps we have seen it excel in actually transfer into running a game. Hence I emphasised the "as part of a games machine" point.

As described, it's the total package. Not only performance advantage, but power, heat and space/cost saving. As for its gaming potential, I think there are other factors in play (e.g., time, money, content pipeline, network bottlenecks, ease-of-gaming precludes difficult AI, ...). It's tough but no one said it's going to be easy.

Even then, I don't see an absolute barrier yet. There is no argument, just uncertainty since no one knows for sure.
 
What demo is that? If you have impressive looking things on PCs, please link to them, as I haven't seen much there!

Ice Storm fighters, that was it:

http://www.intelcapabilitiesforum.net/ISF_demo?s=a

Although this demo is stressing a lot more than just physics so its not really comparable to the duck demo.

Take out the cloth and fluid dynamics and just have ducks. Spend all the SPEs on ducks. You could manage a whole lot more ducks!
Okay, I think you're missing some key info. Blocks are easy to process. They're quick. Second only to spheres, you can use boxes and have simple solvers. The moment you move into mesh collisions, things become a hell of lot harder. A box has 8 vertices you need to check with other boxed for collisions. Comparing meshes, you have hundreds of points per rigid body, and you need to check that with hundreds of points on other rigid bodies it's collidiing with. Thus with one a demo of 100 boxes colliding and 100 ducks, the 100 ducks is probably an order of magnitude harder to do.

The Meqon physics Demo is actually a pretty good test of this:

http://www.fileplanet.com/146070/140000/fileinfo/Meqon-Physics-Demo

There is a ragdoll test which bares some similarities to the duck test but with this this we are probably talking an even more difficult problem as each ragdoll has arms and legs which flail about. It certainly takes a hell of a lot of them (in the hundreds range) to bring my Core2 E6400 to its knees. It scales across both cores aswell so its actually a very nice litte test :D . There's also a water demo in there thats very similar to the duck one. Fewer objects in the water but it behaves pretty much the same and performs superbly.

There are other demoes which are simpler (balls in a bowl for example) which don't even begin to stress my CPU. But seem to demonstrate pretty comparable rigid body physics to the LOD demo.



Thanks, thats actually the type of thing im looking for, I think its fairly clear that this is relevant to gaming phyisics and thus would provide a real world advantage. Its worth pointing out though that the performance advantage at those speeds accounting for the number of SPE's used means it wouldn't be much faster than a current quad core (if at all) assuming performance scales well and it would probably be slower than an 8 core CPU. The meqon demo above also has a cloth simulator btw, runs great on my dual core ;)


Also here's the thread I remember about PhysX on Cell. Somewhere along the line Aegia said Cell would handle PhysX almost as well (in the same ball-park anyway) as their PhysX processor.


Yes but without knowing how the PhysX PPU performs that doesn't tell us an awfull lot. The demo's and rare game examples we have seen so far have been less impressive IMO than what has been seen through pure software (except perhaps Cell factor, but as I mentioned, you can run that in software too.)
 
I expect we will see 8 cores on the desktop fairly soon. If not by the end of this year then certainly some time in 2008. Obviously these processors have a far bigger budget in just about every area to Cell so its not a fair comparison in terms of which is the more efficient architecture for gaming but my reasoning for bringing it up was to highlight that x86 as a gaming CPU isn't necessarily so far behind Cell in the PS3 in performance terms.
Intel's Nehalem is 1-8 core and produced in 45nm. But it's introduction is sometime in 2008 and the 8-core version won't be available for desktop for a while. Also the 8-core version is likely to be just a dual-die CPU with 2 quad-core processors in one just like Pentium D. As for performance, you shouldn't forget memory latency at which Cell has an advantage.
Motorstorm physics is good but cant hold it against CoH, yep thats right Company of Heroes. It has physics calculations for everything. My modded version (I made it myself, alters graphics+physics) allows 600+ objects to have unique physics and it shure looks great to see debris and vechicle parts fly everywhere (counted 300+ parts on a medium sized tank battle on screen).
Cool, but does it allow different objects have different physical property?
 
Cool, but does it allow different objects have different physical property?

Of course, it's based on material, weight, size, and a lot more. Wood planks will fly away when arty is deployed and split hard compared to stones which rolls, flies away, or get destroyed depending on size and type (eg wall rocks, cliff rocks and so on). In short all objects have their own physiscs property (even debris).:smile:
 
This thread is all over the place, and has gone through like five topic iterations at this point.

1) It started with links to articles from 2003, tied said articles to potential future efforts for the 720, and raised the prospect of Interval-centric computing. Well, the Interval computing thing we might as well be talking about quantum computing or any other fundamental paradigm shift; ie we have no idea when it will happen, and as such there's no use discussing its relevance in consoles.

In terms of the memory-centric advances, it would be senselss to ignore that Cell seeks to address these issues as well, indeed its memory performance is one of its key differentiating features, and the reason for the moniker 'broadband engine.' It's not as overarching as what Sun seems set on achieving, but then again it's been achieved whereas Sun's efforts are still MIA.

2) It went on to the Monarch chip. My first thought was along the lines of... "what?" I'm not sure if the idea of that tangent was to show Cell as having 'superior' competition, to indicate that MS might be working with Raytheon for their next console, or what other reason... but again, if sensors is what Raytheon is targeting with Monarch, they better get on it. It seems promising, but in the world of defense Cell is already garnering a lot of attention in that role.

3) Then the potential architectures for what the next consoles might contain were discussed on a more fundamental level - GPGPU, Fusion, etc etc...

We just can't know or begin to guess right now. A Cell variant in PS4 seems at this time to be the single 'safe' bet to make, but even that is not a certainty. Between now and the onset of the next generation, we have the advent of same-die CPU/GPU chips, the move towards utilizing GPUs for more general computing, and Intel's efforts to move graphics onto x86-extensible sub-processors via Larabee. Between all of that, potential mergers/buyouts in the computing space, architectural failures/implosions, and any one of a number of projects potentially pulling away outright and changing the game, how can we make guesses in 2007 of what will happen when we know things between now and 2009 are going to be big?

4) It went to some weird tangent about, "which would you prefer, 4 CPUs or 4 GPUs?" Which again gets a "what?" from me. Since when were chip budgets viewed outside of the measure of actual physical area? The number of cores... toss that aside. Just pretend each console will have two chips roughly ~200mm^2 in surface area - whatever makes the most sense in that context performance-wise is what we should be discussing, not some arbitrary 'more GPU vs more CPU' scenario. It's worth noting as an aside that in the current G80-generation on the desktop, CPUs are again the bottleneck; there has to be balance... it may never be perfect, but you can't just leave the CPU behind entirely and expect good next-gen performance by only jacking the GPU. Both the GPUs and CPUs will be improved - why act as if it's mutually exclusive?

5) And now the conversation has gone through the ever-reviled "game code/general code" iteration (oh god make it stop) and has landed into some physics discussion. I'm not sure... what else can be said? Cell is better at physics than the present crop of x86 processors. It doesn't matter if people point to Crysis or Company of Heroes and say "where's Cell?" It is what it is, and pointing to what's been done so far is no indicator of what can be done. If the discussion is going to be about games, make it about games. If it's going to be about processors, make it about processors; they're not the same thing. Hell Company of Heroes is back on Havok 3 vs the present Havok 4.5... so imagine the glory that Havok 4 could bring to it. But again, these are software implementations; they don't by themselves indicate which architecture is stronger in a given field. That Cell is better than x86 in operations like physics is not even in dispute IMO, and animated GIFs and physics-centric RTS' don't change that reality.
 
Last edited by a moderator:
  • Like
Reactions: Geo
Sure, head to their download section and read through their white papers.

Summary: A bunch of CPU architect veterans designed a 3-way OoO core with focus on power consumption while maintaining a high level of performance. Their core burns only 7W @ 2GHz in a 65nm process. The core runs up to 2.5GHz. Their core is only 10mm^2 discounting level 1 caches, but even if we more than double their size for a speculative 90nm implementation it would still be smaller than current XCPUs.

PA's Semi primary objective is to displace Freescales offerings in the networking gear market.

IMO, this would be an ideal building block for the next XBOX CPU. All future CPUs will be limited by power envelope, so starting out with a small, fast and low power core seems like a good starting point. The fact that it doesn't require one to jump through hoops to program is a bonus.

Cheers
I'm under the impression that MS : make a bad choice and want to push hight frequency for PR reason, or get shafted by IBM.

What would be the size of one of this core with 128 register for the vmx units? 25mm²? anyway close.

for almost the same die size as xenon Ms could have get :
4 cores
almost 2MB of L2 cache
OoO execution
2Ghz
~8O GFlop (one 3,2 xcpu ~30 gflop, @~2Ghz ~20 GFlop X4, speak about accuracy lol.... anyway it's just for the figure)
This would have destroyed the xenon performance wise in every situations....

I don't know if MS is deceived of xenon or if it was aiming at big PR numbers(ie 3.2 Ghz,etc...)

EDIT i don't feel like going through all these white papers (sorry for the laziness), Gubbi do these power efficient PPC allow smt?
 
Last edited by a moderator:
Sure, head to their download section and read through their white papers.

Summary: A bunch of CPU architect veterans designed a 3-way OoO core with focus on power consumption while maintaining a high level of performance. Their core burns only 7W @ 2GHz in a 65nm process. The core runs up to 2.5GHz. Their core is only 10mm^2 discounting level 1 caches, but even if we more than double their size for a speculative 90nm implementation it would still be smaller than current XCPUs.

There's a big difference between typical and max power usage. Max is 25W, which is the more meaningful number. This would have increased at 90nm, and clockspeeds may drop too.

PA's Semi primary objective is to displace Freescales offerings in the networking gear market.

IMO, this would be an ideal building block for the next XBOX CPU. All future CPUs will be limited by power envelope, so starting out with a small, fast and low power core seems like a good starting point. The fact that it doesn't require one to jump through hoops to program is a bonus.

Cheers

A lot of low powered cored tend to have poor FP performance. It's not clear how good FP performance are for these chips.
 
Last edited by a moderator:
I'm under the impression that MS : make a bad choice and want to push hight frequency for PR reason, or get shafted by IBM.

What would be the size of one of this core with 128 register for the vmx units? 25mm²? anyway close.

for almost the same die size as xenon Ms could have get :
4 cores
almost 2MB of L2 cache
OoO execution
2Ghz
~8O GFlop (one 3,2 xcpu ~30 gflop, @~2Ghz ~20 GFlop X4, speak about accuracy lol.... anyway it's just for the figure)
This would have destroyed the xenon performance wise in every situations....

I don't know if MS is deceived of xenon or if it was aiming at big PR numbers(ie 3.2 Ghz,etc...)

EDIT i don't feel like going through all these white papers (sorry for the laziness), Gubbi do these power efficient PPC allow smt?

I've looked over the slides and something doesn't add up: It said it's 200M transistors and has 1156 pins. That's a pretty big chip with a big package. I can't find a reference to diesize at all except on wikipedia.

I would venture it is 100-120mm^2 on 65nm. If this is the case, then only needing 25W at 2Ghz would be pretty impressive, but diesize-wise and cost-wise it could not be a Xenon replacement. Plus it's not coming out till Q4 2007, which means there's plenty of time for it to fall short (highly impressive performance and power consumption claims tend to do that ;)).

FP performance isn't too bad either now that I've found it. There seems to be a separate FP unit and a VMX unit. The FP is only Double Precision, and does a scalar MADD (2 DP FLOPS per cycle). The VMX can do a vec4 Single Precision MADD. Since it's a dual-core, total performance at 2 Ghz is 8 DP GFLOPS and 32 SP GFLOPS. I'm not sure if it can use both units at the same time for each core.
 
It's an SOC, so there's a lot of other stuff on the same die. Are your figures taking that into account? A console version would not likely have all that or need all the IO pins.
 
It's an SOC, so there's a lot of other stuff on the same die. Are your figures taking that into account? A console version would not likely have all that or need all the IO pins.

http://hightech.lbl.gov/DCTraining/docs/server-conference/pbannon-case-study.pdf
http://www.pasemi.com/downloads/PA_Semi_ISSCC_2007.pdf

Eyeballing at the block diagram, the 2 main cores use up just under half the chip with memory controller and L2 cache eating another 1/3. The rest is SOC and IO. I imagine that this thing could be 70-80mm^2 if you try to make it as small as possible, but that's still pretty sizable.
 
There's a big difference between typical and max power usage. Max is 25W, which is the more meaningful number. This would have increased at 90nm, and clockspeeds may drop too.

25W is Max for the entire dual core SOC in 65nm @ 2GHz, 7W is for a single core.

A lot of low powered cored tend to have poor FP performance. It's not clear how good FP performance are for these chips.

From their white paper: >2000 SpecFP. That is compiling generic code (which is miles ahead of what the PPU and the XCPU can do).

Single precision throughput is 4 fmadds per cycle, so 16GFLOPS per core @ 2GHz.

Cheers
 
http://hightech.lbl.gov/DCTraining/docs/server-conference/pbannon-case-study.pdf
http://www.pasemi.com/downloads/PA_Semi_ISSCC_2007.pdf

Eyeballing at the block diagram, the 2 main cores use up just under half the chip with memory controller and L2 cache eating another 1/3. The rest is SOC and IO. I imagine that this thing could be 70-80mm^2 if you try to make it as small as possible, but that's still pretty sizable.

Going by the die photo they take up < 30%.

And in this presentation they compare the core size in 65nm to that of StrongArm in 0.65um which is ~10mm^2.

Cheers
 
What would be the size of one of this core with 128 register for the vmx units? 25mm²? anyway close.

I'm guessing it already have 96 registers, 32 for the architected registers and one extra for each instruction scheduler slot. The core would probably clock in below 20-25mm^2 per core in 90nm. The primary reason for 128 VMX register in the current XCPU is to allow for unrolling/manual scheduling around data dependency hazards.

Gubbi do these power efficient PPC allow smt?

Nope. In order to support SMT you'd need to double the instruction scheduler (have two, one for each context), double the register file. To avoid thrashing of caches they would have to be bigger (or at least have higher associativity). All expansions that would make these structures slower and the only benefit would be a higher usage of the execution units.

Looking at page 13 of this it is quite obvious that the execution units are only around 25% of a core, so it makes complete sense to just duplicate the entire core and actually get double the thread performance.

Cheers
 
Back
Top