CELL Patents (J Kahle): APU, PU, DMAC, Cache interactions?

Panajev2001a said:
Jaws said:
Panajev2001a said:
Also note that by taking these unit areas from the PSX core, were also inheriting the areas of datapaths etc. which would scale with our calculations.

The problem they are seeing right now is that routing logics, wires do not scale as fast as you would want them to and they are anyways a big limit to clock-speed scaling in future chips.

http://realworldtech.com/page.cfm?ArticleID=RWT062004172947

If Cell is a VLIW processor, then we'd see most of the scheduling/control logic removed from the hardware and into software, a clever compiler, no?
....
The PU will have to juggle a tons of threads to keep the system running efficiently and a good thread scheduler will need a nice and fast PU to do its job.

A superscaler PU with a good chunk of cache?

Panajev2001a said:
....
I will even grant you that yelds on such a 300 mm^2 beast would not be so bad: the degree of redundancy you have helps you.

You have lots of repearting blocks: SRAM cells, DRAM cells, APUs, PUs, DMACs.

And that would also apply to the GPU, including their repeating Salc/ Salps...

Panajev2001a said:
Plus, and this you forgot, you can add few more blocks just to increase yelds: if you put 9 APUs per PE it will be much more probable to have working PEs with 8 fully functional APUs.

Ahhh yes... an advantageous case where increasing die space would actually increase yields! Nice... 8)
 
Wouldn't a VLIW compiler deal with that?

In this case: NO.

Each APU is its own processor, with its own main RAM (Local Storage) and access to a DMAC.

VLIW is a technology in which, basically, in each instruction you specify what should be done and how it should be done by the CPU executing it since the CPU execution units and resources are accessible to the compiler.

You are saying that basically a compiler could take care of threading and synchronization work in a large SMP system.


PostPosted: Mon Aug 23, 2004 11:54 am Post subject:
Panajev2001a wrote:
Jaws wrote:
Panajev2001a wrote:
Quote:
Scared of multi-processors!


No, I want better cache for the APUs and the PU.

I want more general purpose performance:


Sounds like you'd prefer the Xe CPU! Razz


Are they that "incredibly" different ? Wink Cough...


You might want that cought seeing too! Wink If they are not that fundamntally different, then what is the real agenda behind Cell and all this investment? Why not just ask IBM for 4 dual issue PowerPC 970/980s to burn into a 65nm CPU and perhaps save some cash?

The three CPU cores in Xenon CPU are not PPC970 or hypothetical PPC976 derived from POWER 5.

They are dual-issue processors (a G5 is like an 8 issue CPU IIRC [8 instructions fetched, broken if possible into smaller instructions and grouped in 5 groups for the Out Of Order logic and then sent to the 8 execution units]): simple yet effective (how else do you think they can be pushed at over 3 GHz in a tri-core configuration ?).

The G5 has not reached 3 GHz with a single core and a single and simplier VMX unit.
 
Fafalada said:
.....
I wouldn't expect anything silly like the 64MB eDram pool some have suggested on TOP of APU local storage.

Meh...didn't you see that die image of the PSX core, with 4 MB eDRAM at 130nm and the rest of the core at 90nm, and only 86mm2 die area, and my attempt to show a BE ~ 300 mm2 at 65 nm! ;)

passerby said:
1c contribution.

Some time ago there was a report of a near-TF(or multi-100 GF) CPU created by an institute - was it based in Isreal?

IIRC, wasn't this the optical chip from Intel?

passerby said:
Hey as another really good example, newset video cards only clock at the half-GHz ranges, and can still post high GF numbers. :D

IIRC, 3Dlabs Wildcat Realizm 800 claimed 700 GFlops, 3DLabs Flops! ;) ... Source..

passerby said:
My current guess:
Each APU is just in the sub-1GHz regions - maybe even much lower. How they claim high GF numbers - well we'll wait and see. After all we don't know much about what an APU is - just lots of papers about how APUs deploy and work together, but nothing technical on an APU itself.

Sub < 1Ghz is quite low ... is that for 32 APUs?

Yeah...we've kinda postulated that an APU at 4GHz, we'd get 8 Flops per cycle but this hasn't really been confirmed. What if they could do 16 or 32 Flops per cycle. Then we wouldn't need to aim as high as 4 GHz! 8)

Panajev2001a said:
....
Fine by me as that is what I expect: 1 PE at 4 GHz with 8 APUs per PE.

Panajev, what bugs me about that is that if we could show 4 PEs at 65 nm ~ 300 mm2 die... then we could achieve 1 PE at 90 nm ~ 150 mm2 die! So why are STI so heavily investing in 65 nm and 45 nm :?: What's the big rush to lower process then :?:
 
Jaws,

from the calculations we saw so far it is a bit more than 300 mm^2.

If die-size is not a concern of them, they can certainly go for 4 PEs and 1 GHz of clcok speed or 2 PEs and 2 GHz of clock-speed.

4 PEs and 4 GHz of clock-speed makes for a more expensive chip which produces a lot of heat.

Why invest in 65 nm and 45 nm manufacturing processes ? To have as a group a very important asset to compete with other major Semiconductor players and to save as much money as they can in each chip to be able to increase their profit margin while still delivering a powerful chip and without worrying too much if Microsoft abuses of their enormous capital and goes into a relatively nasty price war.

Even with PlayStation 2: if Sony did not run into problem with their 180 nm GS they would have never thought about releasing that 279 mm^2 beast that is the 250 nm GS.

Developing their own Sony LIVE network, keeping 3-4 consoles alive and push initiatives like OpenGL ES and OpenMAX and COLLADA and other major software initiatives to sustain PlayStation 3 and CELL... there is money that needs to be spent there.

Plus, you have not seen thir GPU yet ;).
 
Panajev2001a said:
Wouldn't a VLIW compiler deal with that?
.....
You are saying that basically a compiler could take care of threading and synchronization work in a large SMP system.

Yes, some sort of hybrid VLIW type compiler. The reason I say that is that if they plan on any success for they're distributed computing/ cyberworld 'pie in the sky' dreams, you would need something like that, or you would get lost in a forest of PEs? And if they plan to carry this ISA forward to PS4, how about dealing with a 40 PE system that has 320 APUs! :oops: ...Of course they could've ditched these aspirations...


Panajev2001a said:
The three CPU cores in Xenon CPU are not PPC970 or hypothetical PPC976 derived from POWER 5.

They are dual-issue processors....

The G5 has not reached 3 GHz with a single core and a single and simplier VMX unit.

Well, I thought PPC 970/976/980s were derived from Power4 but at different processes? ...130/90/65 nm respectively...but I may be mistaken...and I thought the Xe CPU were modifications of those core to 3 cores that are dual issue each ~ 6 threads per CPU... but I prolly need to re-read the Xenon specs again! :p

But my point was they could've saved alot of hassle then by going the Xenon CPU route...

Panajev2001a said:
Why invest in 65 nm and 45 nm manufacturing processes ? To have as a group a very important asset to compete with other major Semiconductor players and to save as much money as they can in each chip to be able to increase their profit margin while still delivering a powerful chip and without worrying too much if Microsoft abuses of their enormous capital and goes into a relatively nasty price war.

Yes that would also make sense, but when has Sony been conservative recently! Just look at the PSP specs...


Panajev2001a said:
Plus, you have not seen thir GPU yet ;).

Yummy... :devilish:
 
some semi-far-out desires.

I hope Sony can extend the life of PS3 by having the ability to connect multiple PS3s together, directly, physically (not over the internet) by stacking them somehow, (like Sega Genesis on top of SegaCD or SGI machines) so that you can share the processing resources of all the PS3s you have, to increase power for realtime applications. Sega CD did this with Genesis. SGI machines do this. graphics cards do this. I know it sounds wild, but not as wild as sharing processing resources over the internet in realtime (which seems impossible) If PS3's come down to $100 4-5 years after launch, this might be feasible. the PS4 comes along 7-8 years after PS3. the PS4 gets to use new optical processors / optical technology to allow for previously impossible effects and rendering techniques because of the limits of normal silicon. everything like raytracing and global illumination...PS4 would be able to do current- film quality CGI, almost anyway, in games. leaving only one final generation needed (PS5/Xbox4/N7) to bring graphics to the point of not needing any more advancement.


ok back to something much more knowable and likely.... if the PS3 uses 32 to 64 APUs as the patents (Sony's) and reports (Mercury News) suggest, and if PS4 comes out within the standard 5-6 years, and uses newer Cell based technolog, we could be looking at 512+ APUs or some such figure.

The PS3 *will* have DOZENS of processors & sub-processors (the PUs, the APUs, the elements within each Pixel Engine, etc)...and then the PS4 will have HUNDREDS of more powerful processors & sub-processors.
 
Jaws said:
Panajev2001a said:
Wouldn't a VLIW compiler deal with that?
.....
You are saying that basically a compiler could take care of threading and synchronization work in a large SMP system.

Yes, some sort of hybrid VLIW type compiler. The reason I say that is that if they plan on any success for they're distributed computing/ cyberworld 'pie in the sky' dreams, you would need something like that, or you would get lost in a forest of PEs? And if they plan to carry this ISA forward to PS4, how about dealing with a 40 PE system that has 320 APUs! :oops: ...Of course they could've ditched these aspirations...

The ISA should not contain info about the number of APUs in a system.

Such a thing is beyond retarded: it is like fixing the number of execution units in an ISA.

EPIC in fact has already taken care of it: the compiler knows in advance the kind of resources IA-64 machines have (in terms of registers, instructions bundle that are legal or not [which do depend on the execution units of the CPU: if you do not have three units capable of doing Branch-type operations you won't be able to schedule a BBB bundle for example but you will break it in different bundles, etc... ): this imposes no limit on the number of bundles you can process at any given cycles or the number of cores you can have.

The compiler (EPIC traces its roots back to VLIW) does not deal with issues inherent to SMP machines.

A compiler CANNOT take care of thread scheduling and synchronization as threads by itself.

VLIW is just another kind of ISA, like the x86 or EPIC or x86-64.

At the compiler level you can do another kind of work: you could insert instructions to spawn threads to speculatively process a branch (predication: you execute all the paths of the branch and discard the incorrect result once the branch target has been computed).

Still thread management is an OS thing. We are talking about very different levels here.

VLIW can specify how instructions are scheduled in a single CPU: it does not schedule threads and processes, that is the OS's job.


Panajev2001a said:
The three CPU cores in Xenon CPU are not PPC970 or hypothetical PPC976 derived from POWER 5.

They are dual-issue processors....

The G5 has not reached 3 GHz with a single core and a single and simplier VMX unit.

Well, I thought PPC 970/976/980s were derived from Power4 but at different processes? ...130/90/65 nm respectively...but I may be mistaken...and I thought the Xe CPU were modifications of those core to 3 cores that are dual issue each ~ 6 threads per CPU... but I prolly need to re-read the Xenon specs again! :p

But my point was they could've saved alot of hassle then by going the Xenon CPU route...

Who went after who's route ;) ?

The PowerPC 970 FX is the 90 nm shrink of the 130 nm PowerPC 970.

The G5 or PowerPC 970 is a 8-issue machine (8 fetched, 5 groups of iops tracked in the scheduling logic and 8 iops issued to the execution units), not a dual-issue machine.

Panajev2001a said:
Why invest in 65 nm and 45 nm manufacturing processes ? To have as a group a very important asset to compete with other major Semiconductor players and to save as much money as they can in each chip to be able to increase their profit margin while still delivering a powerful chip and without worrying too much if Microsoft abuses of their enormous capital and goes into a relatively nasty price war.

Yes that would also make sense, but when has Sony been conservative recently! Just look at the PSP specs...[/quote]

No more 7.1 sound, 32 MB of external RAM and only 4 MB of e-DRAM (e-DRAM was cut from 12 MB to 4 MB).


Panajev2001a said:
Plus, you have not seen thir GPU yet ;).

Yummy... :devilish:

Indeed.

Btw,

as you can see in this document:

http://www.sony.net/SonyInfo/IR/info/presen/eve_03/handout.pdf

The EE was, even at 250 nm, only 240 mm^2 while the GS at 180 nm was only 188 mm^2.

This is far from two 300+ mm^2 chips at 4 GHz that you are asking for.
 
Panajev2001a said:
Plus, you have not seen thir GPU yet ;).

"Well, then, we shall have to make you speak."

Come on, panajev, do you know something about the Realizerâ„¢ that (we)you'd want (you) to share with us? :devilish:
 
Vysez said:
Panajev2001a said:
Plus, you have not seen thir GPU yet ;).

"Well, then, we shall have to make you speak."

Come on, panajev, do you know something about the Realizerâ„¢ that (we)you'd want (you) to share with us? :devilish:

No, but I trust SCE not to let people down with the GPU: for 1999 standards the GS was very fast and powerful, they did not deliver a disappointing GPU.

Of course I suspect that between the time the GS was feature-locked and the time the PlayStation 3's GPU will be feature-locked SCE will have learned quite a few new tricks to not only make the GPU powerful, but also flexible and versatile.
 
Panajev2001a said:
No more 7.1 sound, 32 MB of external RAM and only 4 MB of e-DRAM (e-DRAM was cut from 12 MB to 4 MB).

The 7.1 is still here, and the 32MB of external ram added because developers wanted it (32MB of edram would have been too expensive for them, and 12MB of edram + 20MB or external ram would not have been really practical in real world scenarios (for instance, look at the A-Ram on GC).
So, IMO, i won't call this change a letdown from a hardware perspective. :D
 
Panajev2001a said:
Jaws said:
Panajev2001a said:
Wouldn't a VLIW compiler deal with that?
.....
You are saying that basically a compiler could take care of threading and synchronization work in a large SMP system.

Yes, some sort of hybrid VLIW type compiler. The reason I say that is that if they plan on any success for they're distributed computing/ cyberworld 'pie in the sky' dreams, you would need something like that, or you would get lost in a forest of PEs? And if they plan to carry this ISA forward to PS4, how about dealing with a 40 PE system that has 320 APUs! :oops: ...Of course they could've ditched these aspirations...

The ISA should not contain info about the number of APUs in a system.

Such a thing is beyond retarded: it is like fixing the number of execution units in an ISA.
......
A compiler CANNOT take care of thread scheduling and synchronization as threads by itself.
........

Didn't imply that at all! ;) You mis-interpreted the post.

My point was that you seem to want some sort of low level access to manually control the scheduling of these threads for optimisation and not trusting abstraction. Which seems reasonable if you get your 1 PE with 8 APUs...but if your working in a distributed LAN/ WAN environment or in the future, the PS4 has 40 PEs and 340 APUs or more...ouch...that's gonna give someone a headache! ;)

I'm talking about forcing this abstraction. In this case it would be a combination of the following working intimately together,

realtime OS + Compiler + Hardware (PUs)

PS: I stated my definition of low level access before as, register level/ assembly level/ microcode level to avoid any confusion ! ;)


Panajev2001a said:
Jaws said:
Panajev2001a said:
The three CPU cores in Xenon CPU are not PPC970 or hypothetical PPC976 derived from POWER 5.

They are dual-issue processors....

The G5 has not reached 3 GHz with a single core and a single and simplier VMX unit.

Well, I thought PPC 970/976/980s were derived from Power4 but at different processes? ...130/90/65 nm respectively...but I may be mistaken...and I thought the Xe CPU were modifications of those core to 3 cores that are dual issue each ~ 6 threads per CPU... but I prolly need to re-read the Xenon specs again! :p

But my point was they could've saved alot of hassle then by going the Xenon CPU route...

Who went after who's route ;) ?

True...will we ever find out...? :p



Panajev2001a said:
The PowerPC 970 FX is the 90 nm shrink of the 120 nm PowerPC 970.

The G5 or PowerPC 970 is a 8-issue machine (8 fetched, 5 groups of iops tracked in the scheduling logic and 8 iops issued to the execution units), not a dual-issue machine.

Thanks for clarifying the processes....

I'm aware of the G5 being 8 issue. What I'm saying is that AFAIK, Xe CPU is a modified PPC970/ G5 core that is dual issue. And the Xe CPU uses three such cores, effectively being a 6 issue CPU? Or have I mis-read the specs :?

Panajev2001a said:
.....
No more 7.1 sound, 32 MB of external RAM and only 4 MB of e-DRAM (e-DRAM was cut from 12 MB to 4 MB).

PSP having 7.1 sound was just silly ;)

Yes, eDRAM was cut but external RAM, AFAIK, was increased from 8 MB to 32 MB due to dev requests... still, they are not conservative specs for a portable ;)

Edit:

Panajev2001a said:
The EE was, even at 250 nm, only 240 mm^2 while the GS at 180 nm was only 188 mm^2.

This is far from two 300+ mm^2 chips at 4 GHz that you are asking for.

But the GS was 279 mm2 at 250 nm...and EE ~ 240 mm2 = 519 mm2 total....not that far from 600 mm2 total... I'm not asking for blood!

However, I find that more realistic that attaining 4GHz though...that seems far more difficult to achieve...

The roles could be reversed, the GPU in PS3 could be smaller than the BE. It would be difficult to estimate the area of the 4 Pixel engines with Salc/ Salps and whether they have any periphery texture / local memories etc. aswel...
 
Panajev,

We will just have to wait and see! ;) Btw, do you have an estimation on how quickly Sony will move to 45 nm from 65 nm?


Megadrive1988 said:
......
leaving only one final generation needed (PS5/Xbox4/N7) to bring graphics to the point of not needing any more advancement.

Megadrive, you know that will never happen... :D There are too many closet graphics whores in this world! :p
 
But the GS was 279 mm2 at 250 nm...and EE ~ 240 mm2 = 519 mm2 total....not that far from 600 mm2 total... I'm not asking for blood!

300mm2 is doable, but at the moment I think BE is going to be the size of NV40.

However, I find that more realistic that attaining 4GHz though...that seems far more difficult to achieve...

Yeah, I bet something like NV40, clocked at 4GHz will give amazing performance too, provided its pair with proper memory.

Has Intel really scrap those 4GHz P4, they were planning ? I was hoping I can get some 4GHz CPU, around this time of the year, and 5 GHz by this time next year.

If Sony and MS going to put out $300 consoles, with 4 GHz CPU, it'll be really nice. Especially if Sony release another Linux kit or something, than I don't have to buy PCs no more.
 
Jaws said:
Thanks for clarifying the processes....

I'm aware of the G5 being 8 issue. What I'm saying is that AFAIK, Xe CPU is a modified PPC970/ G5 core that is dual issue. And the Xe CPU uses three such cores, effectively being a 6 issue CPU? Or have I mis-read the specs :?

I doubt you have the specs to read. I guess thats a version of the old physics saying "Not even wrong".
 
V3 said:
Has Intel really scrap those 4GHz P4, they were planning ? I was hoping I can get some 4GHz CPU, around this time of the year, and 5 GHz by this time next year.

Just read an interesting, if true, article about Intel. Here

PS. How big is NV40? I know it's large, but how large?

PPS. Notice that Sony and Toshiba are moving for a quick transition to 45nm, possible parallels to the GS and 180nm? And on a 300mm wafer.... I'm with V3 on what's possible.
 
V3 said:
But the GS was 279 mm2 at 250 nm...and EE ~ 240 mm2 = 519 mm2 total....not that far from 600 mm2 total... I'm not asking for blood!

300mm2 is doable, but at the moment I think BE is going to be the size of NV40.

I thought NV40s die size is ~ 300 mm2 ?

V3 said:
....
Has Intel really scrap those 4GHz P4, they were planning ? I was hoping I can get some 4GHz CPU, around this time of the year, and 5 GHz by this time next year.

IIRC, they've pushed them back to Q1 2005...

V3 said:
If Sony and MS going to put out $300 consoles, with 4 GHz CPU, it'll be really nice. Especially if Sony release another Linux kit or something, than I don't have to buy PCs no more.

Hmmm...I wonder what this next Linux / Cell OS kit would be like... I hope they don't lock things down like the current kit...

DeanoC said:
Jaws said:
Thanks for clarifying the processes....

I'm aware of the G5 being 8 issue. What I'm saying is that AFAIK, Xe CPU is a modified PPC970/ G5 core that is dual issue. And the Xe CPU uses three such cores, effectively being a 6 issue CPU? Or have I mis-read the specs :?

I doubt you have the specs to read. I guess thats a version of the old physics saying "Not even wrong".

Well, the leaked specs I do ;) ...I suppose it's the old addage, close but no cigar! :p

The PPC 970FX is the dual issue, SOI, strained silicon, higher clocking, 90 nm modified version of the G5/PPC 970 core...so three PPC 970FX dual issue cores would make an ideal candidate for the Xe CPU, IMHO...

970FX, each core ~ 66 mm2 at 90 nm --> Xe CPU ~ 200 mm2 at ~ 3GHz.

On a side note, I own 3 PowerPCs between 2 Macs and a GameCube...and sheeshh, between Apple and IBM, they well and truly have a fucked up CPU naming scheme with G3s, G4s, G5s, PowerPC 6**, 7**, 9**, Power X, blah, blah, blah... no wonder I stopped following they're roadmap! :LOL:
 
Vince said:
....
PPS. Notice that Sony and Toshiba are moving for a quick transition to 45nm, possible parallels to the GS and 180nm? And on a 300mm wafer.... I'm with V3 on what's possible.

Ditto, sounds like their business model for PS2 all over again...

Vysez said:
Vince said:
PS. How big is NV40? I know it's large, but how large?

~280 mm2 with 222M transistors on a 130nm process. :D

You dont have a link to that do you ? Cheers.. ;)
 
Back
Top