DM's 65 nm transistor density analysis based on Sony EE data

...

To Paul

Than there would be no need for Sony to be taking a huge risk at 65 now would there. When 1 PE could be shoved right on 0.90 micron.
Smaller die = lower production cost. I am sure Sony management doesn't like the cost of a 280 mm2 chip any more than I do.

To nonamer

You can't just scale things like this DM. There isn't a fixed size for a transistor;
Yes, transistor size is fixed for each type. How they are laid out varies.

How else do you think we have Gigabit DRAM chips today?
DRAM transistors are the smallest type and are laid in dense uniform grid, you can't do this with processor logics.

To Geeforcer

GS has ~32 million non-EDRAM transistors.
Says who? GS has 7 million logic transistors and 36 million eDRAM transistors. This is why its transistor density is much higher.

To Grail

CPUs today are typically made up of at least 40% SRAM
Can't apply this to Sony processors since they aren't typical.

You made a bad mistake in your estimate
No I haven't, sir.

But you're no CMOS engineer! All you do is make stuff up!
Based on Sony production data, of course.

EE in its first incarnation was an enormous chip, to a large extent due to very few metal layers (four, I think), and the large amount of wide buses the chip uses. Lots of real-estate was lost due to this.
CELL busese seem wider.

Cell will be made with at least seven layers and perhaps even more!
Those additional layers are for eDRAM cell forming. They don't shrink the size of logic gates.

Why can't you realize that you just DON'T UNDERSTAND these things,
I can tell I understand the process better than you do.

and that you can't simply scale things linearly like you're trying to do?
Works all the time.
 
Based on Sony production data, of course.

u have access to sony production (Raw) data? or are you just applying learned knowledge to public available materials?



Can't apply this to Sony processors since they aren't typical.

assuming they don't change anything I'd agree.
 
Re: ...

DeadmeatGA said:
CPUs today are typically made up of at least 40% SRAM
Can't apply this to Sony processors since they aren't typical.

Now you're backpedalling. It was your assertion CPUs typically are rich in logic, now you're limiting yourself to saying just Sony CPUs are.

It's typical of a person losing an argument to suddenly switch from one foot to another. I kind of expected this to happen since you do it so often. Either changing viewpoint, or simply ignoring what other people tell you - even those who have actual experience with the hardware in question that you're criticizing. (Remember you telling Faf he was wrong about bottlenecks in PS2? Hahahalololol!)

Will this sillyness ever end?

You made a bad mistake in your estimate
No I haven't, sir.

Yes you did. GPUs aren't TYPICALLY rich of SRAM; CPUs are. Exact opposite of what you're claiming!

But you're no CMOS engineer! All you do is make stuff up!
Based on Sony production data, of course.

Problem is, you draw entirely incorrect conclusions, something you should have understood yourself as soon as you came up with that REDICULOUS 166mio transistors on an enormous 280sqmm chip at a process so modern it's not even commercially available yet. Damn, you're so lost you couldn't find your own arse even if you got to use both hands and a flashlight...

CELL busese seem wider.

True, they'll also be on more metal layers.

Cell will be made with at least seven layers and perhaps even more!
Those additional layers are for eDRAM cell forming. They don't shrink the size of logic gates.

They have nothing to do with eDRAM cells. Metal layers connects different parts of the chip, they're used in virtually ALL integrated circuits, of course including those having no eDRAM at all. As usual, you have no clue.

You really should have realized as much yourself from my previous post; as you know, EE contains NO eDRAM at all, yet it still has metal layers! Funny you managed to miss that little fact, huh? ;)

I can tell I understand the process better than you do.

Haha. Yeah, right. Even in the face of overwhelming resistance and piles of evidence pointing towards the contrary, you persist in claiming the Earth is flat. I'm beginning to think Marconelly (?) was right when he asked you for your own sake to seek professional help.

Dude, you have ISSUES. We ALL KNOW that you are WRONG, yet you refuse to listen to a word of what we're saying. You just ignore everything, instead you make up new fantasy stuff and tout that as truth.

Like I said, you have ISSUES. Seek help.

and that you can't simply scale things linearly like you're trying to do?
Works all the time.

Only in your own mind, dude... Different CMOS processes have different characteristics, including transistor size etc. That's why you need to tweak the layout and print new masks when switching from one process/fab to another. One size does NOT fit all.

*G*
 
Smaller die = lower production cost. I am sure Sony management doesn't like the cost of a 280 mm2 chip any more than I do

65 nm isn't a cakewalk, there are so many things that can go wrong it isn't even funny. The risk to put 1 PE on 65 nm isn't worth it, what if something happens and they botch the PS3 launch enough to give MS or Nintendo a huge head start?

Like I said, if PS3's Cell is going to be just 1 PE they would put it on 90 THAN move to 65 nm later.
 
Re: ...

DeadmeatGA said:
Says who? GS has 7 million logic transistors and 36 million eDRAM transistors. This is why its transistor density is much higher.

Yes, my bad. For some reason, I forgot that GS has 4mb of RAM, not one.
 
They have nothing to do with eDRAM cells. Metal layers connects different parts of the chip, they're used in virtually ALL integrated circuits, of course including those having no eDRAM at all. As usual, you have no clue.

I don't think you understand what a metal layer is. Are you confusing this with interconnects or vias?
 
Why does this have to happen when I am on vacation ( still am ) ? I miss all the fun :p

Mfa,

Do you really think that the 4 PUs will be full blown PPC cores ?

Last time I heard IBM was looking in tiny cores with an ISA of not even 60 instructions ( more like 50 IIRC ).

Another thing is that CELL will be indeed push 65 nm quite a bit: Toshiba already stated that they will upgrade their Oita #2 plant from 65 nm to 45 nm as soon as possible ( they basically said the upgrade to 45 nm will be in their minds while they build the 65 nm lines and the upgrade should be relatively smooth ).

The die shrink to 45 nm will not make it "ultra" cheap to make ( of course it will cut costs quite nicely ), but more like more "reasonable" to make... 45 nm + high volumes should also help things quite a bit.
 
Why does this have to happen why I am on vacation ( still am ) ? I miss all the fun

Then go back on your vacation, nothing interesting here only Deadmeat being ass usual
 
Panajev, it depends. If the PEs are flexible enough to handle branch heavy code which has to touch dispersed data which wont fit in its own memory in entirety then I dont see any reason for powerfull PPC cores. If not ...

The EE+GS were already expensive ... and they were also expecting to be able to shrink them down fast. To me it seems their decision on what kind of size to aim for wont be too far shy of the last time (cost per area is unlikely to have gone down, even with 300 mm).
 
It was the plan from sony to shrink the ee+gs to lower their cost, and they succeeded. And with cell they seem to aim for the same strategy but with a cascadeable design (which I guess from the optical-link stuff they have in their schemes). If they have 4 cores, and each of these start out with the same size as the ee did now, then by 2010 they will have all of these in the less size of a single ee.

This is dangerous extrapolation. I can go on, since the EE is ~10 x larger (13M -> 166M), and 10 times faster (300MHz -> 3GHz), it will be 100 times faster. 620GFlops.

OH WAIT. The figure from the beginning is just using the max die size of the EE as an example to get the number 280mm2 since that's what the max size sony will go for. They swallowed 280+240mm2 = 520mm2 for a consumer application. Then we are able to double our earlier GFlops: 1.24TFlops. This includes 40MB of ram for the GS2.

I won't even go into where the fallacy is. Its just a hint... ANYONE can produce bullshit numbers, so shut up DmGA.: Go back to hankfiles.
 
Hey Panajev, its good to see you around. hope you've been enjoying your vacation.

The speculation on the next gen of consoles never ends, hehe, we are what, 2 or 3 years away from their launch in the U.S. in all likelyhood.


this week is Soul Calibur 2 week.

I am preparing for it. tomorrow I'm getting the component cables for Xbox and Ps2 and a universal arcade stick for PS2-GC-Xbox. cannot wait! All three versions, as good as they will be, will probably only get me wanting SC3 on the next systems, with realtime breaking armor, blood, wounds, with motion blur, etc, all in 1080i and with sub-pixel lighting and such. hehe.
 
...

To Grall

Now you're backpedalling. It was your assertion CPUs typically are rich in logic, now you're limiting yourself to saying just Sony CPUs are.
You are not seeing the big picture. Look at "High-end" parts from Intel, AMD, and IBM; they all include large L2 & L3 onchip cache to boost performance so the percentage of logic gates in the total gate count is in low 15~30% range. Compare this figure to "Low-end" parts like Sony EE and Intel XsScale, they all skimp on cache to save transistor count so the percentage of logic gates in the total gate count is much higher, say 70~90%. Since Sony likes to skimp on SRAM gates in order to pack in more FPUs in their CPUs, the total number of gates packed into each Sony CPU tend to be lower, even on advanced processes. Why? Because logic gates take up much more space than SRAM gates.

Yes you did. GPUs aren't TYPICALLY rich of SRAM; CPUs are. Exact opposite of what you're claiming!
Yes they are. Texture cache, vertex cache, frame buffer write cache, Z-buffer compression cache, cache everywhere... GPU designs are simplier because the tasks are highly repetitive and need little to no branch control in their designs. In contrast, CPU design is about branch control. Hell, they throw in millions of gates just to predict what the next branch will be...

as soon as you came up with that REDICULOUS 166mio transistors on an enormous 280sqmm chip at a process so modern it's not even commercially available yet.
Well, just look at Sony's EE figures and talk again. At 90 nm, EE takes up 43 mm2, a surprisingly large sillicon real estate for a mere 13 million transistor design. Why? Lots of logic gates and few SRAM gates, a fate CELL cannot avoid... Kutaragi wants to pack in 32 VUs and 4 PPCs, and only 4 MB of cache.(Worth 36 million transistors if eDRAM cache, a drop in the budget next to 132 FPU's transistor counts). Everybody else packs in only 4 FPUs in their designs by comparison, be it Pentium4, Opteron, or Power4+.
 
Re: ...

DeadmeatGA said:
Now you're backpedalling. It was your assertion CPUs typically are rich in logic, now you're limiting yourself to saying just Sony CPUs are.
You are not seeing the big picture. Look at "High-end" parts from Intel, AMD, and IBM; they all include large L2 & L3 onchip cache to boost performance so the percentage of logic gates in the total gate count is in low 15~30% range.

...Which is the OPPOSITE of what you initially claimed! Now you're just parroting what I told you. ...Except, I doubt you'll find a processor with a logic count as low as 15%, possibly with the exception of a 6MB cache Itanic.

Since Sony likes to skimp on SRAM gates in order to pack in more FPUs in their CPUs, the total number of gates packed into each Sony CPU tend to be lower, even on advanced processes. Why? Because logic gates take up much more space than SRAM gates.

"Since Sony likes"? You take ONE processor and try to use it as if it was the norm. PSP has lots of eDRAM integrated on-die, as will Cell have. Sony doesn't like to skimp on SRAM, they did it with EE because that's how they designed that chip. One feather does not a duck make you know.

Yes you did. GPUs aren't TYPICALLY rich of SRAM; CPUs are. Exact opposite of what you're claiming!
Yes they are. Texture cache, vertex cache, frame buffer write cache, Z-buffer compression cache, cache everywhere...

There may be several of them, but none are in the range of 128-512k range of typical L1/L2 caches found on CPUs. Z and Framebuffer write cache is likely to be no more than a bunch of pixels' worth, probably in the form of a FIFO buffer.



GPU designs are simplier because the tasks are highly repetitive and need little to no branch control in their designs.

Irrelevant. You're going off on a tangent here...

Well, just look at Sony's EE figures and talk again. At 90 nm, EE takes up 43 mm2, a surprisingly large sillicon real estate for a mere 13 million transistor design. Why? Lots of logic gates and few SRAM gates

So using your math, how big would a model 1 AMD Athlon at 22 million transistors be at .09u then? Certainly way way bigger than what it would be using AMDs current .13u process! ;)

Note, it occupies 102sqmm at .18u, and before you come crying about logic vs. cache trannies, remember that the model 1 only has 128k of L1 cache on-die, and that's worth less than 1/4th the total transistor budget.

See now how stupid your example is, or do you need some more intellectual abuse before the message hits home that you're just PLAIN WRONG?

a fate CELL cannot avoid...

Here we KNOW you're just...out there. So you, with ZERO real knowledge and experience about what you're talking about presume to know better than a crack team of engineers? Christ... Calling you an intellectual monkey would be an insult to the monkeys of this world! :)

Btw, neither P4 nor Opteron has four FPUs.

*G*
 
I do believe Simplex Solutions worked extensivley with Sony in the past. They helped design the Graphics Synthesiser for the PS 2. SS is now a part of Cadence, but they are pushing their X Architecture. One of the companies cited by them as being involved is Toshiba.

Cadence is a co-sponsor of the X Initiative, a consortium of semiconductor supply chain companies that was created to advance the usage of the X Architecture, a new interconnect architecture based on the pervasive use of diagonal routing. Targeted at chips with five or more metal layers, the X Architecture rotates the primary direction of the interconnect in the fourth and fifth metal layers by 45 degrees in relation to conventional orthogonal, or "Manhattan," architecture. Layers one through three remain unchanged, preserving the design community's investment in existing cell libraries, memory cells, memory compilers, datapath compilers, and IP hard cores. In addition, the X Architecture allows 45-degree "wrong-way jogs," which provides an additional four degrees of freedom in each layer of routing.

The X Architecture's pervasive use of diagonal routing reduces wire-length by an average of 20%, resulting in simultaneous improvements in chip speed, power, and cost. Based on design results to date, X Architecture chips are expected to have:

20% wirelength reduction
30% fewer vias

All of these benefits contribute to an increased probability of first-silicon success. In addition, the X Architecture's wirelength reduction makes the routing problem 20% easier to solve, resulting in faster timing closure, improved reliability, and a reduction in signal-integrity problems.

Diagonal routing, with its more direct connection of chip components, is not a new idea. Full-custom and memory designs have used hand-routed diagonals for short, local routes for years. And the concept of an architecture with the fourth and fifth metal layers rotated by 45 degrees has been discussed and debated in academic circles for almost as long. (For more information, read The X Architecture: Not Your Father's Diagonal Wiring.) However, successful implementation of the architecture has, until today, been elusive, because the automatic physical design and parasitic extraction and analysis technologies required to automatically create and to model pervasive diagonal routing did not exist.

Over three years ago, a company called Simplex Solutions recognized this opportunity for innovation, and set about the complex and intense process of inventing the design technologies that would enable chipmakers to easily take advantage of the benefits offered by the pervasive use of diagonal interconnect. Simplex, which is now part of Cadence Design Systems, has collaborated with Toshiba Corporation on the feasibility and development of the X Architecture.


To address these concerns for the X Architecture, a growing group of leading companies from throughout the semiconductor supply chain has formed the X Initiative: a consortium for supply chain providers to learn about the X Architecture, to accelerate fabrication of X Architecture chips, and to track and promote success through first silicon and beyond. Cadence is proud to join Toshiba as a co-sponsor of the X Initiative. For more information about the X Architecture and the X Initiative, visit the X Initiative website at http://www.xinitiative.org.

THE PATH TO X
Toshiba, as the initial licensee of the Cadence technologies that enable the X Architecture, will produce the first X Architecture chips. Cadence will employ its enabling technology to create a limited number of X Architecture chips for customers this year. X Architecture chips will be more generally available, both through Cadence's SoC Design Foundry and through licensing arrangements with semiconductor partners, in the second half of 2002.


http://www.cadence.com/industry/x2.html
 
...

...Which is the OPPOSITE of what you initially claimed!
How is it so??? I tried to teach you nicely but you are pushing my patience. Now get this straight into your head. You have a fixed amount of die estate size, the amount of final transistor you put in is determined by what you put in. If you decide to put in all logic and little SRAM, the final transistor count will be very low. If you decided to put in little logic and lots of SRAM, then the final transistor count will be higher.

"High-end" workstation/PC CPU venders have old architectures dating back a decade or more and can't extract any more performance out of CPU core through use of superscalar execution and branch prediction, so they resort to shit load of cache to boost performance.

Except, I doubt you'll find a processor with a logic count as low as 15%, possibly with the exception of a 6MB cache Itanic.
Itanium2 is one, but there are others.

Pentium3 Coppermine had 28 million in total, but around 6 million was used for CPU core(21%)

Power4 is another CPU with most of its 170 million transistors put into cache.

Intel's upcoming 1 billion transistor Itanium has around 175 million transistors in logics and the rest going into cache(18%)

But CELL is not one of these, CELL packs in more FPUs at the expense of SRAM gates, thus its gate density is much lower than any of above designs.

PSP has lots of eDRAM integrated on-die, as will Cell have.
In case you haven't noticed, Sony is using a pair of vintage R4000s(1~2 million transistors) in their design to cut down on logic gate count in PSP, so they can put more of other stuffs. Doesn't work with supposely 132 FPU designs like EE3.

Sony doesn't like to skimp on SRAM, they did it with EE because that's how they designed that chip. One feather does not a duck make you know.
Well, Sony did skimp SRAM and eDRAM gates on both EE and GS, it shows that Sony faces design challenges like any other and has to make trade-offs.

There may be several of them, but none are in the range of 128-512k range of typical L1/L2 caches found on CPUs.
How large is the typical texture cache??? 32 KB is typical, some has it larger. That alone is worth 1 million transistors.

Irrelevant. You're going off on a tangent here...
It is relevant since simplier logic design means higher gate density. This is why you can't directly compare a GPU gate density to a CPU gate density.

So using your math, how big would a model 1 AMD Athlon at 22 million transistors be at .09u then?
I don't have the time to look at AMD's data, but it should be smaller than 43 mm2 that Sony is getting with EE.

Certainly way way bigger than what it would be using AMDs current .13u process!
Every architecture has different gate density, Sony architecture has one of the lowest gate density in the business because they like to pack in stuffs.

See now how stupid your example is, or do you need some more intellectual abuse before the message hits home that you're just PLAIN WRONG?
You are an idiot if you do not see the arying gate density between different architectures based on logic/SRAM ratio and try to enforce single density rate on all other chips. But then again, you show you are absolutely clueless.

Btw, neither P4 nor Opteron has four FPUs.
You need four FMACs to perform SSE or 3DNow instructions. Wrong again!
 
No, 2 multipliers and 2 accumulators. It would be nice for some computations if they had 4 FMACs instead which you could use for either multiplication or addition, but they dont.
 
The dirst post is wrong for a lot of reasons.

I didn't look hard for this...
But did anyone ever mention the number of circuit layers?

Die size is a big deal in relation to latency, heat/watt dissapation etc.

But these processor have a fifferent number of layer.
I think they even increased the layers as things shrink.

Presently what ever Cell is 8 or 9 layers seems reasonable.
Maybe more going as high as 12 layers.

These numbers don't matter. Just my point.
Post #1 missed the facts on it's way to being posted.
 
5 generations of shrinks, a clock which by now is so low as to be not a factor in design anymore ... I think the size of present VUs represent just about the absolute minimum of what they are going to fit 4-wide SIMD FP processors in at 90nm, and appropriately scaled at 65 nm too. Regardless of limitation of their old process.

In that there is IMO some value at looking at the EE.
 
Back
Top