SCEI & Toshiba unveil 65nm process with eDRAM

zidane1strife · Dec 4, 2002

If Sony goes for the all software approach in PS 3, they will lose.

I don't think they'll take an all software approach, anything that's not appropiate perf. wise for the cpu will be done in h/w... the next GS is likely to be more feature rich.

PS: isn't intel using 3gate design or something like that to stop leakage... if so this sony/toshiba approach certainly takes up less space.... seems better, not as brute force as what i've heard intel's doing.

PPS: EE/CELL,etc... 500M+ trans GS3(more than 256Mbit embedded ram) Clearly the system will have at the very 1B trans. combined....

MfA · Dec 4, 2002

The triple gated transistors are for even smaller feature sizes, an alternative to double gated transistors such as IBM's (and by extension in the future Sony/Toshiba's) FinFET.

zidane1strife · Dec 4, 2002

The triple gated transistors are for even smaller feature sizes

Are u sure? i thought i heard they'd start that at 65nm(not sure was a long time ago).... this press release makes me think the new material is sufficient to stop leakage without the need of multiple gates...

EDIT: found something... yeah it says "MAY" but so what...

http://img.cmpnet.com/eet/news/02/september/1237_PG_1.gif

BenSkywalker · Dec 4, 2002

Vince-

Whats the diffrence between a VU and a VS? Explain to me how a VU is 'general computing'. Whats the fundimental diffrence between a VU and the NV3x's new TCL front-end?

Explain to me why a 'general processor' like the EE or SH-4 can outpace the <quote> Hardwired <quote> solutions that you speak of that nVidia produced at the same time. Why does the EE utterly destroy the NV1x's TCL front-end?

ANSWER: Because it's throwing more tranistors at the problem. You [programmable] can maintain parity with a hardwired solution if you devote more resources [read logic] to the problem. This is simple. This is my point. Yet you fight it, over and over.

I guess to sum up my end of countering what you are saying, I see the GeForce1 as a better product in terms of graphics then the PS2(obviously the PS2 has the enormous benefit of being a fixed hardware platform which gives it a huge edge in real world situations). You keep wanting to look at narrowly defined areas and say that a CPU can compete in those particular areas. Let's throw eight million polys per second with trilinear, anistoropic, Dot3 and CubeMaps at the PS2 and see how it holds up. Carmack has stated that Doom3 was built around what was made possible with the GeForce1. Giants when it was ported over to the PS2 had to have downgraded graphics versus the DX7 build which ran very nicely on a GeForce1 at console resolutions. Obviously the game wasn't designed from the ground up around the PS2, else it would have eliminated many of the image enhancing features that the dedicated hardware offered and instead relied on increased poly counts.

The GeForce1 has advantages in certain areas despite the, according to you, 3x increase in complexity. Given fabrication advancements three times the transistor count and it still has shortcomings. Now why would I ever think that dedicated hardware would be better?

Thank you Lord!! Look at OGL and DX10+. The merger of the PS and VS is comming, architectures like the P10/9 are going to be the future.

And this means what to you? Let's see how a CPU enjoys trying to compute out the logic on early Z to reduce the amount of OD, loops back instructions that exceed the processors ability to handle in a single 'pass' and then deals with a BPU miscalc. SGI has viz machines with hundreds of MIPS processors already pushing out TFLOPS, and yet they rely on cutting edge dedicated hardware for their rasterization. Of course, they don't know nearly as much about 3D as Sony right?

Yep, what I'm advocating is that threw advanced lithography, you can increase the programmability of an architecture by a large amount while still maintining preformance parity with a comparable hardwired design. But, in order to do this, you must 'beat' Moore's Law - and thus to equal a hardwired design while maintaing flexibility - you must increase the usable transistor counts.

In rasterization terms the EE was not competitive. First they must significantly exceed Moore's law to catch rasterizers, they have a very long way to go to think about beating them.

This can be done threw: (a) More Advanced Lithography, (b) Multichip, (c) GRID/Cluster/Pervasive/or otherwise Computing, (d) Architectural Advance.

All of these have been used for a long time in off line rendering. They still can't compete with dedicated hardware. The Alpha chips were packing as much L2 cache per core as CELL is supposed to several years ago(although it was off die, the typical 300mm limit for consumer products doesn't apply to the higher end parts which has the same impact as more advanced build techniques). Placed up against an Athlon with a GeForce Alpha's got throttled. Render farms have been around for years, the latency of them(and this is LAN based, not WAN) make them useless for anything nearing real time.

Hey, and your always ready to argue back

Well of course

There is no way they will yeild a true 6.6TFLOPS in one console... period. I'll be impressed if they can output a true TFLOP and sustain it, but I'm not so sure.

Are you saying that Sony will not hit their initial claim?

I too, wonder if SCE will use a full software rendering approach with PS3, with the idea that the hardware will be more of a 'VU' like, scientific computing approach [not like the traditional CPU].

I don't think they will. The next GS has to have a decent amount of feature support or they will be killed by the XBox2 in the early going. Dealing with an entirely new architecture that is significantly different then anything else with, as of now, no compiler support? Their first gen games would be lucky to look much better then late life cycle XB or GC titles. Yes, five years down the road they might be able to pull of some very impressive things considering its software, but it would only compound the problems a lot of developers brought up when the PS2 dev kits first started circulating.

I doubt it would be near an nVidia powered solution, but does it have to be? Interesting questions emerge, such as with 1TFLOP [Which is well over the GSCube IIRC which rendered FF:TSW and Antz at 60fps] isn't that suficient? How much of a visual diffrence would be seen?

1TFLOP isn't that much for real time rendering. If you focus an extreme amount on your raw FP power you are going to sacrifice your interger performance. Current rasterizers are already pushing a trillion ops per second, you dedicate your die space to vector based ops it comes out of somewhere else. As far as FF:TSW being rendered in real time on a TFLOP machine, no, it wasn't perfect. The render farm for FF:TSW was pushing over a TFLOP(not sure exactly how much) and IIRC its total render time was in the several months range(for a two hour movie). Antz was rendered on a multi TFLOPs farm and also took months.

The biggest question I have, is if a developer had full control, and could tailor the entire 3D pipeline for his title - how much is gained in effeciency? I mean, they could literally do anything... hell, banish trinagles. All the petty arguments about the nV2A's PS and the TEV's features would disapear.

But current rasterizers are already headed in the fully prgrammable direction. The big difference is that the hardware is custom built around the pitfalls that general purpose CPUs will fall in to.

But, I bet there will be some sort of rasterizer/GSx

I expect so also, which you do realize makes most of our argument pointless(though I'm sure we will continue it for some time to come

).

V3-

Hmm, don't know about that. Those P4 is getting pretty fast.

It's not speculation. A TNT2 is roughly fifty times faster then a GHZ P3 rendering real time graphics(trilinear filtering etc, etc, not software compromised code). Given the P6 core's IPC edge it's going to take the P4 some time to catch up.

Randycat-

It's not like they are trying to do "software rendering" on some x86 CPU (had they been, you would certainly be indisputibly correct).

I'm also comparing a TNT2, nothing comparable to a R9700

If it is an array of rapid execution vector units (albeit, governed by software), that pretty much blurs the line with "dedicated hardware". It just happens to not be what nVidia is up to, IMO.

Let's say Sony squeezes a billion transistors on to their CELL chips for the PS3. You take 100Million for the 8MB(32Mb) eDRAM which leaves you with 900Million. Figuring for 32 cores you are looking at 28,125,000 transistors per core. That means on a per core basis you are dealing with about as many transistors as a P4(minus the L2 cache) with less memory per core. If Sony does manage to get 1Billion transistors on .065u build process I'm sure the clock speed will be well short of that offered by desktop x86 parts of the time frame. What do you expect them to do per core with a budget comparable to the P4?

As far as it not being what nVidia is up to, you could every other company involved in 3D to go along with nV

zidane1strife · Dec 4, 2002

As far as it not being what nVidia is up to, you could every other company involved in 3D to go along with nV

Well some sony dev.s teams are working with nv tech, and i'm sure the R&D teams are in contact with them. So i really don't see them deviating to a path that won't yield equally good results gphx wise...

If they take the vertex dedicated functions out of the gpu and throw them in a cpu, they can increase the speed and amount of proccessing units... this would give the gpu alot more free space for the pixel dedicated area, and allow for more b/w and perf.... at least that's what i think they want with GS3 and EE3/Cell/etc...

Fafalada · Dec 4, 2002

In rasterization terms the EE was not competitive.

When pray tell was EE ever built for rasterizing? It's largest part are custom built Vertex processors. This is akin to saying Flipper's Geometry Processor is slow at rasterizing, yet, behold, it's a completely hardwired dedicated part - nothing general purpose about it at all.

Although for a 99 part, EE would still make somewhat competent rasterizer - 150mpix/sec fillrate of JPeg decoded data is nothing to scoff at.

BenSkywalker · Dec 4, 2002

When pray tell was EE ever built for rasterizing?

That is pretty much my argument Faf

Using something particularly built for one thing in mind will beat a general purpose design easily. The fact that EE wasn't built for rasterizing is actually precisely why I bring it up.

MfA · Dec 4, 2002

Dont pay attention to analysts, Intel has said it is technology for the second half of the decade and that they dont absolutely need it till <30 nm.

zidane1strife · Dec 4, 2002

I doubt it would be near an nVidia powered solution, but does it have to be? Interesting questions emerge, such as with 1TFLOP

I know it's a bit unrealistic.... but i believe 10+TFLOPS will be achieved...

marconelly! · Dec 4, 2002

Let's throw eight million polys per second with trilinear, anistoropic, Dot3 and CubeMaps at the PS2 and see how it holds up.

I really don't know how GF1 would handle 8 million single textured polygons per second, much less anything of what you've mentioned, considering just how much GF2 I have chokes on much less geometry intensive scenes.

I haven't really followed your discussion with Vince, but from what I've heard from people in the know, VS and VU really have a lot in common, VU basically has expanded feature set that makes it more multi purpose, but VS is so simillar to it instruction wise that some people used to joke how nvidia reverse engineered VUs and implemented that into their chips

Guest · Dec 4, 2002

I would love to see GF1 run a game like LOTR:TT.

randycat99 · Dec 4, 2002

It is becoming apparent to me now that Ben was presupposing software rasterization when he typed software rendering and that implies that Sony would not use a GS-n in their next machine, where I (and possibly the rest of us) where assuming he was referring to T&L/vertex shading as part of the "software rendering" process vs. the hardware implementation done on an nVidia GPU. If that is the case, all I can say is "Duh", I wouldn't anticipate Sony to go with software rasterization, either (but it's not completely improbable at the same time, of course). It seems pretty obvious there would be a GS variant to do the actual rasterization in hardware, possibly based around a programmable pixel shader implementation this time around.

Vince · Dec 4, 2002

BenSkywalker said:
That is pretty much my argument Faf Using something particularly built for one thing in mind will beat a general purpose design easily. The fact that EE wasn't built for rasterizing is actually precisely why I bring it up.

True, but I was comparing the EE (VU's specifically) to the TCL front-end of the NV3x in particular. It would appear that at a fundimental level, they're quite similar.

My whole point is based on the idea that Cell will not be a 'general processor' like a P4 or Athlon. Just as you've done the tranistsor math, I have aswell: see your question first.

What do you expect them to do per core with a budget comparable to the P4?

Actually, I've been having problem getting the tranistor budget to fit with the expected preformance. I've been thinking of Cell in the 500-600M tranistsor (liberal IMHO) range and having problems getting the eDRAM to fit - which is obviously a necessary part of the cellular computing ideal.

<Speculation, but it's grounded>

If they can yeild 700M transistors on a 65micron or smaller process and clock it 1Ghz, they can do a TFlop as Kutaragi stated.

Devote 200M tranistsors for 16MB of eDRAM, thus we're down to around 500M for logic. SCE has historically used MIPS and has already liecenced the MIPS64 core. If they follow in the true ideal of cellular computing and reduce the core to only the minimum instruction set and features as well as the L2 cache, they could reduce the MIPS64 core to ~2M tranistors (Look up the core, the actual core is small), while it yeilds roughtly 4GFlops @ 1 Ghz.

Do the math and they can yeild 1TFlop at 1Ghz with 16Mb of eDRAM.

If they do away with 9 of every 10 MIPS cores and replace it with a FPU/VU type array (similar to Nv30's front end), they can yeild even more and are down to 25 cores. There are many such combinations possible, but these are some FPU heavy ones. Perhaps, someone better aquanted with the size of a TU or ALU could comment.

zidane1strife · Dec 4, 2002

Even if it's unrealistic(due to da ram) i believe 3Ghz will be achieved....
(ps1 30approxx... Mhz... ps2 300 approxx Mhz... ps3 3 approxxx Ghz)

PC-Engine · Dec 4, 2002

zidane1strife said:
Even if it's unrealistic(due to da ram) i believe 3Ghz will be achieved....
(ps1 30approxx... Mhz... ps2 300 approxx Mhz... ps3 3 approxxx Ghz)

It's MHz/GHz not Mhz/Ghz. Also it's "the" not "da"

BenSkywalker · Dec 5, 2002

Randy-

It is becoming apparent to me now that Ben was presupposing software rasterization when he typed software rendering

No, I was stating exactly what I meant, software rendering. You were supposing that software rendering does not include rasterization which it does and always has. I was working with 3D viz for years, software rendering is software rendering- rasterization is included in the rendering.

Vince-

True, but I was comparing the EE (VU's specifically) to the TCL front-end of the NV3x in particular. It would appear that at a fundimental level, they're quite similar.

My whole point is based on the idea that Cell will not be a 'general processor' like a P4 or Athlon. Just as you've done the tranistsor math, I have aswell: see your question first.

If you look at it from strictly a T&L perspective then a decent enough CPU can compare in performance(well, no single CPU can currently match the T&L performance of a NV3X but in a hypothetical situation), but that ignores a lot of the other aspects that current GPUs handle.

Actually, I've been having problem getting the tranistor budget to fit with the expected preformance. I've been thinking of Cell in the 500-600M tranistsor (liberal IMHO) range and having problems getting the eDRAM to fit - which is obviously a necessary part of the cellular computing ideal.

I find your numbers far more likely then mine, I was trying to give Sony every benefit though

Devote 200M tranistsors for 16MB of eDRAM, thus we're down to around 500M for logic. SCE has historically used MIPS and has already liecenced the MIPS64 core. If they follow in the true ideal of cellular computing and reduce the core to only the minimum instruction set and features as well as the L2 cache, they could reduce the MIPS64 core to ~2M tranistors (Look up the core, the actual core is small), while it yeilds roughtly 4GFlops @ 1 Ghz.

Do the math and they can yeild 1TFlop at 1Ghz with 16Mb of eDRAM.

In that situation you only leave each core with 65K RAM, just over half what a Celery is packing. That would make things a bit sticky to try and deal with IMO.

If they do away with 9 of every 10 MIPS cores and replace it with a FPU/VU type array (similar to Nv30's front end), they can yeild even more and are down to 25 cores. There are many such combinations possible, but these are some FPU heavy ones. Perhaps, someone better aquanted with the size of a TU or ALU could comment.

Even if they are capable of getting a FPU/VU type array crammed in to a very dense space would they really want to? Cut the pipeline down too short and their clock speed is going to take a big hit. As it stands now, MIPS aren't exactly known for their blistering clock rates

Marconelly-

I really don't know how GF1 would handle 8 million single textured polygons per second, much less anything of what you've mentioned, considering just how much GF2 I have chokes on much less geometry intensive scenes.

I'm speaking on a chip level, not system. Have a developer custom code a game from the ground up for what the GeForce1 is capable of and you end up with something like Doom3(Carmack stated the game was built around what was possible with the GeForce1). Obviously the poly counts are extremely low in that particular title, and performance on a GF1 will stink(although a ~100% boost in performance would occur if it was custom coded, that is according to Carmack), but the capabilities of the GF1 are not limited to what you see exploited in a typical PC game.

I haven't really followed your discussion with Vince, but from what I've heard from people in the know, VS and VU really have a lot in common, VU basically has expanded feature set that makes it more multi purpose, but VS is so simillar to it instruction wise that some people used to joke how nvidia reverse engineered VUs and implemented that into their chips

There are external factors when comparing the two, for instance you have to move the data from the VUs to the GS, using up the VUs for T&L means they aren't available for other uses etc.

randycat99 · Dec 5, 2002

BenSkywalker said:
No, I was stating exactly what I meant, software rendering. You were supposing that software rendering does not include rasterization which it does and always has. I was working with 3D viz for years, software rendering is software rendering- rasterization is included in the rendering.

Take it easy there, chief. I wasn't trying to bash you, just understand where you are coming from to say what you did. It seems to me that most people here assume that you are referring to operation of the vector units as T&L modules in a "software-related" manner when you mentioned "software rendering". It kind of goes w/o saying that the basic rasterization will be done by dedicated hardware, as that has shown to be the best way to go about it at this time. So if you want to get technical about it, I suppose you could say Sony's current graphics approach includes software and hardware aspects.

The idea it seems you were presenting to us seems to be, indeed, a full software rendering implementation all the way from T&L to rasterization. Whether or not Sony is really planning such an approach remains to be seen, IMO. I don't think you will find much argument that it would be tough to make a general CPU to do rasterization faster than dedicated hardware (so you are somewhat preaching to the choir). It just makes the most sense that if someone was to adapt a "software rendering" approach, they are likely talking about the front-end stuff with basic rasterization still done in hardware. Maybe that is a bit too informal for your tastes, but no need for you to presume that I was implying the polar opposite of your idea to show how wrong I was. That's my take on it.

On the flipside, I guess it wouldn't be completely out of the question to do the rasterization using an army of general purpose vector units. In that respect, I guess Sony could actually do a fully "software renderer" implementation, but you have to admit that the distinction of what is "traditional hardware" or "traditional software" becomes more blurred. Does it really matter if virtually the same mechanism is doing the task if it is on a chip marked as "CPU" vs. a chip marked as "GPU"?

marconelly! · Dec 5, 2002

I'm speaking on a chip level, not system. Have a developer custom code a game from the ground up for what the GeForce1 is capable of and you end up with something like Doom3(Carmack stated the game was built around what was possible with the GeForce1). Obviously the poly counts are extremely low in that particular title, and performance on a GF1 will stink(although a ~100% boost in performance would occur if it was custom coded, that is according to Carmack), but the capabilities of the GF1 are not limited to what you see exploited in a typical PC game.

I understand all that, but considering that on my GF2 Doom 3 Alpha ran at approx 1-5FPS whenever there was anything moving on the screen, and if GF1 would be able to reach that same framerate if it was optimized for it, I still don't see much point.

I just don't see any imaginable GeForce 1 configuration being capable to run game like MGS2 or Silent Hill 3 at 60 or 30 FPS respectively, where PS2 is obviously capable of it.

Fafalada · Dec 5, 2002

I'm speaking on a chip level, not system.

Well GF1 needs a CPU to perform a great many things, so your comparing of chip alone to a system(you compared it to the whole PS2) is pointless.
While on the subject, what magic copy of Giants did you run "well" on GF1 that even looked good if I may ask? The one I played runs like complete crap on anything below GF3 & 1+ghz cpu.

Btw Marc, considering D3 demo likes to be on the sluggish side even on a R9700 with P4 2400 I dread to think how it must be like with a GF2

Of course, that's what you get for trying non-public alphas...

BenSkywalker · Dec 5, 2002

Randy-

Does it really matter if virtually the same mechanism is doing the task if it is on a chip marked as "CPU" vs. a chip marked as "GPU"?

From a performance perspective- Does eDRAM offer any benefits? Why? That is a single reason why having certain functions on the GPU is an advatnage over a CPU. That ignores things such as customizations to eliminate certain possible major performance shortcomings(BPU miscalc on a lengthy shader op with branches would be a major hit on a CPU).

Marconelly-

I understand all that, but considering that on my GF2 Doom 3 Alpha ran at approx 1-5FPS whenever there was anything moving on the screen, and if GF1 would be able to reach that same framerate if it was optimized for it, I still don't see much point.

An alpha demo build isn't a good way to gauge performance. A GeForce1 is currently still the base line for playing the game last I was aware.

I just don't see any imaginable GeForce 1 configuration being capable to run game like MGS2 or Silent Hill 3 at 60 or 30 FPS respectively, where PS2 is obviously capable of it.

And the PS2 can't run Giants without compromises, nor could it run JKII or Mafia(both without compromises of course) as a few examples. Obviously when coding you are going to take certain things into consideration on a fixed platform. The big difference is I can pull up examples that were not built from the ground up for the GF1 that won't run on the PS2.

SCEI & Toshiba unveil 65nm process with eDRAM

Guest

Guest

Similar threads