PDA

View Full Version : Huddy says "R600"


Pages : 1 2 [3] 4

Jawed
20-Sep-2006, 23:20
The X1k "sell sheets" also had their errors.

There is one thing there that gives a glimmer of credibility: "Major Z/stencil architecture improvement". There was a patent that popped-up recently on precisely this subject.

So if it's fake, the faker is paying attention :lol:

The fact these say practically nothing, otherwise, is pretty boring though.

Jawed

Geo
20-Sep-2006, 23:48
at least we are over the watt discussion because it started to annoy me :wink:

Thank you for that!

Call me naive, but those slides look like they might be legit to me. Obviously written by a marketing guy, as they maximize the gosh gee whiz without actually saying a great deal.

The lack of a few more proprietary buzz words bothers me tho. There's just gotta be some new buzzwords with R600, doesn't there? Where's our Kaleidoscope? :lol: Actually, the lack of calling the HD video thingy "UVD", which they've already started to evangelize, bothers me.

Edit: Btw, there's always the possibility that "or/of" typo is there on purpose to ID the recipient of this copy, and someone will be getting a nasty-gram because of it.

Pete
21-Sep-2006, 00:41
That's a lot of IDs on one slide, geo: "Major shaders improvement" and "No performance degredation," too. (If anyone deserves a nastygram, it's the writer, from his high-school English teacher. He capitalizes as randomly as Hellbinder, too. :smile: )

The slides say enough things just right enough to make me think they're legit. Considering them "sell sheets" explains their vague and overzealous (spectacular/ supreme/ massive/ absolute/ double performance) proclamations. They got pretty specific with "80GB/s" and "2006 Holiday," though.

As always, questions remain. "4x AA w/o perf. degradation?" They've been peddling this for a while, but "4x" now? Ballsy. And does this directly relate to Z/stencil? Jawed, could you link to that patent (if it's not this one)?

Also, "More powerful, unified shaders" contrasts with Xenos' individually less powerful but altogether more shaders. I thought the idea of unified shaders was to strip away some of the unique PS/VS functionality and just go with as many "general-purpose" processing units as possible. Then again, if, as nAo said (in the new G80 thread), 85% of a GPU's power is dedicated to massaging fragments, I guess PS-specific acceleration isn't really extra but optimal.

Rangers
21-Sep-2006, 01:27
Well, in the battle of the dueling probably fake slides, G80 is said to be ~3X as fast as 7900GTX (or, "3X 3Dmark"), and R600 twice as fast as R580.

Considering R580 is maybe 5-10% faster than G71 to start with, would be a ~36% lead for Nvidia by my calculations. Actually not entirely unrealistic.

YES IM AWARE THE ABOVE IS UTTER SPECULATION.

DmitryKo
21-Sep-2006, 02:26
Last time I looked into my UPS software, a P4 521 with i915G, 1GB of RAM, an Audigy2, a hard drive, and a GF6600GT card was consuming 115 W when idle and 150 W when playing a D3D game. Now I've sold the video card and this same system is cosuming 75 W when idle and 100 W when playing the same game with GMA900. And my PSU is rated at 400 W (http://www.fsp-group.com.tw/english/1_product/2_detail.asp?mainid=1&fid=52&proid=198) (it's an OEM rebadge of a high-end unit (http://www.fsp-group.com.tw/english/1_product/2_detail.asp?mainid=3D1&fid=3D52&proid=3D214)) and provides up to 350 W at 12 V, so it probably handles Quad SLI with no problr

Now why any SLI setup requires at least 600 W PSU, even the one that uses lowly cards like the7300GS, is simply beyound me (not to mention these monster 1000 W units)... they must either be playing safe with a possible use of pre-ATX2.01 designs (where 12V power supply is rather limited), or trying to avoid higgh loads for increased heat efficiency and lower noiswe...

Jawed
21-Sep-2006, 02:58
Pete, I'm referring to what's mentioned in this thread (the last two patent apps):

http://www.beyond3d.com/forum/showthread.php?t=32653

Shockingly, there was absolutely no response.

[One thing I've learnt (not just from here, mind), though I often ignore the resulting principle: post a lot of content aimed at no-one in particular and everyone just shrugs]

In fact, blimey, I need to have a proper root through all that stuff in that thread. There's gotta be a hell of a lot of R600-relevant stuff in there :shock: :shock: That's tomorrow morning's rummage sorted. Now, can I resist a peek before going to bed?...

Hmm, no...

Jawed

LeStoffer
21-Sep-2006, 08:06
Look at the small symbol on the G80 slide: http://img132.imageshack.us/my.php?image=hardspellslidenj6.jpg
Clearly, it's the same morons spreading the same FUD.

Uttar

I'll second that. Besides the 'over 80 GB/s memory bandwidth' it's all just fluff talk that most of us could do better. :roll:

Rangers
21-Sep-2006, 10:14
New Fudo FUD!

http://www.theinquirer.net/default.aspx?article=34510

I cat wait to see how this compares architecturally to Xenos, and if we can draw any conclusion of Xenos performance therein.

All depends on if the "64 shaders" are single or single+mini mostly I guess.

Also if they are targeting only 700..that makes the 1.5 upper bound from G80 slides all the more suspected.

_xxx_
21-Sep-2006, 13:26
Crappy, they definately look fake.

"R600 will double the performance or R580"

...

EDIT: Beaten! was searching for more spelling/grammatical errors :p

You know, I always have to wonder what kind of sad, boring life people must have to invest their valuable time into making faked slides for forum fame. I really don't get it. :???:

_xxx_
21-Sep-2006, 13:30
Now why any SLI setup requires at least 600 W PSU, even the one that uses lowly cards like the7300GS, is simply beyound me (not to mention these monster 1000 W units)... they must either be playing safe with a possible use of pre-ATX2.01 designs (where 12V power supply is rather limited), or trying to avoid higgh loads for increased heat efficiency and lower noiswe...

Simple, because many people buy the cheapest PSU they can find, which might effectively deliver (unstable) half of what it says on the box. Just "making sure" people will have enough juice to power the rig, not like anyone will ever need real 600W.

Jawed
25-Sep-2006, 17:54
Ted brought us this patent a long time ago (when we were discussing R520, and in which it didn't show up):

Appearance determination using fragment reduction (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2005179700&F=0)

A method for determining the appearance of a pixel includes receiving fragment data for a pixel to be rendered; storing the fragment data; and determining an appearance value for the pixel based on the stored fragment data, wherein a portion of the stored fragment data is dropped when the number of fragment data per pixel exceeds a threshold value enabling large savings in memory footprint without impacting perceivably on the image quality. A graphics processor includes a rasterizer operative to generate fragment data for a pixel to be rendered in response to primitive information; and a render back end circuit, coupled to the rasterizer, operative to determine a pixel appearance value based on the fragment data by dropping the fragment data having the least effect on pixel appearance.
I'm keeping my fingers crossed that this will see the light of day in R600.

With it high levels of MSAA can be applied at much lower storage/bandwidth cost than ATI's current MSAA scheme.

Presuming that M$ will create strict definitions for MSAA quality, is this technique effectively off-limits?...

Jawed

vertex_shader
26-Sep-2006, 16:45
Fuad goes crazy again, or not ?:wink:

"ATI has daft codenames for RV6x0 chips"

The first one is codenamed RV610 or, as the engineers codenamed it, Bum. The second one is RV630, or engineering codename Shaka, and the last one RV660 is codenamed Laka.

When you put these three codenames together you get Bum Shaka Laka, which is, of course, richly amusing. Who said that engineers don't have the sense of humour?"
http://www.theinq.com/default.aspx?article=34627

Sounds funny :lol:

digitalwanderer
26-Sep-2006, 16:47
Fuad goes crazy again, or not ?:wink:

"ATI has daft codenames for RV6x0 chips"

The first one is codenamed RV610 or, as the engineers codenamed it, Bum. The second one is RV630, or engineering codename Shaka, and the last one RV660 is codenamed Laka.

When you put these three codenames together you get Bum Shaka Laka, which is, of course, richly amusing. Who said that engineers don't have the sense of humour?"
http://www.theinq.com/default.aspx?article=34627

Sounds funny :lol:
So the high end will of course be "Boom"...

Geo
26-Sep-2006, 17:29
So the high end will of course be "Boom"...


And I'm thinking there will be two "Laka" skus. . . .

Pete
30-Sep-2006, 07:06
Anyone attend ATI's Stream Computing event and notice if their still-CEO was winking or in any other way twitching when he said (http://techreport.com/onearticle.x/10907)
48 today, general FP engines. Direction now is to move to common elements for geometry, raster processing. Today, it's 48, future, maybe it's 96.
But he stems the wonderment with
We have 375 gigaflops, 64 GB/s. We're about a third of a teraflop. Next generation will exceed half a teraflop on a single GPU.
Well, poop. No more specific than "exceed half a teraflop?" Is that like G80 rumored to exceed half a billion transistors? :) B/c the former could lead to 64 shader units (would be strange, considering R580's already packing 48 and R600's rumored to weigh in with a ginormous pin count but not make much clockspeed headway), while the latter promises 96. Anyone know if Orton's been reading VR-Zone's rumors? :twisted:

(Very interesting to compare this to Intel's upcoming teraflop chip. Not in direct competition, eh?)

Arun
30-Sep-2006, 10:35
(Very interesting to compare this to Intel's upcoming teraflop chip. Not in direct competition, eh?)Yeah, R600 would most likely trounce it in integer math.

Uttar

Jawed
30-Sep-2006, 13:39
Well, poop. No more specific than "exceed half a teraflop?" Is that like G80 rumored to exceed half a billion transistors? :) B/c the former could lead to 64 shader units (would be strange, considering R580's already packing 48 and R600's rumored to weigh in with a ginormous pin count but not make much clockspeed headway), while the latter promises 96. Anyone know if Orton's been reading VR-Zone's rumors? :twisted:
I have to admit, the idea of R600 being a mere 64 ALU pipelines is very disappointing, it just doesn't sound like it'd be worth the wait.

I think it's worth cautioning that, per ALU pipeline, Xenos has less FLOPs (for what it's worth) than R5xx's pixel shaders.

The Xenos ALU pipeline is Vec4 MAD + Scalar special-function (rcp, rsq etc.). It's problematic because I think the scalar unit is not MAD capable, so you can't simplify FLOPs-counting by saying it's 10 FLOPs per clock. You can't call it 8 FLOPs (Vec4 MAD) either, because the scalar unit can still do at least an ADD. The more complex scalar functions are arguably not 1 FLOP, either.

But for the time being, say Xenos is 9 FLOPs. R5xx is Vec4 MAD (or Vec3 MAD + scalar special-function) plus Vec4 ADD, which is usually summarised as 12 FLOPs.

So I expect R600 to have a 3/4 the per-ALU-pipe FLOPs, based upon the Xenos pipeline (with additional integer functionality).

3/4 * 374 * 64/48 = 374 GFLOPs

3/4 * 374 * 96/48 = 561 GFLOPs

So, 96 ALU-pipes it is, then (assuming 650MHz) :razz:

Jawed

Jawed
30-Sep-2006, 13:44
So, does that make R600 16-1-6-1 or 32-1-3-1?

Jawed

pjbliverpool
30-Sep-2006, 15:38
I have to admit, the idea of R600 being a mere 64 ALU pipelines is very disappointing, it just doesn't sound like it'd be worth the wait.

I think it's worth cautioning that, per ALU pipeline, Xenos has less FLOPs (for what it's worth) than R5xx's pixel shaders.

The Xenos ALU pipeline is Vec4 MAD + Scalar special-function (rcp, rsq etc.). It's problematic because I think the scalar unit is not MAD capable, so you can't simplify FLOPs-counting by saying it's 10 FLOPs per clock. You can't call it 8 FLOPs (Vec4 MAD) either, because the scalar unit can still do at least an ADD. The more complex scalar functions are arguably not 1 FLOP, either.

But for the time being, say Xenos is 9 FLOPs. R5xx is Vec4 MAD (or Vec3 MAD + scalar special-function) plus Vec4 ADD, which is usually summarised as 12 FLOPs.

So I expect R600 to have a 3/4 the per-ALU-pipe FLOPs, based upon the Xenos pipeline (with additional integer functionality).

3/4 * 374 * 64/48 = 374 GFLOPs

3/4 * 374 * 96/48 = 561 GFLOPs

So, 96 ALU-pipes it is, then (assuming 650MHz) :razz:

Jawed

With Xenos like ALU's and only 64 of them, R600 would need to run at 740Mhz merely to equal R580's total shading power (including vertex shaders). Obviousky it would need to have more than R580 meaning an even higher clock speed. I find that pretty unlikely so 96 ALU's, assuming they are Xenos like seems a dead cert.

Even at only 500Mhz R600 would edge out R580 in total shader FLOPs, 600Mhz would give "over half a Terraflop" at 518GFLOPs.

trumphsiao
01-Oct-2006, 02:13
With Xenos like ALU's and only 64 of them, R600 would need to run at 740Mhz merely to equal R580's total shading power (including vertex shaders). Obviousky it would need to have more than R580 meaning an even higher clock speed. I find that pretty unlikely so 96 ALU's, assuming they are Xenos like seems a dead cert.

Even at only 500Mhz R600 would edge out R580 in total shader FLOPs, 600Mhz would give "over half a Terraflop" at 518GFLOPs.


Jawed, R600 present Clock-Speed is less than 4XXMHz..........

Jawed
01-Oct-2006, 02:16
Jawed, R600 present Clock-Speed is less than 4XXMHz..........
LOL, that's prolly why it's gonna be later than we were expecting. On second thoughts, not so funny, R520 all over again? :cry:

Jawed

SugarCoat
01-Oct-2006, 02:24
LOL, that's prolly why it's gonna be later than we were expecting. On second thoughts, not so funny, R520 all over again? :cry:

Jawed

X=10!

41010!

Jawed
01-Oct-2006, 02:26
Damn I really did laugh then.

Jawed

trumphsiao
01-Oct-2006, 03:00
Damn I really did laugh then.

Jawed


I hope R600 will be NV30-fiasco no more. but who knows

Razor1
01-Oct-2006, 03:05
I hope R600 will be NV30-fiasco no more. but who knows


The more and more I hear about those 64 ALU's/Pipes, if it is, its going to get creamed, Jawed how did ya arrive at that 96 ALU's? 48x2 Array's?

trumphsiao
01-Oct-2006, 03:32
The more and more I hear about those 64 ALU's/Pipes, if it is, its going to get creamed, Jawed how did ya arrive at that 96 ALU's? 48x2 Array's?


well G80 PS ALU is MIMD 1D (still dont espy raw performance of MIMD PS ALU, I mean maybe first time we see such kind of PS ALU available . If this rumor is absolutely true. )

R6XX series ,on the hindmost point , 64 ALUs as a whole from raw performance ( C1 ALU ???) maybe give Nvidia a comeuppance .( If Double-clock feature , be unavailable on High-End/Mid-End /Low-End product line.

Pete
01-Oct-2006, 06:17
You know, I keep (re)forgetting the forest for the trees, and I get caught up in these random number snippets (in this case, not even a real leak) while forgetting current GPUs. Namely, Xenos already has 64 shader ALUs, it's just got 1/4 disabled for yields. So it's pretty daft of me to expect R600, with its rumored ginormous die/pad/pin count/whatever, to pack only as many ALUs as Xenos, even if it were clocked at 1GHz.

No, I'd allow for at least 96 and probably 128 ALUs, and most likely all enabled on the crown SKU. Now, if DX10's integer calc requirement means that R600's ALUs pack more transistors than R580's, that might favor the lower number.

But I keep referring back to G80's 48 pixel "processors," and wondering if we're going to see something significantly different than NV40/G70. In fact, I'm expecting decoupled TMUs, assuming DX10's integer requirement forced NV to rework its current ALUs enough to make the TMU "integrated" in ALU1 no longer a transistor-efficiency win. And 48 TMUs sounds stupid high. That could suggest 48 individual fragment ALUs, rather than the current 24x2 layout. This makes 64 ALUs on R600 sound reasonable (however seemingly improbable, given Xenos). But then you'd think games moving forward would require more shader calcs per pixel, so grouping two ALUs into a single pixel "processor" (dual core, heh) might make sense, in which case we have G80 with 96 fragment ALUs (double G70). That would certainly indicate--require, basically--way more than 64 ALUs in Xenos.

I have no idea how to factor MIMD fragment ALUs into my over-interpretation of "48 pixel processors." In fact, I'm not even sure what "MIMD 1D" means, nor do I know if MIMD makes sense in a fragment processor. This make my attempts at prognostication all the funnier. Well, I'm laughing, at least.

Damn this waiting. :lol:

Anyway, I'm with you, Jawed. It's got to be packing more than 64 ALUs.

Junkstyle
01-Oct-2006, 07:17
if the RS600 is going to be so freakin awesome then why did ATI sell to AMD?

trumphsiao
01-Oct-2006, 08:40
You know, I keep (re)forgetting the forest for the trees, and I get caught up in these random number snippets (in this case, not even a real leak) while forgetting current GPUs. Namely, Xenos already has 64 shader ALUs, it's just got 1/4 disabled for yields. So it's pretty daft of me to expect R600, with its rumored ginormous die/pad/pin count/whatever, to pack only as many ALUs as Xenos, even if it were clocked at 1GHz.

No, I'd allow for at least 96 and probably 128 ALUs, and most likely all enabled on the crown SKU. Now, if DX10's integer calc requirement means that R600's ALUs pack more transistors than R580's, that might favor the lower number.

But I keep referring back to G80's 48 pixel "processors," and wondering if we're going to see something significantly different than NV40/G70. In fact, I'm expecting decoupled TMUs, assuming DX10's integer requirement forced NV to rework its current ALUs enough to make the TMU "integrated" in ALU1 no longer a transistor-efficiency win. And 48 TMUs sounds stupid high. That could suggest 48 individual fragment ALUs, rather than the current 24x2 layout. This makes 64 ALUs on R600 sound reasonable (however seemingly improbable, given Xenos). But then you'd think games moving forward would require more shader calcs per pixel, so grouping two ALUs into a single pixel "processor" (dual core, heh) might make sense, in which case we have G80 with 96 fragment ALUs (double G70). That would certainly indicate--require, basically--way more than 64 ALUs in Xenos.

I have no idea how to factor MIMD fragment ALUs into my over-interpretation of "48 pixel processors." In fact, I'm not even sure what "MIMD 1D" means, nor do I know if MIMD makes sense in a fragment processor. This make my attempts at prognostication all the funnier. Well, I'm laughing, at least.

Damn this waiting. :lol:

Anyway, I'm with you, Jawed. It's got to be packing more than 64 ALUs.


LOL, you guys are funny cause exact R600 mediocre performance. period

LeStoffer
01-Oct-2006, 08:59
LOL, you guys are funny cause exact R600 mediocre performance. period

Sorry, I cant make sense of out that you're trying to say?

rwolf
01-Oct-2006, 10:08
if the RS600 is going to be so freakin awesome then why did ATI sell to AMD?

RS600 is a motherboard chipset not a graphics card chip like R600.

Sunrise
01-Oct-2006, 11:15
Sorry, I cant make sense of out that you're trying to say?
He meant that the numbers speak for themselves, irrespective of their original resulting and relative nature and that no one else here can see R600 to be a bad performer, but him, because his sources already told him that R600 @ 4XXMHz < G80 @ ~600MHz, and of course, the rule of thumb for the typical layman says something like: "If it´s slow on the MHz, it has to be bad". :cool:

Who would I be to question ATI´s decisions, if not an outsider with no knowledge of R600 at all (at least not at a conscious level :lol:) where I therefore find myself unable (granted, it´s like an extreme sport, so it´s also highly addictive, but never forget the damn rope! :lol:) to come to such "strange" conclusions beforehand.

If Jawed were a pot of shares from a new start-up, i´d rather try to subscribe for them all before they finally get listed and go through the roof, because otherwise, you´ll just keep looking at the numbers and will continue to ask yourself, how the hell could that happen. :wink:

trumphsiao
01-Oct-2006, 13:04
He meant that the numbers speak for themselves, irrespective of their original resulting and relative nature and that no one else here can see R600 to be a bad performer, but him, because his sources already told him that R600 @ 4XXMHz < G80 @ ~600MHz, and of course, the rule of thumb for the typical layman says something like: "If it´s slow on the MHz, it has to be bad". :cool:

Who would I be to question ATI´s decisions, if not an outsider with no knowledge of R600 at all (at least not at a conscious level :lol:) where I therefore find myself unable (granted, it´s like an extreme sport, so it´s also highly addictive, but never forget the damn rope! :lol:) to come to such "strange" conclusions beforehand.

If Jawed were a pot of shares from a new start-up, i´d rather try to subscribe for them all before they finally get listed and go through the roof, because otherwise, you´ll just keep looking at the numbers and will continue to ask yourself, how the hell could that happen. :wink:


The problem on R600 now have been redirected to yield issue and Core Comsumption.

and history repeats itself . :grin:

Jawed
01-Oct-2006, 14:53
The more and more I hear about those 64 ALU's/Pipes, if it is, its going to get creamed, Jawed how did ya arrive at that 96 ALU's? 48x2 Array's?
No, either 4 arrays of 24 (16-1-6-1) or 8 arrays of 12 (32-1-3-1).

It depends on whether there's 4 shader units (each of 4 TMUs, 24 ALUs, 4 ROPs) or 8 shader units (each of 4 TMUs, 12 ALUs, 4 ROPs).

In other words, is R600 16 TMUs/ROPs or 32 TMUs/ROPs? Assuming a 1:1 ratio between TMUs and ROPs.

Frankly 32 is unlikely, so 16-1-6-1 is my guess :lol: :lol: :lol: :lol:

I'd be pleased as punch if it was 16-1-6-2... Delirious if it was 32-1-3-2.

Jawed

Jawed
01-Oct-2006, 15:05
well G80 PS ALU is MIMD 1D (still dont espy raw performance of MIMD PS ALU, I mean maybe first time we see such kind of PS ALU available . If this rumor is absolutely true. )
Aha! I finally understand what you're saying now.

In GPUs we're used to Vec4 or Vec5 ALUs, with the ability of one channel to split-off for scalar operations. In NV4x NVidia had a more flexible design: Vec2+Vec2 within a ALU or scalar+scalar.

What you're saying is that G80 executes all channels on separate scalar ALUs. This is more complicated to implement, but it means that the pipeline can issue more combinations of instructions at the same time. This also increases the chances that each channel (of a nominal 4-channel ALU) is actually being used.

In other words, this should result in a significant increase in per-ALU utilisation, per clock.

Groovy :!: Needs taking to the G80 thread!

R6XX series ,on the hindmost point , 64 ALUs as a whole from raw performance ( C1 ALU ???) maybe give Nvidia a comeuppance .( If Double-clock feature , be unavailable on High-End/Mid-End /Low-End product line.
I admit I can't work out what you mean.

Jawed

Razor1
01-Oct-2006, 15:39
No, either 4 arrays of 24 (16-1-6-1) or 8 arrays of 12 (32-1-3-1).

It depends on whether there's 4 shader units (each of 4 TMUs, 24 ALUs, 4 ROPs) or 8 shader units (each of 4 TMUs, 12 ALUs, 4 ROPs).

In other words, is R600 16 TMUs/ROPs or 32 TMUs/ROPs? Assuming a 1:1 ratio between TMUs and ROPs.

Frankly 32 is unlikely, so 16-1-6-1 is my guess :lol: :lol: :lol: :lol:

I'd be pleased as punch if it was 16-1-6-2... Delirious if it was 32-1-3-2.

Jawed

Ah ok, interesting, but I still have a concerned on muliple ALU's per array, because if an array is working on a pixel shader would it also be able to also work on a vertex shader at the same time?

Edit:

Even though we are saying 16 array's, we are talking more like r580 array's, were the alu's are independant of each other that might be a possibility?

Jawed
01-Oct-2006, 16:23
Even though we are saying 16 array's, we are talking more like r580 array's, were the alu's are independant of each other that might be a possibility?
In terms of R580:

R580 is 16-1-3-1, spread over four shader units. Each shader unit consists of 4 TMUs, 12 ALUs and 4 ROPs. The 12 ALUs in each shader unit are arranged as 3 quads.

"Array" in Xenos means "shared program counter". In R580 an array of 12 ALUs sharing a program counter is integrated with TMUs and ROPs within a single shader unit. So, a shader unit contains:

a quad of TMUs
a quad of ROPs
an array of 3 quads of ALUsSo, if R600 is 16-1-6-1, and it has four shader units, each shader unit consists of:

a quad of TMUs
a quad of ROPs
an array of 6 quads of ALUsAll of this is my guesses about how R600 organises ALUs/arrays (program counters) alongside TMUs and ROPs. I'm pretty sure ATI is sticking with screen-space tiling (witness the latest hierarchical-Z patent), which implies to me that the architecture of R580's pixel shaders becomes the basis of the entire shading core of R600.

Other ideas always welcome...

Jawed

Chalnoth
01-Oct-2006, 16:26
With nVidia going 384-bit, the likelihood of a larger bus also seems relatively likely for the R600. Thus it seems much more likely that the R600 will not have only 16 texture units, but will move to 24 or 32. I personally doubt that a more than a 4-1 ALU to TEX ratio is likely.

Jawed
01-Oct-2006, 16:35
if an array is working on a pixel shader would it also be able to also work on a vertex shader at the same time?
Yes. Fragments, vertices and primitives are all just "types of object" as far as a unified pipeline is concerned.

When vertices or primitives are shaded the output data is collected in caches.

When fragments are shaded the output is collected by the ROPs for render target processing.

It's trivial for the pipeline to keep track of which type of object it's working on, and so determine whether to dump the results in the cache or the ROP.

Similarly, the unified pipeline collects data from the relevant cache or buffer depending on the instructions for a new thread, as determined by the thread despatcher. The thread despatcher merely surveys the caches/buffers for work to do and compares that with the running tables that indicate current workload, to work out which type of thread to despatch next.

e.g. if the vertex cache is full and the pixel workload is low, then it's prolly time to start a new batch of fragments!

Once threads are despatched, managing their relative priorities is a non-trivial matter of balancing shader execution time versus latency versus buffer/cache usage.

R580 already does a lot of this, but it only has to worry about one type of thread: fragments. R600's job is more complex, but logically it all comes down to time-sliced execution of a pool of threads across a number of independent arrays.

Jawed

no-X
01-Oct-2006, 16:42
With nVidia going 384-bit, the likelihood of a larger bus also seems relatively likely for the R600. Thus it seems much more likely that the R600 will not have only 16 texture units, but will move to 24 or 32. I personally doubt that a more than a 4-1 ALU to TEX ratio is likely.
4:1? So what about 24-1-4-1/2 confguration? Is it complete nonsense?

Razor1
01-Oct-2006, 16:42
That makes more sense, but now I'm thinking the control silicon is going to be much more complex with a mutliple ALU per array approach since some vertex shaders have to be done before some of the pixel shader calculations, the schedular/control silicon will have to be able to interperate this. Or there has to be a place to store the data before it gets used, then we might see latancies creeping in.

Jawed
01-Oct-2006, 16:51
That makes more sense, but now I'm thinking the control silicon is going to be much more complex with a mutliple ALU per array approach since some vertex shaders have to be done before some of the pixel shader calculations, the schedular/control silicon will have to be able to interperate this. Or there has to be a place to store the data before it gets used, then we might see latancies creeping in.
Xenos already does all this stuff. It has 3 arrays, each of 16 ALU-pipes, each of which can simultaneously execute vertex and fragment threads (time-sliced, obviously). It contains the relevant caches/buffers which are used as input/output for these pipes depending on the type of thread.

I'm puzzled which bit you think is difficult.

Jawed

Razor1
01-Oct-2006, 17:02
Xenos already does all this stuff. It has 3 arrays, each of 16 ALU-pipes, each of which can simultaneously execute vertex and fragment threads (time-sliced, obviously). It contains the relevant caches/buffers which are used as input/output for these pipes depending on the type of thread.

I'm puzzled which bit you think is difficult.

Jawed


I might be thinking along different lines, you are suggesting, r580 type pipes, where each pipe lets take 3 ALU's per pipe for now, and now you have the unified pipes of the xenos where you have 1 ALU per pipe, Now the xenos chip has the capability of doing vertex or pixel calculations per ALU which is controlled by the scedular.

The r580 pipes, if they were capable of doing the same thing, I don't think we can look at them as multiple batched ALU's (I am assuming there is a cache pool for all 3 ALU's, this might be incorrect), I would expect each ALU of the r600 to have its own cache, without dedicated cache I think it will get quite combersome when things are done out of order. The r580 didn't really need to deal with this since it has seperate caches for vertex and shader data.

Forgot the xenos chip would also have seperate cache for each of its 48 ALU's,

I'm getting confused on the way ATi marketed the xenos chip vs, the r580 with "pipelines".

Dave Baumann
01-Oct-2006, 17:47
Razor, actually, there's been relatively little direct marketing of Xenos. I think the confusion is elsewhere...

However, for starters - when you say "cache" are you talking about register storage?

There are general similarities between how R580 and Xenos process pixel data - if you think of an R580 quad as one of the SIMD's in Xenos, the only difference is that R580's is working on 3 pixel quads (of the same shader) while Xenos works on 4 pixel quads (of the same shader) - you could also say that R580 has 4 smaller SIMD's and where Xenos has 3 larger.

Jawed
01-Oct-2006, 18:04
The problem on R600 now have been redirected to yield issue and Core Comsumption.

and history repeats itself . :grin:
Gotta wonder if this is due to ATI aiming for 80nm.

I'm still convinced that 80nm has been anything but a smooth ride for ATI this year...

Jawed

Razor1
01-Oct-2006, 18:12
Razor, actually, there's been relatively little direct marketing of Xenos. I think the confusion is elsewhere...

However, for starters - when you say "cache" are you talking about register storage?

There are general similarities between how R580 and Xenos process pixel data - if you think of an R580 quad as one of the SIMD's in Xenos, the only difference is that R580's is working on 3 pixel quads (of the same shader) while Xenos works on 4 pixel quads (of the same shader) - you could also say that R580 has 4 smaller SIMD's and where Xenos has 3 larger.


Yeah register storage is what I'm getting at.

Ok that clears up the confusion I was directly trying to compare the quads, thx Jawed and Dave:smile:

Razor1
01-Oct-2006, 23:13
Gotta wonder if this is due to ATI aiming for 80nm.

I'm still convinced that 80nm has been anything but a smooth ride for ATI this year...

Jawed


They might be just waiting, they probably have alot of cores left over from the x1600 and x1300's for them to just go to 80nm. Dell was still selling x300's, and x600's a few months ago, and form the way nV's stratigic line price/performance wise, might have hurt ATi's low and mid range sales substantially. But then again, so far ATi's plans for going to 80nm haven't come about at all, at least with project timelines they have given.

MistaPi
01-Oct-2006, 23:41
Will ATi and R600 have a advantage when it comes to multiplatform Xbox360 - PC games you think?

Razor1
01-Oct-2006, 23:50
Hard to say I think it will really depend on the engine the game will be using. Also the xbox 360 versions of a game will be specifically programmed with edram in mind, so alot of the tricks used in xbox 360 won't work on desktop cards. I don't think cross plateform titles will have any specific advantages (still have to take into account unified pipelines for PC games), games being ported from xbox 360 to PC might though.

Ailuros
02-Oct-2006, 06:12
With nVidia going 384-bit, the likelihood of a larger bus also seems relatively likely for the R600. Thus it seems much more likely that the R600 will not have only 16 texture units, but will move to 24 or 32. I personally doubt that a more than a 4-1 ALU to TEX ratio is likely.

Before I'd personally speculate about R6x0 or G8x TMUs I'd first like to know what each is capable of. I'm not so convinced that NV will actually increase the TMU amount in G8x.

My reasoning starts elsewhere; I can see "only" 8 ROPs on Xenos, but can I really compare them to R5x0 ROPs w/o regarding what each is capable of?

All I'm saying is that any speculations can easily be on the wrong track; while Xenos can give some hints on how R6x0 might be, G7x is rather a bad indication for G8x since the latter is IMO a complete overhaul of the architecture.

Questions:

1. Will R6x0 keep the TMU:ROP ratio at 2:1 or will it be 1:1?
2. Will R6x0 ROPs be a derivative of Xenos or PC GPU ROPs?
3. How will we have to count multitexturing fillrate with this coming generation?

CarstenS
02-Oct-2006, 08:59
No, I'd allow for at least 96 and probably 128 ALUs, and most likely all enabled on the crown SKU. Now, if DX10's integer calc requirement means that R600's ALUs pack more transistors than R580's, that might favor the lower number.

I was being told, that you can use one FP-ALU with a little more sophisticated control logic as two integer ALUs. If they saved some t's somewhere else, even 128 might be possible.

Mabru
02-Oct-2006, 11:49
By Fuad Abazovic: lunedì 02 ottobre 2006, 10.43
ATI's next generation DirectX 10 core will end up with more than 500 million transistors.

We already told you that this will be one big, hot chip but we don’t think many people expected 500+ million transistors.

This is just a single piece of the puzzle of unveiling this big crazy chip. Don’t forget that we heard will dissipate 250+ Watts of power and will also be really fast in both DirectX 9 and 10 games.

We hope there won't be any additional delays of this hot chip, which will be the first big and powerful graphic chip unveiled by AMD.
http://uk.theinquirer.net/?article=34799

_xxx_
02-Oct-2006, 12:06
Errm, first they speculate about G80 having 700+ mio and now 500+ is "huge"? Where's the logic in that? :???:

Sunrise
02-Oct-2006, 13:21
Errm, first they speculate about G80 having 700+ mio and now 500+ is "huge"? Where's the logic in that? :???:
None, i guess. On the other hand, if you had some kind of split personality... :wink:

Mariner
02-Oct-2006, 13:33
Errm, first they speculate about G80 having 700+ mio and now 500+ is "huge"? Where's the logic in that? :???:

If there's one thing we all should have learned by now it is not to expect anything in the way of logic from The Inquirer! :smile:

Pete
02-Oct-2006, 14:32
I don't get it, the date is in Italian? Is Fuad in Italy for an ATI press event (thus "500+"), or was the # whispered to him while attending prep for a more impending NV launch?

Edit: nAo, :lol:!

nAo
02-Oct-2006, 14:43
I don't get it, the date is in Italian? Is Fuad in Italy for an ATI press event (thus "500+"), or was the # whispered to him while attending prep for a more impending NV launch?
LOL, his internet browser is simply configured to display a date in italian, a quick cut and paste did the rest of the job :)

trumphsiao
02-Oct-2006, 14:45
Errm, first they speculate about G80 having 700+ mio and now 500+ is "huge"? Where's the logic in that? :???:


I heard R600 transistor counts will be less than 520M .

trumphsiao
02-Oct-2006, 15:08
I was being told, that you can use one FP-ALU with a little more sophisticated control logic as two integer ALUs. If they saved some t's somewhere else, even 128 might be possible.


as we perceived R600 is least DX10.1 standard ASIC which will consume mammoth of transitor

budgets on numerous advance regards .Damn why R6XX have to be Hara-Kiri again as

unscalable product.

INKster
02-Oct-2006, 15:19
as we perceived R600 is least DX10.1 standard ASIC which will consume mammoth of transitor

budgets on numerous advance regards .Damn why R6XX have to be Hara-Kiri again as

unscalable product.

?!?:???:

The DirectX 10.1 API isn't scheduled to be completed until late 2007/early 2008, how can it be in R600, and how can they garantee full compatibility with an unfinished spec ?
I don't buy that.

Plus, the number of transistors you suggested above would likely be much higher, in order to accommodate the new 10.1 features.

SugarCoat
02-Oct-2006, 19:55
Errm, first they speculate about G80 having 700+ mio and now 500+ is "huge"? Where's the logic in that? :???:

theinq never said Nvidias next core was 700M, that was all from wacko VR-Zone reporting off the chinese specs. Infact theinq said 700M was highly unlikely ;).

Megadrive1988
02-Oct-2006, 20:14
700M transistors on 90nm is not really mass-manufacturable is it?

unless we believe the two die theory. geometry die, pixel shader die. but that would break with Nvidia's tradition of single chip graphics chips / graphics processors (SLI never negated that).


I tend to believe the earlier report of 500M+ transistors.


edit: sorry i'm off-topic, I thought I was in the G80 thread :|

Xmas
04-Oct-2006, 22:15
I was being told, that you can use one FP-ALU with a little more sophisticated control logic as two integer ALUs.
I'm not a hardware guy but that doesn't make sense to me. Of course floating point numbers consist of two integer parts, but they are obviously neither complete (no MUL for exponent, no full bit logic) nor wide enough for 32-bit integers as required by D3D10. And integer operations are too rare to make doubling the data paths and read ports worthwhile.

Farhan
04-Oct-2006, 23:06
I'm not a hardware guy but that doesn't make sense to me. Of course floating point numbers consist of two integer parts, but they are obviously neither complete (no MUL for exponent, no full bit logic) nor wide enough for 32-bit integers as required by D3D10. And integer operations are too rare to make doubling the data paths and read ports worthwhile.

Do the 32-bit integer multiplies result in a 64-bit integer? Or is the result a max of 32-bits? If the latter is true, are the inputs limited to 16-bits each?

Xmas
04-Oct-2006, 23:23
Do the 32-bit integer multiplies result in a 64-bit integer? Or is the result a max of 32-bits? If the latter is true, are the inputs limited to 16-bits each?
It's 32-bit -> 32-bit.

Tridam
05-Oct-2006, 00:02
Aha! I finally understand what you're saying now.

In GPUs we're used to Vec4 or Vec5 ALUs, with the ability of one channel to split-off for scalar operations. In NV4x NVidia had a more flexible design: Vec2+Vec2 within a ALU or scalar+scalar.

What you're saying is that G80 executes all channels on separate scalar ALUs. This is more complicated to implement, but it means that the pipeline can issue more combinations of instructions at the same time. This also increases the chances that each channel (of a nominal 4-channel ALU) is actually being used.

In other words, this should result in a significant increase in per-ALU utilisation, per clock.



GMA X3000 ALUs work in MIMD 1D / 4 scalars mode when working on pixel quads but work in SIMD 4D / Vec4 mode when working on independant vertices. It's just an example that shows the way ALUs work can be different based on the elements they process with an unified architecture.

tEd
05-Oct-2006, 00:18
GMA X3000 ALUs work in MIMD 1D / 4 scalars mode when working on pixel quads but work in SIMD 4D / Vec4 mode when working on independant vertices. It's just an example that shows the way ALUs work can be different based on the elements they process with an unified architecture.

that's kinda cool

Jawed
05-Oct-2006, 00:28
GMA X3000 ALUs work in MIMD 1D / 4 scalars mode when working on pixel quads but work in SIMD 4D / Vec4 mode when working on independant vertices. It's just an example that shows the way ALUs work can be different based on the elements they process with an unified architecture.
Hmm, now I want to find out more about the detailed architecture!

Jawed

Farhan
05-Oct-2006, 01:29
It's 32-bit -> 32-bit.

So what happens on an overflow? I'm trying to envision the combined FP/int hardware that can do this multiply without needing a full 32x32 bit tree since the output can only be 32-bit.

silent_guy
05-Oct-2006, 02:31
It's 32-bit -> 32-bit.

Not always true. Some regular CPU's will store the result in 2 32-bit registers. In x86, it will be stored in EDX:EAX, for example. The ARM instruction set has multiply (32-bit result) and long multiply (64-bit result). For the latter, you can freely specify the 2 registers where the result will be stored.

Edit: add ARM info

3dcgi
05-Oct-2006, 02:57
Hmm, now I want to find out more about the detailed architecture!

Jawed
I'm surprised you missed this Jawed. There was discussion about it after Intel open sourced their Linux driver.

Jawed
05-Oct-2006, 03:10
http://www.beyond3d.com/forum/showthread.php?t=32643

I was doing "internet caff" time back then, so my attention span was fairly limited.

Yes, I should look at that more closely... Need some sleep right now.

Jawed

Xmas
05-Oct-2006, 13:35
Not always true. [...]
I'm aware of that, but the question was about SM4 integer operations. There are no 64-bit integer types in SM4.
IIRC overflow results in wrap-around.

aaronspink
05-Oct-2006, 21:25
700M transistors on 90nm is not really mass-manufacturable is it?

You ask the wrong question. People have been producing and shipping ~700M transistors designs for years in quantities that Nvidia and ATI combined could only dream of....

Aaron Spink
speaking for myself inc.

aaronspink
05-Oct-2006, 21:28
It's 32-bit -> 32-bit.

If its just 32b->32b then it can easily be done with the existing FP mul array hardware. Just need some additional logic on the input and output of the mul array which is a fairly minimal overhead. The thing to realize is that the mul array to support 32b FP has to support a minimum of a 48b output, but do to scaling, etc, they tend to output more than that.

Aaron Spink
speaking for myself inc.

Bob
05-Oct-2006, 21:39
If its just 32b->32b then it can easily be done with the existing FP mul array hardware.
If you just use the existing FP32 hardware, you end up with only a 24b * 24b multiply, which is short of a 32b * 32b one. You'll need to expand the FP multiplier to deal with 32-bit mantissas, or take up several clocks doing the multiply by smaller pieces.

aaronspink
06-Oct-2006, 06:34
If you just use the existing FP32 hardware, you end up with only a 24b * 24b multiply, which is short of a 32b * 32b one. You'll need to expand the FP multiplier to deal with 32-bit mantissas, or take up several clocks doing the multiply by smaller pieces.

Its not uncommon for the inputs in an FP multiplier to be bigger than the mantissa width.

Bob
06-Oct-2006, 06:53
Its not uncommon for the inputs in an FP multiplier to be bigger than the mantissa width.
Wait a sec. If you have an existing FP32-only multiplier, why do you need 8 extra bits of mantissa precision? If it's not a plain FP32 multiplier but an existing hybrid FP/INT 32b multiplier, then *obviously* it can do 32b integer multiplies. If it's an FP64 multiplier, then we're not talking about the same thing and I totally misunderstood your earlier post.

_xxx_
06-Oct-2006, 09:25
You ask the wrong question. People have been producing and shipping ~700M transistors designs for years in quantities that Nvidia and ATI combined could only dream of....

Aaron Spink
speaking for myself inc.

With the clockspeeds of just a few MHz, I guess?

nAo
06-Oct-2006, 09:33
With the clockspeeds of just a few MHz, I guess?
> 1 Ghz & most of them used for memory, not logic, IIRC

_xxx_
06-Oct-2006, 10:26
> 1 Ghz & most of them used for memory, not logic, IIRC

Well, that kinda anwers it. Memory is a completely different beast, not even remotely comparable.

CJ
06-Oct-2006, 11:04
Some recent roadmaps: http://www.chilehardware.com/modules.php?op=modload&name=News&file=article&sid=1631&mode=thread&order=0&thold=0

It places R600 in Q1 and a 'next gen' R600 in Q3. It also sees RD580 with 3 graphics slots (probably 1 for physics like RD600) while RD790 has 4 (running at PCIe 8x). Seeing as it's listed under "CrossFire Roadmap" it looks like ATi's own itteration of QuadSLI (4GPU?) is coming next year too.

SugarCoat
06-Oct-2006, 11:22
no slide showing the R600 doing Crysis 163% faster then X product? I'm dissapointed. Kind of strange it says "3 graphics" instead of 2 graphics 1gpgpu or 1physics. TripFire?

"OopsFire"

trumphsiao
06-Oct-2006, 11:23
no slide showing the R600 doing Crysis 163% faster then X product? I'm dissapointed.

maybe 6 months form now.

SugarCoat
06-Oct-2006, 11:26
shut up you, i'll just stick my fingers in my ears and close my eyes, its not true. Its coming out in January you hear! JANUARY!


shut up shut up shut upppp

la la la la la la la

vertex_shader
06-Oct-2006, 11:37
Some recent roadmaps: http://www.chilehardware.com/modules.php?op=modload&name=News&file=article&sid=1631&mode=thread&order=0&thold=0

It places R600 in Q1 and a 'next gen' R600 in Q3. It also sees RD580 with 3 graphics slots (probably 1 for physics like RD600) while RD790 has 4 (running at PCIe 8x). Seeing as it's listed under "CrossFire Roadmap" it looks like ATi's own itteration of QuadSLI (4GPU?) is coming next year too.

Manufactoring process missing, when r600 coming on 65nm, than the refresh coming on 55nm, tsmc already have 55nm plans for gpus 2007q2 http://www.vr-zone.com/?i=4083, when r600 coming on 80nm, than the refresh coming on 65nm, but 80->65nm its not just a shrink.
My bet is the r600 coming on 65nm, with a paper launch end of january, mass available in march, and the 55nm shrink (r680?) coming october with hard launch.

PeterAce
06-Oct-2006, 12:58
I'd say it much more likely to be R600 on 80nm Q1 2007 and the 'next gen' R600 (I'm assuming a refresh) on 65nm in Q3 2007.

psurge
06-Oct-2006, 18:46
I'd say if they are doing 700million transistors on 80nm or 90nm, there's a good chance quite a few of them are dedicated to cache, fifos, register files, etc...

Anyway, did anyone else notice the 2007 Crossfire slide has RD790 with HT3 on it? Are we going to see multiple GPUs connected with HyperTransport?

aaronspink
06-Oct-2006, 19:09
> 1 Ghz & most of them used for memory, not logic, IIRC


We have a winner!!! DRAM vendors have been shipping 90nM products for quite a while that have 700+ million transistors, these are fairly mainstream, and have *significant* volumes. In addition, things like the new dual core Itanium also top the 1 billion transistor mark. Though in less volumes than something like the R600 or G80 will achieve.

Its all about redundancy, which should be reasonably straightforward to do in graphics.

Aaron Spink
speaking for myself inc.

aaronspink
06-Oct-2006, 19:11
Well, that kinda anwers it. Memory is a completely different beast, not even remotely comparable.

Somewhat and somewhat not. The only way R600 is 700 million transistors, is if a very significant number of the transistors are memory.

Aaron Spink
speaking for myself inc.

trumphsiao
06-Oct-2006, 19:15
Somewhat and somewhat not. The only way R600 is 700 million transistors, is if a very significant number of the transistors are memory.

Aaron Spink
speaking for myself inc.


R600 least accounts for 500~520M and till now we have never heard Final Tapeout for R600 has been done .

Geo
06-Oct-2006, 19:15
Somewhat and somewhat not. The only way R600 is 700 million transistors, is if a very significant number of the transistors are memory.



[nazi mod=on]

edram daughter die suggestions for R600 will be met with severe reprecussions.

[nazi mod=off]



:razz:

Bouncing Zabaglione Bros.
06-Oct-2006, 19:34
Somewhat and somewhat not. The only way R600 is 700 million transistors, is if a very significant number of the transistors are memory.

Aaron Spink
speaking for myself inc.

ATI has said in the recent past they they would not use EDRAM on PC space products.

Kaotik
06-Oct-2006, 20:42
R600 least accounts for 500~520M and till now we have never heard Final Tapeout for R600 has been done .

Didn't Inquirer actually report on R600 tape-out few months back? (Not that it's trustworthy source, but still worth mentioning)

edit: ah, here it is http://www.theinquirer.net/default.aspx?article=32546

Razor1
07-Oct-2006, 01:08
[nazi mod=on]

edram daughter die suggestions for R600 will be met with severe reprecussions.

[nazi mod=off]



:razz:


LOL

DemoCoder
07-Oct-2006, 02:11
"edram daughter die suggestions for R600 will be met with severe reprecussions."

NVidia protested much that they wouldn't have USA. Hmm, maybe we shouldn't take them completely at their word.

Perhaps the ATI denials are only half-true.

Iron Tiger
07-Oct-2006, 04:23
"edram daughter die suggestions for R600 will be met with severe reprecussions."

NVidia protested much that they wouldn't have USA. Hmm, maybe we shouldn't take them completely at their word.

Perhaps the ATI denials are only half-true.So no "daughter die", but actual embedded memory for this part.

trumphsiao
07-Oct-2006, 05:12
So no "daughter die", but actual embedded memory for this part.

R600 is a 512bit External/Internal supported ASIC which means embedded memory is implausible

and revocable.( With expensive Fan and PCB Layout (Maybe 12~16 PCB if 512bit Bus enabled by

ATI for 4AA Free.)

Geo
07-Oct-2006, 05:15
"edram daughter die suggestions for R600 will be met with severe reprecussions."

NVidia protested much that they wouldn't have USA. Hmm, maybe we shouldn't take them completely at their word.

Perhaps the ATI denials are only half-true.

Oy vey. :lol:

I'll say this. I think they have at least one significant surprise in their pocket. I'm not sure what it is, but my gut feeling is they've held out at least one yowza we don't know about yet.

trumphsiao
07-Oct-2006, 05:19
Oy vey. :lol:

I'll say this. I think they have at least one significant surprise in their pocket. I'm not sure what it is, but my gut feeling is they've held out at least one yowza we don't know about yet.


R600 can support hardware Sound acceleration......:lol:

Geo
07-Oct-2006, 05:55
R600 can support hardware Sound acceleration......:lol:

Whatever it is, it will be better than they stole NV1 and embedded it. :razz:

Tho that would be pretty cool too. :lol:

Ailuros
07-Oct-2006, 06:10
D3D10 GPUs are quite a few steps ahead current ones, so it's no surprise at all that at least in theory they're capable of a buttload of more tasks and yes on the hardware level.

As for embedded memory, implementation or transistor cost would be the last that would worry me (if we'd be talking about some sort of "MSAA-framebuffer" a la Xenos). I'd be far more worried about a similar "macro-tiling" optimisation (as in Xenos); I'm not so sure developers would be that crazy about it after all especially considering the rather mixed feelings they show for the 360.

If you really want what marketing/PR blurbs usually call "MSAA for free" (and that's always to be seen in a highly relative sense), better call for a full blown TBDR, even more so if you're to combine it with float HDR. Even there the "free" thingy means only smaller memory footprint and bandwidth requirements. If you want something that comes close to single digit performance penalties you can use only 2x sample MSAA on current GPUs (but skip on stencil shadows and/or HDR) or use a highly CPU limited application; a close third option would be some semi-stochastic exotic algorithm which doesn't seem that much feasable right now.

Something from 2002:

http://www.beyond3d.com/forum/showpost.php?p=49013&postcount=42

*cough*

Acert93
07-Oct-2006, 06:24
ATI has said in the recent past they they would not use EDRAM on PC space products.

And NV said Unified Shaders were too soon :wink:

Although eDRAM Xenos style for a PC would probably need ~60MB (1600x1200 with 4xMSAA... I don't see PC gamers putting up with anything less), which is about 500M transistors for memory alone. Maybe in 2 or 3 years with ZRAM or MRAM... so GDDR4 it is.

Edit: Seems Demo and Ailuros beat me...

Chalnoth
07-Oct-2006, 06:32
Well, technically, the statement that it was too soon for unified shaders was made around two years ago, if I remember correctly.

INKster
07-Oct-2006, 06:45
Well, technically, the statement that it was too soon for unified shaders was made around two years ago, if I remember correctly.

Actually, it was made in July 2005, little more than a year ago:

http://www.bit-tech.net/bits/2005/07/11/nvidia_rsx_interview/4.html

Chalnoth
07-Oct-2006, 07:18
Ah, okay, around the time the Xenos info was coming out. Makes sense. I know that we largely took it to mean that nVidia wasn't planning on going unified with the next architecture, but it doesn't necessarily have to be taken to mean that.

juicytuna
07-Oct-2006, 07:22
Reading that interview in hindsight, it seems clear that Kirk was simply downplaying the unified nature of Xenos in relation to the RSX. This is understandable as the contract with Sony was announced just a few months prior to this interview and the general consensus at the time was that the RSX was NV47 based. I doubt Sony would have been too happy to hear Kirk declaring there delivery to Sony was stillborn.

Also he states:

"We will do a unified architecture in hardware when it makes sense. When it's possible to make the hardware work faster unified, then of course we will. It will be easier to build in the future, but for the meantime, there's plenty of mileage left in this architecture."

We have to remember after the 7800gtx came three new high end parts; the 7800gtx-512, the 7900 and the 7950. So yes, that is quite a lot of mileage.

But hindsight as they say is 20/20 :p

Arun
07-Oct-2006, 10:16
Reading that interview in hindsight, it seems clear that Kirk was simply downplaying the unified nature of Xenos in relation to the RSX. This is understandable as the contract with Sony was announced just a few months prior to this interview and the general consensus at the time was that the RSX was NV47 based.Yup, there is another interview I think is even more revealing, though: http://www.beyond3d.com/forum/showthread.php?t=30014

Two interesting quotes that seem to explain nicely the decision to go for stream processors now, should DailyTech's information be accurate:
Besides, it requires more I/O (wires) because all connections with memory concentrate on the box. Registers and constants are put in a single box too. It's because you have to keep all vertex states, pixel states and geometry states together while doing load balancing. A bigger register array requires more ports.
D. Kirk: We want to remove special-purpose units from GPU. On the other hand, we also want to run (special graphics functions) really fast. If you remove all special-purpose implementations from GPU it's just a Pentium.
You know what amuses me? When you look at David Kirk's statements about a lot of things (Unification, HDR+MSAA), you realize he's a true master at redefining the terms used to fit what he wants to say. Here, he nearly seems to be implying that if an architecture has ROPs or TMUs, it's not unified! There also is another interview with him where he basically answers the MSAA+HDR question as if he was talking of deferred rendering. Good ole David Kirk! :)


Uttar

Ailuros
07-Oct-2006, 10:30
There also is another interview with him where he basically answers the MSAA+HDR question as if he was talking of deferred rendering. Good ole David Kirk!

I sure hope you don't mean the one with references to abnormal memory amounts and software based alternative sollutions :roll:

trumphsiao
07-Oct-2006, 20:34
Wonder How ATI get tinny profit from R600 ?

14~16 Layer PCB( 512bit )which means more than 2 times PCB cost ,over 2000 pins , expensive fan ,all sounds like Vodoo 6000 ....:sad:

INKster
07-Oct-2006, 20:40
512 bit ring-bus or 512 bit external memory bus ?

I fail to see the reason for that, with GDDR4 in full swing and all...
The cost will be astronomical, as will power consumption (if this gets confirmation).

Arun
07-Oct-2006, 20:44
Wonder How ATI get tinny profit from R600 ?

14~16 Layer PCB( 512bit )which means more than 2 times PCB cost ,over 2000 pins , expensive fan ,all sounds like Vodoo 6000 ....:sad:R600 isn't what you think it is.

Uttar

Jawed
07-Oct-2006, 21:23
512 bit ring-bus or 512 bit external memory bus ?

I fail to see the reason for that, with GDDR4 in full swing and all...
The cost will be astronomical, as will power consumption (if this gets confirmation).

Alternatively, you could just drool over the prospect of 1GB of local memory with 120-150GB/s bandwidth...

Jawed

SugarCoat
07-Oct-2006, 22:41
Ah, okay, around the time the Xenos info was coming out. Makes sense. I know that we largely took it to mean that nVidia wasn't planning on going unified with the next architecture, but it doesn't necessarily have to be taken to mean that.

There are benefits and disadvantages to both approaches. Unified certainly is interesting in that it appears architecturally "new" and different, and has some interesting promise in terms of load balancing and extensibility.

Frankly speaking, however, the graphics industry has gotten very good at extracting a lot of performance from the current vertex/pixel shader architectures, so the competition for anything new architecturally is a highly evolved and efficient architecture. The first rule of any new GPU is to be better at the previous API. Any trade-off which might move you away from that goal has to be evaluated carefully.



There are plenty of examples, including some from recent memory, of architectures that looked new, and sounded good on paper, but in implementation suffered greatly in the first law of GPUs—which is every new GPU architecture must be better than the previous GPU architecture, as measured by the things which characterized that previous GPU architecture. Unified will come eventually, but only when being unified delivers the best performance, architectural, and power efficiency.


http://www.extremetech.com/article2/0,1697,1986932,00.asp

July 11, 2006. Either they dont comunicate at all within the company or the design is not truly unified in the sense we think it is, because so far, they're literally in a very good position to downplay their new card.

I half expect the official comment about the new core to be something along the lines of

"well it works kind of okay unified but we reeeeaally wish we didnt do this."

Pete
07-Oct-2006, 23:36
Perhaps it's time to stop comparing R600 to NV30, Voodoo 6k, etc.? The same was said of R520, and it didn't turn out to be that bad. Details of R600 are good to have, but we can see from G80 that even with the details we still can't put together the big picture (1D/2D/4D, TMUs?).

And then we get Uttar's comment. If R600's not what trumphsiao thinks it is, then what's he describing? You're killin' me! :smile:

INKster
07-Oct-2006, 23:46
Alternatively, you could just drool over the prospect of 1GB of local memory with 120-150GB/s bandwidth...

Jawed

Pff, we already have that in 2x 7950 GX2 (Quad-SLI). :razz:
Theoretical numbers mean little to me. All i want is more *real* performance out of the hardware.
That's why i'm keeping a somewhat skeptic staunch on G80's 128 "pipes/alu's" until i see some cold, hard benchmark numbers.

Uttar, what are you hidding from us ? ;)

Ailuros
08-Oct-2006, 00:04
Pff, we already have that in 2x 7950 GX2 (Quad-SLI). :razz:
Theoretical numbers mean little to me. All i want is more *real* performance out of the hardware.
That's why i'm keeping a somewhat skeptic staunch on G80's 128 "pipes/alu's" until i see some cold, hard benchmark numbers.

Uttar, what are you hidding from us ? ;)

Bad example; on a SLi or quad-SLi setup you have X amount of memory and Y amount of bandwidth per GPU and per frame.

A 7950GX2 has 512MB per GPU; now think of another GPU with 1024MB per GPU and twice or more bandwidth and think of dual or quad GPUs setups.

Chalnoth
08-Oct-2006, 00:37
http://www.extremetech.com/article2/0,1697,1986932,00.asp

July 11, 2006. Either they dont comunicate at all within the company or the design is not truly unified in the sense we think it is, because so far, they're literally in a very good position to downplay their new card.

I half expect the official comment about the new core to be something along the lines of

"well it works kind of okay unified but we reeeeaally wish we didnt do this."
Ah, forgot about that interview. Well, we only have a few weeks to go and we'll all know for sure :)

Razor1
08-Oct-2006, 01:44
http://www.extremetech.com/article2/0,1697,1986932,00.asp

July 11, 2006. Either they dont comunicate at all within the company or the design is not truly unified in the sense we think it is, because so far, they're literally in a very good position to downplay their new card.

I half expect the official comment about the new core to be something along the lines of

"well it works kind of okay unified but we reeeeaally wish we didnt do this."


I see where your coming from Sugar, I'm thinking its the first one, if Dev rel didn't know last week I doubt PR knew anything either specially in July :smile:

hmm never again will I call you Sugar lol, that didn't sound good at all!

aaronspink
08-Oct-2006, 03:30
ATI has said in the recent past they they would not use EDRAM on PC space products.

Both current G71 and R520/580 contain a lot number of memory transistors.

trumphsiao
08-Oct-2006, 06:06
R600 isn't what you think it is.

Uttar


why R600 have more than 2000 pins ???:grin:

NocturnDragon
08-Oct-2006, 07:25
Alternatively, you could just drool over the prospect of 1GB of local memory with 120-150GB/s bandwidth...
Jawed

And some people think 150GB/s is too much. :wink:

Arty
08-Oct-2006, 07:39
Both current G71 and R520/580 contain a lot number of memory transistors.
You mean trasistors that make up the memory controller?

TurnDragoZeroV2G
08-Oct-2006, 08:18
You mean trasistors that make up the memory controller?

Nope, a lot of transistors dedicated to being memory.

Texture cache, other caches/buffers, the registers for all the threads that R520/580 use, and a couple other uses. For the R520/R580 threading alone, it's 128 bits per register, 32/96 registers per thread (2 registers per pixel with an allocation of 16/48 pixels per thread?), 128 threads per quad, 4 quads...

trumphsiao
08-Oct-2006, 10:20
Just heard R600 could be using advanced 1.4GHz GDDR4 Module instead.

Maybe over 2000pins means another conception .

Jawed
08-Oct-2006, 14:02
That's about 90GB/s on a 256-bit bus.

Jawed

INKster
08-Oct-2006, 14:14
Don't forget the latencies of GDDR4 vs GDDR3.
GDDR3 is slightly faster clock-for-clock.
It may not give it the edge against a 384bit bus + 900/800 MHz GDDR3.

Also, we know the 1 GHZ GDDR4 yield is relatively good, but what about 1.4 GHz ? Any clues on that ?

trumphsiao
08-Oct-2006, 14:23
Don't forget the latencies of GDDR4 vs GDDR3.
GDDR3 is slightly faster clock-for-clock.
It may not give it the edge against a 384bit bus + 900/800 MHz GDDR3.

Also, we know the 1 GHZ GDDR4 yield is relatively good, but what about 1.4 GHz ? Any clues on that ?

You need solely core clock of 350MHz to achieve 1.4GHz which is far easier than GDDR3 except comsumption regard.

INKster
08-Oct-2006, 15:04
You need solely core clock of 350MHz to achieve 1.4GHz which is far easier than GDDR3 except comsumption regard.

350 ?

I don't follow...

trumphsiao
08-Oct-2006, 15:08
350 ? Don't you mean 700 MHz ?

http://en.wikipedia.org/wiki/GDDR4

On the signaling front, the new GDDR-4 protocol now expand the chip I/O buffer up to 8-bit per two cycles, allowing for greater sustained bandwidth during the burst transmission, but at expense of significantly increased CAS latency(CL), determined mainly by the double reduced count of the address/command pins and half-clocked DRAM cells, compared to GDDR-3.

Jawed
08-Oct-2006, 15:17
Hmm, so GDDR4 is actually quad-rate?...

Jawed

INKster
08-Oct-2006, 15:28
Hmm, so GDDR4 is actually quad-rate?...

Jawed

That can't be right.

This (http://www.digit-life.com/articles2/video/r580plus-part2.html) comparison between two X19xx with GDDR3 and GDDR4 running at the same frequency clearly shows us that GDDR4 is slightly slower, and that can't solely be explained by the higher latency (not to mention that the Quad Data Rate tech would be seen in benchmarks where the clock speed was the same between the two).

R580+ shows no changes versus R580. Higher results of the R580+ in the test with a single texture can be explained by the effect of the higher bandwidth - when the memory frequency is reduced, we can see parity with the R580.

Arun
08-Oct-2006, 15:39
This (http://www.digit-life.com/articles2/video/r580plus-part2.html) comparison between two X19xx with GDDR3 and GDDR4 running at the same frequency clearly shows us that GDDR4 is slightly slowerYou should keep in mind that memory controllers can be (and are) tweaked (generally in the drivers nowadays!) based on the target latency and bandwidth. As such, overclocking or underclocking may not always give the expected results, as this tweaking is not automatic (i.e. the driver only has presets to choose from, and won't change the preset even if you overclock or underclock exactly to another frequency!)

As long as the changes are minimal, you won't get any major problem, but in extreme cases, you could see some weird stuff. All AFAIK, of course, and I do not guarantee the perfectly accuracy of my explanation and/or comments.

As for the Quad Data Rate, that just means that a GDDR4 module clocked at 2GHz effective is really a 500Mhz one, while a GDDR3 module clocked at 1800MHz effective is really a 900MHz one. That's why you should expect higher latency but identical bandwidth for GDDR3 and GDDR4 marketed at a similar effective clockrate. Beyond3D's R580+ review has an excellent explanation of this matter, should you be interested in reading more about it. Not that you shouldn't read all of our reviews and click all of our ads anyway! Errr... ;) :)


Uttar

INKster
08-Oct-2006, 15:43
You should keep in mind that memory controllers can be (and are) tweaked (generally in the drivers nowadays!) based on the target latency and bandwidth. As such, overclocking or underclocking may not always give the expected results, as this tweaking is not automatic (i.e. the driver only has presets to choose from, and won't change the preset even if you overclock or underclock exactly to another frequency!)

As long as the changes are minimal, you won't get any major problem, but in extreme cases, you could see some weird stuff. All AFAIK, of course, and I do not guarantee the perfectly accuracy of my explanation and/or comments.

As for the Quad Data Rate, that just means that a GDDR4 module clocked at 2GHz effective is really a 500Mhz one, while a GDDR3 module clocked at 1800MHz effective is really a 900MHz one. That's why you should expect higher latency but identical bandwidth for GDDR3 and GDDR4 marketed at a similar effective clockrate. Beyond3D's R580+ review has an excellent explanation of this matter, should you be interested in reading more about it. Not that you shouldn't read all of our reviews and click all of our ads anyway! Errr... ;) :)


Uttar

So, basically you're saying that GDDR4 is not actually a natural evolution of GDDR3 towards higher (nominal) speeds, but kind of a halfway point between GDDR3 and Rambus XDR's Octo Data Rate on a narrow bus (XDR's is 64bit wide, IIRC), right ?

Dave Baumann
08-Oct-2006, 16:28
Pin wise, GDDR4 is behaving similar to GDDR3.

_xxx_
08-Oct-2006, 17:36
Hmm, so GDDR4 is actually quad-rate?...

Jawed

Since the name says "DOUBLE Data Rate", I'm inclined to doubt it ;)

_xxx_
08-Oct-2006, 17:37
You need solely core clock of 350MHz to achieve 1.4GHz which is far easier than GDDR3 except comsumption regard.

Bullshit.

Jawed
08-Oct-2006, 17:53
Since the name says "DOUBLE Data Rate", I'm inclined to doubt it ;)
We've been talking about clock rate within the memory devices, not the bus speed.

Jawed

_xxx_
08-Oct-2006, 17:55
We've been talking about clock rate within the memory devices, not the bus speed.

Jawed

Well that's variable as with any CPU right now. Bus speed x internal multiplier, which can both be anything you wish and design it for.

Regardless of that, double data rate on the bus with GDDR4.

nAo
08-Oct-2006, 18:02
We know memories speed scale very slowly, if GDDR4 is not transferring 4 bits per clock cycle how did they manage to (nominally) double the frequency?

silent_guy
08-Oct-2006, 18:58
We know memories speed scale very slowly, if GDDR4 is not transferring 4 bits per clock cycle how did they manage to (nominally) double the frequency?

The burst length is increased from a minimum of 4 (as in GDDR3) to 8. This way, they can partition the internal organization into more parallel blocks and run the internal clock at half the speed of BL4. If the internal speed used to be the previous critical path in GDDR3, you suddenly have doubled the cycle time. The critical path probably now becomes the IO pins.
Since your minimum transaction packet suddenly doubles in size when going from BL4 to BL8, you'll have to work hard to keep rams busy at high efficiency. That requires (more) complex scheduling and large cache sizes.

nAo
08-Oct-2006, 19:10
Thanks, now it makes sense!

silent_guy
08-Oct-2006, 19:20
Thanks, now it makes sense!

BTW, the R580+ article that Uttar pointed to doesn't make a lot of sense:


The core of a GDDR4 DRAM is clocked at half the rate compared to a GDDR3 DRAM of the same effective frequency. To achieve that the DRAM deals with double the number of bits on its I/O pins for each cycle of operation, up to 8 bits from 4. That reduction in core DRAM frequency allows a GDDR4 device to run with a lower operating voltage (1.5V nominal, down from 1.8V) compared to a GDDR3 device of the same effective frequency, leading to lower power consumption electrically.

If you really bend your head around it, you could maybe discover that it's trying to say that the burst size increased from 4 to 8, but it's far more likely that the writer didn't understand what he was talking about. How do you 'double the number of bits on an IO pin' ??? Only by going from double to quad data rate. Which is definitely not the case for GDDR4.

The explanation about latency increase is also very questionable. There's no reason why GDDR4 should have more random access patterns and I don't see how changes in addressing (assuming they are at all there) impact latency.
It's the BL4 -> BL8 that's doing all the evil.

Edit: small correction: addressing latency will go up by 1 cycle, but, again, that impact is minor compared to the BL increase.

nAo
08-Oct-2006, 19:24
It's the BL4 -> BL8 that's doing all the evil.
Basicly we can get from RGBA8 to FP64 for free!! j/k ;)

silent_guy
08-Oct-2006, 19:37
Basicly we can get from RGBA8 to FP64 for free!! j/k ;)

When you think about it, it must be quite scary (for a chip architect): your minimal memory transaction is 32 pins * 2 (DDR) * 8 (BL) = 512 bits or 64 bytes. If you organize your memory controller as 64-bit controllers, your minimal transaction is 128 bytes. (I think that ATI doesn't do the latter.)

Say you need to change only 1 byte in such a 128 byte block, that gives you a BW efficiency of less than 1% ! So you better find ways to groups transactions as much as possible... which increases cache size and increases latency... which increases internal buffering requirements even more.

It gets worse when you need to switch from read to write and back or switch from one row to another: there are a lot of different timing contraints that much be met and they all reduce performance.

I'm happy I don't have to deal with that in my job. :grin:

nAo
08-Oct-2006, 19:41
Well..it gives them a lot of good reasons to give us more-samples-per-pixel-anti-aliasing(tm)

Xmas
09-Oct-2006, 00:39
When you think about it, it must be quite scary (for a chip architect): your minimal memory transaction is 32 pins * 2 (DDR) * 8 (BL) = 512 bits or 64 bytes.
Burst length is not given in clocks but bits AFAIK, so it's only half that.

silent_guy
09-Oct-2006, 02:30
Burst length is not given in clocks but bits AFAIK, so it's only half that.
Yes, you're right. I should have looked that up first... BTW, the datasheet can be found here (http://www.samsung.com/Products/Semiconductor/GraphicsMemory/GDDR4SDRAM/512Mbit/K4U52324QE/K4U52324QE.htm).

jpr27
09-Oct-2006, 07:21
So in the end at this point in time with GDDR3 and GDDR4 in the coming generations, is it really worth the extra cost for GDDR4 now? I know that both cards manufacturers keep things close to the chest but hmmmm. I know ATI put alot of effort in its memory controller in the R580? ( If memory serves me correct.) Im just wondering if ATI might have some new sytems when dealing with memory BW and latencies and make the most of the burst increase to 8 that comes in the GDDR4.

Another thought or question actually. Does anyone see XDR memory coming into play at some point? I know right now its not cost effective but just a thought? We all hear you can never have enough BW and CPU's are often considered the bottlenecks. Now that Dual and Quad cores are coming maybe we will see XDR in the R600 or G80 or close relatives?

aaronspink
09-Oct-2006, 09:05
So, basically you're saying that GDDR4 is not actually a natural evolution of GDDR3 towards higher (nominal) speeds, but kind of a halfway point between GDDR3 and Rambus XDR's Octo Data Rate on a narrow bus (XDR's is 64bit wide, IIRC), right ?

GDDR3 is really GDDR2, they just tweeked some of the signalling. GDDR4 ~= DDR3. In botth GDDR4 and DDR3 the memory devices operate in a prefetch of 8 mode internally, while GDDR2/3 and DDR2 operate in a prefetch of 4 internally. So a part that bins to say 450 Mhz GDDR2/3 will run at a pin rate of 1800 MT/s while a GDDR4 device at the same bin frequency would run at 3600 MT/s.

Aaron spink
speaking for myself inc.

trumphsiao
09-Oct-2006, 09:45
R600 have 8 MC channels (A/B/C/D/E/F/G/H) compared to R580 have merely 4 channels of that (A/B/C/D)

maybe someone know what I adumbrated . and dont spell out........................

trumphsiao
09-Oct-2006, 09:46
R600 isn't what you think it is.

Uttar

nope ,R600 would be either 32bitX8 or 64 bit X8 MC capable ASIC.

Dave Baumann
09-Oct-2006, 13:53
R580 has 8 memory channels (4 primary ring stops with two channels per stop).

Geo
09-Oct-2006, 13:55
R600 have 8 MC channels (A/B/C/D/E/F/G/H) compared R580 have merely 4 channels of that (A/B/C/D)

maybe someone know what I adumbrated . and dont spell out........................

512-bit & GDDR4? That's a snot-load of bw there. . . what are they planning to do with all that?

I could imagine R600 being physically big enough for 512-bit, but I'd be woried about R680 at 65nm being big enough to support all the pins it would require. I suspect that's why NV went 384-bit --less the limitations of G80 than the limitations of the 65nm refresh.

Jawed
09-Oct-2006, 14:00
GDDR4 requires less pins.

Jawed

Jawed
09-Oct-2006, 14:06
R580 has 8 memory channels (4 primary ring stops with two channels per stop).
It puzzled me why it wasn't 8 ring stops. I surmised that it boiled down to there being 4 PS pipelines (screen-space tiling).

Makes me wonder whether R600 could go as far as being 8 ring stops therefore 8 pipeline, 32-1-3-1... Each ring stop would have a 32-bit channel to two 512Mbit chips for a total of 1GB.

But ATI makes noises about going higher than 3:1 ALU:TEX, so it seems doubtful it would be 32-1-3-1. Unless that's being saved for the refresh: 32-1-4-1...

Jawed

Ailuros
09-Oct-2006, 14:09
Is the first number the amount of "ROPs" hypothetically?

Razor1
09-Oct-2006, 14:12
hmm if ATi goes 512 are the expecting that there will be no other bottlenecks to achieve peak performance on the r600? I would think the r600 will need 3 times more the shader, fillrate performance of the r580 before that happens, if it really is 512 bit. Thats alot of complexity and cost to add for something that might be bottlenecked in other areas.

Geo
09-Oct-2006, 14:25
hmm if ATi goes 512 are the expecting that there will be no other bottlenecks to achieve peak performance on the r600? I would think the r600 will need 3 times more the shader, fillrate performance of the r580 before that happens, if it really is 512 bit. Thats alot of complexity and cost to add for something that might be bottlenecked in other areas.

Well, that's the thing isn't it? You need to figure out what they have in mind to do with it. Tho it's also possible, much like the MC itself in R520, that to *some degree* it is meant to be forward looking. Even so, I'd expect they have something figured out --either in existing features or in new features-- that gets some signficant degree of benefit from it above what gddr4+256-bit brings.

If 512-bit is true at all, that is. I'm not discounting it nor accepting it just yet. I would feel better about it if I understood where it would bring that benefit.

Farhan
09-Oct-2006, 14:36
It puzzled me why it wasn't 8 ring stops. I surmised that it boiled down to there being 4 PS pipelines (screen-space tiling).

Makes me wonder whether R600 could go as far as being 8 ring stops therefore 8 pipeline, 32-1-3-1... Each ring stop would have a 32-bit channel to two 512Mbit chips for a total of 1GB.

But ATI makes noises about going higher than 3:1 ALU:TEX, so it seems doubtful it would be 32-1-3-1. Unless that's being saved for the refresh: 32-1-4-1...

Jawed

Wouldn't 8 stops just mean a higher latency because the max number of hops is 4 instead of 2 in the current one? I guess if the chip is a lot bigger then they could make it 8 stops.

Razor1
09-Oct-2006, 14:40
Well, that's the thing isn't it? You need to figure out what they have in mind to do with it. Tho it's also possible, much like the MC itself in R520, that to *some degree* it is meant to be forward looking. Even so, I'd expect they have something figured out --either in existing features or in new features-- that gets some signficant degree of benefit from it above what gddr4+256-bit brings.

If 512-bit is true at all, that is. I'm not discounting it nor accepting it just yet. I would feel better about it if I understood where it would bring that benefit.


Very true about the forward looking r520. I'm not discounting it either, but if ATi goes that route and the chip is bottlenecked by other areas, they will come to the same situation or simliar situation of the r520, r580 where they will produce something at higher costs, and the benfits of that might never be really seen, which unfortunately on consumers end, they buy something with a pretense that it will be utilized and never really happens. This could go both ways though.

trumphsiao
09-Oct-2006, 14:48
512-bit & GDDR4? That's a snot-load of bw there. . . what are they planning to do with all that?

I could imagine R600 being physically big enough for 512-bit, but I'd be woried about R680 at 65nm being big enough to support all the pins it would require. I suspect that's why NV went 384-bit --less the limitations of G80 than the limitations of the 65nm refresh.

R600 Code name :Pele

R600 have 8 MC Channels from A to H(A/B/C/D/E/F/G/H )
R580 have 4 MC Channles from A to D (A/B/C/D)

R600 have over 2000 pins count

Maybe I would get some details later on .

Actually I want to buy R600 as souvenior for ages ................

Jawed
09-Oct-2006, 14:50
Wouldn't 8 stops just mean a higher latency because the max number of hops is 4 instead of 2 in the current one? I guess if the chip is a lot bigger then they could make it 8 stops.
Latency just isn't a big deal for GPUs. Latency-hiding is their raison d'etre.

It's interesting what you say about the chip being a lot bigger. ~500M transistors on 80nm is going to be about as big as R580 is, I guess.

Can you squeeze a 512-bit GDDR4 bus into the die size of R580? Seems doubtful to me. Even with less pins per memory channel.

Jawed

Geo
09-Oct-2006, 14:51
As I said, it would have to bring some benefits now even if not fully utilized. The R520 MC did make some stuff doable, I think, like the low performance hit of HQ AF.

I don't know that we know the performance characteristics of R600 well enough to say about its sensitivity to BW. We had a bit of a bullish feeling about what R580 could do with more BW, that was mostly a disappointment with X1950 (compared to expectations that is). But we do know that R600 is a different architecture, so I don't think that R580+BW experience is all that relevant.

Which is a long way of saying I'm not willing to rule out that R600 can get decent benefit from a BW boost beyond what GDDR4+256-bit can bring.

Dave Baumann
09-Oct-2006, 14:52
R580 have 4 MC Channles from A to D (A/B/C/D)
We don't appear to be particularly good at listening...

_xxx_
09-Oct-2006, 14:59
Can you squeeze a 512-bit GDDR4 bus into the die size of R580? Seems doubtful to me. Even with less pins per memory channel.

Jawed

I doubt it too, as well as that 2000 pins remark above.

Jawed
09-Oct-2006, 15:07
As I said, it would have to bring some benefits now even if not fully utilized. The R520 MC did make some stuff doable, I think, like the low performance hit of HQ AF.
I think that's more architectural than bandwidth-per se, since texture caching and out-of-order threading both give significant boosts to R5xx texturing-performance. But having said that you can never rule-out brute-bandwidth or fine-grained access to textures in memory. It's a shame we have no better idea of the relative performance factors of all these architectural elements.

What you can say is that ATI's 16 TMUs are blindingly fast compared to NVidia's.

I don't know that we know the performance characteristics of R600 well enough to say about its sensitivity to BW. We had a bit of a bullish feeling about what R580 could do with more BW, that was mostly a disappointment with X1950 (compared to expectations that is). But we do know that R600 is a different architecture, so I don't think that R580+BW experience is all that relevant.
We don't know what effect drivers might have, yet. ATI's been under the gun for Vista...

Jawed

INKster
09-Oct-2006, 15:07
We don't appear to be particularly good at listening...

So..., 256 bit "external" bus confirmed (with the 512bit internal ring bus like in R580), right ? ;)

Jawed
09-Oct-2006, 15:08
I doubt it too, as well as that 2000 pins remark above.
But, is R600 merely 500M transistors?...

Jawed

Farhan
09-Oct-2006, 15:08
Latency just isn't a big deal for GPUs. Latency-hiding is their raison d'etre.

It's interesting what you say about the chip being a lot bigger. ~500M transistors on 80nm is going to be about as big as R580 is, I guess.

Can you squeeze a 512-bit GDDR4 bus into the die size of R580? Seems doubtful to me. Even with less pins per memory channel.

Jawed

Of course, but you wouldn't want to increase the latency more than you have to. If 4 stops is still enough to get the frequency where they want it, then i don't think they'll add more stops for fun. I'm quite sure the size of the chip will have something to do with this, since the ring bus runs along the edges of the chip.

trinibwoy
09-Oct-2006, 15:11
We don't appear to be particularly good at listening...

I think he's just relaying information - not intentionally ignoring you. Don't take it personally :razz:

Tridam
09-Oct-2006, 15:12
R600 Code name :Pele


http://www.beyond3d.com/forum/showthread.php?t=24961#post607871 :twisted:

Jawed
09-Oct-2006, 15:28
Of course, but you wouldn't want to increase the latency more than you have to. If 4 stops is still enough to get the frequency where they want it, then i don't think they'll add more stops for fun. I'm quite sure the size of the chip will have something to do with this, since the ring bus runs along the edges of the chip.
If R600 is an 8 shader unit design (say each has 4 ROPs, 4 TMUs, 12 ALUs, 4 Z/stencil) you would prolly ask, "do shader units want to share a ring stop?" I dunno.

I don't think it's possible to say much of anything one way or another. Just fun to think about how things could vary.

Jawed

_xxx_
09-Oct-2006, 15:30
What you can say is that ATI's 16 TMUs are blindingly fast compared to NVidia's.

The old nVidias I take it? Who knows what they look like in G80.

As for 500+ mio trannies - no idea :) but 2000 pins is totally crazy. Madness. PCB designers will burn down the house.

nAo
09-Oct-2006, 15:42
perhaps it's 1000 + 1000 ;)

Razor1
09-Oct-2006, 15:44
perhaps it's 1000 + 1000 ;)


hmm think we heard about the dual core rumor for another chip ....................:grin:

silent_guy
09-Oct-2006, 16:30
... but 2000 pins is totally crazy. Madness. PCB designers will burn down the house.

Not necessarily. If you define 'pins' as 'balls' on a BGA, then a major part of those would be VDD/GND balls, of which a large number will be in the rectangle under the die.

PeterAce
09-Oct-2006, 18:43
Currently I'm assuming that R600 will keep the shader array size from R500/C1 (16-way, 4 quads) and that R600 will increase the number of arrays. With ALU latency at 8 clocks and switching shader types every 4 clocks that hides the latency and keeps the batch/thread size at 64 (good for dynamic branching performance).

On 80nm plus the extra D3D10 requirements If thery are conservative I think they will go :

20 Texture
5 arrays (80 ALUs, each Vec4 + Scalar)
20 ROPS

If they are really going for it (then maybe if they also have fast GDDR4, like 1400/2800):

24 Texture
6 arrays (96 ALUs, each Vec4 + Scalar)
24 ROPS

Maybe the last config fits better with the later 65nm refresh..... just not sure yet!

_xxx_
10-Oct-2006, 08:44
Not necessarily. If you define 'pins' as 'balls' on a BGA, then a major part of those would be VDD/GND balls, of which a large number will be in the rectangle under the die.

True, haven't thought of that. OTOH, this feeds speculations about a huge power draw...

But still, 2000 pins IS madness.

trumphsiao
10-Oct-2006, 10:44
True, haven't thought of that. OTOH, this feeds speculations about a huge power draw...

But still, 2000 pins IS madness.

The major problem on R600 is "If you have to run DX9 Game .the only way to do so is Hardware simulation ."

LeStoffer
10-Oct-2006, 10:51
The major problem on R600 is "If you have to run DX9 Game .the only way to do so is Hardware simulation ."

First, that doesn't make any sense. Second, ATI has promised that the R600 will be the fastest card in DX9.

hoom
10-Oct-2006, 11:35
Didn't r300 run dx7 & 8 in 'hardware simulation' mode?
ie int fixed function was a shader on the FP24 hardware :razz:

kyetech
10-Oct-2006, 11:36
The major problem on R600 is "If you have to run DX9 Game .the only way to do so is Hardware simulation ."

Its fair to say, your knowledge of the g80 is more than the r600.

aaronspink
10-Oct-2006, 11:48
GDDR4 requires less pins.

Jawed

Not really...

Arty
11-Oct-2006, 01:34
Its in chinese (http://www.hardspell.com/news/showcont.asp?news_id=30133), doesnt make it reliable though. So have the salt handy!:razz:
512bit reveals saves the position width new flagship R600 band width astonishing

Before this already some news confirmed NVIDIA G80 will use 384bit to reveal saves the position funds, but at present goes against the level to reveal the card all to use the 256bit position width. Although ATi the R520/R580 interior ring-like uses 512bit to reveal saves the main line but it only to have 4 to reveal saves the channel, outside revealed saves the position width still was 256bit. R600 will bring unprecedented true 512bit to reveal saves the position width, if will provide 2500MHz GDDR4 to reveal saves, then the band width achieved astonishing 160G/s, we knew even if will be the eDRAM band width which X-box in 360 will inlay also only has 256GB/s, although in the speed eDRAM will be quick many, but X-box 360 eDRAM will only have 10MB, but R600 will provide actually is 1024MB. The R600 research and development code number is Pele, will use TSMC 80 nanometer crafts manufacture, completely will support DirectX 10, will estimate in the next year first quarter issue.

trinibwoy
11-Oct-2006, 01:46
Heh, that's a shitload o' bandwidth :grin:

wishiknew
11-Oct-2006, 01:56
Can we finally have more z samples without loop back for aa please!

DemoCoder
11-Oct-2006, 02:03
I'm very skeptical of 512-bit external bus, and doubly skeptical of 2500Mhz GDDR4 + 512bit external bus.

INKster
11-Oct-2006, 02:44
A 512bit bus would only make sense if paired with slower memory (also, the PCB design would be too complex and expensive).
And 2500MHz GDDR4 would only make sense if paired with a 256bit bus (and the internal ring-bus wouldn't make much sense either, to be the same width as the external bus ?).
There are levels of common sense here, not to mention price.

If the price goes beyond any reasonable threshold, then the market will make it suffer from poor sales, and a hypothetical high profit margin per unit might not compensate that.

SugarCoat
11-Oct-2006, 03:06
Well i dont know about suffering from sales. Not like anyone expects ATI to have 512-bit bus up and down their next gen product line, but it would be great news for the mid-lower end cards because it would make 256-bit more common i should think. We have to remember these are generally their lowest selling parts in terms of % of total sales, so its a part for a niche market to begin with. I dont see them running out of the 550-$600 range that has become acceptable, despite the chips size.

Question is, is it useful and if so, how? One really has the wonder what they would need it all for. I would think ATI would further advance their HDR + AA work, surely they were aware Nvidia would add the function or best it with their part, so its fair to assume they are going to try to one up them. Maybe some free AA and filtering no matter the circumstance :twisted:. That would be a nice little addition, and about time too, something i would certainly look for in a new architecture. Wonder how much bandwidth physics needs. Both at the same time quite possibly would be a strain on a 256-bit bus. Cant forget the PCB for it is going to be sporting CrossFire 2.0 with increased up and down bandwidth across the connections. Really the ideas are limitless.

Thats a hell of alot of memory per card too. Definitly think free AA/AF is a real possability. Bet theres a few hidden features in it too.

INKster
11-Oct-2006, 03:10
Well i dont know about suffering from sales. Not like anyone expects ATI to have 512-bit bus up and down their next gen product line, but it would be great news for the mid-lower end cards because it would make 256-bit more common i should think. We have to remember these are generally their lowest selling parts in terms of % of total sales, so its a part for a niche market to begin with. I dont see them running out of the 550-$600 range that has become acceptable, despite the chips size.

Question is, is it useful and if so, how? One really has the wonder what they would need it all for. I would think ATI would further advance their HDR + AA work, surely they were aware Nvidia would add the function or best it with their part, so its fair to assume they are going to try to one up them. Maybe some free AA and filtering no matter the circumstance :twisted:. That would be a nice little addition, and about time too, something i would certainly look for in a new architecture. Wonder how much bandwidth physics needs. Both at the same time quite possibly would be a strain on a 256-bit bus. Cant forget the PCB for it is going to be sporting CrossFire 2.0 with increased up and down bandwidth across the connections. Really the ideas are limitless.

Thats a crapload of memory per card too. Definitly think free AA/AF is a real possability.


I still think a "brute force" approach so soon is not in ATI's style.
But..., who knows ?

SugarCoat
11-Oct-2006, 03:18
If they were really serious about busting out of that +/- ~5-10% performance delta that has been plaguing the industry in product comparison for the last what, 3 years?, i dont think its out of the realm of possability.

Still dont know enough about the actual core achitecture yet though to come to any conclusions even IF we accept that it has a 512bit-bus.

One thing is for certain though, drivers are going to play a large part in the early benchmarks and reviews. I expect performance to scale for a number of months so it may be hard to decide who the "winner" is right away. Especially if i'm right and Nvidia counters immediatly with a clock boosted card, which they left plenty of room for, if the X2800XT turns out to be the better half.

Pete
11-Oct-2006, 03:29
I dunno, we kept saying the same thing about 256bit a while ago, that wider with slower memory would be preferable to narrower with fast memory, and nothing's changed: they still use the fastest stuff available on the widest buses at the high end.

What's the main die-size problem with a 512b external bus, die area or circumference? Is the issue a disporportionate amount of die space for a 1024b internal bus, or insufficient space along the die for all the wires to RAM? I mean, can we say that if G71 can support 256b, that an R600 probably larger than R580 could support 512b?

Razor1
11-Oct-2006, 03:37
Lets say the r600 has 2 - 3 times the performance of the r580 does that warrent 2-2.25 the increased bandwidth going with 2500mhz gddr4 and 512 bit memory will supply? The only time that amount of bandwidth will really be neccassary is when high levels of AF, AA are used (and these modes would have to be much higher then what is currently available, since the r580 does a good job conserving performance under heavy loads of AF and AA). It would really be overkill IMHO, but again anything is possible. And it does seem possible to put a 512 bit bus on a r580 size chip, if nV was able to put a 256 bit bus on a g71 which is half the size, why can't ATi do a 512 on a chip double the size?

Ah Pete beat me to the punch lol!

Geo
11-Oct-2006, 03:38
Of course, if it really is 512-bit, that would make 80 or 96 shaders more likely I'd think, and from more than one direction. :smile:

LeStoffer
11-Oct-2006, 08:03
I'm very skeptical of 512-bit external bus, and doubly skeptical of 2500Mhz GDDR4 + 512bit external bus.

Agreed. Is it possible to make? Yes. Is it possible to make a decent profit with such a beast? Doubtful.

pakotlar
11-Oct-2006, 09:06
ATI is making me excited again. I'm getting feelings of deja vu with pre R300 launch. If they pull a 512bit bus with 1gb memory @ 2+ ghz GDDR4 I'm buying their product. BTW, this isn't out of the realm of possibility if nVidia is doing 384bit/1800 mhz memory. ATI wouldn't be able to stick with a 256bit bus and achieve higher bandwith than nVidia's product. If R300 on ATI has showed us anything, it is that they are willing to cut into their margins to grab market share from nvidia.

Nvidia has been gaining fast since NV40, so I could see this happening. Even if they put in 2.5ghz GDDR4 (and how likely is that?) 256bit bus would get them 6GB/s less bandwith than G80. I doubt that they would release a product 3 months (read 1/2 way to refresh) later and have it underspecced compared to the competition.

Bandwith is going to seriously matter this generation; we were bandwith limited with 24TMUs on G71, to a lesser extent with x1950xtx (look at games that do a lot of texture lookups, Call of Duty 2 is a good example, has a huge framerate boost compared to x1900xtx), but with 32+ TMUs, 80GB/s (max with 256bit/2.5ghz gddr4) is not going to come close to cutting it.

This seems like a good generation for bus width increase. nVidia is doing it, and ATI is probably too.

Arty
11-Oct-2006, 09:30
Your reference of R600 to R300 makes G80 -> NV30. I'm not sure if some people would be too happy with that. Expect angry posts coming at you or atleast expect your theory to be questioned to great lengths. :sly:

I wouldnt mind another R300 from ATI but I am not raising my hopes so early. :nope:

pakotlar
11-Oct-2006, 09:50
Your reference of R600 to R300 makes G80 -> NV30. I'm not sure if some people would be too happy with that. Expect angry posts coming at you or atleast expect your theory to be questioned to great lengths. :sly:

I wouldnt mind another R300 from ATI but I am not raising my hopes so early. :nope:

Yeah, I'm not making that jump for G80 either. I'm not comfortable with it. But there were design errors with the NV30 that extended way past the 16 GB vs 21GB/s issue. I don't doubt that nvidia has learned from its mistakes, and there will be differences, such as the G80's early arrival, which will preserve nvidia's standing even if the r600 makes a big splash.

But this generation will involve as big a design philosophy step as the SM1.0->2.0 one, if not bigger, and each company's internal design/profit model agendas will be made clear. I think the R600 has a good chance of trouncing G80's performance in certain cases, but I have to admit that is based off of my intuition and the rumours that have been floating around on the internet, and not on any empirical evidence.

I am skeptical of G80's low bandwith figure to be honest. That could turn out badly for them.

_xxx_
11-Oct-2006, 09:50
If they really do that, I can only see them losing lots of money despite the (possibly) good sales. Actually, selling more at similar prices as the competition means losing more with every part sold in this theoretical case, since R600 + RAM@2GHz + the board would prolly cost twice as much as the G80 counterpart to produce.

Ailuros
11-Oct-2006, 10:14
I personally don't judge a GPU by it's theoretical peak numbers, buswidth or bandwidth but it's overall efficiency in the end. At this point without having detailed insider knowledge from both sides it's for me fairly impossible to judge any of the two products let alone make any safe predictions.

I am skeptical of G80's low bandwith figure to be honest. That could turn out badly for them.

If the 86.4GB/sec are true, it translates into a ~69% more bandwidth then on the 7900GTX. I haven't yet understood how they reached that claimed 38.4 GPixels fillrate, have you? With as many unknown factors I'm a tad puzzled how you're able to determine what is low and what would be high.

I won't say that it's impossible that ATI might have gone for a 512bit bus, but I'd prefer to have a few more serious indications then random rumours that float around; likewise as the supposed 8 quads on R5x0 prior to it's initial launch. I'm still looking for the other half.

Arty
11-Oct-2006, 17:07
If they really do that, I can only see them losing lots of money despite the (possibly) good sales. Actually, selling more at similar prices as the competition means losing more with every part sold in this theoretical case, since R600 + RAM@2GHz + the board would prolly cost twice as much as the G80 counterpart to produce.
I think its fair knowledge that R580 costs almost twice than that of G71, atleast at launch. The idea of ATI fighting Nvidia on that front is plausible but that is not enough to make the 512bit thingy any more believable.

ZioniX
13-Oct-2006, 10:42
Fuad is also supporting the idea of the 512-bit memory interface:

THE UP AND COMING R600 will have a real 512 bit memory controller. Unlike its predecessors which had an internal 512 ring memory bus, the R600 will have it externally as well.

The Inquirer (http://www.theinquirer.net/default.aspx?article=35062)

phenix
13-Oct-2006, 10:45
Fuad joins the 512bit bandwagon....

R600 has an external 512 memory bus
http://uk.theinquirer.net/?article=35062

SugarCoat
13-Oct-2006, 13:42
*does the free AA and filtering dance*

nAo
13-Oct-2006, 13:47
*does the free AA and filtering dance*
Keep on dancing..and dreaming..

SugarCoat
13-Oct-2006, 13:48
Keep on dancing..and dreaming..

oh i will buddy! i will!

trinibwoy
13-Oct-2006, 14:42
Who knows...maybe ATi felt that margins were too high with R580 so they decided to completely destroy them with R600 :) The Inq mentions PCB complexity - will AIBs revolt if this thing really has a 512-bit external memory interface?

Jawed
13-Oct-2006, 14:45
AIBs buy ready-built boards don't they? They just specify which combination of parts should be on board (memory quantity, VIVO, output port config, cooler etc.) and put their sticker on.

Jawed

Jawed
13-Oct-2006, 14:49
Anyway, it seems that futzing around with 80nm (i.e. delays getting it out the door) are going to cost far more margin than the yield of the die or board complexity.

When stuff is late it completely screws-up your inventory as you have to make extra old stuff, which the fickle marketplace no longer wants to buy because of price/performance/features.

I'm still agog at the on-going 80nm delay. It truly is farcical that RV560/RV570 are hitting the market one year after R5xx debuted.

Jawed

Bouncing Zabaglione Bros.
13-Oct-2006, 14:53
I wonder if this will recreate the R300 situation? A stunning engineering design with people wondering how and why they did what they did to give us a great product that seemed to be way ahead of it's time. I remember everyone going :shock: at R300, and I have a suspicion that R600 could be the same.

Before someone else says it though, I'm not suggesting that G80 is going to be NV30. I think G80 could be a great chip. I just think that R600 will be :shock:.

Jawed
13-Oct-2006, 14:59
If one GPU really does have ~40% more bandwidth than the other, then that alone is enough of a :shock: that all the rest will fall sweetly into place.

On the other hand, I'm not expecting any architectural :shock::shock: with R600. That truly is NVidia's preserve this generation.

Jawed

INKster
13-Oct-2006, 15:02
AIBs buy ready-built boards don't they? They just specify which combination of parts should be on board (memory quantity, VIVO, output port config, cooler etc.) and put their sticker on.

Jawed

Still, that means 16 GDDR4 chips, instead of 12 or 8.
Surely a few coins will be lost per unit on the AIB front, no ?

Bouncing Zabaglione Bros.
13-Oct-2006, 15:07
If one GPU really does have ~40% more bandwidth than the other, then that alone is enough of a :shock: that all the rest will fall sweetly into place.


For a long time we've been talking about the memory bandwidth being the bottleneck. There's two ways to deal with this - increase the memory bandwidth, or decrease how much memory bandwidth you need to use. If the 512 external memory bandwidth is true, I wonder if it means ATI has decided to go for broke on the DX10 inflection point - just as they did on the DX9 inflection point.


On the other hand, I'm not expecting any architectural :shock::shock: with R600. That truly is NVidia's preserve this generation.


Don't you think fully unified is enough of an architectural :shock: , or are you blase about it because it's assumed to be the case already? If ATI had kept it quiet and we didn't know what we know about Xenos, we'd be going :shock::shock: if we'd suddenly got a unified architecture.

Jawed
13-Oct-2006, 15:09
There'd be 512MB and 1GB versions, I guess. You'd expect to be able to retail 1GB cards for more, too.

7950GX2 is seriously more costly than 7900GTX, isn't it?... Any signs of a revolt?...

Jawed

dnavas
13-Oct-2006, 15:14
I wonder if this will recreate the R300 situation?

I think the only way that happens is if G80 winds up being unable to support GDDR4 at high frequencies for some reason. Otherwise, a refresh could remedy a good bit of the bandwidth imbalance. Not to mention addressing power concerns....

I think it far more likely that we will be looking at these two architectures in late spring considering bandwidth vs. computational power. I consider it quite possible that G80 will be unbalanced in favor of computational ability, while R600 might be unbalanced in favor of higher bandwidth.

/me readies his popcorn emoticon.

trinibwoy
13-Oct-2006, 15:24
There'd be 512MB and 1GB versions, I guess. You'd expect to be able to retail 1GB cards for more, too.

7950GX2 is seriously more costly than 7900GTX, isn't it?... Any signs of a revolt?...

Jawed

Well not only is each GX2 PCB probably (?) simpler than anything we'll see with R600, but the G71 chip itself is probably dirt cheap compared to what R600 will go for. GX2's are going for $500 now.

Jawed
13-Oct-2006, 15:25
For a long time we've been talking about the memory bandwidth being the bottleneck. There's two ways to deal with this - increase the memory bandwidth, or decrease how much memory bandwidth you need to use. If the 512 external memory bandwidth is true, I wonder if it means ATI has decided to go for broke on the DX10 inflection point - just as they did on the DX9 inflection point.
Oh I believe ATI will both increase bandwith and increase the efficiency with which it's used. The patents make that pretty clear. (And, no doubt, G80 will benefit from doing both these things, too.)

I'll be fairly shocked, for example, if R600 has 120GB/s of bandwidth but can fully utilise that with only 16 TMUs and 16 ROPs. That would be truly stunning.

I dare say I believe in both IHVs' ability to design a GPU around the target bandwidth. If you look at what X1k and GF7 can do with only 22GB/s and compare that to what X8.. GF6 can do, you'll see some pretty impressive things.

So, I'm highly confident that if ATI delivers a GPU with 120GB/s bandwidth, the entire GPU will live up to that.

What's more intriguing to me is how G80 can "double performance" with only 86GB/s...

Don't you think fully unified is enough of a architectural :shock: , or are you blase about it because it's assumed to be the case already?
I'm blase about it because of Xenos :grin: Xenos does things that aren't even in D3D10... Xenos is a truly stunning bit of kit.

Jawed

tEd
13-Oct-2006, 15:27
On the other hand, I'm not expecting any architectural :shock::shock: with R600. That truly is NVidia's preserve this generation.

Jawed

How so? If r600 isn't :shock: , what is from ati ...

Bouncing Zabaglione Bros.
13-Oct-2006, 15:29
I think the only way that happens is if G80 winds up being unable to support GDDR4 at high frequencies for some reason. Otherwise, a refresh could remedy a good bit of the bandwidth imbalance. Not to mention addressing power concerns....

I think it far more likely that we will be looking at these two architectures in late spring considering bandwidth vs. computational power. I consider it quite possible that G80 will be unbalanced in favor of computational ability, while R600 might be unbalanced in favor of higher bandwidth.



I'm not suggesting that R600's success will be dependant on G80's failure, and I don't really want the thread to drift off in that direction. R300 was not judged on the competing product from Nvidia, because that was not available. R300 was judged on what it brought us from a performance and IQ point of view, and the unexpected tech it used to do that.

I'm just suggesting that unification and the DX10 inflection point will give ATI the opportunity to do some interesting and unexpected things that might bring us the same kind of technology related :shock: that we had when R300 was sprung on us. It could be especially important to ATI to gain back the ground it lost over the last year with it's poor execution. We're already somewhat :shock: over the thought of a 512 bus, and we already know we're getting a unified architechture (which would be worth another :shock: if we didn't already know/expect it).

trinibwoy
13-Oct-2006, 15:31
I just hope R600 can live up to all the hype generated by its architectural sex appeal :lol: Although 128+GB/s of bandwidth is certainly a very good place to start.

trinibwoy
13-Oct-2006, 15:32
How so? If r600 isn't :shock: , what is from ati ...

I think he means impact due to novelty. We've been talking about unified shaders and ring-buses and and ultra-threading for a year now. So some of r600's wow points are kind of "old" already.

Bouncing Zabaglione Bros.
13-Oct-2006, 15:39
I think he means impact due to novelty. We've been talking about unified shaders and ring-buses and and ultra-threading for a year now. So some of r600's wow points are kind of "old" already.

It certainly gives me more confidence that these things will work well in R600, as we know they've been effectively given a test-run in earlier products. It will be second generation unification and ring-bus memory controllers in R600, so I'd expect to see significant improvements and any first generation issues to have been sorted.

trinibwoy
13-Oct-2006, 15:46
Yep, so it will be ATi's sleek, sexy and refined vs Nvidia's immature, unnatural and all around weird :smile: Place your bets !

Bouncing Zabaglione Bros.
13-Oct-2006, 15:49
Yep, so it will be ATi's sleek, sexy and refined vs Nvidia's immature, unnatural and all around weird :smile: Place your bets !

To paraphrase that Nvidia engineer: "384 bits! That's just unnatural! Is there some IEEE standard that uses 384 for anything?"

I think I will dub G80 as "the design pulled from Satan's backside". :lol:

Jawed
13-Oct-2006, 15:49
Well not only is each GX2 PCB probably (?) simpler than anything we'll see with R600, but the G71 chip itself is probably dirt cheap compared to what R600 will go for. GX2's are going for $500 now.
I doubt a pair of G71s at 7950GX2's launch were dirt cheap in comparison with R600 at its coming launch. And a GX2 PCB (either from the pair) is hardly as simple as a 7900GT's board, either.

But I expect 1GB of GDDR3 at 7950GX2's launch will be notably cheaper than 1GB of GDDR4 at R600's launch.

I just think that at a $700 retail price, say, there won't be too many complaints about profit-margin per R600 from AIBs. And particularly not if there's a decent range of lower-specified R6xx cards to sell as well. It's quantities and actual performance that'll be the battleground in D3D10, at each price point.

They'll be complaining about not having R6xx to sell from November though. Who wants to sell X1950XTX against GF8800GTS? And if R600 costs $700 in Feb, while 8800GTX costs $550, will punters want to pay the extra for 30-50% extra performance (assuming there is a substantial difference in performance)?

I just can't believe ATI is missing Christmas with R600. If you want to argue about AIB happiness then that's a far bigger deal than how many layers the PCB is.

Jawed

Jawed
13-Oct-2006, 15:54
How so? If r600 isn't :shock: , what is from ati ...
If ATI built a GPU based upon scalar ALUs and/or unified TMUs and ROPs, then I'd be :shock:

We know what a unified GPU looks like.

Jawed

trinibwoy
13-Oct-2006, 15:56
I doubt a pair of G71s at 7950GX2's launch were dirt cheap in comparison with R600 at its coming launch. And a GX2 PCB (either from the pair) is hardly as simple as a 7900GT's board, either.

Hmmm I don't know. If I were a betting man, I'd say margins on G71's were pretty high from day one.

They'll be complaining about not having R6xx to sell from November though. Who wants to sell X1950XTX against GF8800GTS? And if R600 costs $700 in Feb, while 8800GTX costs $550, will punters want to pay the extra for 30-50% extra performance (assuming there is a substantial difference in performance)?

First bet is in - R600 will be 30-50% faster than G80 !!! :shock: :razz:

Bouncing Zabaglione Bros.
13-Oct-2006, 16:00
First bet is in - R600 will be 30-50% faster than G80 !!! :shock: :razz:


You've done it now - expect to see this reported on TheInq by end of Monday...

INKster
13-Oct-2006, 16:04
Yeah.
Along with G80 being 90nm, and them having difficulty with putting a 512bit bus on the card because the GPU is 90nm (as if anything smaller was actually easier to do ?), and G80 having everything "dis-unified", etc, etc.

The Inq is really reaching these days...

trinibwoy
13-Oct-2006, 16:13
And if R600 costs $700 in Feb, while 8800GTX costs $550, will punters want to pay the extra for 30-50% extra performance (assuming there is a substantial difference in performance)?

Weren't early expectations for R600 around November 06 as well. Are you basing this assumption on the sheer expected superiority of R600 or do you think R600 (Feb06) > R600 (Nov06) ?

Razor1
13-Oct-2006, 16:14
To paraphrase that Nvidia engineer: "384 bits! That's just unnatural! Is there some IEEE standard that uses 384 for anything?"

I think I will dub G80 as "the design pulled from Satan's backside". :lol:


LOL, hmm reminds me of fp24:lol:

Geo
13-Oct-2006, 16:27
Okay, nice historical quips re fp24/fp32 comparo to 384/256 or 512 are one thing. . . but Satan's Backside will please remain attached to his Dark Lordship's posterior. Thank you. :cool:

jamis
13-Oct-2006, 16:32
I'll be fairly shocked, for example, if R600 has 120GB/s of bandwidth but can fully utilise that with only 16 TMUs and 16 ROPs. That would be truly stunning.
Jawed
While the 512-bit bus might seem too good to be true, the R600 with it's 32 ROPs and 64 unified "pipes" (2X32TMUs and 96 ALUs) should be able to fill it up nicely.:wink:

Jawed
13-Oct-2006, 16:34
Weren't early expectations for R600 around November 06 as well. Are you basing this assumption on the sheer expected superiority of R600 or do you think R600 (Feb06) > R600 (Nov06) ?
I'm fairly sure there was a time when we were expecting R600 before Christmas, but that's so long ago...

The performance thing is solely based upon bandwidth: 120GB/s versus 86GB/s. If the rumours have any worth, of course.

Jawed

Geo
13-Oct-2006, 16:43
I'm fairly sure there was a time when we were expecting R600 before Christmas, but that's so long ago...



I don't know that I'd ever heard a specific date range from ATI. Having said that, they certainly were at pains to give every impression since before even R520 launched (go back to Orton's comment about when investment is made vs reaping the benefits in the summer of '05) that the vast majority of the heavy lifting had already been done for R600 by the time XB360 launched. The key items being the USA of Xenos and the Ring-bus of R5xx.

This certainly lead me, and I think most observers, to believe that they'd be out the door in a timely fashion with R600.

So what's the holdup? I read recently again somewhere that ATI is reported to have said that the launch is NOT tied to Vista, so that would lead one to think that its not something in WDDM2.0 or DX10 that they feel is a requirement to show R600 at its best.

Process? Could they be diddling us along until 65nm is ready?

Something else entirely? Maybe integrating GS?

Hellifino, but it looks darn curious to me given how much of an impression they'd given about how much of R600 was already in the bank at the beginning of this year.

Sunrise
13-Oct-2006, 17:32
I'm fairly sure there was a time when we were expecting R600 before Christmas, but that's so long ago...
Yeah, but that alone wasn´t necessarily based on knowledge, just blind guessing around. Same with G80, but different. We expected it to arrive not later than mid-year at some point (after all the talks about it from devs and NV themselves), but it never happened.

Regarding their 80nm schedule, I don´t think anyone can deny that they were late to the game already at the point when they decided to go with the shrink, but they really didn´t have much of a choice there, because introducing them on 90nm would´ve been rather pointless to waste time and money for. Hurting their margins even more (when lowering ASPs on their high-end SKUs) isn´t really an option if NV can always outclass you by their relative and vast die area advantage.

Let´s also not forget that RV560/RV570 is not a "simple" 1:1 shrink to the next full-node, but different ASICs which had to be designed around the 80nm half-node, which takes time and depending on the maturity can also have an effect on your introduction date (as is inventory or market-demand). Q4 isn´t that bad, but they had promised those parts not later than mid-September. Something like that shouldn´t happen again, if they want to keep their AIBs happy.

R600 was never (i haven´t seen any roadmaps that hinted otherwise) intended to be introduced this year, which makes sense if some of these rumours turn out to be true. They basically want to even their mistakes that lead to their relative >6 month shortfall they have on NV, so they probably decided: Why waste any more time with half the guts, when we can show the market that we´re still committed to leadership, even if we got a little late (as in: totally messed up our timetable) last generation. NV won´t sit still, that´s for sure and since ATi knows that, their simply is no other way than to go full-throttle.

However, the problem with that is and it always was that if they´ll offer something very powerful, making money out of it and pleasing the AIBs is a completely different story. R600 better be scalable enough to account for their super-high-end-lower-margin-SKU-monster or we really will have another R300-based story here, but ATi should know that by now.

Geo
13-Oct-2006, 17:41
Yeah, but that alone wasn´t necessarily based on knowledge, just blind guessing around. Same with G80, but different. We expected it to arrive not later than mid-year at some point (after all the talks about it from devs and NV themselves), but it never happened.



You mean good old Marv Burkett in Nov '05:

Like I said the 7's are going to roll out from this point going forward well probably into January and the key there is just getting into production for the spring refresh which starts to build next February. Beyond that, the team is working very hard on our next brand new GPU. We have not disclosed any details on that but that one is really aligned for first half next year and that will be the beginning of the next new architecture for the company.

So the 7's will stick around. If you look at the 7's, they will be around in the midrange and low-end for probably another year, 1.5 years. And then the new GPU will come out at the high end in the first half and go from there. Expect a family of 7 in 90 nanometers to come out from now until probably January.

I've never suggested that G80 isn't later than they originally forecast. . . .this being an R600 thread I was just wondering about R600.

Sunrise
13-Oct-2006, 17:52
I doubt a pair of G71s at 7950GX2's launch were dirt cheap in comparison with R600 at its coming launch. And a GX2 PCB (either from the pair) is hardly as simple as a 7900GT's board, either.
I have seen many PCBs in my life, but the GX2 isn´t really looking complex at all. Let´s not forget we´re not asking something like 12-16 layers here, but mass-manufacturable, dirt cheap, 08/15 8-layer PCBs that will be delivered in pairs, which would prolly drive the total cost to about 150-180% of one 7950GT (at launch, now even cheaper), but not much more.

Looking at the price range NV still does high margins on their GTs, the GX2 isn´t that much of a feat at all. What is so smart about it is the way NV marketed it and it´s performance, relative to it´s price.

R600 should be a hell of a lot more more expensive to make, if all the rumours are indeed true. They´ll need GDDR4, they´ll need at least 12 layers, they´ll need buttloads of power-related components...and G80 already looks like it´s at-the-end-of doable 12-layer designs, but that´s hard to guess without wiring data etc.

Sunrise
13-Oct-2006, 17:57
You mean good old Marv Burkett in Nov '05:
Marv, some Crytek devs and some contacts i always have handy to double-check.

I've never suggested that G80 isn't later than they originally forecast. . . .this being an R600 thread I was just wondering about R600.
Me too. Just wanted to give some relative "introduction-date-trouble" examples, were we tend to keep our hopes up, but instead there is some delay regarding their time-to-market schedule or some unusual induced delay we can´t know of, at least not until we´ve heard otherwise.

no-X
13-Oct-2006, 18:00
I just bethought of this post...

Given Eric's comments in the R580 interview, if core speeds scale at a similar ratio at memory, why would we expect any more of the elements that consume bandwidth? Take a read of the interview again, carefully, and bear in mind what items are likely to be on his mind when he's replying (given that R580 is a historic item to him at that point).

Also, note, just because GDDR4 is coming doesn't mean that this will translate into an immediate and massive leap in bandwidth. GDDR3 started at the high point of GDDR2's end - i.e. GDDR2's high point was 500MHz, but 500MHz GDDR3 was far more prevelent; it wouldn't surprise me to see GDDR4 coming in at the 900-1000MHz range initially.

Having said that, I would not mind more texture power (assuming more BW), but I would not want to reduce the ratio of ALU : TEX.

If R600 has more texture power, it needs more bandwidth. If core speed scale at a similar ratio to memory speed, we won't get the necessary amount of bandwidth from faster memory modules. So the only way to increase bandwidth is...?

trumphsiao
13-Oct-2006, 18:06
I think the only way that happens is if G80 winds up being unable to support GDDR4 at high frequencies for some reason. Otherwise, a refresh could remedy a good bit of the bandwidth imbalance. Not to mention addressing power concerns....

I think it far more likely that we will be looking at these two architectures in late spring considering bandwidth vs. computational power. I consider it quite possible that G80 will be unbalanced in favor of computational ability, while R600 might be unbalanced in favor of higher bandwidth.

/me readies his popcorn emoticon.


Actually G80 right now have some problems in supporting GDDR4 .
But you will see 7800GTX 512 style occur again........

Form someone who is responsible for R600 board told me If he is optimistic on time frame for R600 Launch , He would guess Feb at least. But R600 is beast which would literally sip all the power supply.

trinibwoy
13-Oct-2006, 18:10
Actually G80 right now have some problems in supporting GDDR4 .
But you will see 7800GTX 512 style occur again........

Form someone who is responsible for R600 board told me If he is optimistic on time frame for R600 Launch , He would guess Feb at least. But R600 is beast which would literally sip all the power supply.

You're just a barrel o' good news aren't ya :razz:

trumphsiao
13-Oct-2006, 18:10
Hmmm I don't know. If I were a betting man, I'd say margins on G71's were pretty high from day one.



First bet is in - R600 will be 30-50% faster than G80 !!! :shock: :razz:


G71 is a cash cow can make Nvidia more than 3~5 times Gross Margin of G72 .

sometimes faster means combination of Faster Ram / Higher-Clock/ Better and noisy fan .

Razor1
13-Oct-2006, 18:15
Actually G80 right now have some problems in supporting GDDR4 .
But you will see 7800GTX 512 style occur again........

Form someone who is responsible for R600 board told me If he is optimistic on time frame for R600 Launch , He would guess Feb at least. But R600 is beast which would literally sip all the power supply.


Hmm in what sense? are you saying there is GDDR3 ram shortages at the required speeds? Or core clock is causing issues with the g80? Or are you talking about the r600 lol, I've lost something in the middle there.

trumphsiao
13-Oct-2006, 18:29
Hmm in what sense? are you saying there is GDDR3 ram shortages at the required speeds? Or core clock is causing issues with the g80? Or are you talking about the r600 lol, I've lost something in the middle there.

Nvidia want to repeat " High Volume 7800GTX 256MB to Limited 7800GTX 512MB version" again.:grin:

Razor1
13-Oct-2006, 18:31
Nvidia want to repeat " High Volume 7800GTX 256MB to Limited 7800GTX 512MB version" again.:grin:


AHHH ok well that depends if they only need it of course.

Anyways, I haven't heard any issues with GDDR4 and the g80, are you talking about latancy issues for the ram and the memory controller of the g80 not being able to handle the higher latancy?

dnavas
13-Oct-2006, 18:33
Form someone who is responsible for R600 board told me If he is optimistic on time frame for R600 Launch , He would guess Feb at least. But R600 is beast which would literally sip all the power supply.

With only 500M transistors?
They've got to be clocking it pretty fast, then, no?

trumphsiao
13-Oct-2006, 18:35
AHHH ok well that depends if they only need it of course.

Anyways, I haven't heard any issues with GDDR4 and the g80, are you talking about latancy issues for the ram and the memory controller of the g80 not being able to handle the higher latancy?

maybe. cause my friend didnt mention in detail.

Sunrise
13-Oct-2006, 18:37
maybe. cause my friend didnt mention in detail.
If your friend mentioned it, how did he come to that conclusion, then? Either he has something substancial to say or he´s just guessing.