PDA

View Full Version : The NEXT LAST R600 Rumours & Speculation Thread


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [16] 17 18 19 20 21 22 23

trinibwoy
30-Apr-2007, 00:12
In fact im amazed ATI is doing this aswell given the opportunity nvidia has presented to them.

Did ATI start designing RV630 after G84 launched? There was no opportunity to take advantage of...they just decided to release a shitty mid-range card as usual. The only difference is that Nvidia did the same. :)

INKster
30-Apr-2007, 00:14
Did ATI start designing RV630 after G84 launched? There was no opportunity to take advantage of...they just decided to release a shitty mid-range card as usual. The only difference is that Nvidia did the same. :)

And still either of them will sell a bunch of those due to "checklist features".
Mainstream is ever more boring as GPU generations go by... :D

doob
30-Apr-2007, 00:15
No rumor/slides about anisotropic filtering improvements?

fellix
30-Apr-2007, 00:16
6xCFAA (wide tent) (http://img250.imageshack.us/img250/4718/6xcfaawidetg9.png) | 8xCFAA (http://img250.imageshack.us/img250/4803/8xcfaajo7.png)

That is, what the kernels for the lower AA modes should look like, but I can't get it of how it would be for the narrow 6xCF one. :roll:

tEd
30-Apr-2007, 00:17
Semantics. ;)
R600 is still an ATI child.

The slide isn't ;)

...but as already said it's marketing not really worth getting upset over.

Rangers
30-Apr-2007, 00:17
And still either of them will sell a bunch of those due to "checklist features".
Mainstream is ever more boring as GPU generations go by... :D

Mid range is really the mid-upper range.

Really AKA just buy a GTS..

You can even get 640MB GTS's for cheap, mitigating the lack of memory problem.

trinibwoy
30-Apr-2007, 00:23
No rumor/slides about anisotropic filtering improvements?

One slide said that default quality is now the same as R580 HQ.

INKster
30-Apr-2007, 00:28
Mid range is really the mid-upper range.

Really AKA just buy a GTS..

You can even get 640MB GTS's for cheap, mitigating the lack of memory problem.

8600 GTS or 8800 GTS 320MB ? :lol:

À propos, if NV lowers this last one on the price scale, which product will AMD come up with to fight it ?
R600 XT 256MB ? Sounds a little bit unlikely. 256MB is already a little short on the high-end...

Rangers
30-Apr-2007, 00:31
8600 GTS or 8800 GTS 320MB ? :lol:

À propos, if NV lowers this last one on the price scale, which product will AMD come up with to fight it ?
R600 XT 256MB ? Sounds a little bit unlikely. 256MB is already a little short on the high-end...

8800, of course.

trinibwoy
30-Apr-2007, 00:33
Well I think we have a good idea of what R600 is now and although it doesn't seem that radical it's still exciting to finally see what the beast is made of. But I have a couple questions based on these slides.

1. They seem to be presenting 5-wide vectors as an advancement yet they don't specify if there is any co-issue or packing capability? I'm not sure how going wider is better if you don't have tricks to maximize utilization to go along with that?

Nevermind....just realized they are implying that R600 can issue five different scalar instructions per object like Jawed was suggesting earlier :)

2. Is CFAA a fancy blur filter during post-processing? If blurring isn't that bad it should result in better "AA" then?

3. Are the 20 FP32 samplers responsible for vertex/color texture and constant retrieval? What's the usual ratio of texture address units and samplers? 1:4?

4. Does the full speed FP16 filtering translate into free INT8 trilinear like on G80? This is something I am sure they would highlight.

5. If the answer for 4 is no, then does that explain the "16 TMU" claim - 4 INT8/FP16 bilerps per engine x 4 engines?

Subtlesnake
30-Apr-2007, 00:49
ATI did the same thing on R520 and people bitched then too.

It's not new even from ATI.
Yes, people would to well to examine the X1800 sell sheet. (http://www.techpowerup.com/reviews/ATI/R5XX/images/X1800-1.jpg)

Jawed
30-Apr-2007, 01:01
1. They seem to be presenting 5-wide vectors as an advancement yet they don't specify if there is any co-issue or packing capability? I'm not sure how going wider is better if you don't have tricks to maximize utilization to go along with that?
http://forum.beyond3d.com/showpost.php?p=978652&postcount=3679

The context implies multi-issue of independent instructions (a progression from single-issue, then co-issue, then 5-way issue). As far as I can tell it truly lives up to the name "superscalar" - people would quibble over NV4x and G7x but I don't think they'd quibble over this.

The fact that each of the 5 ALUs is really a 16-wide ALU (16 pixels in parallel per shader unit) might just be enough, though.

2. Is CFAA a fancy blur filter during post-processing? If blurring isn't that bad it should result in better "AA" then?
I don't see why this isn't simply a ROP-based function. Whoops, meant to say RBE. It's a custom resolve function based on whatever MSAA samples have been written. No extra samples are being generated regardless of AA mode (erm, except maybe in 24xCFAA mode?).

3. Are the 20 FP32 samplers responsible for vertex/color texture and constant retrieval?
Theoretically anything in memory that exists in a 1D, 2D or 3D structure, with 1-4x fp32s (or smaller) per coordinate. That would exclude constants I expect. Perhaps constants come in through the memory read cache over on the left?

There's bound to be a ton more slides.

What's the usual ratio of texture address units and samplers? 1:4?
ATI GPUs seem to be 1:1. I've been puzzling over the 8 TAs per quad TF...

4. Does the full speed FP16 filtering translate into free INT8 trilinear like on G80? This is something I am sure they would highlight.

5. If the answer for 4 is no, then does that explain the "16 TMU" claim - 4 INT8/FP16 bilerps per engine x 4 engines?
So far, the answer to 4 is no and 4 INT8/FP16 is what's happening. Guess we'll just have to wait to see. 8x TAs really is a puzzler.

Unless four of them are for generating VF addresses, of course. Though VF addresses should be generally simpler than TF addresses, as their in the form of base + per-component scale * offset (1D).

So if the VFs can do arbitrary 2D/3D texture buffer fetches (no filtering) then I suppose you could argue a full TA is preferable.

But 20 VFs is kinda weird. Presumably all these types of fetch pipelines (including the filtering pipes) run in parallel, so it's just a matter of keeping as many of them as full as possible. So 16 of the VFs are a bit like a vec4 ALU, all working together and the last 4 are like a co-issued "scalar". (Just wider, teehee).

Jawed

nobody
30-Apr-2007, 01:03
One slide said that default quality is now the same as R580 HQ.

That´s still worse than what G8x offers. bad.

Subtlesnake
30-Apr-2007, 01:05
That´s still worse than what G8x offers. bad.
Noticeably worse?

trinibwoy
30-Apr-2007, 01:13
The context implies multi-issue of independent instructions (a progression from single-issue, then co-issue, then 5-way issue).

Yep I realized you had touched on that in your earlier post. But wouldn't that be a compiler and instruction scheduling nightmare? You would need the ability to issue up to five different instructions per clock!

The fact that each of the 5 ALUs is really a 16-wide ALU (16 pixels in parallel per shader unit) might just be enough, though.

?? I thought each of the 16-ALU's (one per pixel) per shader unit was a 5-wide ALU :???: Or is that the same thing?

tEd
30-Apr-2007, 01:14
The slide also says improved handling of problematic texture filtering cases whatever that means

...and these slides http://www.forum-3dcenter.org/vbulletin/showpost.php?p=5449613&postcount=364 with CFAA comparison to MSAA

BRiT
30-Apr-2007, 01:14
That´s still worse than what G8x offers. bad.

The same slide also mentions improving those corner-cases. I imagine the top-notch AF to be the same between R600 / G80.

trinibwoy
30-Apr-2007, 01:24
So what are the bets on branching granularity - 16?

trinibwoy
30-Apr-2007, 01:26
New 'advanced' custom AA resolve passes are nice and everything but it seems to me that nothing could stop competition to implement the same filters, in fact I'm willing to bet R6xx uses its shaders code to implement those filters.

You know, now that you've said that, I wonder if they came up with this after they saw CSAA. If it is really just a post-processing effect run on the shaders it's something they could have tacked on. CSAA on the other hand seems to require hardware support.

Razor1
30-Apr-2007, 01:30
So what are the bets on branching granularity - 16?


Would that be a bet? :grin:

Razor1
30-Apr-2007, 01:34
If this AA method is a post process effect how much bandwidth will it hit, will hit both bandwidth and the shader rate (which at this point seems fine it seems to have an abundent amount of shader ability)? And if this is a 512 mb card, how much memory will that take up additionally? Comparative to traditionl MSAA.

Jawed
30-Apr-2007, 01:40
Yep I realized you had touched on that in your earlier post. But wouldn't that be a compiler and instruction scheduling nightmare? You would need the ability to issue up to five different instructions per clock!
Yep! Pretty exciting huh! The compiler is definitely going to be fun, which is why all those static single assignment patent applications come in handy:

Method and apparatus for reducing instruction dependencies in extended SSA form instructions (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2006070050&F=0)

as they specifically untangle vector code to decide where instruction dependencies lie, per component. The truth is, compilers have already done this kind of thing in the past (and G80's compiler must be doing pretty much the same) it's just that for ATI this is the most complicated.

For ATI it was mostly a question of treating RGB as always independent from A. That two-way split makes the compiler's job much easier. But swizzles and output masks add a ton of complexity.

?? I thought each of the 16-ALU's (one per pixel) per shader unit was a 5-wide ALU :???: Or is that the same thing?
No. They're five independent ALUs. Each one can execute a different instruction.

They're 16-wide, because there's 16 pixels all running in parallel within the shader unit. So, it's sorta like a sideways G80. Remember the arguments about how G80's ALUs aren't really scalar, because they're 8-wide arrays of ALUs performing sequential instruction execution of the individual components of a vector? 8-pixels are locked together in G80. 16 in R600.

The operand fetch should be moderately exciting in R600. 3 operands have to be fetched for each of 5 ALUs. Need to think about this...

Jawed

Jawed
30-Apr-2007, 01:47
So what are the bets on branching granularity - 16?
Which chip?

Jawed

Jawed
30-Apr-2007, 01:50
OK, I'm curious now, why are people expecting CFAA to be a post-process shader?

Is it because D3D10 allows access to the individual samples, anyway? Or is there a more fundamental thing?

In my view, when you look at a render target in memory (or in cache) all those samples are cosying-up next to each other. The resolve hardware can easily stride through that data at whatever rate it feels like. Don't forget the resolve hardware has to be able to cope with both, at the very least, 32-bit and 64-bit per pixel render target formats.

Jawed

Unknown Soldier
30-Apr-2007, 01:52
23 years of trolling the internet and counting...

Jawed

You realise, you make me feel young again ;) since i've only been browsing the internet for 10 years plus. :D

US

Mize
30-Apr-2007, 02:01
Re: 23 years trolling the internet...

The web browser (invented in NeXTStep) isn't old. You must mean ARPNET or BITNET using FTP???

trinibwoy
30-Apr-2007, 02:03
Lol, how do you know Jawed isn't 23 years old?

trinibwoy
30-Apr-2007, 02:14
The truth is, compilers have already done this kind of thing in the past (and G80's compiler must be doing pretty much the same) it's just that for ATI this is the most complicated.

Ok, although G80 shouldn't have to worry about instruction dependency as much.

No. They're five independent ALUs. Each one can execute a different instruction.

They're 16-wide, because there's 16 pixels all running in parallel within the shader unit. So, it's sorta like a sideways G80. Remember the arguments about how G80's ALUs aren't really scalar, because they're 8-wide arrays of ALUs performing sequential instruction execution of the individual components of a vector? 8-pixels are locked together in G80. 16 in R600.

Ah yeah you're right that's a better way to think about it. That firms up my own understanding as well, thanks. One thing I'm not clear on - are all five ALU's running the same 16-pixel thread? If so, why not just run five different threads and not worry about multi-issue since the ALU's are independent anyway?

What chip?

R600

Razor1
30-Apr-2007, 02:17
OK, I'm curious now, why are people expecting CFAA to be a post-process shader?

Is it because D3D10 allows access to the individual samples, anyway? Or is there a more fundamental thing?

In my view, when you look at a render target in memory (or in cache) all those samples are cosying-up next to each other. The resolve hardware can easily stride through that data at whatever rate it feels like. Don't forget the resolve hardware has to be able to cope with both, at the very least, 32-bit and 64-bit per pixel render target formats.

Jawed

Well it seems like these new modes are programmable, would it be far fetched to think it would be easier to do it within a shader?

dnavas
30-Apr-2007, 02:18
Re: 23 years trolling the internet...

...You must mean ARPNET or BITNET using FTP???

Usenet would have been my assumption.

Jawed
30-Apr-2007, 02:19
Re: 23 years trolling the internet...

The web browser (invented in NeXTStep) isn't old. You must mean ARPNET or BITNET using FTP???
rn

http://en.wikipedia.org/wiki/Rn_(newsreader)

on BSD Unix on JANET in the UK, technically. I should have rec.music stuff printed out on good old fashioned 132 character stave paper somewhere, I think:

http://www.pdp8.net/images/greenbar.shtml

with 5-line staves instead of solid blocks :grin:

Jawed

trinibwoy
30-Apr-2007, 02:22
OK, I'm curious now, why are people expecting CFAA to be a post-process shader?

Well the slide says "Anti-aliasing is effectively a post-processing filter applied to each rendered frame". Are you saying that the filter could be run on the ROPs?

Rys
30-Apr-2007, 02:27
Well the slide says "Anti-aliasing is effectively a post-processing filter applied to each rendered frame". Are you saying that the filter could be run on the ROPs?
That depends on the resolve being asked for (and in R6's case whether you think it's custom).

3vi1
30-Apr-2007, 02:29
Re: 23 years trolling the internet...

The web browser (invented in NeXTStep) isn't old. You must mean ARPNET or BITNET using FTP???

Jawed is a shader running on a GPU from the futurrrree. :wink:

Seriously, that guy has great comments. I've been lurking here for 3 years and I've always enjoyed his insight, cheers!


On a side note, I've been wondering.. Since AMD is implementing a sound component, no doubt helped by the removal of the audio stack from previous versions of windows; what is the likelihood of AMD has implemented something like Creative's X-FI sound processor (http://www.tomshardware.com/2005/08/18/creative_x/page2.html)?

Can we compare ring buses here, or am I jumping to conclusions?

Razor1
30-Apr-2007, 02:31
That depends on the resolve being asked for (and in R6's case whether you think it's custom).


How so? I know you probably know all about the r600 by now lol, are you saying its not really programmable? It kinda sounds that way cause of the drivers hint.

Jawed
30-Apr-2007, 02:32
Ok, although G80 shouldn't have to worry about instruction dependency as much.
Only if you pretend the MI/SF pipeline doesn't exist. R600 has dedicated interpolators outside of the ALU pipeline. SF instructions always last 4 or 8 clocks, which makes finding MAD-ALU-issuable instructions tough (if you want 100% MAD ALU utilisation). Doable, but complicated.

Ah yeah you're right that's a better way to think about it. That firms up my own understanding as well, thanks. One thing I'm not clear on - are all five ALU's running the same 16-pixel thread? If so, why not just run five different threads and not worry about multi-issue since the ALU's are independent anyway?
Hey, now that's good thinking. No idea! What a groovy idea if it is like that.

R600
RV610 could do as well. If it's four clocks per batch, then that'd be 16. Then R600 would be 64. I dunno actually...

But your comment about threads has piqued my curiosity...

Jawed

Rys
30-Apr-2007, 02:38
How so? I know you probably know all about the r600 by now lol, are you saying its not really programmable? It kinda sounds that way cause of the drivers hint.
I'm not saying anything, really :smile: And if I knew everything I'd be working for AMD. You can learn a hell of a lot about the chip from this thread though, and a bunch from Mr. Ashraf.

Razor1
30-Apr-2007, 02:40
LOL man you know how much info has been in these r600 threads :lol: .

Star_Hunter
30-Apr-2007, 02:47
LOL man you know how much info has been in these r600 threads :lol: .

Its the source of half the rumors on the internet.

Jawed
30-Apr-2007, 02:48
Well it seems like these new modes are programmable, would it be far fetched to think it would be easier to do it within a shader?
The screendoor antialiasing patent:

Sample-level screen-door transparency using programmable transparency sample masks (http://v3.espacenet.com/textdoc?DB=EPODOC&IDX=US2007070082&F=0)

refers to a "programmer-specified" sampling pattern. So the sample pattern can vary per pixel and the combination of MSAA and transparency AA samples can be defined. So that's the programmable bit, at least as far as the driver is concerned. Programmers won't get their hands on the oSampleMask function until after D3D10 it seems...

But I have to admit, I interpreted the document to mean that the RBEs do the rest. I know screendoor isn't CFAA - but the "programmable" processing implied here makes the RBEs rather more fun than we're used to.

A tough question: CFAA and HDR at the same time? Seems unlikely...

Actually does anyone know what 24xCFAA's sample pattern is? That's "edge detect", which implies it's similar to CSAA, which could imply all the samples are within the pixel

Jawed

trinibwoy
30-Apr-2007, 02:49
Only if you pretend the MI/SF pipeline doesn't exist. R600 has dedicated interpolators outside of the ALU pipeline. SF instructions always last 4 or 8 clocks, which makes finding MAD-ALU-issuable instructions tough (if you want 100% MAD ALU utilisation). Doable, but complicated.

Not sure what you mean here, that G80's shared MI/SF pipeline will cause bubbles in the main MAD ALU?

SirPauly
30-Apr-2007, 02:55
A tough question: CFAA and HDR at the same time? Seems unlikely...



http://i21.photobucket.com/albums/b299/Genocide737/newjp2.jpg


Edit: I'm assuming it may work with all these other modes offered. Don't know for sure though!

BRiT
30-Apr-2007, 02:59
A tough question: CFAA and HDR at the same time? Seems unlikely...


Maybe it's just marketting-speak, but this slide hints at CFAA and HDR at the same time, no?

As for the sampling pattern, that should be interesting to see. They only showed 16xCFAA.

Geeforcer
30-Apr-2007, 03:03
To me, that slide says "Hey, we still support all this stuff as well" but it could very well mean "CFAA works with all this".

Jawed
30-Apr-2007, 03:07
Not sure what you mean here, that G80's shared MI/SF pipeline will cause bubbles in the main MAD ALU?
Yep it's possible - but major bubbles are meant to be rare.

Remember that code segment? I found a way to schedule it on G80 so that there are no bubbles on the MAD ALU. Took a fair amount of to-ing and fro-ing...

Jawed

trinibwoy
30-Apr-2007, 03:37
Based on Rys's comment I'm leaning towards CFAA happening on the ROPs (might as well use that since I've got nothing else to go on :)) The slides indicate to me that CFAA works by borrowing MSAA color samples from neighbouring pixels during the resolve. So that explains why it can affect non-edge pixels. But that essentially boils down to a blur since you're blending neighbouring pixel colors. That blurring could cause a loss of detail on an otherwise properly filtered (AF) texture. In terms of performance (I think) the ROPs would need the ability to read 8 FP16 samples per clock for 16xCFAA + HDR - maybe that's what all the bandwidth is for!

Also, there doesn't seem to be any Z-trickery going on, just real color samples. So that's one obvious advantage there - it doesn't fall over when stencil shadows are used.

trinibwoy
30-Apr-2007, 03:47
Remember that code segment? I found a way to schedule it on G80 so that there are no bubbles on the MAD ALU. Took a fair amount of to-ing and fro-ing...

Hopefully it'll take less to-ing and fro-ing to efficiently schedule actual game code :razz:

R300King!
30-Apr-2007, 05:13
http://www.vr-zone.com/?i=4935

We learned that AIB partners for both NVIDIA and ATi have already received the GeForce 8800 Ultra cards and Radeon HD 2900 XT cards and are preparing for shipments now. These cards will hit the retail shelves by May 14th, the same day AMD officially launch their R6xx series. However, only 2900 XT cards will be available while 2900 XTX, 2600 and 2400 cards will be available at a later date. We heard that there are some 6000 pieces of 2900 XT cards by the first week of launch and no supply problems are expected.

Mize
30-Apr-2007, 05:22
rn

http://en.wikipedia.org/wiki/Rn_(newsreader)

on BSD Unix on JANET in the UK, technically. I should have rec.music stuff printed out on good old fashioned 132 character stave paper somewhere, I think:

http://www.pdp8.net/images/greenbar.shtml

with 5-line staves instead of solid blocks :grin:

Jawed

I started rn in 1988 :) but it was BITNET/ARPNET as nobody called it internet yet.

Rangers
30-Apr-2007, 05:24
Wow only 6,000 is enough to supply a launch? That doesn't sound right. Especially for a $400 somewhat mainstream priced part..

Nvidia is clueless with that Ultra OT, we've criticized ATI enough for it's stupid moves for sure.

Arty
30-Apr-2007, 05:45
6000 sounds like an launch allocation to one AIB, especially one that has both ATI/Nvidia cards (Asus or MSI). Also keep in mind that vr-zone has less credibility when it comes to ATI rumors. But then again its ATI, so anything is possible. :grin:

Twinkie
30-Apr-2007, 05:50
On one of the slides, it compares CSAA and CFAA. The slides mention that CSAA doesnt apply to stencil shadows, but really i thought the Q modes from CSAA applies AA to stencil shadows?

nAo
30-Apr-2007, 05:55
If this AA method is a post process effect how much bandwidth will it hit, will hit both bandwidth and the shader rate (which at this point seems fine it seems to have an abundent amount of shader ability)? And if this is a 512 mb card, how much memory will that take up additionally? Comparative to traditionl MSAA.
BW wise it should be extremely efficient as it works locally as long as your render target is tiled in memory with texture caches doing the rest

ChrisRay
30-Apr-2007, 05:58
On one of the slides, it compares CSAA and CFAA. The slides mention that CSAA doesnt apply to stencil shadows, but really i thought the Q modes from CSAA applies AA to stencil shadows?

Stencil shadow has always been based on the base level of the color sample. 16x uses 4x color/zsamples.. And 16xQ uses 8 color/zSamples. I'll wait till I see more about CFAA before passing comment. I have never been a fan of post processing effects. Just be hopeful its nothing like quincunx,

nAo
30-Apr-2007, 06:02
Maybe I'm wrong and they do everything in the ROPs but I don't know..the shader core + TMUs really have everything is needed to perform such operations at high speed, since the hw is already thert why don't re-use it? :) And as I said..I can see the competition "easily" implement something similar..

silent_guy
30-Apr-2007, 06:38
The context implies multi-issue of independent instructions (a progression from single-issue, then co-issue, then 5-way issue).

I'm a bit annoyed that AMD is calling this 320 stream processors, because that implies 320 independent threads. Mismatched terminology will only make it harder to get the point across. (And that's not even including today's admirable efforts of Fud to confuse the matter even more by injecting a pile of complete nonsense.)

My best effort to characterize this beast would then be 64 5-way super-scalar processors. After all, each of the 5 ops still has to be part of the same thread, right? I guess that's a subtlety that most shoppers-in-the-mall wouldn't understand when comparing to a product with 128 *real* streaming processor...

I wonder how if they are able to encode those independent ops in a 32-bit instruction word. :wink:

fellix
30-Apr-2007, 06:42
The slides indicate to me that CFAA works by borrowing MSAA color samples from neighbouring pixels during the resolve. So that explains why it can affect non-edge pixels. But that essentially boils down to a blur since you're blending neighbouring pixel colors. That blurring could cause a loss of detail...
It will cause loss of detail (eg., blur), if the filter kernel was a plain box filter (http://img341.imageshack.us/img341/6831/boxfilterii9.jpg), where the distance of all of the samples (incl. outside the pixel) from the pixel center isn't taken into account, eg. all is flattened.
But in the case of CFAA, the tent averaging (http://img341.imageshack.us/img341/2397/tentfilteraz4.jpg)method is non-flat, so it gradually gaining the weights towards the pixel center with very nice macro-pixel blending. Gaussian averaging (http://img341.imageshack.us/img341/8203/gaussianfilterwu9.jpg)is even more "natural", but it's quite more math demanding. :wink:

Farhan
30-Apr-2007, 06:45
I'm a bit annoyed that AMD is calling this 320 stream processors, because that implies 320 independent threads. Mismatched terminology will only make it harder to get the point across. (And that's not even including today's admirable efforts of Fud to confuse the matter even more by injecting a pile of complete nonsense.)

My best effort to characterize this beast would then be 64 5-way super-scalar processors. After all, each of the 5 ops still has to be part of the same thread, right? I guess that's a subtlety that most shoppers-in-the-mall wouldn't understand when comparing to a product with 128 *real* streaming processor...

I wonder how if they are able to encode those independent ops in a 32-bit instruction word. :wink:

:shock: Which product has 128 *real* streaming processors?

silent_guy
30-Apr-2007, 06:54
:shock: Which product has 128 *real* streaming processors?

I mean: 128 processors, in the sense that each processor has its own program counter. Because that's probably one of the weaknesses of R600 then: if there is branch divergence in a batch of threads, performance will still have a granularity of 5 instead of 1.

(See: it's really hard to come up with unambiguous terminology! :wink: )

FrameBuffer
30-Apr-2007, 07:00
corret me if I am wrong, however it would seem to me that the 6,000 units was aimed at the 8800 GTX Ultra ??

(ok tiny rant)..

why is it that time and time again that then (nv) can seemingly do these Ultra Paper launches as a means to try to "steal" ATI's thunder and time and time again it is simply brushed over by the enthusiatst hardware sites (and yes I included B3D in this one as well).

From the fabled 6800 Ultra Spectalcular XXX then the 7900 Ultra $999 edition (which iirc I was LOL@ for suggesting ahead of such announcement) and here we are again with the 8800 GTX Ultra LE (limited Edition ? ).

Yes I admit my blood is red and I tend to favor the red camp however I havent seen ATI pull this crap over and over .. and just as easily get a pat on the back by the supposed journalistic force

/end rant

Twinkie
30-Apr-2007, 07:05
corret me if I am wrong, however it would seem to me that the 6,000 units was aimed at the 8800 GTX Ultra ??

(ok tiny rant)..

why is it that time and time again that then (nv) can seemingly do these Ultra Paper launches as a means to try to "steal" ATI's thunder and time and time again it is simply brushed over by the enthusiatst hardware sites (and yes I included B3D in this one as well).

From the fabled 6800 Ultra Spectalcular XXX then the 7900 Ultra $999 edition (which iirc I was LOL@ for suggesting ahead of such announcement) and here we are again with the 8800 GTX Ultra LE (limited Edition ? ).

Yes I admit my blood is red and I tend to favor the red camp however I havent seen ATI pull this crap over and over .. and just as easily get a pat on the back by the supposed journalistic force

/end rant

How about the XTXXXXXPEXTXX edition?

Actually i got one better.. ever heard of the X1950XTX uber edition?

:lol:

Rangers
30-Apr-2007, 07:22
corret me if I am wrong, however it would seem to me that the 6,000 units was aimed at the 8800 GTX Ultra ??

(ok tiny rant)..

why is it that time and time again that then (nv) can seemingly do these Ultra Paper launches as a means to try to "steal" ATI's thunder and time and time again it is simply brushed over by the enthusiatst hardware sites (and yes I included B3D in this one as well).

From the fabled 6800 Ultra Spectalcular XXX then the 7900 Ultra $999 edition (which iirc I was LOL@ for suggesting ahead of such announcement) and here we are again with the 8800 GTX Ultra LE (limited Edition ? ).

Yes I admit my blood is red and I tend to favor the red camp however I havent seen ATI pull this crap over and over .. and just as easily get a pat on the back by the supposed journalistic force

/end rant

This is a real pointless launch by Nvidia anyway, since if I understand correctly many vendor OC'd 8800's have higher core clocks than the $999 Ultra!

Really makes no sense. Guess 8800 got too hot for Nvidia to at least follow their tradition of welding two together.

Twinkie
30-Apr-2007, 07:29
This is a real pointless launch by Nvidia anyway, since if I understand correctly many vendor OC'd 8800's have higher core clocks than the $999 Ultra!

Really makes no sense. Guess 8800 got too hot for Nvidia to at least follow their tradition of welding two together.

Abit of OT, but i do agree with you on the ultra variant of G80. Seems quite pointless as even the 7800GTX 512mb provided higher performance agaisnt the 7800GTX 256mb than this.

SugarCoat
30-Apr-2007, 07:29
This is a real pointless launch by Nvidia anyway, since if I understand correctly many vendor OC'd 8800's have higher core clocks than the $999 Ultra!

Really makes no sense. Guess 8800 got too hot for Nvidia to at least follow their tradition of welding two together.

thats not exactly true. The Shader clock is going to be at 1500MHz as opposed to 1350 stock or 1400 which i think some of the more expensive AIB OC cards have. I'll be a little interested to see how it effects it. GTS vs GTX vs Ultra vs XT vs XTX should make for some quick'n interesting chart fun.

I kinda doubt the Ultra will be that expensive. If its over $650 after the first month i'd actually be quite shocked. Its like asking for bad PR.

Russell
30-Apr-2007, 07:39
It's a wonderful feeling to return from work hearing both positive chatter about the R600 again and having earned the rest of the money I need to buy one.

Now, if they'd only release it...and Bearlake.

ChrisRay
30-Apr-2007, 08:13
http://img403.imageshack.us/img403/2249/20919688is2.jpg


Back when I did my CSAA investigation I used this tool as well. But something I noticed. This paticular angle is the "worse" case scenerio for CSAA, Also is this a comparison of 16x Normal or 16xQ?

Chris

fellix
30-Apr-2007, 08:22
16xQ or not, this shot is too much compressed to justify. :roll:

BTW, anyone come with a hint of what should look like the 24xCFAA mode? Maybe some CrossFireish thingie.

OT: this thread will hit 4K posts, for sure

ChrisRay
30-Apr-2007, 08:36
Your right it is pretty compressed. Forgive me if this is not to scale. Just something I quickly threw together to look at uncompressed.

http://img209.imageshack.us/img209/542/csaatomsas6.th.png (http://img209.imageshack.us/my.php?image=csaatomsas6.png)

fellix
30-Apr-2007, 08:40
Well, as far as I see, the slide shows a fair comparison -- CFAA has better colour sampling distribution (gradients).
Applying a simple (1/2) Gauss filtering will do even better job, but it's about 5~6 cycles per sample cost, IMHO.

Anon Lamer
30-Apr-2007, 11:13
Usenet would have been my assumption.

Usenet went to shit somewhere around 95 when the average joes came online. You may not believe it, but f.ex alt.sex was once free from spambots and people looking from someone to ejaculate upon. It was very much like this board is now, across all newsgroups.

Now, on R600, since the chinese claim 890 mhz core overclock achieved on the XT specimens I suspect that ATI is delaying the XTX in order to smash down on 8800 ultra. Premium cores + fast GDDR3 = ~20% performance over XT. BUT on the other hand, if the last last rumors about XT doing well against the GTX, then the XTX might not be really necessary. I cant wait until wednesday when Hugin and Munin brings us knowledge about R600. ;)

AnarchX
30-Apr-2007, 11:36
Hardspell has some nice pictures of the full HD 2000 line-up:
http://www.hardspell.com/doc/hardware/36904_page18123.html

And also interessting details about UVD.

dizietsma
30-Apr-2007, 11:37
On the point of when NDA expires has this changed since the take over by AMD, for example are we now going to have a 9am PST expiry ? This is not good for me as I live in the UK and that is late afternoon.

My preference would be for a midnight PST NDA expiry if we have to use that timezone.

AnarchX
30-Apr-2007, 11:42
I upload the pictures from Hardspell, because it looks like they could delete them soon:
HD 2400
http://directupload.com/thumbs/ijqdnznu4xtnr35tdnyc.jpg (http://directupload.com/files/ijqdnznu4xtnr35tdnyc.jpg)http://directupload.com/thumbs/mmdkjryoeoixztizydxj.jpg (http://directupload.com/files/mmdkjryoeoixztizydxj.jpg)

HD 2600 Pro and XT
http://directupload.com/thumbs/nddrzjm5jknwzjoh5jka.jpg (http://directupload.com/files/nddrzjm5jknwzjoh5jka.jpg)http://directupload.com/thumbs/jflmnynmjmgy4jjnxtqj.jpg (http://directupload.com/files/jflmnynmjmgy4jjnxtqj.jpg)

HD 2900XT
http://img219.imageshack.us/img219/7199/483e6e55a5d2401298cebbeew2.th.jpg (http://img219.imageshack.us/my.php?image=483e6e55a5d2401298cebbeew2.jpg)

Skinner
30-Apr-2007, 12:20
I wonder if 2x PCIE8x (i975 mobo ) will be enough to feed 2 X2900XT's in CF?

IbaneZ
30-Apr-2007, 12:27
The 2600 Pro is tiny compared to the XT.

trinibwoy
30-Apr-2007, 12:28
It will cause loss of detail (eg., blur), if the filter kernel was a plain box filter (http://img341.imageshack.us/img341/6831/boxfilterii9.jpg), where the distance of all of the samples (incl. outside the pixel) from the pixel center isn't taken into account, eg. all is flattened.
But in the case of CFAA, the tent averaging (http://img341.imageshack.us/img341/2397/tentfilteraz4.jpg)method is non-flat, so it gradually gaining the weights towards the pixel center with very nice macro-pixel blending. Gaussian averaging (http://img341.imageshack.us/img341/8203/gaussianfilterwu9.jpg)is even more "natural", but it's quite more math demanding. :wink:

But wouldn't any filter at all - regardless of its quality - result in a blur since it's using data from outside the pixel to determine its final color? Will this improve IQ across the screen or just on polygon edges?

jamis
30-Apr-2007, 12:40
The 2600 Pro is tiny compared to the XT.
But it doesn't have extra power plug so that rumour was true.:cool: Also it seems to be single slot not dual, there were some pics earlier of dual slot XT but I wasn't sure as it looked really like HD2900.
But that pimp my plastic style is really tacky on all of them.:lol:

vertex_shader
30-Apr-2007, 12:58
But it doesn't have extra power plug so that rumour was true.:cool: Also it seems to be single slot not dual, there were some pics earlier of dual slot XT but I wasn't sure as it looked really like HD2900.
But that pimp my plastic style is really tacky on all of them.:lol:

"Please AMD pimp my card" :lol:

fellix
30-Apr-2007, 12:58
But wouldn't any filter at all - regardless of its quality - result in a blur since it's using data from outside the pixel to determine its final color? Will this improve IQ across the screen or just on polygon edges?
Imagine the difference between adjusting the plain brightness vs. the gamma levels of you monitor (being CRT or TFT). The type of weighted fallout from the components (in our case, the sub-samples) into determining the final pixel colour value is making the difference.
I remember now, that XGI (or other IHV) had support of a very weird AA mode, which box-filter kernel overlapped all the eight neighboring pixels subsamples -- that's where the filter kernel type makes a blur, with rather significant loss of detail.

Razor1
30-Apr-2007, 13:50
http://www.fudzilla.com/index.php?option=com_content&task=view&id=749&Itemid=1

could this be why the rv630 is late and not because of the miilion so orders to OEM's?

Slappi
30-Apr-2007, 14:07
http://www.fudzilla.com/index.php?option=com_content&task=view&id=749&Itemid=1

could this be why the rv630 is late and not because of the miilion so orders to OEM's?

Could Fuad be more full of crap?

Sobek
30-Apr-2007, 14:21
Could Fuad be more full of crap?

Impossible.

More silos needed.

trinibwoy
30-Apr-2007, 14:21
:lol:

Razor1
30-Apr-2007, 14:26
Impossible.

More silos needed.


LOL

compres
30-Apr-2007, 14:41
I still wonder how the likes of Fuad can keep their job. Just looking at his articles(I say article but they might not deserve to be called that) I can tell he does not even read them before publishing. His work is about the crappiest in the whole internet right now...

BTW where is mao5? Is it bed time in China?

anaqer
30-Apr-2007, 14:45
I still wonder how the likes of Fuad can keep their job.
As long as he gets hits...

Kaotik
30-Apr-2007, 14:50
BTW where is mao5? Is it bed time in China?

I think I saw men with black masks and AMD jackets heading his way :runaway:

Geeforcer
30-Apr-2007, 14:55
I still wonder how the likes of Fuad can keep their job. Just looking at his articles, I can tell...

Question answered?

DemoCoder
30-Apr-2007, 15:31
I started rn in 1988 :) but it was BITNET/ARPNET as nobody called it internet yet.

I was MUDing in '86, reading usenet news in '85 and it was already called the Internet back then. (However, there was no rec.music or anything prior to '87, back then it was netgroups)

Jawed
30-Apr-2007, 15:32
Based on Rys's comment I'm leaning towards CFAA happening on the ROPs (might as well use that since I've got nothing else to go on :)) The slides indicate to me that CFAA works by borrowing MSAA color samples from neighbouring pixels during the resolve. So that explains why it can affect non-edge pixels. But that essentially boils down to a blur since you're blending neighbouring pixel colors. That blurring could cause a loss of detail on an otherwise properly filtered (AF) texture. In terms of performance (I think) the ROPs would need the ability to read 8 FP16 samples per clock for 16xCFAA + HDR - maybe that's what all the bandwidth is for!

Also, there doesn't seem to be any Z-trickery going on, just real color samples. So that's one obvious advantage there - it doesn't fall over when stencil shadows are used.
Good summary.

The blur should be mitigated by the use of a tent filter, which looks like /\ across the width of a pixel, instead of the classic box. The samples outside of the pixel will be weighted lower than those inside. What we can't tell, as yet, is how granular the weightings are, whether it's a tent based on radius for all samples, or a tent for only those samples outside the pixel based on distance, or a simple fractional weighting based on the fact the samples are outside the pixel.

Jawed

allnighter
30-Apr-2007, 15:36
I still wonder how the likes of Fuad can keep their job. Just looking at his articles(I say article but they might not deserve to be called that) I can tell he does not even read them before publishing. His work is about the crappiest in the whole internet right now...


One could argue that we all make our contribution.
I'm sure almost everyone on thes boards, and many other as well, does his/her daily check of Fudzilla and the Inq for a freash load of nonsense, and some will go as far as looking for that rare gem of "real info", providing our beloved entertainer with some nice ad revenue.
He's got a pretty safe business model going, I'd say. Post a lot of crap and people will come and read it, if for nothing else to have some fun.
Go Fudo!

3vi1
30-Apr-2007, 15:50
One could argue that we all make our contribution.
I'm sure almost everyone on thes boards, and many other as well, does his/her daily check of Fudzilla and the Inq for a freash load of nonsense,



I don't..

Next :arrow:

fellix
30-Apr-2007, 15:58
Good summary.
...
Actually, the tent kernel is very cheap (and simple) -- generally one MUL op per sample address, eg. linear f(x), but the meaning point here is namely the steep weighting toward the pixel center, counter-acting to the outer edge, thus suppressing the blur (d)effect, but leaving enough residual colour gradients.
Nevertheless, I still want my precious Gauss kernel, some day in a DX10 driver release, maybe. :lol:

BTW, I expect very frenzy & detailed R600 analysis here, in B3D. ;)

Jawed
30-Apr-2007, 16:01
I guess that's a subtlety that most shoppers-in-the-mall wouldn't understand when comparing to a product with 128 *real* streaming processor...
Eh? G80 has 16 8-way ALUs, each of which has a 2-way SF unit. There's your instruction granularity there.

Reading across the entire unified shader, R600 has a maximum of 20 distinct instructions in flight at a time (ignoring pipeline length): 4 units with 5 instructions. G80 has 32 distinct instructions: 16 units with 2 instructions.

I wonder how if they are able to encode those independent ops in a 32-bit instruction word. :wink:
How many different instructions can the 4 skinny ALUs do?:

NOP
ADD, MAD, MUL, SUB ...
integer ADD, MAD, MUL, SUB ...
binary ops (<<, >>, |, &, !),
format conversions: fp<->uint<->int and 32<->16 variants (can't remember if there's a 24-bit variant)can't think of anything else. That should fit in 20-bits, as 4-way x 5-bits and 12 bits to cover the SF gubbins on top of that lot in the fatboy ALU.

Jawed

neliz
30-Apr-2007, 16:03
fuad's A12 story just recycles what inq posted a couple of weeks back about going from A12 to A14 for the 2600...

The weird thing is.. those 2600's don't have a PCIe power connector but the ones in crossfire on AMD's forum do. One could argue that AMD went the extra mile on a re-spin to bring power requirements down a notch

Jawed
30-Apr-2007, 16:04
Actually, the tent kernel is very cheap (and simple) -- generally one MUL op per sample address, eg. linear f(x), but the meaning point here is namely the steep weighting toward the pixel center, counter-acting to the outer edge, thus suppressing the blur (d)effect, but leaving enough residual colour gradients.
I was only cautioning that it may not look like a teepee kind of tent. It might have a flat top and steep sides or a flat top and stepped sides.

Jawed

silent_guy
30-Apr-2007, 16:26
How many different instructions can the 4 skinny ALUs do?:

NOP
ADD, MAD, MUL, SUB ...
integer ADD, MAD, MUL, SUB ...
binary ops (<<, >>, |, &, !),
format conversions: fp<->uint<->int and 32<->16 variants (can't remember if there's a 24-bit variant)can't think of anything else. That should fit in 20-bits, as 4-way x 5-bits and 12 bits to cover the SF gubbins on top of that lot in the fatboy ALU.

That covers the instruction type. What about the register selection? If each is doing a MAD and they are truly independent, you're looking at 3 operands and 1 destination * 5 = 20 register selections. If the register file is 32 deep, you're looking at 100 bits just for for that.
But that's only be the case for a MAD.

It wouldn't surprise me if a MAD instruction will still have some restrictions wrt register dependence.

mao5
30-Apr-2007, 16:48
I'm coming with Test Drive Unlimited Gifthttp://www.chiphell.com/images/smilies/xizao.gif

1/R600XT, default E6600, default R600XT, ALL MAX 4xAA+HDR (1280x1024)
http://www.chiphell.com/attachments/month_0704/20070430_0400fa82c37352d946d3L3kMK47f6XI8.jpg

http://www.chiphell.com/attachments/month_0704/20070430_c75ef0d929375639ae83W9YksBRHKU2s.jpg

2/3.36g 6300 + x1950xtx, no game in setting info (1280x1024)
http://www.chiphell.com/attachments/month_0704/20070430_5bd953e4fed00463d6e3dhLNdgeZ0N2n.jpg

3/3.52G E6400 2GB ram 660/2200 8800GTX ALL MAX 4xAA +HDR (1280x960)

http://www.chiphell.com/attachments/month_0704/20070430_9372aaadc1957f360411Nkow9FjyyONs.jpg

more 3.52G E6400 2GB ram 660/2200 8800GTX (1280x960) pics:
http://we.pcinlife.com/thread-733559-1-1.html
http://we.pcinlife.com/thread-733065-1-1.html

Mod: Don't inline link images of that size!

tertsi
30-Apr-2007, 16:57
Yes, the R600XT can co-issue 5 vectors/scalars, and in some cases R600XT is over 2x faster than 8800 GTX.

Test instruction set (length 384, no texture fetch and 100 iterations)
MAD R0.xyz, R0, R0, R1;
MUL R2.x, R2, R3;
MAD R1.x, R1, R1, R3;
MAD R0.xyz, R1, R1, R0;
ADD R2.x, R1, R2;
MUL R3.x, R3, R1;


R600XT (co-issue 3 instructions) - 93,9277 GInstr/sec

8800 GTX - 39,1998 GInstr/sec

nicolasb
30-Apr-2007, 16:59
There will be a DVI-HDMI convert in retail box, R6XX DVI can send 5.1 ss outSince when is it possible to transmit digital audio via a DVI connector? Surely part of the reason HDMI exists at all is precisely that it adds to DVI the ability to transmit video and audio down the same cable?

R300King!
30-Apr-2007, 17:01
From German site

http://directupload.com/files/hzmynwgg5zjdzniufoio.jpg
http://directupload.com/files/j2rwj2tyjxmcamngnimz.jpeg
http://directupload.com/files/nizonm4myzznr0ndjlma.jpg
http://directupload.com/files/myjzmivzgmyw5iwzdgxi.jpg

http://directupload.com/files/yyijk2zw5qmq3mmojdyd.jpg

Kaotik
30-Apr-2007, 17:10
The last doesn't seem official as the rest, wasn't the general impression that it has 16x32bit, not 8x64bit controllers?

nAo
30-Apr-2007, 17:13
Integer support only via the fat ALU?

silent_guy
30-Apr-2007, 17:14
From German site...
Thanks! VLIW was the magic word I was looking for.

R300King!
30-Apr-2007, 17:14
The last doesn't seem official as the rest, wasn't the general impression that it has 16x32bit, not 8x64bit controllers?

Honestly, not sure, I just reposted the images from www.3dcenter.de forum. The last image posted was by a different forum member than the first images.

oeLangOetan
30-Apr-2007, 17:32
Integer support only via the fat ALU?

The fat alu is probably the special function unit

2k internal ringbus, :o

Jawed
30-Apr-2007, 17:33
That covers the instruction type. What about the register selection?
It's not just registers for operands, though. It's also possible to have indexed constants. There are 16 constant arrays (actually that's just what can be bound to a shader), each constant can have 4096 entries and each entry can have 4 elements. I didn't realise you meant to include stuff beyond the type!

Also, it seems that R6xx will be able to directly read/write virtual memory addresses, which will be a chunk of bits. Maybe I've got the wrong end of the stick though and only TUs and RBEs can address memory.

If each is doing a MAD and they are truly independent, you're looking at 3 operands and 1 destination * 5 = 20 register selections. If the register file is 32 deep, you're looking at 100 bits just for for that.
But that's only be the case for a MAD.
Yeah, hurts my head. Hoping we'll get some nice detail on this.

It wouldn't surprise me if a MAD instruction will still have some restrictions wrt register dependence.
In the past ATI has been very proud of the fact that its ALU pipeline has not had any operand fetch bandwidth restrictions. I kinda hope they've maintained that tradition.

Theoretically, R3xx...R5xx are all capable of fetching 5 vec4 operands per clock, to feed the MAD+ADD pipeline. But I've never seen it stated in such cold, hard terms. It was years before ATI admitted to the presence of the vec4 ADD unit in the R3xx-and-up ALU pipeline...

Jawed

nAo
30-Apr-2007, 17:42
The fat alu is probably the special function unit

that's for sure, but since they just mention "integer and logic ops support" it makes me think
that thesere ops are not orthogonal to any ALU. I might be wrong of course.

Frank
30-Apr-2007, 17:46
Well, the specs certainly look very yummy! And close to what Jawed and I originally speculated about! Only with a lot stricter scheduling.

So, it does seem to be a case of bad drivers, after all. If it delivers, it should surely dethrone the 8800 easily.


I think they have hidden a large operand cache on the chip, to store all those VLIW ops. Think about it: they need something in the order of a 128 bit word each clock for each ALU simply to keep it going.

Diamond.G
30-Apr-2007, 17:52
Since when is it possible to transmit digital audio via a DVI connector? Surely part of the reason HDMI exists at all is precisely that it adds to DVI the ability to transmit video and audio down the same cable?

Higher bandwidth and mandatory HDCP (over HDMI) support are some others that come to mind.

nicolasb
30-Apr-2007, 17:57
Higher bandwidth and mandatory HDCP (over HDMI) support are some others that come to mind.I did say part of the reason :); there are lots of other factors: resilience of signal is another (can be quite hard work getting DVI to work over a 10-metre cable, but not too tough with HDMI). But mao was claiming that R600 would output audio via DVI (and hence into an HDMI cable via a dongle); I wasn't aware that was possible.

Dave Baumann
30-Apr-2007, 18:00
Higher bandwidth and mandatory HDCP (over HDMI) support are some others that come to mind.
Technically HDMI does not mandate HDCP, just that the likely implementations means they will nearly always go hand in hand.

Jawed
30-Apr-2007, 18:11
Well, the specs certainly look very yummy! And close to what Jawed and I originally speculated about! Only with a lot stricter scheduling.
Well, for me, thread packing was a big step beyond this, aimed at improving branching at the same time as throughput. R3xx...R5xx can issue 4 ALU instructions per clock (I think; at least 3).

So, it does seem to be a case of bad drivers, after all. If it delivers, it should surely dethrone the 8800 easily.
I think it's really the case that most DX9 games (that are capable of stretching G80/R600) are ROP or TMU limited (turn on all the shadows and make the trees look high res). I'm really doubtful we'll see much advantage for R600 due solely to ALU instruction throughput.

I think they have hidden a large operand cache on the chip, to store all those VLIW ops.
Confused, do you mean instruction cache?

Think about it: they need something in the order of a 128 bit word each clock for each ALU simply to keep it going.
I'm struggling to see how this is markedly different from R3xx or NV4x (or NV3x). Aren't these all VLIW processors? Don't forget the texture instructions too. Ooh, and the vertex fetches. And memory reads/writes (gather/scatter)?

Jawed

the maddman
30-Apr-2007, 18:19
I did say part of the reason :); there are lots of other factors: resilience of signal is another (can be quite hard work getting DVI to work over a 10-metre cable, but not too tough with HDMI). But mao was claiming that R600 would output audio via DVI (and hence into an HDMI cable via a dongle); I wasn't aware that was possible.

How about using the analog pins on the DVI along with a proprietary dongle to send the audio? They should have enough bandwidth to carry the signal, and aren't being used when it's sending digital video.

fellix
30-Apr-2007, 18:22
The last shot of the ring-bus isn't showing any different in the topology, compared to the R500 line, as the earlier slides stated -- just the doubled internal and external data channels. :roll:

So, R600 seems to be a scalar outfit, just like G80, but with rather different structural stacking.

stevem
30-Apr-2007, 18:30
Dave, allowed to use a keyboard again?

The last doesn't seem official as the rest, wasn't the general impression that it has 16x32bit, not 8x64bit controllers?
Hmm, both GDDR3/4 can be arranged as 16Mx32bit (8banksx2M)=512Mbit/8=64bit. The relationship between ring-stops/controller width hasn't been disclosed (AFAICR). The final pic indicates 8x64bit channels as per the 3,3,2 layout of DRAMS on the PCB. It also suggests that memories can share a channel, so back to back mounting lines up with 2900XT board pics. Of course it could also be expediancy of layout.

Frank
30-Apr-2007, 18:34
Well, for me, thread packing was a big step beyond this, aimed at improving branching at the same time as throughput. R3xx...R5xx can issue 4 ALU instructions per clock (I think; at least 3).
Agreed.

I think it's really the case that most DX9 games (that are capable of stretching G80/R600) are ROP or TMU limited (turn on all the shadows and make the trees look high res). I'm really doubtful we'll see much advantage for R600 due solely to ALU instruction throughput.
That might be the case. But then again, the preformance hit from better filtering would be reduced, and it makes sense to do preprocessing through the vertex shaders that would be too costly in the past. Less shading for the pixelshaders as well.

Confused, do you mean instruction cache?
Yes, sorry.

I'm struggling to see how this is markedly different from R3xx or NV4x (or NV3x). Aren't these all VLIW processors? Don't forget the texture instructions too. Ooh, and the vertex fetches. And memory reads/writes (gather/scatter)?
Yes, but you have many more ALUs, that can do more things as well. As they're scalar, that alone increases the amount of instructions needed five times. And there are L1 and L2 caches for those other things as well.

And while a ringbus is really neat, it does require a lot of additional buffers as well, including a good scheduling mechanism that needs to be aware of the data required plenty in advance. Switching to a different thread is no solution either, because that requires a data refresh as well.

wishiknew
30-Apr-2007, 18:55
This stuff is getting more and more over my head. I'm curious on how hardocp interpret it.

Natoma
30-Apr-2007, 18:56
This stuff is getting more and more over my head. I'm curious on how hardocp interpret it.

Probably the same way you did. :wink:

stevem
30-Apr-2007, 18:59
The last shot of the ring-bus isn't showing any different in the topology, compared to the R500 line, as the earlier slides stated -- just the doubled internal and external data channels. :roll:
This previous slide indicates some differences as x1k design used an internal switch/x-bar. I guess the exact topology is unknown.
http://img106.imageshack.us/img106/9016/ringbusii1fw7jf0.jpg (http://imageshack.us)

wishiknew
30-Apr-2007, 19:02
Probably the same way you did. :wink:

:(

I need a 3D for dummies book now.

And the r/v6x0 series supporting pcf, does this mean those complaints about shadows not rendering right will go away?

Frank
30-Apr-2007, 19:05
:(

I need a 3D for dummies book now.
You might be better off with a "CPUs and distributed computing for dummies" book, as that's more or less what the GPUs are becoming.

;)

mao5
30-Apr-2007, 19:17
seems nobody pay attention to Test Drive Unlimited compare test.

trinibwoy
30-Apr-2007, 19:18
R600XT (co-issue 3 instructions) - 93,9277 GInstr/sec

8800 GTX - 39,1998 GInstr/sec

Those numbers seem to indicate that not only is R600 faster than G80 but it's also more efficient in terms of ALU usage. That's puzzling since R600's max theoretical advantage is ~ 40% assuming perfect utilization and ignoring G80's MUL.

tertsi, what are the theoretical maximums for instruction throughput on this test given what you know of G80 and R600?

satein
30-Apr-2007, 19:18
Additional slides about AA on R6xx... (Posted at Xtremesystems)

http://img135.imageshack.us/img135/6019/amd1dk8.jpg

http://img61.imageshack.us/img61/654/amd2zd4.jpg

Jawed, you're the man!! You are right on a tent filter!!

Good summary.

The blur should be mitigated by the use of a tent filter, which looks like /\ across the width of a pixel, instead of the classic box. The samples outside of the pixel will be weighted lower than those inside. What we can't tell, as yet, is how granular the weightings are, whether it's a tent based on radius for all samples, or a tent for only those samples outside the pixel based on distance, or a simple fractional weighting based on the fact the samples are outside the pixel.

Jawed

Kaotik
30-Apr-2007, 19:18
seems nobody pay attention to Test Drive Unlimited compare test.

Give 8800GTS & GTX numbers if possible from the same view as R600 shots and they're a lot more interesting :wink:
But the performance seemed quite nice, even though you can't compare directly when the scenes are different

mao5
30-Apr-2007, 19:19
rumor suggests that there's no UVD in Radeon HD 2900XT? Anyone can confirm it?

mao5
30-Apr-2007, 19:20
Give 8800GTS & GTX numbers if possible from the same view as R600 shots and they're a lot more interesting :wink:
But the performance seemed quite nice, even though you can't compare directly when the scenes are different

mh, you are right, but the tester don't have GTX on his hand.

fellix
30-Apr-2007, 19:23
Additional slides about AA on R6xx...
Those were already posted few moons early in the thread. ;)

Robin B
30-Apr-2007, 19:27
seems nobody pay attention to Test Drive Unlimited compare test.

The test are only done in 1280x1024, why not test it in 1680x1050 or 1600x1200 ? Dont think there is many harcore gamers that play games in 1280x1024 anymore.

trinibwoy
30-Apr-2007, 19:34
seems nobody pay attention to Test Drive Unlimited compare test.

Maybe because all the shots were different and the performance was based on a single frame?

SugarCoat
30-Apr-2007, 19:37
The test are only done in 1280x1024, why not test it in 1680x1050 or 1600x1200 ? Dont think there is many harcore gamers that play games in 1280x1024 anymore.

you'd be thinking wrong.


800 x 600 32,563 2.79 %
1024 x 768 483,863 41.53 %
1152 x 864 77,255 6.63 %
1280 x 1024 462,111 39.66 %
1440 x 900 36,775 3.16 %
1600 x 1200 21,472 1.84 %
1680 x 1050 26,851 2.30 %
1920 x 1200 11,917 1.02 %
Other 12,380 1.06 %

Those two unquestionably DOMINATE. Now if you want to talk about choice resolution of those who spend 500+ on a high end card then thats different ;). Most hardcore gamers dont even know what the hell an G80 is let alone a R600, they just play games until their computer needs to be replaced.

Not to mention the growing popularity of LCD monitors and the fact that the price of ones that are above 1280x1024 native is a pretty big price hike. My monitor cost me about $300 when i bought it, for a decent one that did 1600x1200 it was at least double that.

Kaotik
30-Apr-2007, 19:38
rumor suggests that there's no UVD in Radeon HD 2900XT? Anyone can confirm it?

I'm not 100% sure, but I think in the Hardspell.com's article, which had a lot of UVD stuff including drivers installed, showing the controls etc, they had HD 2900XT on it.

The article is already gone though, I think

flippin_waffles
30-Apr-2007, 19:38
But the framerates were the same at both 104 mph and 0 mph. Is that significant?

Moloch
30-Apr-2007, 19:50
you'd be thinking wrong.


800 x 600 32,563 2.79 %
1024 x 768 483,863 41.53 %
1152 x 864 77,255 6.63 %
1280 x 1024 462,111 39.66 %
1440 x 900 36,775 3.16 %
1600 x 1200 21,472 1.84 %
1680 x 1050 26,851 2.30 %
1920 x 1200 11,917 1.02 %
Other 12,380 1.06 %

Those two unquestionably DOMINATE. Now if you want to talk about choice resolution of those who spend 500+ on a high end card then thats different ;). Most hardcore gamers dont even know what the hell an G80 is let alone a R600, they just play games until their computer needs to be replaced.

Not to mention the growing popularity of LCD monitors and the fact that the price of ones that are above 1280x1024 native is a pretty big price hike. My monitor cost me about $300 when i bought it, for a decent one that did 1600x1200 it was at least double that.
You can get (many) a 22" WS that does 1680x1050 for well under 400 clams :wink:
Closer to 300 infact.
Only problem is they're all TN panals, but thats supposed to be the "gamer" panal.

Kaotik
30-Apr-2007, 19:59
You can get (many) a 22" WS that does 1680x1050 for well under 400 clams :wink:
Closer to 300 infact.
Only problem is they're all TN panals, but thats supposed to be the "gamer" panal.

Ye well, you know that 400, even 300 is damn big money for some to spend :wink:

Moloch
30-Apr-2007, 20:03
Ye well, you know that 400, even 300 is damn big money for some to spend :wink:
if you spend 400 on a videocard, why not spend close to that on a display ;)
No sense buying a high end card for gaming at 1280 or even 1024 :smile:

Frank
30-Apr-2007, 20:10
if you spend 400 on a videocard, why not spend close to that on a display ;)
No sense buying a high end card for gaming at 1280 or even 1024 :smile:
If you have to choose between high-res or maxed settings or a high framerate, what do you choose? And that's when you have the card and monitor to be able to make that choice in the first place?

compres
30-Apr-2007, 20:12
Question answered?

LOL I = pwned?

To be honest it is hard to avoid not clicking the links when someone quotes him...

dess
30-Apr-2007, 20:30
A slide on the last page is titled "SIMD Arrays". But then it says that "Each instruction word can include up to 6 independent, co-issued operations", and "All operations are performed in parallel on each data element in the current thread". Now, is it really SIMD, or is it MIMD, instead?

Bouncing Zabaglione Bros.
30-Apr-2007, 20:37
if you spend 400 on a videocard, why not spend close to that on a display ;)
No sense buying a high end card for gaming at 1280 or even 1024 :smile:


I'd rather have a lower res with every bit of eyecandy, AA, high IQ and special effect at good frame rates, than have a high res, but have to turn off AA or the eyecandy to get good framerate.

Bjorn
30-Apr-2007, 20:40
So, it does seem to be a case of bad drivers, after all. If it delivers, it should surely dethrone the 8800 easily.


So AMD is going to release the 2900 XT at 400$ knowing that it'll soon demolish the 8800 GTX, it's just the drivers that's keeping it back ?

Not saying that it's impossible, just that it sounds a bit to much like wishful thinking.

Sound_Card
30-Apr-2007, 20:40
I'd rather have a lower res with every bit of eyecandy, AA, high IQ and special effect at good frame rates, than have a high res, but have to turn off AA or the eyecandy to get good framerate.

I rather have high res with eye candy. :razz:

My wallet however say's no.

Robin B
30-Apr-2007, 20:46
I rather have high res with eye candy. :razz:

My wallet however say's no.

My wallet says yes, my wife says no.:cry:

Sound_Card
30-Apr-2007, 20:47
So AMD is going to release the 2900 XT at 400$ knowing that it'll soon demolish the 8800 GTX, it's just the drivers that's keeping it back ?

Not saying that it's impossible, just that it sounds a bit too much like wishful thinking.

One way to look at it is that they are 6 months late. Perhaps they feel thay can catch a wider market with a different price point to help regain lost ground.

trinibwoy
30-Apr-2007, 21:02
I could understand undercutting the GTX to regain ground but undercutting the GTS too? That's pretty aggressive.

Kocur
30-Apr-2007, 21:02
Do you guys really think that 2900XT will demolish 8800GTX?

Why?

I expect the Radeon to end up faster, but not by much, up tp 20%.
In some very peculiar shader bound situations (very peculiar, that is, all the shading power can be used at once and with the maximal attainable effectiveness) its lead might increase to 40%, not more.

What do you think?

Silent_Buddha
30-Apr-2007, 21:19
I was under the impression from the innuendo that people under NDA are saying that the XTX not the XT is competitive and at times ahead of the GTX.

And that both the XT and XTX appear to have a substantial lead in DX10 tests.

Then again I could be mis-interpreting what they are actually trying to say without saying it. :P

We'll know soon enough when the NDA expires.

Also, I'm still under the impression that the "real" XTX is the dual chip Dragonshead 2 or whatever it was called. Meaning the original XTX was possibly cancelled due to GDDR 4 not providing a performance boost as they were hoping for. And thus they moved the dual chip solution into that slot. This is of course almost purely speculative guessing... :wink:

Regards,
SB

flopper
30-Apr-2007, 21:20
Do you guys really think that 2900XT will demolish 8800GTX?

Why?

I expect the Radeon to end up faster, but not by much, up tp 20%.
In some very peculiar shader bound situations (very peculiar, that is, all the shading power can be used at once and with the maximal attainable effectiveness) its lead might increase to 40%, not more.

What do you think?

Its technological with feautures that seems really nice. then it overklocks also a lot.
I like overclocking it feels good.

If it goes head to head and better than a GTS and overlocked beats a GTX its a no brainar then at a pricetag att 400us or whatever it becomes here.
Similiar or even better performance with better imagequality is a winner in my book.
Its not needed to have the ultimate cards if the performance is enough at a specific pricepoint.
However, its hard to say how well it will do.
I bet in windows Vista it be really good to have as a gamer card.
Xp, it be so and so.

Rangers
30-Apr-2007, 21:40
Well, the specs certainly look very yummy! And close to what Jawed and I originally speculated about! Only with a lot stricter scheduling.

So, it does seem to be a case of bad drivers, after all. If it delivers, it should surely dethrone the 8800 easily.


It sounds like the SHADERS are ace, yes, but it'll probably held back by 16TMUS/ROPS remember?

compres
30-Apr-2007, 21:41
It sounds like the SHADERS are ace, yes, but it'll probably held back by 16TMUS/ROPS remember?

Have you even seen the information about the rops? They are certainly not 16 r580 rops...

Farhan
30-Apr-2007, 21:43
Have you even seen the information about the rops? They are certainly not 16 r580 rops...

I haven't. Where is this information?

Rangers
30-Apr-2007, 21:45
Have you even seen the information about the rops? They are certainly not 16 r580 rops...

I'm more worried about the TMU capabilities anyway. Besides, I'm stupid compared to everybody else in this thread, so I dont know how to interpet such things anyway.

Kaotik
30-Apr-2007, 21:45
if you spend 400 on a videocard, why not spend close to that on a display ;)
No sense buying a high end card for gaming at 1280 or even 1024 :smile:

True, but then again, what if your current card isn't fast enough to enjoy it on the higher res display, and you can't ditch 800 on 'em to get both?
I personally got myself X1800XL used, so didn't quite pay 400 or even 300, not even close :razz:

Sound_Card
30-Apr-2007, 21:46
I could understand undercutting the GTX to regain ground but undercutting the GTS too? That's pretty aggressive.

$399 is not undercuting the GTS. That's $80 over the 320mb and $50 over the 640mb. Average of course.

vertex_shader
30-Apr-2007, 21:51
seems nobody pay attention to Test Drive Unlimited compare test.

In that area in the game r600xt 63 frame (http://www.chiphell.com/attachments/month_0704/20070430_0400fa82c37352d946d3L3kMK47f6XI8.jpg) in that picture not really impressive.

E4300 and x1950pro 1280x1024 max setting 4xaa+hdr 30 frame (http://img71.imageshack.us/img71/848/testdriveunlimitedgh2.jpg), like in FEAR only ~200% faster than the x1950pro, i think the XT can't beat any of todays games the GTX. (maybe some cherry picked resolution and setting), so the xt not in pair performance wise with the gtx than some rumors/users suggest this.

Slowly i'm sure the r600 maked for dx10 and not for dx9, but dx10 games coming maybe ~5 this year, dx9 games coming ~150 this year, so i think ATi in the wrong way overall and not have any enhusiast card in they pocket, enhusiast users will cry :wink:

Still some hope left the drivers can be better, but for this we need wait for weeks.

INKster
30-Apr-2007, 21:53
$399 is not undercuting the GTS. That's $80 over the 320mb and $50 over the 640mb. Average of course.

$80 over the 320MB GTS ?
I would say it's more in the region of 100 to 120 USD over it.

trinibwoy
30-Apr-2007, 21:55
$399 is not undercuting the GTS. That's $80 over the 320mb and $50 over the 640mb. Average of course.

Lemme guess...you're comparing MSRP to street price again? The 640MB GTS is $449 MSRP. Not sure why you're even mentioning the 320MB card.....

compres
30-Apr-2007, 21:57
I haven't. Where is this information?

http://forum.beyond3d.com/showpost.php?p=978460&postcount=3629

Silent_Buddha
30-Apr-2007, 22:02
It depends. I'm wondering how many game companies will play with DX10 features in patches to existing game. Ala - Far Cry with SM 3.0 as a past example. And like Call of Juarez and Just Call with DX10.

I'm guessing not a lot, but there's a few others. There was that screen of Company of Heroes with a DX10 option that wasn't enabled yet. Likewise the Flight Sim X crew appear to be working on DX10 features.

I don't think DX10 performance is going to sell this generation of cards, however...

I only buy a new graphics card at most once a year. Assuming R600 performs close to but slightly slower than G80 in DX9 games, but performs quite a bit better in DX10. Then the R600 may be the better card for me.

Reminds me alot of the FX5800. Where NV had competitive and sometimes faster DX8 performance but fell far short when it came to DX9 perf. I'm hoping this isn't the case with G80. And that NV is just holding something back in the drivers.

EG - I'm hoping that it's just DX10 borked driver for G80 right now that makes it appear to be not so good in DX10. Granted the only thing I have to judge by is that one slide of Call of Juarez DX10 that may or may not be accurate.

So in the end this might just be a dead end train of thought with no connection to reality. :lol:

Regards,
SB

Sound_Card
30-Apr-2007, 22:04
Lemme guess...you're comparing MSRP to street price again? The 640MB GTS is $449 MSRP. Not sure why you're even mentioning the 320MB card.....

I know the difference, I was the one that pointed out to you on the x1950pro.
ahh and seems I was wrong too. I must have got confused with the 320mb GTS because I would have sworn the 640mb GTS was lower than this (http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Description=8800GTS&x=0&y=0).

So danm, That really is under cutting the GTS. :shock:

Mintmaster
30-Apr-2007, 22:05
Yes, the R600XT can co-issue 5 vectors/scalars, and in some cases R600XT is over 2x faster than 8800 GTX.

Test instruction set (length 384, no texture fetch and 100 iterations)
MAD R0.xyz, R0, R0, R1;
MUL R2.x, R2, R3;
MAD R1.x, R1, R1, R3;
MAD R0.xyz, R1, R1, R0;
ADD R2.x, R1, R2;
MUL R3.x, R3, R1;


R600XT (co-issue 3 instructions) - 93,9277 GInstr/sec

8800 GTX - 39,1998 GInstr/secWhy is the 8800GTX so slow in this? I see 10 scalar instructions among the 6 total instructions, so theoretically shouldn't it be 1.35*128*6/10 = 104 GInstr/sec?

It's almost as if the GTX is treating that set as 6 4D instructions and masking the rest. There must be significant compiler issues. On the other hand, the R600XT rate doesn't reach the naive theoretical peak either (unless it's clocked at 500MHz), so who knows what's going on.

Farhan
30-Apr-2007, 22:10
http://forum.beyond3d.com/showpost.php?p=978460&postcount=3629

That's the TMUs. Not the ROPs.

Jawed
30-Apr-2007, 22:12
That might be the case. But then again, the preformance hit from better filtering would be reduced, and it makes sense to do preprocessing through the vertex shaders that would be too costly in the past. Less shading for the pixelshaders as well.
The question is, can this stuff be applied to DX9 games that are already out there?

Yes, but you have many more ALUs, that can do more things as well. As they're scalar, that alone increases the amount of instructions needed five times.
Hmm, compared with R5xx, which could issue 4 instructions per clock (or 3...?), I don't see how 5 instructions is a big change in terms of instruction issue.

Operand fetch, that's a different problem and seemingly a huge change. Maybe we're talking at cross-purposes: you're including operand fetch as part of instruction issue?

I'm curious to see what the repeat rate for an instruction is. In R5xx it's 4, and the pipeline alternates between 2 batches, making for 8 clocks total latency per pixel-instruction.

And while a ringbus is really neat, it does require a lot of additional buffers as well, including a good scheduling mechanism that needs to be aware of the data required plenty in advance.
This is where some kind of scoreboarding comes into play. ATI's had plenty of practice doing this stuff in R5xx and Xenos. Arguably the asynchronous texturing in R3xx provided an early taste... I think R600 is evolutionary in this respect.

Switching to a different thread is no solution either, because that requires a data refresh as well.
I know it sounds glib, but that's precisely what this architecture lives and breathes for. A queue of latency-inducing events (fetches, branches), another queue of latency-free events (instruction clauses, each bound by a latency event) and priorities+weightings to make it all hum. Again, an evolution from previous GPUs.

Jawed

vertex_shader
30-Apr-2007, 22:13
It depends. I'm wondering how many game companies will play with DX10 features in patches to existing game. Ala - Far Cry with SM 3.0 as a past example. And like Call of Juarez and Just Call with DX10.

I'm guessing not a lot, but there's a few others. There was that screen of Company of Heroes with a DX10 option that wasn't enabled yet. Likewise the Flight Sim X crew appear to be working on DX10 features.

I don't think DX10 performance is going to sell this generation of cards, however...

I only buy a new graphics card at most once a year. Assuming R600 performs close to but slightly slower than G80 in DX9 games, but performs quite a bit better in DX10. Then the R600 may be the better card for me.

Reminds me alot of the FX5800. Where NV had competitive and sometimes faster DX8 performance but fell far short when it came to DX9 perf. I'm hoping this isn't the case with G80. And that NV is just holding something back in the drivers.

EG - I'm hoping that it's just DX10 borked driver for G80 right now that makes it appear to be not so good in DX10. Granted the only thing I have to judge by is that one slide of Call of Juarez DX10 that may or may not be accurate.

So in the end this might just be a dead end train of thought with no connection to reality. :lol:

Regards,
SB

This is the dx10 games looks now coming this year (some will be delayed to 2008 for sure).
Crysis, Hellgate London, Clive Baker's Jericho, Age of Conan: Hyborian Adventures, Unreal 3, Bioshock, Lost Planet.
Dx10 patches coming for Flight Simulator X, Company of Heroes, Call of Juarez, Just Cause.

compres
30-Apr-2007, 22:17
That's the TMUs. Not the ROPs.

Even so, I don't think it is right to say "remember the 16 rops will be keeping you from performing" when we know nothing about official specs...

Moloch
30-Apr-2007, 22:17
I'd rather have a lower res with every bit of eyecandy, AA, high IQ and special effect at good frame rates, than have a high res, but have to turn off AA or the eyecandy to get good framerate.
How bout Both ?
There is the novel idea of saving up more :wink:

Also while fsaa does wonders for edges at lower res I'd rather use like 2x fsaa at like 1600x1200 than 4x fsaa at 1024 or 1280.

Galduta
30-Apr-2007, 22:19
STALKER with AA X4 ???

http://bp0.blogger.com/_hqH07Z7-z90/RjWuWPShO8I/AAAAAAAAAG8/IcV5E8m0wmY/s1600/66510766le0.jpg

http://i4.photobucket.com/albums/y117/jonelo/Dibujo-1.png

Kaotik
30-Apr-2007, 22:30
STALKER with AA X4 ???

http://bp0.blogger.com/_hqH07Z7-z90/RjWuWPShO8I/AAAAAAAAAG8/IcV5E8m0wmY/s1600/66510766le0.jpg

Without dynamic lighting it works :razz:

Rangers
30-Apr-2007, 22:31
Whoops incorrect information edit..

Bouncing Zabaglione Bros.
30-Apr-2007, 22:32
How bout Both ?
There is the novel idea of saving up more :wink:

Also while fsaa does wonders for edges at lower res I'd rather use like 2x fsaa at like 1600x1200 than 4x fsaa at 1024 or 1280.

I don't see a point where developers won't be piling on so many effects that you will get acceptable frame rates with all eyecandy and IQ features at high res. I don't think anyone will be running Crysis or Alan Wake with everything turned on (including AA) at very high resolutions and still be getting 100+ fps.

Give developers the kind of power that G80 or R600 have, and they will still suck it all up and ask for more.

Galduta
30-Apr-2007, 22:36
Without dynamic lighting it works :razz:

- AMD present Stalker running in the 2900 Croosfire in DX 8 :grin::grin: . NO is serius .

fellix
30-Apr-2007, 22:48
Some thoughts:

G80 -- more "legacy" hardware for top notch DX9 experience and some mucho-MHz ALUs for the sake of marketing.
NV30 -- just the MHz, and even more for the tiny memory bus. And don't bother for that skinny FP32 thingie there.

R600 -- throw all the bets for uber-massive VLIW number-crushing array and crowd every last cornier of the PCB with an insane wide bus.
R300 -- ...well, you should know the its story.

INKster
30-Apr-2007, 22:53
Some thoughts:

G80 -- more "legacy" hardware for top notch DX9 experience and some mucho-MHz ALUs for the sake of marketing.
NV30 -- just the MHz, and even more for the tiny memory bus. And don't bother for that skinny FP32 thingie there.

R600 -- throw all the bets for uber-massive VLIW number-crushing array and crowd every last cornier of the PCB with an insane wide bus.
R300 -- ...well, you should know the its story.

From what is known, G80 has little in the way of legacy hardware from G7x and older GPU's, compared to the R580 -> R600.
Also, i doubt the highly clocked scalar ALU's weren't a true design decision right from the very beginning.

fellix
30-Apr-2007, 22:57
Well, at least they would keep the primary MADD unit from the G70 pipe, but now we have a just quasi-MUL here. :lol:
G80 just missed the chance to become a real MADDnes.

jimmyjames123
30-Apr-2007, 22:59
In the AMD slide which showed Crossfire scaling on various games, the gains appear to be impressive. However, at those resolutions and settings, NV SLI also boosts performance by factor of 1.8-2.0.

I am a little surprised that there is very little mention of X2900 vs 8800 in the AMD slides. The only comparison is related to AA and what CFAA can do vs CSAA.

What type of performance hit would one expect for CFAA vs CSAA? From what I recall, there is very little performance penalty with CSAA.

Also, it appears that the 8800 series will have an advantage in AF quality and speed vs the HD 2900...?

Jawed
30-Apr-2007, 23:00
Tertsi's latest code seems like cruel and unusual punishment:

http://www.cupidity.f9.co.uk/b3d96.gif

Note the rates are for 1 iteration.

Also note I left R600 at 800MHz. Scale it down for whatever you think the real clock of XTX (or the tested XT) will be. And scale G80 up for whatever you think Ultra will be...

Jawed

fellix
30-Apr-2007, 23:12
What type of performance hit would one expect for CFAA vs CSAA? From what I recall, there is very little performance penalty with CSAA.
Both modes [CFAA & CSAA] are peaking at eight multi-samples as a base filter kernel (eg. two 4-sample loops).
CFAA, with it's new tent filter, is just re-using the already sampled subpixels in the neighboring pixels, so no extra memory is wasted, as I see it, just more readouts.
The coverage sampling (a kind of binary resolving) in G80 from for its sake is consuming just a fraction of the resource by the real colour subsamples.

So, I think -- given the wider memory interface -- R600 should pop-up in this comparison.

nAo
30-Apr-2007, 23:14
G80 -- more "legacy" hardware for top notch DX9 experience and some mucho-MHz ALUs for the sake of marketing.
Are you kidding right?

3vi1
30-Apr-2007, 23:15
This is the dx10 games looks now coming this year (some will be delayed to 2008 for sure).
Crysis, Hellgate London, Clive Baker's Jericho, Age of Conan: Hyborian Adventures, Unreal 3, Bioshock, Lost Planet.
Dx10 patches coming for Flight Simulator X, Company of Heroes, Call of Juarez, Just Cause.

Don't forget Alan Wake!

oh yea this too: Windows Vista Key to Live Anywhere Vision (http://www.microsoft.com/presspass/press/2006/may06/05-09G4WE3LineupPR.mspx)

Under Microsoft’s Live Anywhere vision, announced by Chairman and Chief Software Architect Bill Gates on stage today, the power of the Xbox Live games and entertainment network will storm onto the Windows platform. This vision will first arrive on Windows Vista this winter with the release of “Shadowrun.” Published by Microsoft Game Studios, “Shadowrun” will be shipped simultaneously on Windows Vista and Xbox 360, enabling cross-platform game play and continuing to bridge the gap between consoles and computers.

The new on line network will provide Windows Vista and Xbox 360 gamers with a consistent experience, with one gamertag, one set of achievements, one friends list and voice communications between all Xbox Live members. Bringing the Live Anywhere vision to Windows means that millions of gamers worldwide can access all the best elements of Xbox Live whether they’re playing on an Xbox 360 system, a Windows-based PC or their mobile phone.

“We’re incredibly excited to combine the best online innovations from Xbox Live with the platform innovations for which Windows is known,” said Peter Moore, corporate vice president of the Interactive Entertainment Business in the Entertainment and Devices Division at Microsoft. “Windows gamers have a lot to look forward to: Xbox Live experiences, revolutionary DirectX 10 technologies in Windows Vista and some of the best games in the world — all on Windows.”

I recall reading way back before Xbox360 came out that ATI actually switched directions about their future GPU. Apparently, Microsoft had come to them with this unified vision about XBOX and Vista being able to play on the same network. In order to facilitate that process, ATI got bribed (payed/promised other goodies?!) into designing their next GPU as able to easily handle XBOX360 code conversion and vice versa. <<<

So.. This might mean that all of XBOX360 games may end up migrating to Vista - provided they have ATI hardware! Also, this makes sense when reflecting on Microsoft removing DirectX sound - in order to undercut Creative and EAX!

Open AL: DirectSound3D on Windows Vista (http://www.openal.org/openal_vista.html)

With Microsoft's decision to remove the audio hardware layer in Windows Vista, legacy DirectSound 3D games will no longer use hardware 3D algorithms for audio spatialization. Instead they will have to rely upon the new Microsoft software mixer that is built into Windows Vista. This new software mixer will give the users basic audio support for their old Direct Sound games but since it has no hardware layer, all EAX® effects will be lost, and no individual per-voice processing can be performed using dedicated hardware processing.

EAX has become the de facto standard for real-time effects processing. It has been incorporated in hundreds of games and has become the method of choice for game developers wanting to add interactive environment effects to their titles. Some of the best selling games of all time use the EAX extensions to DirectSound 5.0 and beyond, including Warcraft3, Diablo2, World of Warcraft, Half Life, Ghost Recon, F.E.A.R. and many others. Under Windows Vista, these games will be losing the hardware support that came as standard under the previous Windows Operating Systems, and will no longer provide real-time interactive effects, making them sound empty and lifeless by comparison to the way they sound on Windows XP.

Anyhow.. I have a question.. What is the feasibility of ATI or some other company building a PHYSICS based application on Vista using DirectX 10 based cards like the R600? Would such a thing be possible with what I'm reading so far? I'm thinking something in the vein of Adobe but for Physics applications and engineering/gaming.


Cheers

fellix
30-Apr-2007, 23:17
Are you kidding right?
I hope, I am -- we're enough of GFFX kind of leg-shootings, right! ;)

Kaotik
30-Apr-2007, 23:20
Apparently Creative's ALchemy is working quite nicely considering it's beta on X-Fi cards though, to emulate DS3D + EAX via OpenAL?

willardjuice
30-Apr-2007, 23:33
Apparently Creative's ALchemy is working quite nicely considering it's beta on X-Fi cards though, to emulate DS3D + EAX via OpenAL?

Yeah for the most part it works for me, my only problem now is Guild Wars doesn't work flawlessly with it.

Sphinx
30-Apr-2007, 23:38
It depends. I'm wondering how many game companies will play with DX10 features in patches to existing game. Ala - Far Cry with SM 3.0 as a past example. And like Call of Juarez and Just Call with DX10.

I'm guessing not a lot, but there's a few others. There was that screen of Company of Heroes with a DX10 option that wasn't enabled yet. Likewise the Flight Sim X crew appear to be working on DX10 features.

I don't think DX10 performance is going to sell this generation of cards, however...

I only buy a new graphics card at most once a year. Assuming R600 performs close to but slightly slower than G80 in DX9 games, but performs quite a bit better in DX10. Then the R600 may be the better card for me.

Reminds me alot of the FX5800. Where NV had competitive and sometimes faster DX8 performance but fell far short when it came to DX9 perf. I'm hoping this isn't the case with G80. And that NV is just holding something back in the drivers.

EG - I'm hoping that it's just DX10 borked driver for G80 right now that makes it appear to be not so good in DX10. Granted the only thing I have to judge by is that one slide of Call of Juarez DX10 that may or may not be accurate.

So in the end this might just be a dead end train of thought with no connection to reality. :lol:

Regards,
SB

Thats what i feel too ~ but again Developers of these games will use there Engines/Games with fallbacks for the DX <10 Generation...

Will we ever learn see the difference if DX9 or DX10 will be used ? What if Developers can easily add the same output for DX9 fallback mode ~ wich would could be used on DX10... the visual differences we would not find really...

Just as Example...
So if G80 would perform 200% on DX9.0 and only 50% on D3D_10 then R600 could Perform only 100% on DX9.00 and 150% on D3D_10

You would see G80 run faster or same as the R600 and on the visual site G80 would have enough raw power to render the same output as DX9.0+ in comparsion with the R600 and D3D10...

bigtabs
30-Apr-2007, 23:45
I could've sworn Microsoft were claiming 6x the speed for gaming under Vista and DX10 not so long ago. :lol: Surely that has to account for something? :wink:

Razor1
30-Apr-2007, 23:48
Some thoughts:

G80 -- more "legacy" hardware for top notch DX9 experience and some mucho-MHz ALUs for the sake of marketing.


Well I wouldn't say legacy, do you think Dx10 games are going to be less texture or fillrate bound then dx9 games? It looks to me both companies skimped in certain areas.

I don't remember ATi skimping on Dx8 performance on the r300, it performed better then the gf4 and the fx while using less power. And now take a r300 can it really play heavy duty dx9 games?

Frank
30-Apr-2007, 23:50
I could've sworn Microsoft were claiming 6x the speed for gaming under Vista and DX10 not so long ago. :lol: Surely that has to account for something? :wink:
Nah, they're way too conservative. It's more like at least 10 times as fast. ;)

Frank
30-Apr-2007, 23:54
Well I wouldn't say legacy, do you think Dx10 games are going to be less texture or fillrate bound then dx9 games? It looks to me both companies skimped in certain areas.

I don't remember ATi skimping on Dx8 performance on the r300, it performed better then the gf4 and the fx while using less power. And now take a r300 can it really play heavy duty dx9 games?
I agree somewhat. While the focus is and has been on shaders, the amount and size of the textures (and the amount of memory they can use) has steadily increased as well.

But the TMUs have become better as well, the clockspeeds have increased, the amount of TMUs has risen slowly, and the available bandwidth has increased.

So, it's a toss-up. But it would likely have had immediate benefits for current games if they had increased the amount a bit more.

Razor1
01-May-2007, 00:01
Immidiate games are what makes these cards sell, great technology is only great if the benefits are usable now and near future, not 1 year from now, there will be one or two more launches by then.

bigtabs
01-May-2007, 00:10
Immidiate games are what makes these cards sell, great technology is only great if the benefits are usable now and near future, not 1 year from now, there will be one or two more launches by then.

I disagree with this somewhat, as I like having a card with a bit more legs. Means I can be comfortable skipping a gen, or at least waiting for a next gen refresh.

Razor1
01-May-2007, 00:11
If it doesn't perform better then a competing card in Dx9 games because of a bottleneck or in this case possible 2 bottenecks, how will that bottleneck be eased in Dx10?

fellix
01-May-2007, 00:14
And now take a r300 can it really play heavy duty dx9 games?
Well, to be honest, D3D9 is far stretched API for its time, but R300 is maybe the longest survivor in the modern graphics industry -- a true evergreen. ;)

Razor1
01-May-2007, 00:15
Gotta to agree with that, but the reason it did last so long is because 50% of the market was using fx's :wink:

jimmyjames123
01-May-2007, 01:57
So where are the R600 performance vs G80 leaks? I would have thought there would be more performance comparisons leaked by now. Does NDA actually expire on May 2?

Silent_Buddha
01-May-2007, 02:01
Heck one of my good friends only just recently upgraded from his 9700 pro to a 8800 GTX.

And I still have a machine with a 9700 pro that gets used during LAN parties. It's a file server otherwise.

If I had to vote for greatest 3D hardware of all time in terms of either impact or longevity, that would have to go to the 3dfx Voodoo1 or the ATI R300. Granted the R300 might have been made to look better due to it's competition, but considering it's STILL useable in the vast majority of todays games speaks volumes.

Regards,
SB

silent_guy
01-May-2007, 02:51
It's not just registers for operands, though. It's also possible to have indexed constants.
Is it completely ridiculous to suggest that constants are loaded into registers before issuing ALU ops. I guess the performance drop would be catastrophic in some cases. :wink:

(Question for a 3D shader expert: How common are constant operands in real life 3D shaders?)

In the past ATI has been very proud of the fact that its ALU pipeline has not had any operand fetch bandwidth restrictions. I kinda hope they've maintained that tradition.
Yes, that's the other big question: but with the vec4, isn't the only freedom a permutation of xyza? (I haven't read the CTM docs...)
That reduces a 15-way fetch into a 6-way fetch (if the 1D supports MAD), if you're smart about the way you lay out registers in memory.
If there are no restrictions at all in R600, you'd need 15 independent reads and 5 independent writes per shader unit. Insane.

Another question: in R600, the branch logic is in parallel with the ALU's. Is that a departure from R5xx or has it always been like that?

Hmm, compared with R5xx, which could issue 4 instructions per clock (or 3...?), I don't see how 5 instructions is a big change in terms of instruction issue.
Yes, I'm not sure there is that much additional value in allowing independent scalar operations, since the majority of ops in 3D are vec3 or vec4 based anyway and the amount of scalar ops are not that common.
It's probably a different story for GPGPU.

silent_guy
01-May-2007, 02:56
And while a ringbus is really neat, it does require a lot of additional buffers as well, including a good scheduling mechanism that needs to be aware of the data required plenty in advance. Switching to a different thread is no solution either, because that requires a data refresh as well.
(Broken record mode, but I just can't help it: ) While it's undeniably true that it creates neat marketing pictures for the masses, I still don't see any structural advantage in using a ring bus over another solution. But I'm always willing to reconsider when presented with reasonable arguments!

Farhan
01-May-2007, 03:15
(Broken record mode, but I just can't help it: ) While it's undeniably true that it creates neat marketing pictures for the masses, I still don't see any structural advantage in using a ring bus over another solution. But I'm always willing to reconsider when presented with reasonable arguments!
Crossbars go up exponentially in complexity as you add more channels, a ring bus is much more forgiving. IIRC in the R520 article the other thing they were saying was the reduction/distribution of hot spots due to the distributed memory controllers. A memory crossbar would certainly be a very busy part of the chip, and it's all in one spot, so i can buy that argument i think.

silent_guy
01-May-2007, 03:48
Crossbars go up exponentially in complexity as you add more channels, a ring bus is much more forgiving.
Well, it's not exponential, but n*m where n is the number of feeds and m the number of clients. Assuming we won't see devices with more than 512-bit external busses anytime soon, n is not about to go up dramatically. The same is true for m.
In addition, when you're really desperate, you can still add stages in a crossbar too. The complexity of a GPU crossbar is smaller than the stuff you can see in some big iron routers, BTW.

A crossbar can result in more buses, but they are going to have a width that supports the bandwidth of individual memory channels. A ring needs a width that supports the aggregate bandwidth of the completely chip. What is hailed by the masses as a blessing ('OMG! A 1024 bit bus!) is actually a disadvantage.

IIRC in the R520 article the other thing they were saying was the reduction/distribution of hot spots due to the distributed memory controllers. A memory crossbar would certainly be a very busy part of the chip, and it's all in one spot, so i can buy that argument i think.
Well, I don't buy it, if only because we have a real life proof that it doesn't have to be a problem. :wink:

All in all, wrt implementation, I think it's a wash.

Finally, one thing is for sure: a ring structure has multiple clients dumping different work loads onto the same transport fabric. It is extremely hard to manage the load, and predict (and avoid) freak cases that can kill your performance.

stevem
01-May-2007, 03:54
Crossbars go up exponentially in complexity as you add more channels, a ring bus is much more forgiving. IIRC in the R520 article the other thing they were saying was the reduction/distribution of hot spots due to the distributed memory controllers. A memory crossbar would certainly be a very busy part of the chip, and it's all in one spot, so i can buy that argument i think.
That's the rationale. A ring bus, esp R600 "double" ring bus, also adds latency cf x-bar which is apparently useful in scheduling for more in-flight operands.

I dunno how much I like this solution or the relatively lower uber ROP/TMU count. Proof in the pudding, I guess. Also, I think G80 isn't given enough credit for its design compromises/performance.

Colourless
01-May-2007, 04:07
Heck one of my good friends only just recently upgraded from his 9700 pro to a 8800 GTX.

And I still have a machine with a 9700 pro that gets used during LAN parties. It's a file server otherwise.

If I had to vote for greatest 3D hardware of all time in terms of either impact or longevity, that would have to go to the 3dfx Voodoo1 or the ATI R300. Granted the R300 might have been made to look better due to it's competition, but considering it's STILL useable in the vast majority of todays games speaks volumes.

Regards,
SB

The thing about the cards is while their fillrate are lower than current midrange cards, their memory bandwith is still on par. That is why the cards are still useable. The biggest thing limiting the old 9700 Pro is it's amount of memory. Only 128 MB. Of course they are still better than current low end.

Rangers
01-May-2007, 04:10
That's the rationale. A ring bus, esp R600 "double" ring bus, also adds latency cf x-bar which is apparently useful in scheduling for more in-flight operands.

I dunno how much I like this solution or the relatively lower uber ROP/TMU count. Proof in the pudding, I guess. Also, I think G80 isn't given enough credit for its design compromises/performance.


If the bad things said about the ring bus here are true, it could be tying up a lot of die area, thereby essentially forcing the low TMU/ROP counts. I mean both these IHV's work in the same die area.

As far as giving credit to G80 design, I'd pretty much make the call on that as soon as we get R600 benches, not before, personally.

How are the R600 TMU's supposedly "uber"? as you say? And how many ROPS is G80? 24?

Farhan
01-May-2007, 04:27
Well, it's not exponential, but n*m where n is the number of feeds and m the number of clients. Assuming we won't see devices with more than 512-bit external busses anytime soon, n is not about to go up dramatically. The same is true for m.
In addition, when you're really desperate, you can still add stages in a crossbar too. The complexity of a GPU crossbar is smaller than the stuff you can see in some big iron routers, BTW.

A crossbar can result in more buses, but they are going to have a width that supports the bandwidth of individual memory channels. A ring needs a width that supports the aggregate bandwidth of the completely chip. What is hailed by the masses as a blessing ('OMG! A 1024 bit bus!) is actually a disadvantage.


Well, I don't buy it, if only because we have a real life proof that it doesn't have to be a problem. :wink:

All in all, wrt implementation, I think it's a wash.

Finally, one thing is for sure: a ring structure has multiple clients dumping different work loads onto the same transport fabric. It is extremely hard to manage the load, and predict (and avoid) freak cases that can kill your performance.
Yeah i keep forgetting the n*m thing :oops:
Doesn't the wiring get crazy after a while though? After laying out large muxes, circles just look so much more appealing to me than cris-crossing wires, so that may explain my bias :-P

Pretty sure the big iron routers come in huge boxes and have entire chips dedicated to just routing? I don't think you can compare them with this.

I'm not saying one is inherently better than the other either, there are tradeoffs obviously. But i'm sure there must be more than just a marketing reason ATI decided to go for the ring bus...

ninelven
01-May-2007, 04:39
I think people are coming to that conclusion from the picture (slides)... Of course, that is just an odd way of looking at it.

From the slides, it seems that the primary bound on performance (at high resolutions) will be the ROPs (if it is indeed 16).

Edit: If you look at shading power to ROP ratio of G80 and R600, you have ~3:1 for G80 (slightly more) and ~5:1 for R600 (or somewhat lower depending on how you look at it). Interesting design decisions....

Sound_Card
01-May-2007, 04:47
I always thought that the ring bus was all about distribution and less centralization. I don't think ATi exactly claims the ring bus to be a performance advantage, but rather a more elegant layout than the crossbar set up. Just a different way of doing things.

mao5
01-May-2007, 05:03
The test are only done in 1280x1024, why not test it in 1680x1050 or 1600x1200 ? Dont think there is many harcore gamers that play games in 1280x1024 anymore.

I think the tester's monitor is using a 19'' LCD

mao5
01-May-2007, 05:07
You can get (many) a 22" WS that does 1680x1050 for well under 400 clams :wink:
Closer to 300 infact.
Only problem is they're all TN panals, but thats supposed to be the "gamer" panal.

oh, man, TN LCD panel can only show best game image at 1680x1050, so TN will stimulate the prevalence of G80 and R600

Galduta
01-May-2007, 05:16
http://www.generation-3d.com/UserImgs/imgs/ATi/R600A/3dmarkhd2900.jpg


http://www.generation-3d.com/11335-sous-3Dmarks-2006-HD-2900-XT,ac8886.htm

The 2900 in overcloking

And

http://www.generation-3d.com/La-radeon-HD-2600-XT-pulverise-la-X1950-XTX,ac8885.htm


La radeon HD 2600 XT nouvelle carte milieu de gamme du groupe ATi/AMD serait de 25 à 140 % plus performante que la dernière carte très haut de gamme d'ATi la X1950 XTX.

The Radeon HD 2600 XT have 25 - 140 % more performance that 1950 XTX , maybe is the 2900 of Daylitech :mrgreen:

turtle
01-May-2007, 05:29
Judging by people with overclocked GTX's at 700+/2400+ that score a little lower in SM2/SM3, it looks slightly better than what a 8800U will be able to achieve ...but it is only 3dmark, and it's only about ~10%.

DemoCoder
01-May-2007, 05:32
As far as giving credit to G80 design, I'd pretty much make the call on that as soon as we get R600 benches, not before, personally.



I don't think it comes down to benchmarks alone, but also performance per tranny, and margin. For example, you can criticize NVidia for reusing interpolator hardware to do special functions as well as MULs, on the other hand, there is a certain elegance to doing so, resulting in a more compact design.

From what we've been seeing of the R600 so far, it seems like a monster of a chip, and I mean monster in a not neccessarily pretty light. It may eek out a performance victory, but it seems to be doing so with lots of overhead, which doesn't neccessarily bode well for the refresh, as NV may have far more headroom on the G8x in terms of transistor budgets. This almost seems a rehash of G7x vs R5xx, only this time around, NVidia will have less of an IQ disadvantage and more of a margin advantage, with G8x being another company cash cow.

Mint always seemed to be skeptical if the R5xx design was worth it, in terms of throwing so many trannies at DB performance. Now one has to ponder if all the trannies blown on the R600 will yield adequate ROI on performance. If DX10 games don't become prevalent soon, and if R600 doesnt' spank the hell out of the G80 on them, one has to wonder if one really needs 320 SPs when 128 SPs double pumped perform almost as well. :)

Arty
01-May-2007, 05:33
The Radeon HD 2600 XT have 25 - 140 % more performance that 1950 XTX , maybe is the 2900 of Daylitech :mrgreen:
If its 5% faster (overall) than the 1950XTX, I'll eat my hat. (and ofcourse buy the 2600)

Sobek
01-May-2007, 05:37
I'm a little confused. Is it 140% as in 1.4x the speed (one x1950 + 140% additional 'performance' on top of that for the x2900), or just an additional '40%' (100% = x1950xtx, x2900 is 40% faster).

I hate percentages. Kill me.

turtle
01-May-2007, 05:41
I'm a little confused. Is it 140% as in 1.4x the speed (one x1950 + 140% additional 'performance' on top of that for the x2900), or just an additional '40%' (100% = x1950xtx, x2900 is 40% faster).

I hate percentages. Kill me.

2600.

I believe he means 1.25-2x x1950xtx +40% faster. 140% would be taken by taking R580's 48+8 shaders, and RV630's 120 shaders, and doing some division + adding the fact those shaders can be dynamic vertex or pixel. Pretty bold claim, but I guess it's possible...especially since "With made you saw the X2900XT is not as tall as a door plane as the rumours said it."

I personally am hoping for serenity to eat his hat...I want a mid-range-tide-me-over-and-later-become-physics-card. :razz:

Rangers
01-May-2007, 05:58
I don't think it comes down to benchmarks alone, but also performance per tranny, and margin. For example, you can criticize NVidia for reusing interpolator hardware to do special functions as well as MULs, on the other hand, there is a certain elegance to doing so, resulting in a more compact design.

From what we've been seeing of the R600 so far, it seems like a monster of a chip, and I mean monster in a not neccessarily pretty light. It may eek out a performance victory, but it seems to be doing so with lots of overhead, which doesn't neccessarily bode well for the refresh, as NV may have far more headroom on the G8x in terms of transistor budgets. This almost seems a rehash of G7x vs R5xx, only this time around, NVidia will have less of an IQ disadvantage and more of a margin advantage, with G8x being another company cash cow.

Mint always seemed to be skeptical if the R5xx design was worth it, in terms of throwing so many trannies at DB performance. Now one has to ponder if all the trannies blown on the R600 will yield adequate ROI on performance. If DX10 games don't become prevalent soon, and if R600 doesnt' spank the hell out of the G80 on them, one has to wonder if one really needs 320 SPs when 128 SPs double pumped perform almost as well. :)

True but, whilst R580 was ~2X the die size of G70, ATI should already be in a much better position here, if they can get performance parity (big if, we'll see from the reviews) as the trans counts and die sizes between G80-R600 are similar AFAIK.

flippin_waffles
01-May-2007, 06:02
Well, if that were the case, that would also imply that it is only 25% faster in some situations as well, ie. 1/4x. So we're back to square 1. :lol: But he said 25-140% more performance, so that would imply 1.25x - 2.4x (or something like that :) ). Sounds pretty crazy!

turtle
01-May-2007, 06:10
G80+NVIO is quite a bit bigger. Isn't it something like 484+45mm2? Granted, the spread isn't G71-R580 big.

R600 is <430mm2...something like that.

Transistor count is similar though, I believe you are correct, and that does put them in a much better position than before...and if the 2600 is any kind of performer, a great position in the midrange.

mao5
01-May-2007, 06:48
One way to look at it is that they are 6 months late. Perhaps they feel thay can catch a wider market with a different price point to help regain lost ground.

corrrect

PatrickL
01-May-2007, 07:01
2600.

I believe he means 1.25-2x x1950xtx +40% faster. 140% would be taken by taking R580's 48+8 shaders, and RV630's 120 shaders, and doing some division + adding the fact those shaders can be dynamic vertex or pixel. Pretty bold claim, but I guess it's possible...especially since

I personally am hoping for serenity to eat his hat...I want a mid-range-tide-me-over-and-later-become-physics-card. :razz:

Just ignore that 2600 thing. That site did not make any benchmark and give no source for that claim.

Silent_Buddha
01-May-2007, 07:13
I have to say while it would be a coup in the mid-range, I'm a bit sceptical of HD 2600 XT being 1.25-2.40 times the performance of the X1950XTX.

Regards,
SB

mao5
01-May-2007, 07:13
In that area in the game r600xt 63 frame (http://www.chiphell.com/attachments/month_0704/20070430_0400fa82c37352d946d3L3kMK47f6XI8.jpg) in that picture not really impressive.

E4300 and x1950pro 1280x1024 max setting 4xaa+hdr 30 frame (http://img71.imageshack.us/img71/848/testdriveunlimitedgh2.jpg), like in FEAR only ~200% faster than the x1950pro, i think the XT can't beat any of todays games the GTX. (maybe some cherry picked resolution and setting), so the xt not in pair performance wise with the gtx than some rumors/users suggest this.

Slowly i'm sure the r600 maked for dx10 and not for dx9, but dx10 games coming maybe ~5 this year, dx9 games coming ~150 this year, so i think ATi in the wrong way overall and not have any enhusiast card in they pocket, enhusiast users will cry :wink:

Still some hope left the drivers can be better, but for this we need wait for weeks.

so u totally ignored the 3xfps from 3.52GHz E6400+660/2200MHz GTX 1280x960 in Test Driver Unlimited?

silent_guy
01-May-2007, 07:45
Doesn't the wiring get crazy after a while though?
It's funny: back in college, when they taught me about scan chains as a DFT methodology, I thought the wiring overhead for this must be too much. (That was with 2 metal layers and abusing poly to route some of the wires.) :wink:

After laying out large muxes, circles just look so much more appealing to me than cris-crossing wires, so that may explain my bias.
One way to look at it, is that those large muxes are at their core nothing more than a whole lot of 1-bit 6-to-1 muxes. That's exactly how the tools look at it anyway. And 10(?) layers of metal clearly gives you quite a bit of freedom to indulge.

Pretty sure the big iron routers come in huge boxes and have entire chips dedicated to just routing? I don't think you can compare them with this.
I was talking about router chips,not boards! You'd be amazed about the crazy stuff they do.

I'm not saying one is inherently better than the other either, there are tradeoffs obviously. But i'm sure there must be more than just a marketing reason ATI decided to go for the ring bus...

Don't get me wrong: I really accept the argument that a ring has advantages wrt back-end implementation and getting data around the chip. So if you have no other choice, it's a solution that deserves to be looked at.

But the inevitable price to pay is that you give up something at the system level. And that's a bad thing. In this case, that price means more over-dimensioning to reduce freak corner cases, over-dimensioning to limit the latency penalty (2 opposite direction rings), less control over scheduling etc. On R580, that was manageable because it was only used for read return data. But if R600 also uses this to transport write data, it gets really ugly.

A crossbar doesn't have those system level disadvantages.

And since we have proof that a crossbar implementation is practically possible, it's make little sense to cheer about the presence of a ring...

Bjorn
01-May-2007, 07:51
One way to look at it is that they are 6 months late. Perhaps they feel thay can catch a wider market with a different price point to help regain lost ground.

I'm unsure about the rationale of gaining back marketshare in the 400$ price range. There's also no DX10 games on the market yet so surely there's a lot of potential buyers left to fight for, while making money at the same time.

Nvidias mid end (8600) series aren't exceptional performers either so if they're confident that they got the best architecture then surely they would expect to gain some significant marketshare in the low-mid end.

AnarchX
01-May-2007, 07:55
http://www.generation-3d.com/UserImgs/imgs/ATi/R600A/3dmarkhd2900.jpg


http://www.generation-3d.com/11335-sous-3Dmarks-2006-HD-2900-XT,ac8886.htm

The 2900 in overcloking



Dry ice? The clocks had to be extrem high to get such score with this slow cpu. Or it is a CF setup?


@ HD 2600XT: This card is supposed to be on X1950Pro performance level. This is also match with the specifiactions (4ROPs/8TMUs/24ALUs@800MHz +GDDR4@128Bit).

fellix
01-May-2007, 08:06
R520 was a kind of abortion child from ATi. I mean - great DB performance (look ma' 4x4 batch size here) but highly inadequate shader processing capacity, as a huge part of the transistor budget was spent to the "manager", damaging the working force. After all, it was not easy to beat the competing 24 dual-MADD pipes (R580 just corrected that).
As for the ring-bus -- it's about optimizing the internal [circuit] wiring, not a sole performance design. Crossbar, being more simple approach, has its low-latency advantage.

Rangers
01-May-2007, 08:10
R520 was a kind of abortion child from ATi. I mean - great DB performance (look ma' 4x4 batch size here) but highly inadequate shader processing capacity, as a huge part of the transistor budget was spent to the "manager", damaging the working force. After all, it was not easy to beat the competing 24 dual-MADD pipes (R580 just corrected that).
As for the ring-bus -- it's about optimizing the internal [circuit] wiring, not a sole performance design. Crossbar, being more simple approach, has its low-latency advantage.

R520 was a great part and very fast compared to the competition at the time. Of course, it was very late.

Not sure why you think it was a failure performance wise.

fellix
01-May-2007, 08:32
I just hate this chip -- too much trade offs, viewed as a complete design.

Mintmaster
01-May-2007, 08:38
Note the rates are for 1 iteration.

Also note I left R600 at 800MHz. Scale it down for whatever you think the real clock of XTX (or the tested XT) will be. And scale G80 up for whatever you think Ultra will be...

JawedThat's exactly what I was saying. Multiply those rates by 6 instructions per iteration and you get 156.3 GInstr/s for R600XT and 105.6 GInst/s for G80. Tertsi's numbers were ~100 GInstr/s for R600XT and ~40 GInstr/s for G80.

Either something's not right in the measurement or the compilers aren't doing their job. Can't see any other reason for the disparity.

vertex_shader
01-May-2007, 08:42
so u totally ignored the 3xfps from 3.52GHz E6400+660/2200MHz GTX 1280x960 in Test Driver Unlimited?

That picture from another area, the game has very unbalanced frame rates, what driver the guy using with the XT?:smile:

Farhan
01-May-2007, 08:56
It's funny: back in college, when they taught me about scan chains as a DFT methodology, I thought the wiring overhead for this must be too much. (That was with 2 metal layers and abusing poly to route some of the wires.) :wink:
Hehe, i was allowed 4 metal layers. :grin:


One way to look at it, is that those large muxes are at their core nothing more than a whole lot of 1-bit 6-to-1 muxes. That's exactly how the tools look at it anyway. And 10(?) layers of metal clearly gives you quite a bit of freedom to indulge.
This is true. Thank god for place and route tools! On the other hand, you can't use ALL those metal layers (usually top 2 reserved for power/ground IIRC?), then you have your clock distribution and what not. Also the higher metal layers are much less dense than the lower ones... I really would like to see a layout of a massive xbar though!


Don't get me wrong: I really accept the argument that a ring has advantages wrt back-end implementation and getting data around the chip. So if you have no other choice, it's a solution that deserves to be looked at.

But the inevitable price to pay is that you give up something at the system level. And that's a bad thing. In this case, that price means more over-dimensioning to reduce freak corner cases, over-dimensioning to limit the latency penalty (2 opposite direction rings), less control over scheduling etc. On R580, that was manageable because it was only used for read return data. But if R600 also uses this to transport write data, it gets really ugly.

A crossbar doesn't have those system level disadvantages.

And since we have proof that a crossbar implementation is practically possible, it's make little sense to cheer about the presence of a ring...
Hm... i'm not sure i'm seeing what disadvantages there are besides the overdimensioning. Wouldn't you just dump your data onto the stop closest to you and it will just go to its required stop? All the scheduling and whatnot should happen individually on each of the stops, no? Since the memory controllers are going to be on the stops i think? What am i missing here?

Rangers
01-May-2007, 10:02
One day left..we need a leaked whole review to liven things up...

Skinner
01-May-2007, 10:35
One day left..we need a leaked whole review to liven things up...

It's not may 14 st?

bigtabs
01-May-2007, 10:46
I don't think so. All the May 2nd rumors seem to be sourced from that one VR-Zone piece. At least the ones I've read anyway. I'd be happy if it were right though!

neliz
01-May-2007, 11:24
Fudo just said it'll be may 14th again for the XT launch, nothing before that.

I think nobody is expecting an AMD launch tomorrow anyway.

Rangers
01-May-2007, 11:29
I see the FUD piece now. Bummer. Hope it's not true.

CJ
01-May-2007, 11:29
Hey, stop stealing my ideas!!! I have a bet running about the price tag of $399 already!



Already getting cold feet about your bet? ;)

Taken from the reviewers guide:

Highlight: DirectX® 10 Ready, Avivo™ HD, UVD (full hardware decoding of Blu-ray and HD DVD), Built-in HDMI and 5.1 surround audio, Free “Black Box” game bundle, killing price/performance ratio over 8800GTS/GTX.

:cool:

trinibwoy
01-May-2007, 11:35
Heh, I'm sure they wish that was "killing performance over 8800GTS/GTX". It'll be interesting to see Nvidia's response - a 2900XT at $400 will be mighty attractive.

fellix
01-May-2007, 11:45
It'll be interesting to see Nvidia's response - a 2900XT at $400 will be mighty attractive.
If they can keep up with the demand. :wink:

trinibwoy
01-May-2007, 12:01
True, street price should have some upward pressure for the first few weeks after launch.

dizietsma
01-May-2007, 12:06
I hope it is a price/performance killer so forcing nvidia to react by reducing GTX and GTS prices. We need the 8800GTS 320MB to be forced into the mid mainstream to replace the rather dull 8600!

fellix
01-May-2007, 12:13
Some SKUs just can't drop below certain price point, I think.
Unless your company is called Sony. :lol:

Sound_Card
01-May-2007, 13:26
I'm a little confused. Is it 140% as in 1.4x the speed (one x1950 + 140% additional 'performance' on top of that for the x2900), or just an additional '40%' (100% = x1950xtx, x2900 is 40% faster).

I hate percentages. Kill me.

140% = 2.4x faster not 1.4x.:razz:

Lux_
01-May-2007, 13:38
First, it's "25 - 140% faster". So, the 140% case is probably some corner case.
Second, these percentages are PR-percentages, therefore I'd say these have the margin of error about 10% ;).

"25%" - in games, "140%" in HD video decode CPU utilization perhaps.

dnavas
01-May-2007, 14:02
I just hate this chip -- too much trade offs, viewed as a complete design.

Does no one remember "128 scalar? But that's less than G7's 24 * 4 * 2!!" ?

Obviously, I wished that the thing would come out and stomp the G80, but I'm just as interested in reading about the tradeoffs, and why they were made. There are likely to be interesting wins here.

-Dave

tEd
01-May-2007, 14:03
HL2 16x CFAA

http://img389.imageshack.us/my.php?image=20070501d14724f5c5bff91et2.jpg

http://img168.imageshack.us/my.php?image=20070501d05322923e0d643fd2.jpg

trinibwoy
01-May-2007, 14:08
Not sure if I think I'm seeing it cause I kinda expect it but does the ground texture in those two shots look different?

Robin B
01-May-2007, 14:09
HL2 16x CFAA

http://img389.imageshack.us/my.php?image=20070501d14724f5c5bff91et2.jpg

http://img168.imageshack.us/my.php?image=20070501d05322923e0d643fd2.jpg

Something tells me that this card was not ment for high res.:wink:

INKster
01-May-2007, 14:10
Not sure if I think I'm seeing it cause I kinda expect it but does the ground texture in those two shots look different?

I was thinking the exact same thing.
It actually looks sharper in the No AA example.

Razor1
01-May-2007, 14:13
blur effect of CFAA?

NocturnDragon
01-May-2007, 14:14
blur effect of CFAA?

Yep, that's probably why "24x" has a edge detect resolve filter

neliz
01-May-2007, 14:17
I was thinking the exact same thing.
It actually looks sharper in the No AA example.

So do the glove textures..

anyone recognize the level? or is this the part where you first get the rocket launcher in hl2