3D Technology & Architecture

Frank

Certified not a majority
Veteran
Ok, so R600 sucks. Get over it. Or start asking questions.

Why does it suck? And why doesn't G80? And in what scenarios can R600 suck a lot less, or even trounce G80 seriously? And what makes G80 really shine?

And there are lots of other things to consider as well. Like, IQ. It is user-programmable on both, if wanted. Do we want wide-tent filters for each pixel? Is there something better? Is there something that looks at least as well and is cheaper?

I mean, that's what this forum is all about.

And who is to say that we cannot have an impact? Because the guys at ATi are so much more clever than all of us? Many of the very knowledgeable people in the industry are around here.

Think about Humus' Doom3 tweak. If a game sucks on R600, open up the shaders in an editor, and start hacking!
 
Last edited by a moderator:
Many of the very knowledgeable people in the industry are around here.

What I like about B3d, and especially this section, is that you can always learn something new and you can keep up with the latest advancements in 3d tech easily. Honestly, before I came to this site, I didn't even know what a PS is.
 
Ok, so R600 sucks. Get over it. Or start asking questions.

Why does it suck? And why doesn't G80? And in what scenarios can R600 suck a lot less, or even trounce G80 seriously? And what makes G80 really shine?

If ATI had managed current leakage it would really rock.

Something must be wrong either with their design, the 80nm process, or a combination of the two.
 
And in what scenarios can R600 suck a lot less, or even trounce G80 seriously?

The R600 spanks the G80 seriously in the geometry shader. The GS is essentially useless on G80 as performance drops exponentially as you increase workload. We've measured R600 to be up to 50 times faster in some cases.
 
The R600 spanks the G80 seriously in the geometry shader. The GS is essentially useless on G80 as performance drops exponentially as you increase workload. We've measured R600 to be up to 50 times faster in some cases.

Are these practical use cases? The DX10 SDK samples seem to show R600 being up to 2x faster than the GTS in GS usage scenarios but losing in others.
 
50x is in the worst case scenario for G80. The G80 is sort of saved by the upper limited of the GS output limit. If you could output more than 1024 scalars chances are the gap would be even bigger. Essentially the deal is that if you do things very DX9 style, the G80 can keep up, but for DX10 style rendering, like if you output more than just the input primitive, performance starts to drop off at an amazing rate. You don't need much amplification for it to become really bad. In real world cases it might not be 50x, but you'll probably see much larger deltas than you're used to. Like you'd see maybe 2-3x gap in dynamic branching in the previous generation in the best case, but for the GS you'll probably find real world cases where the delta is more like 5-10x.
 
Cool, it'll be interesting to see how that develops and whether R600's GS advantage ever comes in to play and overshadows its relative shortcomings. Based on what Rys and Andy mentioned about G80's behavior it seems Nvidia's handling of GS threads is pretty primitive.
 
50x is in the worst case scenario for G80. The G80 is sort of saved by the upper limited of the GS output limit. If you could output more than 1024 scalars chances are the gap would be even bigger. Essentially the deal is that if you do things very DX9 style, the G80 can keep up, but for DX10 style rendering, like if you output more than just the input primitive, performance starts to drop off at an amazing rate. You don't need much amplification for it to become really bad. In real world cases it might not be 50x, but you'll probably see much larger deltas than you're used to. Like you'd see maybe 2-3x gap in dynamic branching in the previous generation in the best case, but for the GS you'll probably find real world cases where the delta is more like 5-10x.
Yeah the R600 is still 2 time slower in branching and 5-10 faster in GS. But does it really matter if it's on 1-10% of the workload of a screen?
 
If the delta is as large as Humus claims, it`ll be a miracle to have heavy GS loads for this generation. Not to mention that we don`t know what we`re multiplying by 50. Being 50 times faster than a no-legged dog doesn`t necessarily make you Ben Johnson.
 
Im more interested in what this says about Xenos's performance. No-one has ever been able to benchmark Xenos and we have simply assumed it must be really fast due to how great its architecture looks on paper.

But R600 has an even nicer architecture on paper and quite a bit more raw power yet seems to be performing pretty poorly considering all that

Does that mean Xenos may also be a fair bit slower than its paper specs suggest?
 
The R600 spanks the G80 seriously in the geometry shader. The GS is essentially useless on G80 as performance drops exponentially as you increase workload. We've measured R600 to be up to 50 times faster in some cases.
Aha my suspicions were correct :)

That said I think G80 is a bit bad due to drivers right now, which I expect to improve. I mentioned this in the other thread, but even making a "identity" geometry shader with a max vertex count of 3 halves the performance of a simple shader right now... there's really no excuse for that as far as I can tell on the hardware side, so I'm sure more work can be done in the software.

I'm glad R600 has a high-performance geometry shader, but I guess only time will tell how much use it gets. GS certainly has its place but honestly I'm skeptical about the current "transform differently and CLONE!" examples... while some have merit, others could easily be handled with multiple rendering passes arguably more efficiently (especially with the significantly reduced CPU overhead of rendering in D3D10).

I'm certainly waiting for good geometry shader demos though :)
 
Yeah the R600 is still 2 time slower in branching and 5-10 faster in GS. But does it really matter if it's on 1-10% of the workload of a screen?
Do you really think there would be a reason to use GS if it's only 1% of a scene?

Not taking this into account won't lead anywhere, there is absolutely no reason to use GS if it take up even up to 10% of the scene as the goal is primarily to replace textures by triangles in order to increase realism.
 
There are more uses to the GS. And replacing textures with triangles to increase realism...maybe, somewhere around a helluva lot of time in the future when we get to be able to handle sub-pixel pollies nicely and thus create everything with geometry instead of texturing. Not quite there yet.
 
50x is in the worst case scenario for G80. The G80 is sort of saved by the upper limited of the GS output limit. If you could output more than 1024 scalars chances are the gap would be even bigger.
Hmm, so NVidia asked for the cap of 1024 fp32s on the output of GS :?: What about the 4 streams limit in streamout (instead of 8 streams, to make it symmetrical with vertex input), I'm wondering if they asked for that too :?:

Hmm...

Jawed
 
GS certainly has its place but honestly I'm skeptical about the current "transform differently and CLONE!" examples... while some have merit, others could easily be handled with multiple rendering passes arguably more efficiently (especially with the significantly reduced CPU overhead of rendering in D3D10).
There's certainly an argument for saying that the ROPs sit idle if several rendering passes iterate through GS/streamout - but at the same time, there are surely plenty of uses for GS alone (without streamout) as part of a single pass: VS->GS->PS.

And, as for idling ROPs during GS/streamout, I presume that once D3D gets multiple concurrent contexts, this will become a limitation of the past. In theory this stuff all works in Xenos.

Jawed
 
How many games (or any apps) using GS will be launched in next 12 months?
Last I heard AMD won't name even a single one after being directly asked.
Until such app is released, GS hardware in R600 is just bunch of useless transistors.
 
What is likely to happen is that devs won't use the feature much, since they have a 7 month old arch they have been working with already and tanks with GS. Add to that the TWIMTBP program and you have a formula for success on nVidia's camp.

I would love to be wrong though.
 
Well, it would be great for games that use Novodex or Havok, if those would supply extended effects that make use of them. Half the reason you would want an Aegia card would be for that.
 
Back
Top