Beyond3D Forum

Beyond3D Forum (http://forum.beyond3d.com/index.php)
-   Pre-release GPU Speculation (http://forum.beyond3d.com/forumdisplay.php?f=51)
-   -   The NEXT LAST R600 Rumours & Speculation Thread (http://forum.beyond3d.com/showthread.php?t=39173)

Geo 08-May-2007 16:14

Might R600 finally (for an ATI part) having PCF support (as required in the DX10 spec) play in there somehow?

If the rumored pricing is correct, then it is likely very indicative. Even if it's an excellent price/performance part, history is pretty conclusive that it's unlikely to be an absolute performance king at that price. AMD has a fiduciary responsibility to their shareholders even to not give away the store. What was the last performance king that launched at the price point the rumor sites are publishing? R360? Edit: Nope, 9800 XT was $499. . . .

nicolasb 08-May-2007 16:28

Quote:

Originally Posted by INKster (Post 983257)
If the "R650 in the Summer" silly season starts in force now, AMD might have an "Osborne Effect" on their hands.
I don't think they would want to undermine the R600 family launch like that.

Much though it pains me to have to agree with Fuad Abazovich about anything, I suspect his guess of "late Q3" (i.e. well into September) is probably a good estimate for R650.

Jawed 08-May-2007 16:34

Quote:

Originally Posted by AnarchX (Post 983122)

So, A13 is still the current variation of the chip, it seems. When did this first appear? I can't remember, November? January?

Jawed

Creig 08-May-2007 16:38

Quote:

Originally Posted by Geo (Post 983275)
Might R600 finally (for an ATI part) having PCF support (as required in the DX10 spec) play in there somehow?

If the rumored pricing is correct, then it is likely very indicative. Even if it's an excellent price/performance part, history is pretty conclusive that it's unlikely to be an absolute performance king at that price. AMD has a fiduciary responsibility to their shareholders even to not give away the store. What was the last performance king that launched at the price point the rumor sites are publishing? R360? Edit: Nope, 9800 XT was $499. . . .

X1950 XTX was fairly close, wasn't it?

aeryon 08-May-2007 16:40

Quote:

Originally Posted by Geo (Post 983221)
Heh, the roadmap/codename juggling that began in 2002 finally rejoins the main line.

Btw, I'm feeling cranky today. If I see another rolly eyes sarcasm-dripping troll in this thread today there will be one or more vacations handed out that will last until a few days after the expected R600 launch. :twisted:

Sorry for your pain Geo, but like you with my post, I've started to be really annoyed by the last 20 pages of single shot framerates showing us nothing except some ATI fanatics trying to convince us R600 is sooooo good...

Hope you understand, I did not wanted to create any problem here, even if I admit my move was not the smartest one.

best regards
aeryon

CarstenS 08-May-2007 16:41

Quote:

Originally Posted by Geo (Post 983235)
Well, there are multiple possibilities here.

1). They're cheating. Get the pitchforks.
2). Their part is particularly suited to 3DM06. Possible, I suppose.
3). As some people have suggested upstream, their compiler "optimization opportunity" is pretty steep, and they're starting with high profile benchie stuff like 3DM06. . . and those kind of increases will be along for many other apps in the months to come as they get their arms around their new baby.

1) Everyone cheats. ;) In fact: The whole 3D-graphics-industry is all about cheating. ;)

2) Definitely. This and a 5-wide massive MAD-Kernel really makes an R600 shine.

3) Less likely. Though I agree, that there's still performance buried somewhere in the 20 million lines of catalyst-code, i doubt that a 10 to 20 percent boost will be the standard we're gonna see in games. I'd be more inclined to make this a 5 to 10 percent.

Geo 08-May-2007 16:41

Quote:

Originally Posted by Jawed (Post 983282)
So, A13 is still the current variation of the chip, it seems. When did this first appear? I can't remember, November? January?

What, not A15? :lol:

November is a bit early for A13 I think, at least if you mean coming back. . .

Geo 08-May-2007 16:44

Quote:

Originally Posted by Quasar (Post 983290)
1) Everyone cheats. ;) In fact: The whole 3D-graphics-industry is all about cheating. ;)


Oh, let's do that one in a different thread! Tho I understand what you mean there, re the whole mindset is how to approximate reality with as little resources as possible. I suppose since that's the mindset in the beginning, it can have some unpleasant flow through consequences that might not seem all that odd or objectionable to the folks engaged in them.

But, like I said, "good point, wrong thread". :wink:

{Sniping}Waste 08-May-2007 16:47

Denny AKA Guess2098 has a HD2900XT 1 gig GDDR4 and looks like theres no NDA on him. He has older drivers but might be able to get newer ones later. This is the latest benchs run with a OC but with the stock cooler. Stock speeds for the card he has is 750/1009. Heres the thread http://www.xtremesystems.org/forums/...d.php?t=143104
http://www.iamxtreme.net/video/r600/2900XT1024_06.PNG

http://www.iamxtreme.net/video/r600/2900XT1024_07.PNG

http://www.iamxtreme.net/video/r600/2900XT1024_08.PNG

mao5 08-May-2007 17:00

Quote:

Originally Posted by Razor1 (Post 983250)
Because look at crap Mao was posting for the past 2 weeks, he says it way better then gtx, but shows screenshots of where the screenshots arn't even close to the same, and when they are, the frame rates come out very close. Coincidence, I don't think so.

Crap Razor1, you can compare default clock 2900XT with exrtreme oc 8800 GTX, you always can, I have strong faith on your crap comparisons.

Razor1 08-May-2007 17:02

Quote:

Originally Posted by mao5 (Post 983301)
Crap Razor1, you can compare default clock 2900XT with exrtreme oc 8800 GTX, you always can, I have strong faith on your crap comparisons.


When did I do that :grin: , I don't much stock into DT's benchmarks :wink:

Razor1 08-May-2007 17:03

Quote:

Originally Posted by Geo (Post 983291)
What, not A15? :lol:

November is a bit early for A13 I think, at least if you mean coming back. . .


I think A13 came back at end of Feb.

Geeforcer 08-May-2007 17:07

After reading this thread one things is obvious: XT 2900 will score between 10 and 14K in 3Dmark06. It's all clear now!

Jawed 08-May-2007 17:18

Quote:

Originally Posted by Mintmaster (Post 983003)
I don't know how I can explain it any simpler than I did in the last post. For the last time, DC is not talking about a vector instruction on R300 vs. a vector instruction on R300. He is simply pointing out that if ADD co-issue wasn't done on R300, no big deal. If scalar co-issue isn't done on R600, it is a big deal. He's talking about extracting the max throughput possible. Shader code is more likely to have lots of scalar and vec2 code than a crapload of ADDs.

Which will run at lower utilisation on R300 than on R600 if no co-issue is possible (due to dependency). Further, for scalar or vec2 instructions, any co-issue that can be identified by the R300 compiler is going to work on R600. So, again, R600 comes out better than R300.

Quote:

ATI obviously put a lot of work and die area into making R600 a 5x1D architecture, clearly for the purpose of improving speed more than could be done by extra Vec4+1D units instead. Not getting co-issue right in the compiler for R600would be a lot more damaging (especially in comparison to G80) than with R300.
There's a lot of low-hanging fruit in co-issue for this architecture, though. As I keep saying vec3/vec4 instructions make a mockery of the suggestion that ATI would be struggling with a compiler that can co-issue in the most basic cases. And that basic case makes up a lot of code.

Unravelling dependencies and eliminating dead code amongst vector instructions is what the set of static single assignment patent applications is all about. Sure, they make DemoCoder sneer in their obviousness - but they lie at the heart of making the compiler do the non-trivial co-issues that you guys are so desperate to show are impossibly complicated and bound to make R600 fall on its own sword.

I've got no argument that there are corner cases of tightly-dependent code that will run like a dog on R600 and I'm under no illusion that co-issue is generally trivial. The devrel guys are always begging gamedevs to explicitly mask their outputs - so that should be clue enough that the compiler guys have a hard time...

Quote:

Oh, okay.

So basically you're suggeting the same rationale that I did previously. The batch size is too small for one quarter of the ALUs to all operate on the same channel like in G80.
No, my suggestion is that having a 4-way tiled architecture, they wouldn't then want to split-up each of the 4 tiles of batch scheduling with a more advanced sequencer. The sequencer from R5xx is enough (one ALU batch, one texture batch - roughly speaking) instead of using the Xenos sequencer (multiple ALU batches and one texture batch - roughly speaking).

Quote:

I still think that's the way to go. Just use a 64-pixel batch size. It won't make that much difference, and I certainly think it would be much less than from near-perfect utilization.
But now you have spent more die space on the sequencers to get the same batch size. The payback is that two consecutive and dependent scalar or vec2 instructions will run at full speed. Maybe the payback isn't worth it?

---

Last night I realised that R600's ALU organisation offers another fundamental advantage (a direct inheritance from Xenos, prolly). Since each of the four shader units contains five 16-way ALUs, ATI's fine-grained ALU-redundancy scheme works a charm. In this setup, a 17th ALU pipeline is added to each array. So the redundancy overhead is 6%. (The theory is that each of Xenos's three 16-way ALUs have a 17th pipeline for redundancy.)

If R600 was implemented as lots of smaller ALUs (e.g. to build a sequential scalar GPU like G80) then the redundancy overhead would be significantly higher.

Again, another indication of evolution...

Jawed

Jawed 08-May-2007 17:23

Quote:

Originally Posted by Geo (Post 983291)
What, not A15? :lol:

November is a bit early for A13 I think, at least if you mean coming back. . .

http://www.hexus.net/content/item.php?item=7437

Which implies A13 was just about to pop out.

If we're at A15, then, hairy muff...

Jawed

Geo 08-May-2007 17:30

Quote:

Originally Posted by Jawed (Post 983315)
http://www.hexus.net/content/item.php?item=7437

Which implies A13 was just about to pop out.

If we're at A15, then, hairy muff...

Jawed

A15 was me funnin'. Somebody or other was claiming that a few months back. :smile: I see what you see on that chip shot. I suspect the Hexus piece was closer to when the order for A13 went to the fab than when it came back. . .

chavvdarrr 08-May-2007 17:30

Quote:

Originally Posted by Jawed (Post 983312)
Last night I realised that R600's ALU organisation offers another fundamental advantage (a direct inheritance from Xenos, prolly). Since each of the four shader units contains five 16-way ALUs, ATI's fine-grained ALU-redundancy scheme works a charm. In this setup, a 17th ALU pipeline is added to each array. So the redundancy overhead is 6%. (The theory is that each of Xenos's three 16-way ALUs have a 17th pipeline for redundancy.)

If R600 was implemented as lots of smaller ALUs (e.g. to build a sequential scalar GPU like G80) then the redundancy overhead would be significantly higher.

Again, another indication of evolution...

Jawed

ehm... and what if there are 18 or 19 ALUs ? Or if the whole redundacy scheme is different...
I just don't get why a hypothetic scheme should be considered as evolution.

Jawed 08-May-2007 17:42

Quote:

Originally Posted by Arnold Beckenbauer (Post 983159)
http://www.forum-3dcenter.org/vbulle...74#post5474374
Click on "spoiler" to see all graphics.

8 SPs per SM (streaming multi-processor), 512 threads per SM???

Why is the SFU "outside"?

Hm, is it possible, that it's G80?

Clicking on the spoiler, the top diagram shows NV4x/G7x ALUs.

The set of slides that follows is about writing good code. The first slide is to encourage developers to specify output masks on their code. The masks then allow the compiler to simply co-issue.

These examples are for old ATI hardware, though they'll continue to be relevant. They appear on slides 51 and 46 of:

http://ati.amd.com/developer/Dark_Se...r_Dev-Mojo.pdf

Jawed

Jawed 08-May-2007 17:51

Quote:

Originally Posted by chavvdarrr (Post 983319)
ehm... and what if there are 18 or 19 ALUs ? Or if the whole redundacy scheme is different...
I just don't get why a hypothetic scheme should be considered as evolution.

GRAPHICS PROCESSING LOGIC WITH VARIABLE ARITHMETIC LOGIC UNIT CONTROL AND METHOD THEREFOR

Jawed

Robin B 08-May-2007 19:17

Does anyone here know if the R600 comes with a 8 Pin Pci-E adapter, think it could make or brake som sales, not many Psu have one yet.

Geo 08-May-2007 19:51

I don't know that anyone's seen a retail pack yet to say that for sure. Places like Newegg are generally pretty good about showing pictures of all the accoutrements, so checking there when available might be the way to go.

Mintmaster 08-May-2007 20:59

Quote:

Originally Posted by Jawed (Post 983312)
Which will run at lower utilisation on R300 than on R600 if no co-issue is possible (due to dependency).

Forget about it Jawed. You don't get it. This isn't about % ALU utilization on R300 vs. R600. It's about the importance of compiler co-issue for competitiveness.

The extra ADD doesn't use much die space on R300, nor does its usage make much difference in most shader code. Neither applies to R600's coissue. NVidia said somewhere they get nearly 2x speed boost by going scalar, and looking at PS benchmarks I believe them.

Quote:

But now you have spent more die space on the sequencers to get the same batch size. The payback is that two consecutive and dependent scalar or vec2 instructions will run at full speed. Maybe the payback isn't worth it?
Why do we need more sequencers? You're really twisting my Xenos example around with the whole smaller Xenos quadrupled interpretation.

Forget about that example for now. Take R600, and assume the batch size is 64 pixels. Now make the single ALU array per quarter act on one channel of all pixels instead of all channels of 16 pixels. That's pretty much the sum of it (ignoring the SF details).

----------------

HOWEVER, I realized that I forgot something. Latency requires you to cycle between batches. An ALU array in R600 takes 4 cycles to get through a batch, so a 12-cycle instruction latency, for example, requires you to have three batches in a cycle that switches every 4 clocks. My design would need 12 batches in the the cycle that switches every clock. That's a lot more data shuffling.

I guess there is indeed a legitimate reason for going the co-issue route.

Russell 08-May-2007 21:04

Quote:

Originally Posted by Robin B (Post 983379)
Does anyone here know if the R600 comes with a 8 Pin Pci-E adapter, think it could make or brake som sales, not many Psu have one yet.

When HIS's site posted the R600 product page early, I did note that there was no reference to the 8-pin adapter in the accessories list. Whether that is the case, or is the case for all manufacturers, I do not know.

w0mbat 08-May-2007 21:05

http://www.imagebanana.com/img/b11ska11/xh29.jpg

Dave Baumann 08-May-2007 21:18

Quote:

Originally Posted by Mintmaster (Post 983432)
HOWEVER, I realized that I forgot something. Latency requires you to cycle between batches. An ALU array in R600 takes 4 cycles to get through a batch, so a 12-cycle instruction latency, for example, requires you to have three batches in a cycle that switches every 4 clocks.

If you take Xenos as an example it is built around an 8 cycle ALU latency - each SIMD array has sets two sequencers/arbiters to handle and swap between two separate threads (executing over 4 cycles each).


All times are GMT +1. The time now is 04:21.

Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.