Beyond3D Forum

Beyond3D Forum (http://forum.beyond3d.com/index.php)
-   Pre-release GPU Speculation (http://forum.beyond3d.com/forumdisplay.php?f=51)
-   -   The NEXT LAST R600 Rumours & Speculation Thread (http://forum.beyond3d.com/showthread.php?t=39173)

Kaotik 10-May-2007 00:07

Quote:

Originally Posted by digitalwanderer (Post 984035)
Did they mention what drivers they used in the review?

if you mean the it-review thing, no, they didn't.
they also "forgot" to mention nv drivers, chipset drivers, driver settings besides aa/af "level", operating system etc etc.

digitalwanderer 10-May-2007 00:08

I can forgive 'em that, but without knowing the drivers used I can't tell spit about performance. :???:

leoneazzurro 10-May-2007 00:10

Quote:

Originally Posted by Julidz (Post 984034)
hmm , so it isn't comparable


just one more question

R'600s architeture is Vec5 (vec4 + 1 scalar) ok ?



so , it can do a vec4 + vec1 instruction per clock ?

It's not really simply "vec5", but yes, it can do vec 4 + scalar

nAo 10-May-2007 00:10

Quote:

Originally Posted by leoneazzurro (Post 984033)
Yes, but I already said G80 has huge advantage in filtering. But when unfiltered data are requested R600 has more units than G80.

maybe they decided GPGPU is more important than games for them
Quote:

PS: I saw a slide in which they claimed 10x filtering improvement on R580... seems really strange to me.
r580 could only filter FP16 and Fp32 textures using shaders..wouldn't be suprised if they're comparing thei new hw implementation to the old software implementation.

INKster 10-May-2007 00:11

Quote:

Originally Posted by Topman (Post 984021)
Raptor from techzone.pt has tested a DX10 demo (CoJ?) in HD 2900XT... < 15 fps in 1024 x 768... 8.37 drivers...

Time to forget high-end cards from ATi...


I think HD 2600XT will be a good card for your segment...

:)

He's not getting optimal results yet due to driver issues and problems regarding the somewhat "hacked" 8pin auxiliary power plug. ;)
BTW, it's techzonept.com, not techzone.pt. :D

SugarCoat 10-May-2007 00:13

Quote:

Originally Posted by digitalwanderer (Post 984037)
I can forgive 'em that, but without knowing the drivers used I can't tell spit about performance. :???:

Launch drivers are just as useless in my opinion. I dont call a battle won/lost until its at least 6-8 months after a launch, especially with a new architecture. The R520 was pretty weathered (badly delayed and released already for 3-4 months if i remember right) when it got its app specific OpenGL improvements. Personally if it does indeed perform as bad as it does in this many titles at launch im blaming driver immaturity.

Kaotik 10-May-2007 00:13

Quote:

Originally Posted by INKster (Post 984040)
He's not getting optimal results yet due to driver issues and problems regarding the somewhat "hacked" 8pin auxiliary power plug. ;)
BTW, it's techzonept.com, not techzone.pt. :D

6pin vs 8pin powerplug won't change your FPS at all, the card runs at default and even OC's with 2x6pin just fine, just overdrive won't work with 2x6pin

Bob 10-May-2007 00:16

Quote:

Yes, but I already said G80 has huge advantage in filtering. But when unfiltered data are requested R600 has more units than G80. And it has also 20% more clock, so this also shorten the distance in filtering.
Well, if you feel like using property A on GPU X with property B on GPU Y gives you a reasonable basis for comparison, more power to you!

Hint: Check your units.

leoneazzurro 10-May-2007 00:16

Quote:

Originally Posted by nAo (Post 984039)
maybe they decided GPGPU is more important than games for them

I don't think so, it's still a small niche market

Quote:

Originally Posted by nAo (Post 984039)
r580 could only filter FP16 and Fp32 textures using shaders..wouldn't be suprised if they're comparing thei new hw implementation to the old software implementation.

What if in R600 they could use both HW and SW shader filtering?
Anyway, this makes me wonder... if R580 is SO texture limited, then the whole argument of most of current games being texture limited falls... because R580 performs not so bad even comparing to a 8800 GTS. And R600 has WAY more power than R580...

Julidz 10-May-2007 00:17

Quote:

Originally Posted by leoneazzurro (Post 984038)
It's not really simply "vec5", but yes, it can do vec 4 + scalar

that's because i've read in the inquirer that it's not vec4 + scalar , but superscalar vec5...what the hell ?

leoneazzurro 10-May-2007 00:19

Quote:

Originally Posted by Bob (Post 984043)
Well, if you feel like comparing property A on GPU X with property B on GPU Y gives you a reasonable basis for comparison, more power to you!

Hint: Check your units.

Sorry, I don't understand what you are saying.
Are you telling me G80 has 256 of something similar to R600 FP32 samplers?
Are you suggesting that 64 of R600 units can perform only fetch to filtering units so they are not available to fetching unfiltered samples?
What I know by reading is that G80 has 64 texture fetch and 64 texture filtering units. I can be wrong.

PS: I'm not an expert, I'm only a person wanting to learn :)

PS2: I understand now what your're saying, but maybe it was only me that could not explain well :)
I tried to say that having less filtering power (less filtering units) per clock, but higher clocks, R600 filtering power is less far to G80's than the mere unit comparison can say.
And it was not tied to what I was saying about fetch units.
Conclusion: if R600 has half of filtering power of G80, has also 20% more clock, so theoretical number peak power in this case is 60% than G80
R600 has more fetch units and 20% more clock, so in theory it should perform better in this regards, except if there are dependencies

digitalwanderer 10-May-2007 00:21

Quote:

Originally Posted by SugarCoat (Post 984041)
Launch drivers are just as useless in my opinion. I dont call a battle won/lost until its at least 6-8 months after a launch, especially with a new architecture. The R520 was pretty weathered (badly delayed and released already for 3-4 months if i remember right) when it got its app specific OpenGL improvements. Personally if it does indeed perform as bad as it does in this many titles at launch im blaming driver immaturity.

I agree/disagree with you. I think you have to wait 3-4 months to until it gets fully optimized, but I think launch drivers are real damned important because that's what the most people are going to look at.

Bad launch drivers = bad launch, IMHO.

Geeforcer 10-May-2007 00:26

I am with Digi. You can only get away with bad launch drivers if your hardware is otherwise absolutely superior. People who have been waiting for upgrade are not going to wait ANOTHER 3-4 months to see how the drivers shake out. 6-8 month is crazy - at this point the replacement cards will come in.

INKster 10-May-2007 00:27

Quote:

Originally Posted by Kaotik (Post 984042)
6pin vs 8pin powerplug won't change your FPS at all, the card runs at default and even OC's with 2x6pin just fine, just overdrive won't work with 2x6pin

Precisely.

Geo 10-May-2007 00:30

Quote:

Originally Posted by digitalwanderer (Post 984047)
Bad launch drivers = bad launch, IMHO.

<kaff> Radeon 8500 <kaff>

Nvidia thinks they got turned over a slow spit for G80 Vista drivers, but that'd been nothing compared to the howling they'd have faced if R600 had launched around the same time with working Vista drivers.

digitalwanderer 10-May-2007 00:31

Quote:

Originally Posted by Geo (Post 984054)
<kaff> Radeon 8500 <kaff>

Ah, but you forget I'm one of the minority who think that was the last of the awful ATi cards and not the start of the good ones. ;)

I never cared for the 8500. :razz:

Geo 10-May-2007 00:33

Neither here nor there --just a gruesome example of how icky drivers on launch can break your heart (and damage your part's reputation).

Silent_Buddha 10-May-2007 00:39

Quote:

Originally Posted by SugarCoat (Post 984031)
the amount of reviewers that do CPU reviews at real world high res high quality settings is almost zero, the reason is because all the bars in their pretty graphs are within single digit frames of eachother so there is no point to it.

One of the more recent examples i can remember of how useless CPU speed is at high res gaming on a new game was in Oblivion and Driverheavens' 8800 launch review. They did the test at 1920x1200 with 4AA/16AF at 2.66GHz and then at 3.6GHz, ~40% improvement in clock speed, the result was a whopping 0-2% without HDR and a MAX of 5% improvement with HDR, this is from a 40% increase in clock speed. Therefore coming to the conclusion that a top end AMD processor is going to cause an automatic 15-25% performance loss in all new games (or at least the ones in question in the review) with simuliar settings or even more taxing ones, is quite simply ridiculous.

The most i could grant you is the 1280x1024 benchmarks might get a noticeable (5-15%) improvement on a 6700, but as i said, the other tests are fine and the trend is pretty much always there, so there is nothing horribly flawed with the results as far as what CPUs were used goes.

Although there are some instances where games are also heavily CPU limited even at extremely high resolutions and graphics quality. Especially if they do any sorts of physics processing (cloth physics, particle physics, collision physics, etc)

I'm not sure about the games that were tested, but games such as EQ2 and Vanguard show huge differences in framerate even at 1920x1200 or 2560x1600 res.

Especially in EQ2, you'll get much more performance at high resolutions by upgrading your CPU than you will by upgrading your Graphics card.

Granted, neither of those two were benched. However, my point is that any game that uses lots of physics calculations will be both CPU and GPU bound even at very high resolutions. And Core 2 Duo I would expect to have much better performance with regards to physics calculations than an AMD X2.

Benching with different CPU's on different graphics card is just shoddy, lazy, and in extreme cases biased when you are trying to only compare the graphics cards.

Regards,
SB

Geeforcer 10-May-2007 00:41

Quote:

Originally Posted by Geo (Post 984054)
<kaff> Radeon 8500 <kaff>

Nvidia thinks they got turned over a slow spit for G80 Vista drivers, but that'd been nothing compared to the howling they'd have faced if R600 had launched around the same time with working Vista drivers.

I will say this: R600 drivers better delivery-room clean after all the "ggpwnd" statements they've made.

Luminescent 10-May-2007 01:08

Any word on whether R600s texture filtering and addressing arrays are globally available to all units for fetch and filter or are they limited to certain ALU groups ala G80 and R580?

Jawed 10-May-2007 02:06

Quote:

Originally Posted by Mintmaster (Post 983958)
Yet AGAIN you're missing the point. Nobody ever said ATI can't do it. DemoCoder simply asserted that it's more important to be good at it now.

I disagree, because if you treat R600 as a vec4+scalar architecture and feed it the same code as for R300 (vector or pixel shader), the throughput will be no worse.

Actually the DX8 modifiers seem like they could be a sore point in R600. What's the betting that, at the very least, source modifiers have to be issued as a distinct instruction? Pretty certain I'd say. So R600 will be slowed down compared with R300. Even the best compiler in the world can't make R600 fully overcome this deficit since R600's not wide enough for all combinations. But R300 has a pipeline hazard related to the DX8 modifier which means that R600 will claw back its loss in situations where R300 had to issue a NOP on the main ALU's prior clock.

The difference with R600 is that it's capable of running at higher instruction throughput and should average a higher percentage of its theoretical peak FLOPs. So the compiler writers have got something to get their teeth into.

It's important for maximum performance that the compiler is good. The difference is that this pipeline has a higher baseline to work from, even with a compiler that can do no more than issue a vec4/vec3 and (optionally) issue an alpha channel instruction, or issue a special function. Compared with R300, R600 has less corner cases. Well, at least on the surface ahead of NDA, anyway.

The way I see it, the dumbest compiler will get more out of the R600 pipeline than the same dumb compiler on R300. There are some corner cases centred on the DX8 modifier mini-ALU - see the CTM documentation if you feel like enumerating them. The example code snippet I gave last night is one of them, actually...

But, in conclusion, I think the stark simplicity of R600's ALU pipeline, when presented with vec3/vec4 + scalar code, makes it extremely hard to argue that "it will run worse than R300 unless the compiler is significantly better than R300's." At the same time I agree, R600 will benefit from a tricksy compiler, and those tricksy bits are new, uncharted, territory. When done right it'll show R300 a clean pair of heels.

Quote:

I still don't agree with this unless you're talking about latency. Changing ops each clock isn't hard as long as you don't need the result immediately.
Well, I think the latency/pipeline-turnaround issue is a big deal.

Quote:

Nearly every pipelined processor in the world, regardless of how simple, does this. You don't save much space by increasing this to more clocks.
It's interesting to note that CPUs with simultaneous multithreading tend to go with just 2 threads (batches, effectively). It really isn't easy to just keep ratcheting-up the number of hardware threads your pipeline will support. For each extra hardware thread you want to support, you have to correspondingly speed-up your "search" across available threads to identify what's issuable.

Quote:

I think one of the reasons we're having difficulty communicating is differing terminology when you say "batches in flight".
I was careful to specify this at the beginning (sorry, that's a few hundred posts ago now), to mean merely the batch "at the end of the pipe" (or at any single position marked off in the pipeline, effectively) - because the actual number of batches in flight is subject to the pipeline length. Something we don't always know. But we know that both Xenos and R5xx use an 8-clock pipeline, with 2 batches each running for 4 clocks. We just don't know what R600 is and I tried to keep away from that issue.

Quote:

Before G80 came it was 512 for R5xx and 6 (albeit enormous ones) for G70. Now you're talking about batches in the immediate vicinity of the ALU arrays, ignoring all the other batches in the pipeline.
Sorry about the confusion, it gets tedious to qualify terminology every time. It might be better to use the CPU terminology "hardware thread" I guess.

Quote:

Simple cycling between batches for predictable ALU instructions isn't hard, and doesn't add measurably to the sequencer complexity. The tough task is managing the many threads in flight that are waiting for texture fetch results. I don't consider what you're talking about to be sequencer complexity. I think the complexity arises from the larger data pool that each stage of the ALUs need to select from.
In order to fill your pipeline's hardware threads, you have to have hardware that's fast enough to "survey" the status of all available batches.

A batch is in one of a number of states:
  1. executing as both ALU and TU hardware threads (will wait for both to finish)
  2. executing as an ALU hardware thread (waiting for clause completion)
  3. executing as a TU hardware thread (waiting for texturing result to appear in destination register, will then enter state 4)
  4. waiting to be issued to ALU
  5. waiting to be issued to TU
  6. waiting for instruction cache page-in
  7. [other stuff I can't think of right now...]
TU hardware threads have indeterminate latency, but in theory ALU hardware threads are fully predictable. Erm, except for when there's dynamic branching in the clause that's issued, etc.

Quote:

G80 may have more batches in flight in this sense, but the only reason for it to have more total batches in flight is the higher texture throughput.
Well, with the suggestions of Arnold Beckenbauer and leoneazzurro

http://forum.beyond3d.com/showpost.p...postcount=4812

http://forum.beyond3d.com/showpost.p...postcount=4815

http://forum.beyond3d.com/showpost.p...postcount=4847

it looks like R600 has rather more hardware threading than I surmised last night, so that means there's prolly rather more sequencer logic in there, in order to cut the batch size.

This also means that fine-grained ALU redundancy will cost more in terms of overhead, e.g. comparing 1-in-4 with the 1-in-16.

So, ahem, R600 right now looks significantly more costly there...

Jawed

Jawed 10-May-2007 02:30

Quote:

Originally Posted by Luminescent (Post 984073)
Any word on whether R600s texture filtering and addressing arrays are globally available to all units for fetch and filter or are they limited to certain ALU groups ala G80 and R580?

I think they'll be restricted, simply because of the difficulty of writing from the TU to an arbitrary register file location.

This question only seems to apply to R600. I think RV630 and RV610 are both small enough that there's only a single shader unit. The reason I say this is that both of them have just one RBE (ROP), 4 pipes. They're just like RV530 and RV510 in this respect - just that they have beefier SIMD and TU configurations inside that single shader unit.

---

You could interpret these diagrams as scalings based upon Xenos architecture. That would require the kind of routing you describe. I don't know how to affirm or deny this...

http://pcweb.mycom.co.jp/articles/20...mages/012l.jpg

Jawed

Rangers 10-May-2007 02:48

So from the It-review, which dovetails with a lot of the other rumors we've heard, X2900 is about 90% of a 8800GTS.

This product is a utter disaster for ATI. Why would anybody even buy it over a G80 product, even a 8800GTS, when it's such a power hog?

I can see sales being next to nil for this. As I said, the it-review shows it inferior to a 8800GTS. Lets say driver polishing or a stronger review CPU can get it to 100% or 105% of 8800GTS, even so, why would you purchase it when its so hot and power hungry? Therefore the market for this card is maybe 10% of people who just really prefer to buy the ATI brand. That's about it. The other 90% of people, the vast majority of sales will go to Nvidia.

I cant believe how bad ATI has gotten..

WaltC 10-May-2007 02:52

Quote:

Originally Posted by Silent_Buddha (Post 983955)
...
I also seem to remember there was quite a bit of hullabaloo about R300 cheating in order to beat NV35, as well as lots of hand waving that the only reason R300 was faster in early release Half-Life 2 benches was that Valve was obviously purposefully not optimizing the game for NV35....

I don't recall anything remotely close to that during that period...;) What I recall most vividly about the period was that as nVidia was scrambling to design a competitive nV40, which took the majority of its design cues directly from R300, was nVidia being roundly and properly called on the carpet for literally advertising nV3x as an 8-pixel-per-clock gpu when it was later discovered to have been a 4-pixel-per-clock gpu from the start--which of course neatly explained why R300 ran away with most everything at the time, and why the performance discrepancies between the gpus mystified many of us for so long. The second best-deserved criticism of nVidia at the time that I recall was even while nVidia was bragging publicly about its "128-bit FP pipeline" (fp32) and stating flatly that ATi's R300 "96-bit pipeline" (fp24) was, quote, "not enough," unquote--nVidia was in fact configuring its drivers to run fp16 in the benchmarks where R300 was running fp24--but attempting to maintain the public illusion that nVidia was running fp32 the whole time.

This was found out, too, later on, and in fact it was also revealed that the first couple of official nV3x drivers from nVidia never even permitted fp32 operation of the gpu--but would in fact run at fp16 while reporting to the end user/customer that he was running at fp32 precision--again, even while nVidia PR was boasting about nV3x's wonderful fp32 capabilities. Of course it was obvious that such a bold and shameless sham would be found out, which it was. That also cleared up any minor performance discrepancies that emerged, as when after public pressure nVidia finally released drivers that put nV3x into fp32 mode, running at fp24 the R300 walked all over nV3x running in fp32. nV3x was also horrible at running SM2.0 code of any description compared to R300, which also explained neatly nVidia's vigorous protests about how "ATI and Microsoft were taking 3d gaming in the wrong direction."....;) It was the "wrong direction" for nVidia at the time, but it seemed to be exactly the right direction for everybody else...;) Then there were the benchmarks like Eidos' Tomb Raider benches that starkly showed how poorly nV3x was as an SM2.x gpu compared to R300, and nVidia was so incensed that the company pressured Eidos to actually can the benchmark and Eidos knuckled under to the pressure. And last but not certainly least was the hullabaloo begun by Extreme 3d and then our very own B3D, which demonstrated to the world how nVidia had deliberately compromised and cheated the 3dMark benchmark in order to produce benchmark scores that showed much better results than nV3x was actually capable of delivering. I cannot imagine that there is anybody alive who fails to clearly remember that...;)

To cap, I simply do not remember a single thing in the life of nV3x wherein ATi was accused of cheating--at least, I recall no such accusations by people who attempted even a modicum of objectivity about the situation. In terms of a comparison between nV3x and R3xx, the differences in the gpus were so great and so stark that the concept of ATi "having to cheat" to best nV3x in most every category, if not all of them, was simply required by nobody...;) Once we moved beyond the misleading falsehoods deliberately sanctioned and advanced by nVidia for nV3x, and we moved into the areas of fact in terms of what nV3x actually was as compared to R300, the notion of a cheating ATi simply wasn't required to illustrate the differences in the gpu performances.

I want to hasten to add that with nV40 nVidia turned the corner and showed that old dogs indeed can learn new tricks, and joined the 3d party that ATi started with R300, and I think we are all immeasurably better off because of it--even though it is clear to me at least that nVidia had absolutely no choice in the matter at all if it wished to remain a viable 3d gpu company for the long haul. The R3xx-nV3x saga, as sorry as it was, stands as a testament, I think, to the enduring value of competition and how it can improve everyone's lot over time. The companies that fall behind are the companies that fail to learn the new tricks that other companies teach them. It could happen to ATi as easily as it happened to nVidia back in '02, should ATi ever become so comfortable in its position that it feels it can rest on its laurels. I can imagine what a slap in the face R300 must've been in '02 to a cocksure nVidia convinced that after swallowing 3dfx whole it now reigned supreme in perpetuity in the 3d gpu marketplace--especially as nVidia failed to see--as most all of us did--just what a potent new bag of tricks ATi was bringing to the table in terms of the 9700 Pro. Indeed, I was no different, and it took some convincing for me to appreciate R300 at the time for what it was--but as the realization dawned after buying an R300--it was very easy to refrain from ever looking back.

I cannot say what R600 will in fact bring to the table in the current horse race. But my gut feeling is that it is going to be something very, very good, and probably pretty special in a number of ways. That's what I expect, anyway--and it's good to know that we have not very long at all to wait before discovering whether or not my gut feeling is on track. Competition is a wonderful thing!

Jawed 10-May-2007 03:15

Up to 5 ops!!!???
 


What are those 5 ops? Is this the direct ancestor of R600's ALU organisation or is it something rather less exciting?

This question's been bugging me for a while, so hopefully someone understands the capability of the vertex ALU pipeline in ATI SM2/3 hardware.

Jawed


All times are GMT +1. The time now is 00:53.

Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.