The NEXT LAST R600 Rumours & Speculation Thread

Discussion in 'Pre-release GPU Speculation' started by Geo, Mar 1, 2007.

Thread Status:
Not open for further replies.
  1. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,532
    Location:
    Winfield, IN USA
    Ah, but you forget I'm one of the minority who think that was the last of the awful ATi cards and not the start of the good ones. ;)

    I never cared for the 8500. :razz:
     
  2. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    Neither here nor there --just a gruesome example of how icky drivers on launch can break your heart (and damage your part's reputation).
     
  3. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,426
    Likes Received:
    10,320
    Although there are some instances where games are also heavily CPU limited even at extremely high resolutions and graphics quality. Especially if they do any sorts of physics processing (cloth physics, particle physics, collision physics, etc)

    I'm not sure about the games that were tested, but games such as EQ2 and Vanguard show huge differences in framerate even at 1920x1200 or 2560x1600 res.

    Especially in EQ2, you'll get much more performance at high resolutions by upgrading your CPU than you will by upgrading your Graphics card.

    Granted, neither of those two were benched. However, my point is that any game that uses lots of physics calculations will be both CPU and GPU bound even at very high resolutions. And Core 2 Duo I would expect to have much better performance with regards to physics calculations than an AMD X2.

    Benching with different CPU's on different graphics card is just shoddy, lazy, and in extreme cases biased when you are trying to only compare the graphics cards.

    Regards,
    SB
     
    Jawed likes this.
  4. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    I will say this: R600 drivers better delivery-room clean after all the "ggpwnd" statements they've made.
     
  5. Luminescent

    Veteran

    Joined:
    Aug 4, 2002
    Messages:
    1,036
    Likes Received:
    0
    Location:
    Miami, Fl
    Any word on whether R600s texture filtering and addressing arrays are globally available to all units for fetch and filter or are they limited to certain ALU groups ala G80 and R580?
     
    #4945 Luminescent, May 10, 2007
    Last edited by a moderator: May 10, 2007
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I disagree, because if you treat R600 as a vec4+scalar architecture and feed it the same code as for R300 (vector or pixel shader), the throughput will be no worse.

    Actually the DX8 modifiers seem like they could be a sore point in R600. What's the betting that, at the very least, source modifiers have to be issued as a distinct instruction? Pretty certain I'd say. So R600 will be slowed down compared with R300. Even the best compiler in the world can't make R600 fully overcome this deficit since R600's not wide enough for all combinations. But R300 has a pipeline hazard related to the DX8 modifier which means that R600 will claw back its loss in situations where R300 had to issue a NOP on the main ALU's prior clock.

    The difference with R600 is that it's capable of running at higher instruction throughput and should average a higher percentage of its theoretical peak FLOPs. So the compiler writers have got something to get their teeth into.

    It's important for maximum performance that the compiler is good. The difference is that this pipeline has a higher baseline to work from, even with a compiler that can do no more than issue a vec4/vec3 and (optionally) issue an alpha channel instruction, or issue a special function. Compared with R300, R600 has less corner cases. Well, at least on the surface ahead of NDA, anyway.

    The way I see it, the dumbest compiler will get more out of the R600 pipeline than the same dumb compiler on R300. There are some corner cases centred on the DX8 modifier mini-ALU - see the CTM documentation if you feel like enumerating them. The example code snippet I gave last night is one of them, actually...

    But, in conclusion, I think the stark simplicity of R600's ALU pipeline, when presented with vec3/vec4 + scalar code, makes it extremely hard to argue that "it will run worse than R300 unless the compiler is significantly better than R300's." At the same time I agree, R600 will benefit from a tricksy compiler, and those tricksy bits are new, uncharted, territory. When done right it'll show R300 a clean pair of heels.

    Well, I think the latency/pipeline-turnaround issue is a big deal.

    It's interesting to note that CPUs with simultaneous multithreading tend to go with just 2 threads (batches, effectively). It really isn't easy to just keep ratcheting-up the number of hardware threads your pipeline will support. For each extra hardware thread you want to support, you have to correspondingly speed-up your "search" across available threads to identify what's issuable.

    I was careful to specify this at the beginning (sorry, that's a few hundred posts ago now), to mean merely the batch "at the end of the pipe" (or at any single position marked off in the pipeline, effectively) - because the actual number of batches in flight is subject to the pipeline length. Something we don't always know. But we know that both Xenos and R5xx use an 8-clock pipeline, with 2 batches each running for 4 clocks. We just don't know what R600 is and I tried to keep away from that issue.

    Sorry about the confusion, it gets tedious to qualify terminology every time. It might be better to use the CPU terminology "hardware thread" I guess.

    In order to fill your pipeline's hardware threads, you have to have hardware that's fast enough to "survey" the status of all available batches.

    A batch is in one of a number of states:
    1. executing as both ALU and TU hardware threads (will wait for both to finish)
    2. executing as an ALU hardware thread (waiting for clause completion)
    3. executing as a TU hardware thread (waiting for texturing result to appear in destination register, will then enter state 4)
    4. waiting to be issued to ALU
    5. waiting to be issued to TU
    6. waiting for instruction cache page-in
    7. [other stuff I can't think of right now...]
    TU hardware threads have indeterminate latency, but in theory ALU hardware threads are fully predictable. Erm, except for when there's dynamic branching in the clause that's issued, etc.

    Well, with the suggestions of Arnold Beckenbauer and leoneazzurro

    http://forum.beyond3d.com/showpost.php?p=983673&postcount=4812

    http://forum.beyond3d.com/showpost.php?p=983679&postcount=4815

    http://forum.beyond3d.com/showpost.php?p=983799&postcount=4847

    it looks like R600 has rather more hardware threading than I surmised last night, so that means there's prolly rather more sequencer logic in there, in order to cut the batch size.

    This also means that fine-grained ALU redundancy will cost more in terms of overhead, e.g. comparing 1-in-4 with the 1-in-16.

    So, ahem, R600 right now looks significantly more costly there...

    Jawed
     
    #4946 Jawed, May 10, 2007
    Last edited by a moderator: May 10, 2007
    digitalwanderer likes this.
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I think they'll be restricted, simply because of the difficulty of writing from the TU to an arbitrary register file location.

    This question only seems to apply to R600. I think RV630 and RV610 are both small enough that there's only a single shader unit. The reason I say this is that both of them have just one RBE (ROP), 4 pipes. They're just like RV530 and RV510 in this respect - just that they have beefier SIMD and TU configurations inside that single shader unit.

    ---

    You could interpret these diagrams as scalings based upon Xenos architecture. That would require the kind of routing you describe. I don't know how to affirm or deny this...

    [​IMG]

    Jawed
     
  8. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    So from the It-review, which dovetails with a lot of the other rumors we've heard, X2900 is about 90% of a 8800GTS.

    This product is a utter disaster for ATI. Why would anybody even buy it over a G80 product, even a 8800GTS, when it's such a power hog?

    I can see sales being next to nil for this. As I said, the it-review shows it inferior to a 8800GTS. Lets say driver polishing or a stronger review CPU can get it to 100% or 105% of 8800GTS, even so, why would you purchase it when its so hot and power hungry? Therefore the market for this card is maybe 10% of people who just really prefer to buy the ATI brand. That's about it. The other 90% of people, the vast majority of sales will go to Nvidia.

    I cant believe how bad ATI has gotten..
     
  9. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile
    I don't recall anything remotely close to that during that period...;) What I recall most vividly about the period was that as nVidia was scrambling to design a competitive nV40, which took the majority of its design cues directly from R300, was nVidia being roundly and properly called on the carpet for literally advertising nV3x as an 8-pixel-per-clock gpu when it was later discovered to have been a 4-pixel-per-clock gpu from the start--which of course neatly explained why R300 ran away with most everything at the time, and why the performance discrepancies between the gpus mystified many of us for so long. The second best-deserved criticism of nVidia at the time that I recall was even while nVidia was bragging publicly about its "128-bit FP pipeline" (fp32) and stating flatly that ATi's R300 "96-bit pipeline" (fp24) was, quote, "not enough," unquote--nVidia was in fact configuring its drivers to run fp16 in the benchmarks where R300 was running fp24--but attempting to maintain the public illusion that nVidia was running fp32 the whole time.

    This was found out, too, later on, and in fact it was also revealed that the first couple of official nV3x drivers from nVidia never even permitted fp32 operation of the gpu--but would in fact run at fp16 while reporting to the end user/customer that he was running at fp32 precision--again, even while nVidia PR was boasting about nV3x's wonderful fp32 capabilities. Of course it was obvious that such a bold and shameless sham would be found out, which it was. That also cleared up any minor performance discrepancies that emerged, as when after public pressure nVidia finally released drivers that put nV3x into fp32 mode, running at fp24 the R300 walked all over nV3x running in fp32. nV3x was also horrible at running SM2.0 code of any description compared to R300, which also explained neatly nVidia's vigorous protests about how "ATI and Microsoft were taking 3d gaming in the wrong direction."....;) It was the "wrong direction" for nVidia at the time, but it seemed to be exactly the right direction for everybody else...;) Then there were the benchmarks like Eidos' Tomb Raider benches that starkly showed how poorly nV3x was as an SM2.x gpu compared to R300, and nVidia was so incensed that the company pressured Eidos to actually can the benchmark and Eidos knuckled under to the pressure. And last but not certainly least was the hullabaloo begun by Extreme 3d and then our very own B3D, which demonstrated to the world how nVidia had deliberately compromised and cheated the 3dMark benchmark in order to produce benchmark scores that showed much better results than nV3x was actually capable of delivering. I cannot imagine that there is anybody alive who fails to clearly remember that...;)

    To cap, I simply do not remember a single thing in the life of nV3x wherein ATi was accused of cheating--at least, I recall no such accusations by people who attempted even a modicum of objectivity about the situation. In terms of a comparison between nV3x and R3xx, the differences in the gpus were so great and so stark that the concept of ATi "having to cheat" to best nV3x in most every category, if not all of them, was simply required by nobody...;) Once we moved beyond the misleading falsehoods deliberately sanctioned and advanced by nVidia for nV3x, and we moved into the areas of fact in terms of what nV3x actually was as compared to R300, the notion of a cheating ATi simply wasn't required to illustrate the differences in the gpu performances.

    I want to hasten to add that with nV40 nVidia turned the corner and showed that old dogs indeed can learn new tricks, and joined the 3d party that ATi started with R300, and I think we are all immeasurably better off because of it--even though it is clear to me at least that nVidia had absolutely no choice in the matter at all if it wished to remain a viable 3d gpu company for the long haul. The R3xx-nV3x saga, as sorry as it was, stands as a testament, I think, to the enduring value of competition and how it can improve everyone's lot over time. The companies that fall behind are the companies that fail to learn the new tricks that other companies teach them. It could happen to ATi as easily as it happened to nVidia back in '02, should ATi ever become so comfortable in its position that it feels it can rest on its laurels. I can imagine what a slap in the face R300 must've been in '02 to a cocksure nVidia convinced that after swallowing 3dfx whole it now reigned supreme in perpetuity in the 3d gpu marketplace--especially as nVidia failed to see--as most all of us did--just what a potent new bag of tricks ATi was bringing to the table in terms of the 9700 Pro. Indeed, I was no different, and it took some convincing for me to appreciate R300 at the time for what it was--but as the realization dawned after buying an R300--it was very easy to refrain from ever looking back.

    I cannot say what R600 will in fact bring to the table in the current horse race. But my gut feeling is that it is going to be something very, very good, and probably pretty special in a number of ways. That's what I expect, anyway--and it's good to know that we have not very long at all to wait before discovering whether or not my gut feeling is on track. Competition is a wonderful thing!
     
    digitalwanderer likes this.
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Up to 5 ops!!!???

    [​IMG]


    What are those 5 ops? Is this the direct ancestor of R600's ALU organisation or is it something rather less exciting?

    This question's been bugging me for a while, so hopefully someone understands the capability of the vertex ALU pipeline in ATI SM2/3 hardware.​

    Jawed​
     
  11. cadaveca

    Newcomer

    Joined:
    Aug 3, 2006
    Messages:
    98
    Likes Received:
    3
    float4+float?
     
  12. Bob

    Bob
    Regular

    Joined:
    Apr 22, 2004
    Messages:
    424
    Likes Received:
    47
    Yeah, NV40 was entirely architected from the group up, developed, layed out, emulated and tested in just 9 months, instead of the more usual 2-3 years. Or maybe your perception of reality has little to do with things like "facts" (which, as I hear, have a well known NVIDIA bias).

    Your memory is faulty
     
  13. SugarCoat

    Veteran

    Joined:
    Jul 17, 2005
    Messages:
    2,091
    Likes Received:
    52
    Location:
    State of Illusionism
    Anyone who's played either or both MMOs knows that those engines perfom shoddy due to massive memory requirements and dont scale well with hardware in a normal sense compared to your FEAR or HL2. Vanguard in particular i found the average framerate badly, and i mean badly, limited by the amount of memory bandwidth available. So no offense to you, but i think you cherry picked two MMOs because they simply dont react like what most benchmarks show. Its also worth mentioning both games hardly react to SLI/CrossFire no matter the resolution and settings even though both do infact show plenty signs of GPU limitation. I disagree its mainly physics in either case as well, but more so, again, memory bottlenecking due to the design of the game itself, afterall its an MMO so theres a heck of a lot of info being passed around on the fly all the time. Oblivion has plenty of physics and hardly reacts at all to CPU speeds at very high settings as i already pointed out. Oblivion also doesnt need to render a massive area without loading where you can run in either direction for 10-15minutes without one hitch, and it doesnt need to track other player characters and their actions which puts significant strain on other components like the system memory and the HDD.

    Im not saying the processor isnt important, it is, and im sure in time as software is tailored more and more for multicore, as we see things offloaded onto other cores such as physics, we'll see significant gains, but right now pretty much all gaming engines are mainly single threaded. The result is all you have to depend on to increase speed is raw MHz and in many instances this doesnt help because the bottleneck is infact elseware for those of us gaming at high settings high resolutions.

    What they did can be considered lazy but i'd argue the bias part. The advantage they may of given the 8800 cards you and a few others are complaining about is MINIMAL at best. Literally 2-5% if they did average fps. At the highest res they did there i'd be pretty damn shocked if it wasnt under 2% for pretty much all of the tests due to the different processors.
     
    #4953 SugarCoat, May 10, 2007
    Last edited by a moderator: May 10, 2007
  14. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,532
    Location:
    Winfield, IN USA
    Is that the bit you're talking about Bob? Because if so I think you're doing a bit of revisionism, you can't compare how ATi "cheated" to how nVidia "cheated" during that time. :lol:

    Don't ya remember all the "magic drivers are coming!" talk, and the whole "REAL WORLD testing" that nVidia tried to blow smoke up our ass with back in those days? :|
     
  15. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,426
    Likes Received:
    10,320
    I would agree with you except that in my case. Going with faster memory netted next to zero improvement in EQ2, other than the initial stuttering when loading all textures for a zone or when new characters entered your field of view with armor textures that weren't already loaded into memory.

    However, overclocking the CPU or better yet just replacing the CPU with a faster version see's rather large gains in performance, especially when you enabled balanced or higher settings.

    Maybe I'll have to play Oblivion again, but number of instances that physics calculations is done in that game was absolutely minimal compared to EQ2. Especially if you have cloth simulation on with many people in capes, clothing, flying carpets, etc.

    EQ2 is quite graphically nice to look at, especially when you get into the higher quality settings. However, it is also extremely CPU bound at all settings. Going from a X800XTPE to a X1800XT netted a whopping 1-2 fps more on average. Upgrading the CPU from a X2 3800 to a X2 5000+ netted closer to 5-15 fps on average depending on scene complexity at 1920x1200 res with 4x AA/16x AF.

    My friend that just upgraded his 9700 pro to a 8800 GTX see's minimal (less than 4fps) gain in EQ2.

    I will agree however that Vanguard is CPU, GPU and memory bandwidth limited. You will get increases in FPS with all those at ALL resolutions. Brad McQuaid has already gone to say that you cannot just put in a faster GPU and expect to get better performance from the game. It is limited by virtually all components in a system.

    It's why you see two people with 8800 GTXs in game both running the same graphics settings and resolutions. Yet one complains he's only getting 15-20 FPS while the other one is saying he's getting 50-60 FPS.

    To get away from those two games. Rise of Nations - Rise of Legends also benefits much more from higher CPU speed than it does from a faster GPU.
    So it isn't just limited to those 2 games.

    Yes, I used a couple of extreme cases. But it's to illustrate that just because you are at a resolution that is more likely to be GPU bound than CPU doesn't mean the CPU can't or won't affect framerates.

    And to come to any conclusions at all where each GPU might be paired with a different CPU isn't serving any purpose.

    Regards,
    SB

    [edit - last post on this as I think this is starting to go WAAAY off topic.]
     
  16. Topman

    Newcomer

    Joined:
    Oct 16, 2006
    Messages:
    73
    Likes Received:
    5
  17. INKster

    Veteran

    Joined:
    Apr 30, 2006
    Messages:
    2,110
    Likes Received:
    30
    Location:
    Io, lava pit number 12
    The Quadro FX 5600 has double the memory (1.5GB).
    The Quadro FX 5500 is actually an old G71 with 1GB of GDDR3 RAM.
    The closest thing would be the Quadro FX 4600 (don't let the GTS-like PCB fool you, that is a full 128 sp, 384bit, 768MB GDDR3 G80 GPU).
     
  18. Sc4freak

    Newcomer

    Joined:
    Dec 28, 2004
    Messages:
    233
    Likes Received:
    2
    Location:
    Melbourne, Australia
    It's photoshopped. One of his fingers on his left hand is eating into the cooler of the card on his leg.
     
  19. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    It's not photoshopped and the review is real, is my bet.
     
  20. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,532
    Location:
    Winfield, IN USA
    That's just the locking bracket pushing into his finger a bit.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...