Is the RSX even based on the 7800?

Discussion in 'Console Industry' started by BenQ, Dec 11, 2005.

  1. Bad_Boy

    Bad_Boy god of war.
    Veteran

    Joined:
    Apr 15, 2004
    Messages:
    3,355
    Likes Received:
    25
    I wonder if the newest nvidia news of the 7900u? has any affect on the RSX.
    /me runs away :p
     
  2. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Um, wanna give some evidence?

    In Shadermark, it's on average about 3% faster than NV40. In Rightmark3D, in which 6 of 8 shaders are similar lighting shaders with lots of math, it's 9% faster on average.

    The best improvement per clock per pipe is 22% in the 3 light phong shader. A bit OT, but RV530 gets a 2.8x increase in that shader over RV515, just to show you how math limited it is. G70's biggest improvements are in HDR data loading and storing.

    No, the secret to their success is much more simple: More pipes. That ATI can even come close to the 512MB GTX in many of today's most shader intensive games, while being heavily deficient in texture rate, shader rate, and bandwidth should show you FLOPS ratings are meaningless between companies. The 7800 512MB has over 2.5 times the FLOPS rating of the X1800XT.
     
  3. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Wanna give some evidence too?
     
  4. Synergy34

    Regular

    Joined:
    Jul 26, 2005
    Messages:
    323
    Likes Received:
    4
    Granted the E3 one wasn;t the best of the bunch but at least it was real time and still looked Next Gen. Also, they re-did that E3 vid to and it has a way better framerate and just looks THAT much better.

    Not sure which one you've seen.

    http://media.ps3.ign.com/media/748/748483/vids_1.html E3 vid, nice clean one to, looks pretty good to me.
     
    #24 Synergy34, Dec 12, 2005
    Last edited by a moderator: Dec 12, 2005
  5. Edge

    Regular

    Joined:
    Apr 26, 2002
    Messages:
    613
    Likes Received:
    10
    You're right, just as bandwidth, clock rate, number of execution units, amount of cache, etc., have no reflection on the power of the system.

    One makes you wonder, what does determine the power of a system?

    If FLOPS are not part of it, then nothing else is also. Specs are meaningless.

    Give me a 1 MHz GPU anyday, doing just 1000 FLOPS, after all that's all the power you need.
     
    #25 Edge, Dec 12, 2005
    Last edited by a moderator: Dec 12, 2005
  6. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Pixel units work on vec3+scalar and VS units work vec4+scalar. You haven't made a differentiation, which would net you 255 32bit Gflops.
     
  7. Eleazar

    Newcomer

    Joined:
    Nov 21, 2005
    Messages:
    95
    Likes Received:
    5
    Location:
    USA
    If you would have actually read my post you would have understood what I said. I said faster. Vauxhall, Opel, and Lotus all have cars around the 200hp mark. But actually beat cars with 300hp as far as fastness goes. Meaning if they were to race the Vauxhall, Opel, and Lotus would beat them. This is due to many factors. The main one being the cars weight. Because they are so light 200 horse power is enough to make them go fast. I also believe this is base HP of the engine, not including things such as turbocharger, supercharger, or any other such addons. However you get the point. Horsepower is not the only thing that makes a car go fast, that was the whole point.

    TFLOPS whether you want to accept it or not are not indicative of real-world performance and here is some proof:
    (This is a section ripped out of one of Anandtech's articles)

    What about all those Flops?

    The one statement that we heard over and over again was that Microsoft was sold on the peak theoretical performance of the Xenon CPU. Ever since the announcement of the Xbox 360 and PS3 hardware, people have been set on comparing Microsoft's figure of 1 trillion floating point operations per second to Sony's figure of 2 trillion floating point operations per second (TFLOPs). Any AnandTech reader should know for a fact that these numbers are meaningless, but just in case you need some reasoning for why, let's look at the facts.

    First and foremost, a floating point operation can be anything; it can be adding two floating point numbers together, or it can be performing a dot product on two floating point numbers, it can even be just calculating the complement of a fp number. Anything that is executed on a FPU is fair game to be called a floating point operation.

    Secondly, both floating point power numbers refer to the whole system, CPU and GPU. Obviously a GPU's floating point processing power doesn't mean anything if you're trying to run general purpose code on it and vice versa. As we've seen from the graphics market, characterizing GPU performance in terms of generic floating point operations per second is far from the full performance story.

    Third, when a manufacturer is talking about peak floating point performance there are a few things that they aren't taking into account. Being able to process billions of operations per second depends on actually being able to have that many floating point operations to work on. That means that you have to have enough bandwidth to keep the FPUs fed, no mispredicted branches, no cache misses and the right structure of code to make sure that all of the FPUs can be fed at all times so they can execute at their peak rates. We already know that's not the case as game developers have already told us that the Xenon CPU isn't even in the same realm of performance as the Pentium 4 or Athlon 64. Not to mention that the requirements for hitting peak theoretical performance are always ridiculous; caches are only so big and thus there will come a time where a request to main memory is needed, and you can expect that request to be fulfilled in a few hundred clock cycles, where no floating point operations will be happening at all.

    So while there may be some extreme cases where the Xenon CPU can hit its peak performance, it sure isn't happening in any real world code.

    The Cell processor is no different; given that its PPE is identical to one of the PowerPC cores in Xenon, it must derive its floating point performance superiority from its array of SPEs. So what's the issue with 218 GFLOPs number (2 TFLOPs for the whole system)? Well, from what we've heard, game developers are finding that they can't use the SPEs for a lot of tasks. So in the end, it doesn't matter what peak theoretical performance of Cell's SPE array is, if those SPEs aren't being used all the time.


    Don't stare directly at the flops, you may start believing that they matter.

    Another way to look at this comparison of flops is to look at integer add latencies on the Pentium 4 vs. the Athlon 64. The Pentium 4 has two double pumped ALUs, each capable of performing two add operations per clock, that's a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium 4 can perform 15.2 billion operations per second. The Athlon 64 has three ALUs each capable of executing an add every clock; so a 2.8GHz Athlon 64 can perform 8.4 billion operations per second. By this silly console marketing logic, the Pentium 4 would be almost twice as fast as the Athlon 64, and a multi-core Pentium 4 would be faster than a multi-core Athlon 64. Any AnandTech reader should know that's hardly the case. No code is composed entirely of add instructions, and even if it were, eventually the Pentium 4 and Athlon 64 will have to go out to main memory for data, and when they do, the Athlon 64 has a much lower latency access to memory than the P4. In the end, despite what these horribly concocted numbers may lead you to believe, they say absolutely nothing about performance. The exact same situation exists with the CPUs of the next-generation consoles; don't fall for it.
     
  8. The GameMaster

    Newcomer

    Joined:
    Feb 9, 2005
    Messages:
    109
    Likes Received:
    1
    I believe I can provide that evidence for him though my numbers do not prove his statement...

    *nVidia Geforce 7800GTX (512MB version) > 24 pixel pipelines with 2 full shader units per pipeline providing 4 components per cycle each (Vec3+Scalar) and 8 vertex pipelines with 1 full shader unit per pipeline also providing 5 components per cycle each (Vec4+Scalar) and a clock rate of 550Mhz.
    *ATI Radeon 1800XT > 16 pixel pipelines with 1 full shader unit providing 4 components per cycle (Vec3+scalar) and 1 partial shader unit per pipeline providing 1 component per cycle and 8 vertex pipelines with 1 full shader unit per pipeline providing 5 components per cycle each (Vec4+scalar) and a clock rate of 625Mhz.

    Each component is 2 FLOPS.

    *nVidia Geforce 7800GTX (512MB version)
    24 pixel pipes*2 units each*8FLOPS per unit*550Mhz = 211.2GFLOPs
    8 vertex pipes*1 unit each*10FLOPs per unit*550Mhz = 44GFLOPs
    TOTAL: 255.2GFLOPs

    *ATI Radeon x1800XT
    16 pixel pipes*1 unit each*8FLOPs per unit*625Mhz = 80GFLOPs
    16 pixel pipes*1 partial unit*2FLOPs per unit*625Mhz = 20GFLOPs
    8 vertex pipes*1 unit each*10FLOPs per unit*625Mhz = 50GFLOPs
    TOTAL: 150GFLOPs

    Well... it's not quite the 2.5 times that Mintmaster indicated... but it certainly has a lot greater theoretical floating point potential than the Radeon card has and also theoretical shader performance. Truth of the matter is though the actual performance in games do not reflect this difference... so while FLOPs and shader performance does matter it is not the end-all performance metric. Just one of many smaller things to look at in conjunction with other things...
     
  9. fulcizombie

    Regular Banned

    Joined:
    Mar 14, 2005
    Messages:
    413
    Likes Received:
    14
    Pretty good,yes...something that will blow away the games(gow,too human,mass effect e.t.c) that will be available for the xbox360 at that time,no.
     
  10. Nicked

    Regular

    Joined:
    Jun 15, 2005
    Messages:
    688
    Likes Received:
    9
    Well, it was further along (framerate wise) that PDZ was at E3. So you can expect a PDZ E3 shitty version to PDZ launch good lookin' version jump. Also, the engine that I-8 is built on is going to be insane.
    Insomniac + Naughty Dog + SCEA Santa Monica? The makers of arguably this gens greatest technical engine + another talented studio means you can expect a great engine leading to a great looking game.
    I doubt I-8 is going to be anything other than mediocre gameplay/story/art-wise though.
     
  11. Phil

    Phil wipEout bastard
    Veteran

    Joined:
    Nov 19, 2002
    Messages:
    4,786
    Likes Received:
    377
    Location:
    127.0.0.1
    While I agree that floating-point performance is not the end-all performance metric, your example of PC games doesn't take into account that the software you're using as benchmark in this case, don't utilize the cards full potential and are limited by many other factors as well. We all expect a closed-box to be much more utilized and that also includes using all dedicated performance available since your software can afford to target a single entity.
     
  12. SubD

    Newcomer

    Joined:
    Sep 18, 2005
    Messages:
    60
    Likes Received:
    2
    And which specific PS3 developers are you referring to?
     
  13. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Yep and they're madd capable.

    Nope. They can do vec3+scalar too, but not madd capable, i.e. 3+1 Flops/cycle.

    Yep.

    Yep.

    Nope. The 2nd PS ALU would be 40 GFlops, net 170 GFlops.

    GTX512 has around 50% more GFlops (32bit programmable) than the 1800 XT. But nowhere near MORE than 2.5x, even with total system TFLOPs...

    Or component ops,

    GTX512 ~ 24x4x2 + 8x5 ~ 232x0.55GHz ~ 128 GigaComponent ops/sec

    1800XT ~ 16x4x2 + 8x5 ~ 160x0.625GHz ~ 105 GigaComponent ops/sec

    ...around 22% more.

    Nope. Nowhere near. Closer to 50% more for 32bit programmable flops.

    They may not reflect 50% flop difference or 22% comp. op difference, but there are games benched that approach this difference.

    Yep, it's NEVER the sole metric, and NEVER was, and NEVER will be. But it's NOT irrelevant like some people make out with sweeping generalisations, without interpreting the numbers correctly.
     
    #33 j^aws, Dec 13, 2005
    Last edited by a moderator: Dec 13, 2005
  14. The GameMaster

    Newcomer

    Joined:
    Feb 9, 2005
    Messages:
    109
    Likes Received:
    1
    Actually... the first shader unit in each pixel pipeline are in fact a partial shader unit.

    [​IMG]

    The pixel pipelines in the R520 is almost identical to the pixel pipeline in the R420 (with the exception of the texture units) in terms of shader arrangement. ATI quoted the ATI Radeon as having a capacity of 43 billion shader component operations per second with the Radeon x850XT (clock rate of 540Mhz) and the nVidia 6800 Ultra (clock rate of 400Mhz) was claimed by nVidia as having 51.2 billion shader component operations per second. Why can nVidia claim a higher theoretical shader performance even though it had lower clock speeds compared to ATI's? Because the first shader unit in each pixel pipeline in the Radeon series is a partial unit and not a full unit that provides 1 component per cycle instead of 4 components that the full shader units provided in the Radeon and Geforce GPUs. Breaking down the numbers it comes out to this...

    Full shader unit providing 4 shader components per second on the Radeon x850XT at 540Mhz provides 35.6 billion shader components per second and the partial unit in the Radeon x850XT provides 1 shader component per second and at 540Mhz provides 8.6 billion shader components per second. Combined that would be 5 shader components per pixel pipeline per cycle and at 540Mhz would be 43 billion shader components per second (the stated amount). This compared to the nVidia Geforce 6800 Ultra which had 2 full shader units that provided a total of 8 component operations per cycle and at 400Mhz would be the stated 51.2 billion shader components per second. So yes the shader numbers should look like this...

    GTX512 ~ 24x4x2 + 8x5 ~ 232x550Mhz ~ 128 billion component ops/sec
    x1800XT ~ (16x4x1)+(16x1x1) + 8x5 ~ 120x625Mhz ~ 75 billion component ops/sec
     
  15. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    The texture arrangement on R520 is actually the same as R300/R420; the representation has changed, but the layout is the same - what has changed is that there is better control of those units with a more flexible scheduler (which also means that chips like RV530 can increase the parallel ALU's while "sharing" texture units). The primary changes to the ALU's is the additional branch unit (which, this and the separate texture address processor, people seem to forget in the FLOP counting).

    BTW - your component vs instructions counting is all to cock.
     
  16. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Yep. I'm fully aware. One has madd capability and the other doesn't.

    You need to clarify this post. Are talking about shader ops/sec (aka shader instructions/sec) OR Component ops/sec?

    Just to re-iteraet, we were talking about 32 bit, programmable FLOPS.

    You need to clarify your metric. They can include mixed 16+32bit instructions/flops etc. to attain different numbers. Heck, this is what this thread discussed earlier with mixed programmable and fixed function flops. You can also have mixed 16, 32 bit flops too...

    Nope.

    See Daves article for R520,

    http://www.beyond3d.com/reviews/ati/r520/index.php?p=03

    You need to clarify your metric because it seems like you're confusing numbers with instructions/sec and component ops/sec. Also we were discussing GTX512 and 1800XT.


    Nope.

    See Daves article linked above for R520

    • ALU 1
    ? 1 Vec3 ADD + Input Modifier
    ? 1 Scalar ADD + Input Modifier
    • ALU 2
    ? 1 Vec3 ADD/MULL/MADD
    ? 1 Scalar ADD/MULL/MADD

    So,

    x1800XT ~ (16x4x1)+(16x4x1) + 8x5 ~ 168x625Mhz ~ 105 billion component ops/sec

    and

    GTX512 ~ 24x4x2 + 8x5 ~ 232x550Mhz ~ 128 billion component ops/sec
     
    #36 j^aws, Dec 13, 2005
    Last edited by a moderator: Dec 13, 2005
  17. c0_re

    Banned

    Joined:
    Jun 27, 2005
    Messages:
    468
    Likes Received:
    4
    Location:
    Minneapolis
    Hmmm were you no there during this years E3? I can't believe people actually by into Sony's campaign of pr bullshit(especially after hearing it some 8 years in a row now)

    Yea the RSX is more powerful than bran new 600$ video card, if you believe that I have some ocean front property in Arizona to sell ya, if Sony wasn't so cheap we would all already have PS3's, there's nothing in the PS3 that is more technically advanced than whats allready the 360. Now Im not saying that the PS3 isn't potentially more powerful than the 360 I'm just saying the technology inside isn't anything bleeding edge in comparison to the 360 unless your talking about the difficulty of developing on cell.
     
    #37 c0_re, Dec 13, 2005
    Last edited by a moderator: Dec 13, 2005
  18. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    When did they say it was?

    The manufacturing process isn't any more advanced. There is a more significant transistor count difference on the CPU side, though. And technology isn't just a function of silicon, there is an intellectual contribution there which you can't just ignore (although it's of course, harder to quantify). I'd agree that power, and the where you are relative to "the edge" aren't necessarily tied at the hip though.
     
  19. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Flops should never be the sole metric and the more metrics you have, the better your judgment. IMHO, 32bit component ops/sec gives a better idea. The equivalent for the above comparison,

    P4 = 2x3.8GHz ~ 7.6 GigaComponents/sec
    A64 = 3x2.8 GHz ~ 8.4 GigaComponents/sec
     
  20. Eleazar

    Newcomer

    Joined:
    Nov 21, 2005
    Messages:
    95
    Likes Received:
    5
    Location:
    USA
    I am not referring to any PS3 developers since it is not myself who wrote this. Did I not clearly state this enough, Anandtech wrote the article. I personally am able to believe that out of all the PS3 developers out there, there could be some that might have said something like this. Maybe their X360 biased and only develop on the PS3 because they have to. I don't know. Or maybe it is a viewpoint a fair amount of PS3 developers share, who knows. You can decide whether it is true or not. The main point however was to show that FLOPS are not an accurate measurement for todays processors and GPUs. The AMD and P4 scenario exemplifies this. As Jaws pointed out, there are much more accurate ways to determine the speed of a processor. He also rightfully said that the more metrics you use the better your judgement.

    Look, if you want to be fooled by the marketing hype of both Sony and MS go right on ahead. But, if Sony and MS want to prove one system is more powerful than the other to someone with more than half a brain. Then, they are going to have to use a better metric than jsut FLOPS. The number that MS and Sony gave us are arbitrary at best, anyone who says differently doesn't understand what FLOPS are, what they measure, how they can be measured (there are different ways FLOPS are measured, the articles lists some ways), and the complete uselessness of FLOPS as a sole metric. What makes it worse is that these measurements are peak FLOPS.

    Lastly, if you look at this wikipedia article http://en.wikipedia.org/wiki/Flops. It explains in a different way what Anandtech said. In case anybody is still foolish enough to think a FLOPS measurement for the PS3 and X360 performance is anything but worthless as a benchmark for real world gaming performance.

    As Anandtech said, "Don't stare directly at the flops, you may start believing that they matter."

    I would like to clarify that my post is focused on dealing directly with the peak FLOPS measurements used to benchmark the performance of the X360 and PS3 systems. I say this in light of the possibilty of people taking my post out of context.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...