Is the RSX even based on the 7800?

Bill said:
It increased their power like 20% on a per clock, per pipe basis from NV40. It's sort of been the secret to their success.
Um, wanna give some evidence?

In Shadermark, it's on average about 3% faster than NV40. In Rightmark3D, in which 6 of 8 shaders are similar lighting shaders with lots of math, it's 9% faster on average.

The best improvement per clock per pipe is 22% in the 3 light phong shader. A bit OT, but RV530 gets a 2.8x increase in that shader over RV515, just to show you how math limited it is. G70's biggest improvements are in HDR data loading and storing.

No, the secret to their success is much more simple: More pipes. That ATI can even come close to the 512MB GTX in many of today's most shader intensive games, while being heavily deficient in texture rate, shader rate, and bandwidth should show you FLOPS ratings are meaningless between companies. The 7800 512MB has over 2.5 times the FLOPS rating of the X1800XT.
 
fulcizombie said:
You don't say....

It's a good thing then that I-8 was the least impressive ps3 video and doesn't look special at all.

Granted the E3 one wasn;t the best of the bunch but at least it was real time and still looked Next Gen. Also, they re-did that E3 vid to and it has a way better framerate and just looks THAT much better.

Not sure which one you've seen.

http://media.ps3.ign.com/media/748/748483/vids_1.html E3 vid, nice clean one to, looks pretty good to me.
 
Last edited by a moderator:
Eleazar said:
FLOPS are arbitrary and do not reflect real world performance.

You're right, just as bandwidth, clock rate, number of execution units, amount of cache, etc., have no reflection on the power of the system.

One makes you wonder, what does determine the power of a system?

If FLOPS are not part of it, then nothing else is also. Specs are meaningless.

Give me a 1 MHz GPU anyday, doing just 1000 FLOPS, after all that's all the power you need.
 
Last edited by a moderator:
Guilty Bystander said:
That's not true.
The GTX does:
5FMACs x 2Flops/FMAC x 56ALU's x 550MHz : 1000 = 308GFlop/s of programmable GFLOP/s.

Pixel units work on vec3+scalar and VS units work vec4+scalar. You haven't made a differentiation, which would net you 255 32bit Gflops.
 
weaksauce said:
Uhm well a 300 horsepowered car is more powerful than the 200, even if it doesn't come up in the same top velocity.

But I don't know, it's more powerful in the way that it has higher clocks and maybe more pipes, but it has a lower memory bandwidth.
Also, you can't really compare it to a pc card. Even if they are equal it will perform better in a console, eh.. :)

If you would have actually read my post you would have understood what I said. I said faster. Vauxhall, Opel, and Lotus all have cars around the 200hp mark. But actually beat cars with 300hp as far as fastness goes. Meaning if they were to race the Vauxhall, Opel, and Lotus would beat them. This is due to many factors. The main one being the cars weight. Because they are so light 200 horse power is enough to make them go fast. I also believe this is base HP of the engine, not including things such as turbocharger, supercharger, or any other such addons. However you get the point. Horsepower is not the only thing that makes a car go fast, that was the whole point.

TFLOPS whether you want to accept it or not are not indicative of real-world performance and here is some proof:
(This is a section ripped out of one of Anandtech's articles)

What about all those Flops?

The one statement that we heard over and over again was that Microsoft was sold on the peak theoretical performance of the Xenon CPU. Ever since the announcement of the Xbox 360 and PS3 hardware, people have been set on comparing Microsoft's figure of 1 trillion floating point operations per second to Sony's figure of 2 trillion floating point operations per second (TFLOPs). Any AnandTech reader should know for a fact that these numbers are meaningless, but just in case you need some reasoning for why, let's look at the facts.

First and foremost, a floating point operation can be anything; it can be adding two floating point numbers together, or it can be performing a dot product on two floating point numbers, it can even be just calculating the complement of a fp number. Anything that is executed on a FPU is fair game to be called a floating point operation.

Secondly, both floating point power numbers refer to the whole system, CPU and GPU. Obviously a GPU's floating point processing power doesn't mean anything if you're trying to run general purpose code on it and vice versa. As we've seen from the graphics market, characterizing GPU performance in terms of generic floating point operations per second is far from the full performance story.

Third, when a manufacturer is talking about peak floating point performance there are a few things that they aren't taking into account. Being able to process billions of operations per second depends on actually being able to have that many floating point operations to work on. That means that you have to have enough bandwidth to keep the FPUs fed, no mispredicted branches, no cache misses and the right structure of code to make sure that all of the FPUs can be fed at all times so they can execute at their peak rates. We already know that's not the case as game developers have already told us that the Xenon CPU isn't even in the same realm of performance as the Pentium 4 or Athlon 64. Not to mention that the requirements for hitting peak theoretical performance are always ridiculous; caches are only so big and thus there will come a time where a request to main memory is needed, and you can expect that request to be fulfilled in a few hundred clock cycles, where no floating point operations will be happening at all.

So while there may be some extreme cases where the Xenon CPU can hit its peak performance, it sure isn't happening in any real world code.

The Cell processor is no different; given that its PPE is identical to one of the PowerPC cores in Xenon, it must derive its floating point performance superiority from its array of SPEs. So what's the issue with 218 GFLOPs number (2 TFLOPs for the whole system)? Well, from what we've heard, game developers are finding that they can't use the SPEs for a lot of tasks. So in the end, it doesn't matter what peak theoretical performance of Cell's SPE array is, if those SPEs aren't being used all the time.


Don't stare directly at the flops, you may start believing that they matter.

Another way to look at this comparison of flops is to look at integer add latencies on the Pentium 4 vs. the Athlon 64. The Pentium 4 has two double pumped ALUs, each capable of performing two add operations per clock, that's a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium 4 can perform 15.2 billion operations per second. The Athlon 64 has three ALUs each capable of executing an add every clock; so a 2.8GHz Athlon 64 can perform 8.4 billion operations per second. By this silly console marketing logic, the Pentium 4 would be almost twice as fast as the Athlon 64, and a multi-core Pentium 4 would be faster than a multi-core Athlon 64. Any AnandTech reader should know that's hardly the case. No code is composed entirely of add instructions, and even if it were, eventually the Pentium 4 and Athlon 64 will have to go out to main memory for data, and when they do, the Athlon 64 has a much lower latency access to memory than the P4. In the end, despite what these horribly concocted numbers may lead you to believe, they say absolutely nothing about performance. The exact same situation exists with the CPUs of the next-generation consoles; don't fall for it.
 
Jaws said:
Wanna give some evidence too?

I believe I can provide that evidence for him though my numbers do not prove his statement...

*nVidia Geforce 7800GTX (512MB version) > 24 pixel pipelines with 2 full shader units per pipeline providing 4 components per cycle each (Vec3+Scalar) and 8 vertex pipelines with 1 full shader unit per pipeline also providing 5 components per cycle each (Vec4+Scalar) and a clock rate of 550Mhz.
*ATI Radeon 1800XT > 16 pixel pipelines with 1 full shader unit providing 4 components per cycle (Vec3+scalar) and 1 partial shader unit per pipeline providing 1 component per cycle and 8 vertex pipelines with 1 full shader unit per pipeline providing 5 components per cycle each (Vec4+scalar) and a clock rate of 625Mhz.

Each component is 2 FLOPS.

*nVidia Geforce 7800GTX (512MB version)
24 pixel pipes*2 units each*8FLOPS per unit*550Mhz = 211.2GFLOPs
8 vertex pipes*1 unit each*10FLOPs per unit*550Mhz = 44GFLOPs
TOTAL: 255.2GFLOPs

*ATI Radeon x1800XT
16 pixel pipes*1 unit each*8FLOPs per unit*625Mhz = 80GFLOPs
16 pixel pipes*1 partial unit*2FLOPs per unit*625Mhz = 20GFLOPs
8 vertex pipes*1 unit each*10FLOPs per unit*625Mhz = 50GFLOPs
TOTAL: 150GFLOPs

Well... it's not quite the 2.5 times that Mintmaster indicated... but it certainly has a lot greater theoretical floating point potential than the Radeon card has and also theoretical shader performance. Truth of the matter is though the actual performance in games do not reflect this difference... so while FLOPs and shader performance does matter it is not the end-all performance metric. Just one of many smaller things to look at in conjunction with other things...
 
Synergy34 said:
Granted the E3 one wasn;t the best of the bunch but at least it was real time and still looked Next Gen. Also, they re-did that E3 vid to and it has a way better framerate and just looks THAT much better.

Not sure which one you've seen.

http://media.ps3.ign.com/media/748/748483/vids_1.html E3 vid, nice clean one to, looks pretty good to me.
Pretty good,yes...something that will blow away the games(gow,too human,mass effect e.t.c) that will be available for the xbox360 at that time,no.
 
fulcizombie said:
Pretty good,yes...something that will blow away the games(gow,too human,mass effect e.t.c) that will be available for the xbox360 at that time,no.
Well, it was further along (framerate wise) that PDZ was at E3. So you can expect a PDZ E3 shitty version to PDZ launch good lookin' version jump. Also, the engine that I-8 is built on is going to be insane.
Insomniac + Naughty Dog + SCEA Santa Monica? The makers of arguably this gens greatest technical engine + another talented studio means you can expect a great engine leading to a great looking game.
I doubt I-8 is going to be anything other than mediocre gameplay/story/art-wise though.
 
The GameMaster said:
Well... it's not quite the 2.5 times that Mintmaster indicated... but it certainly has a lot greater theoretical floating point potential than the Radeon card has and also theoretical shader performance. Truth of the matter is though the actual performance in games do not reflect this difference... so while FLOPs and shader performance does matter it is not the end-all performance metric. Just one of many smaller things to look at in conjunction with other things...

While I agree that floating-point performance is not the end-all performance metric, your example of PC games doesn't take into account that the software you're using as benchmark in this case, don't utilize the cards full potential and are limited by many other factors as well. We all expect a closed-box to be much more utilized and that also includes using all dedicated performance available since your software can afford to target a single entity.
 
Eleazar said:
Well, from what we've heard, game developers are finding that they can't use the SPEs for a lot of tasks. So in the end, it doesn't matter what peak theoretical performance of Cell's SPE array is, if those SPEs aren't being used all the time.

And which specific PS3 developers are you referring to?
 
The GameMaster said:
...
*ATI Radeon 1800XT > 16 pixel pipelines with 1 full shader unit providing 4 components per cycle (Vec3+scalar)

Yep and they're madd capable.

The GameMaster said:
...and 1 partial shader unit per pipeline providing 1 component per cycle...

Nope. They can do vec3+scalar too, but not madd capable, i.e. 3+1 Flops/cycle.

The GameMaster said:
...and 8 vertex pipelines with 1 full shader unit per pipeline providing 5 components per cycle each (Vec4+scalar) and a clock rate of 625Mhz...

Yep.

The GameMaster said:
Each component is 2 FLOPS.

*nVidia Geforce 7800GTX (512MB version)
24 pixel pipes*2 units each*8FLOPS per unit*550Mhz = 211.2GFLOPs
8 vertex pipes*1 unit each*10FLOPs per unit*550Mhz = 44GFLOPs
TOTAL: 255.2GFLOPs

Yep.

The GameMaster said:
*ATI Radeon x1800XT
16 pixel pipes*1 unit each*8FLOPs per unit*625Mhz = 80GFLOPs
16 pixel pipes*1 partial unit*2FLOPs per unit*625Mhz = 20GFLOPs
8 vertex pipes*1 unit each*10FLOPs per unit*625Mhz = 50GFLOPs
TOTAL: 150GFLOPs

Nope. The 2nd PS ALU would be 40 GFlops, net 170 GFlops.

GTX512 has around 50% more GFlops (32bit programmable) than the 1800 XT. But nowhere near MORE than 2.5x, even with total system TFLOPs...

Or component ops,

GTX512 ~ 24x4x2 + 8x5 ~ 232x0.55GHz ~ 128 GigaComponent ops/sec

1800XT ~ 16x4x2 + 8x5 ~ 160x0.625GHz ~ 105 GigaComponent ops/sec

...around 22% more.

The GameMaster said:
Well... it's not quite the 2.5 times that Mintmaster indicated...

Nope. Nowhere near. Closer to 50% more for 32bit programmable flops.

The GameMaster said:
...
Truth of the matter is though the actual performance in games do not reflect this difference...

They may not reflect 50% flop difference or 22% comp. op difference, but there are games benched that approach this difference.

The GameMaster said:
...so while FLOPs and shader performance does matter it is not the end-all performance metric. Just one of many smaller things to look at in conjunction with other things...

Yep, it's NEVER the sole metric, and NEVER was, and NEVER will be. But it's NOT irrelevant like some people make out with sweeping generalisations, without interpreting the numbers correctly.
 
Last edited by a moderator:
Jaws said:
Nope. The 2nd PS ALU would be 40 GFlops, net 170 GFlops.

GTX512 has around 50% more GFlops (32bit programmable) than the 1800 XT. But nowhere near MORE than 2.5x, even with total system TFLOPs...

Or component ops,

GTX512 ~ 24x4x2 + 8x5 ~ 232x0.55GHz ~ 128 Component ops/sec

1800XT ~ 16x4x2 + 8x5 ~ 160x0.625GHz ~ 105 Component ops/sec

...around 22% more.

Actually... the first shader unit in each pixel pipeline are in fact a partial shader unit.

ps.gif


The pixel pipelines in the R520 is almost identical to the pixel pipeline in the R420 (with the exception of the texture units) in terms of shader arrangement. ATI quoted the ATI Radeon as having a capacity of 43 billion shader component operations per second with the Radeon x850XT (clock rate of 540Mhz) and the nVidia 6800 Ultra (clock rate of 400Mhz) was claimed by nVidia as having 51.2 billion shader component operations per second. Why can nVidia claim a higher theoretical shader performance even though it had lower clock speeds compared to ATI's? Because the first shader unit in each pixel pipeline in the Radeon series is a partial unit and not a full unit that provides 1 component per cycle instead of 4 components that the full shader units provided in the Radeon and Geforce GPUs. Breaking down the numbers it comes out to this...

Full shader unit providing 4 shader components per second on the Radeon x850XT at 540Mhz provides 35.6 billion shader components per second and the partial unit in the Radeon x850XT provides 1 shader component per second and at 540Mhz provides 8.6 billion shader components per second. Combined that would be 5 shader components per pixel pipeline per cycle and at 540Mhz would be 43 billion shader components per second (the stated amount). This compared to the nVidia Geforce 6800 Ultra which had 2 full shader units that provided a total of 8 component operations per cycle and at 400Mhz would be the stated 51.2 billion shader components per second. So yes the shader numbers should look like this...

GTX512 ~ 24x4x2 + 8x5 ~ 232x550Mhz ~ 128 billion component ops/sec
x1800XT ~ (16x4x1)+(16x1x1) + 8x5 ~ 120x625Mhz ~ 75 billion component ops/sec
 
The texture arrangement on R520 is actually the same as R300/R420; the representation has changed, but the layout is the same - what has changed is that there is better control of those units with a more flexible scheduler (which also means that chips like RV530 can increase the parallel ALU's while "sharing" texture units). The primary changes to the ALU's is the additional branch unit (which, this and the separate texture address processor, people seem to forget in the FLOP counting).

BTW - your component vs instructions counting is all to cock.
 
The GameMaster said:
Actually... the first shader unit in each pixel pipeline are in fact a partial shader unit.

Yep. I'm fully aware. One has madd capability and the other doesn't.

The GameMaster said:
ps.gif


The pixel pipelines in the R520 is almost identical to the pixel pipeline in the R420 (with the exception of the texture units) in terms of shader arrangement. ATI quoted the ATI Radeon as having a capacity of 43 billion shader component operations per second with the Radeon x850XT (clock rate of 540Mhz) and the nVidia 6800 Ultra (clock rate of 400Mhz) was claimed by nVidia as having 51.2 billion shader component operations per second...

You need to clarify this post. Are talking about shader ops/sec (aka shader instructions/sec) OR Component ops/sec?

Just to re-iteraet, we were talking about 32 bit, programmable FLOPS.

The GameMaster said:
Why can nVidia claim a higher theoretical shader performance even though it had lower clock speeds compared to ATI's?

You need to clarify your metric. They can include mixed 16+32bit instructions/flops etc. to attain different numbers. Heck, this is what this thread discussed earlier with mixed programmable and fixed function flops. You can also have mixed 16, 32 bit flops too...

The GameMaster said:
Because the first shader unit in each pixel pipeline in the Radeon series is a partial unit and not a full unit that provides 1 component per cycle instead of 4 components that the full shader units provided in the Radeon and Geforce GPUs. Breaking down the numbers it comes out to this.

Nope.

See Daves article for R520,

http://www.beyond3d.com/reviews/ati/r520/index.php?p=03

article said:
• ALU 1
? 1 Vec3 ADD + Input Modifier
? 1 Scalar ADD + Input Modifier
• ALU 2
? 1 Vec3 ADD/MULL/MADD
? 1 Scalar ADD/MULL/MADD

The GameMaster said:
Full shader unit providing 4 shader components per second on the Radeon x850XT at 540Mhz provides 35.6 billion shader components per second and the partial unit in the Radeon x850XT provides 1 shader component per second and at 540Mhz provides 8.6 billion shader components per second. Combined that would be 5 shader components per pixel pipeline per cycle and at 540Mhz would be 43 billion shader components per second (the stated amount). This compared to the nVidia Geforce 6800 Ultra which had 2 full shader units that provided a total of 8 component operations per cycle and at 400Mhz would be the stated 51.2 billion shader components per second. So yes the shader numbers should look like this...

You need to clarify your metric because it seems like you're confusing numbers with instructions/sec and component ops/sec. Also we were discussing GTX512 and 1800XT.


The GameMaster said:
...
x1800XT ~ (16x4x1)+(16x1x1) + 8x5 ~ 120x625Mhz ~ 75 billion component ops/sec

Nope.

See Daves article linked above for R520

• ALU 1
? 1 Vec3 ADD + Input Modifier
? 1 Scalar ADD + Input Modifier
• ALU 2
? 1 Vec3 ADD/MULL/MADD
? 1 Scalar ADD/MULL/MADD

So,

x1800XT ~ (16x4x1)+(16x4x1) + 8x5 ~ 168x625Mhz ~ 105 billion component ops/sec

and

GTX512 ~ 24x4x2 + 8x5 ~ 232x550Mhz ~ 128 billion component ops/sec
 
Last edited by a moderator:
BenQ said:
Hmmmm. I find it hard to believe that Sony would lie THAT much about the RSX. They are claiming essentially 5X's more FLOPS for the RSX than the 7800 GTX.
l

Hmmm were you no there during this years E3? I can't believe people actually by into Sony's campaign of pr bullshit(especially after hearing it some 8 years in a row now)

Yea the RSX is more powerful than bran new 600$ video card, if you believe that I have some ocean front property in Arizona to sell ya, if Sony wasn't so cheap we would all already have PS3's, there's nothing in the PS3 that is more technically advanced than whats allready the 360. Now Im not saying that the PS3 isn't potentially more powerful than the 360 I'm just saying the technology inside isn't anything bleeding edge in comparison to the 360 unless your talking about the difficulty of developing on cell.
 
Last edited by a moderator:
c0_re said:
Hmmm were you no there during this years E3? I can't believe people actually by into Sony's campaign of pr bullshit(especially after hearing it some 8 years in a row now)

Yea the RSX is more powerful than bran new 600$ video card, if you believe that I have some ocean front property in Arizona to sell ya

When did they say it was?

c0_re said:
if Sony wasn't so cheap we would all already have PS3's, there's nothing in the PS3 that is more technically advanced than whats allready the 360. Now Im not saying that the PS3 isn't potentially more powerful than the 360 I'm just saying the technology inside isn't anything bleeding edge in comparison to the 360 unless your talking about the difficulty of developing on cell.

The manufacturing process isn't any more advanced. There is a more significant transistor count difference on the CPU side, though. And technology isn't just a function of silicon, there is an intellectual contribution there which you can't just ignore (although it's of course, harder to quantify). I'd agree that power, and the where you are relative to "the edge" aren't necessarily tied at the hip though.
 
Eleazar from Anandtech article said:
...
Another way to look at this comparison of flops is to look at integer add latencies on the Pentium 4 vs. the Athlon 64. The Pentium 4 has two double pumped ALUs, each capable of performing two add operations per clock, that's a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium 4 can perform 15.2 billion operations per second. The Athlon 64 has three ALUs each capable of executing an add every clock; so a 2.8GHz Athlon 64 can perform 8.4 billion operations per second. By this silly console marketing logic, the Pentium 4 would be almost twice as fast as the Athlon 64, and a multi-core Pentium 4 would be faster than a multi-core Athlon 64. Any AnandTech reader should know that's hardly the case. No code is composed entirely of add instructions, and even if it were, eventually the Pentium 4 and Athlon 64 will have to go out to main memory for data, and when they do, the Athlon 64 has a much lower latency access to memory than the P4. In the end, despite what these horribly concocted numbers may lead you to believe, they say absolutely nothing about performance. The exact same situation exists with the CPUs of the next-generation consoles; don't fall for it.

Flops should never be the sole metric and the more metrics you have, the better your judgment. IMHO, 32bit component ops/sec gives a better idea. The equivalent for the above comparison,

P4 = 2x3.8GHz ~ 7.6 GigaComponents/sec
A64 = 3x2.8 GHz ~ 8.4 GigaComponents/sec
 
SubD said:
And which specific PS3 developers are you referring to?

I am not referring to any PS3 developers since it is not myself who wrote this. Did I not clearly state this enough, Anandtech wrote the article. I personally am able to believe that out of all the PS3 developers out there, there could be some that might have said something like this. Maybe their X360 biased and only develop on the PS3 because they have to. I don't know. Or maybe it is a viewpoint a fair amount of PS3 developers share, who knows. You can decide whether it is true or not. The main point however was to show that FLOPS are not an accurate measurement for todays processors and GPUs. The AMD and P4 scenario exemplifies this. As Jaws pointed out, there are much more accurate ways to determine the speed of a processor. He also rightfully said that the more metrics you use the better your judgement.

Look, if you want to be fooled by the marketing hype of both Sony and MS go right on ahead. But, if Sony and MS want to prove one system is more powerful than the other to someone with more than half a brain. Then, they are going to have to use a better metric than jsut FLOPS. The number that MS and Sony gave us are arbitrary at best, anyone who says differently doesn't understand what FLOPS are, what they measure, how they can be measured (there are different ways FLOPS are measured, the articles lists some ways), and the complete uselessness of FLOPS as a sole metric. What makes it worse is that these measurements are peak FLOPS.

Lastly, if you look at this wikipedia article http://en.wikipedia.org/wiki/Flops. It explains in a different way what Anandtech said. In case anybody is still foolish enough to think a FLOPS measurement for the PS3 and X360 performance is anything but worthless as a benchmark for real world gaming performance.

As Anandtech said, "Don't stare directly at the flops, you may start believing that they matter."

I would like to clarify that my post is focused on dealing directly with the peak FLOPS measurements used to benchmark the performance of the X360 and PS3 systems. I say this in light of the possibilty of people taking my post out of context.
 
Back
Top