PS3 gpu 2x more powerful than X360 gpu?

bbot

Regular
nv50 versus r500.

Sony claims that the total flops for PS3 is 2tflops while the total flops for X360 is 1tflops. Since we know that most of the gflops comes from the gpu, the question naturally arises: Is nv50, which uses a traditional pipeline, more better than r500, which uses unified shaders? Dave B?
 
I think we need actually info on the workings of the chip. Simply telling us how many transistors it has doesn't tell use how well they come together. Any comparisons between XB360 and PS3 GPUs should wait until we have tech specs, not marketting waffle.
 
For a chip to be 2X more powerful than one coming out less than 6 months before, there would need to be some serious extra cash poured into the project
 
Yes, the designs are different enough that a direct comparison is kind of tough. From what I can tell, I'd say that they are overall pretty similar, but each with different pros / cons.

1) nvidia's has higher raw fill rate
2) ATI's has (essentially) 2X AA "for free" at 720p resolution.

So in the end, the "effective" fill rates at the most common resolution and AA (which I think MS got right) may be similar, or actually in favor of ATI's chip.

1) nVidia's has higher total shader ops per second, but segmented into X vertex ops, and Y pixel shader ops.
2) ATI's has lower shader ops ("Z"), but you should be able to have the flexibillity to balance that power where needed (vertex vs. pixel).

So in the end, if nVidia's hard-wired "balance" turns out to be a good pick where most games sit, it may typically end up effectively faster in shader power.


You might say it this way (speaking in gross generalizations): ATI has "hard wired" fill rate and flexible shading capability. nVidia has "hard wired" shaders, and flexible fill rate capability.

For consoles, I would suspect that ATI's approach is a better fit, but only time will tell.
 
london-boy said:
For a chip to be 2X more powerful than one coming out less than 6 months before, there would need to be some serious extra cash poured into the project

Or just moved from 110 nm to 90nm...
 
nAo said:
Joe DeFuria said:
1) nvidia's has higher raw fill rate
How do you know?

Good question...I thought I read it somewhere, but it may just be my deductive reasoning based on:

1) Assumption that it is more or less a G70 tech shrunk to 90 nm.

I don't think G70 would have any fewer than 16 "fill rate" pipes: (output 16 pixels per clock, minimum).

Could be wrong, though. :)
 
Doesn't the 100+billion shader ops vs the 48billion count for anything? Or is that not a directly comparable metric?
 
I think XB360 will have an advantage in shader processing, but I think PS3 RSX will have the advantage in global illumination and shadows. Most GI and US algorithms are fillrate bound, not shader bound, indeed, they resemble deferred rendering with a number of passes which calculate visibility, occlusion, or intersection information, followed by a final pass with colors everything in.

The XB360 is a little low in this regard (4gigapixel), and may fair poorer in UE3 engine if it makes heavy usage of shadow buffers or stencil shadows.

This much resembles the argument of having a pipeline with 2 TMUs vs 1. Depending on the workload, you either fair better or worse, but having more pipelines allows more flexibility.

I like the XB360's bold risk taking of a new architecture, unified shaders, eDRAM, but that doesn't neccessarily mean it will win, even if it seems elegant or "cool" to a geek's sense of aesthetics. I would have liked XB360 better if it had 16 ROPs.

I think the biggest benefit of the eDRAM may turn out be the raw speed of dependent texture lookups if you can manage to store those textures in eDRAM.
 
I think the big difference between the two architectures is that one of them will never stop due to texturing - and the other one will be stopping all the time.

A big part of ATI's architecture is that dependent texture fetches cause no stalls. As soon as the dependency arises, the pipeline's next instruction is on a different thread.

Multi-thread graphic processing system patent said:
[0006] As such, there is a need for a sequencing system for providing for the processing of multi-command threads that supports an unlimited number of dependent texture fetches.
Jawed
 
Joe DeFuria said:
Yes, the designs are different enough that a direct comparison is kind of tough. From what I can tell, I'd say that they are overall pretty similar, but each with different pros / cons.

1) nvidia's has higher raw fill rate
2) ATI's has (essentially) 2X AA "for free" at 720p resolution.

So in the end, the "effective" fill rates at the most common resolution and AA (which I think MS got right) may be similar, or actually in favor of ATI's chip.

1) nVidia's has higher total shader ops per second, but segmented into X vertex ops, and Y pixel shader ops.
2) ATI's has lower shader ops ("Z"), but you should be able to have the flexibillity to balance that power where needed (vertex vs. pixel).

So in the end, if nVidia's hard-wired "balance" turns out to be a good pick where most games sit, it may typically end up effectively faster in shader power.


You might say it this way (speaking in gross generalizations): ATI has "hard wired" fill rate and flexible shading capability. nVidia has "hard wired" shaders, and flexible fill rate capability.

For consoles, I would suspect that ATI's approach is a better fit, but only time will tell.

I thought MS said the ATI GPU had 4x MSAA not 2x? (my bold)
 
I'd wager this is the same chip as the G70.

Something that is bothering my the PS3 marketing material is that the GPU is stated of having 1.8 TerraFlops.

Didn't they market the 6800 Ultra having 100GFlops ???
Now 2x a 6800U would make this 200 GFlops not 1800 GFlops.

Something is whack here.

Silly marketing?
 
One of the question is just how custom have Sony/nVidia made the RSX, the R500 from all the info/atleast seems alot more custom-made from what we can tell.

One thing i cant remember is if MS has put up anything about the connetction between the CPU/GPU how fast it is etz.

Thats one thing that from the current view seems very good on PS3, the very fast "partnership" between Cell and the RSX.
If this is any indication of how custom-made the RSX is or if we can draw
any conclusions i dont know.
 
Jawed said:
I think the big difference between the two architectures is that one of them will never stop due to texturing - and the other one will be stopping all the time.

Well, NVs ability to hide dependent fetch latency has been pretty good in comparison to R3xx architecture.

A big part of ATI's architecture is that dependent texture fetches cause no stalls. As soon as the dependency arises, the pipeline's next instruction is on a different thread.

This is no different for NV4x or R3xx. You really think today's cards stall for the hundreds of cycles it takes to fetch a texel? No, they move on to another pixel and another instruction.

The real advantage the XB360 has in dependent texturing is eDRAM latency.
 
Jawed said:
A big part of ATI's architecture is that dependent texture fetches cause no stalls. As soon as the dependency arises, the pipeline's next instruction is on a different thread.
Why do you think the NV3x (and the NV4x, to a lesser extend) had heavy performance drops when you used too many registers? It allowed dependant texture fetches to stall your pipeline significantly less. Sure, that required extra storage for the desired result, but so does "waiting" for threads in the R500. ATI's system is an awful lot more complicated, more flexible and more elegant and it certainly has advantages - but when it comes to dependant texturing, I wouldn't expect an unbelievable speedup. Of course, I could be wrong.
As for the NV40 - certain NVIDIA marketing and for-developer documents have it at 1TeraFlop. Considering it was at 400Mhz and this chip is at 550Mhz, if it had 24 pipelines instead of 16 (and everything was multiplied by identical proportions), it would have 1.00/16*24/400*550 = 2.0625 TeraFlops. Considering everything but PS/VS/Triangle Setup/etc. most likely wasn't increased by such an amount, the number of 1.8 TeraFlops does seem very possible to me (assuming the 1 TeraFlop number isn't "creative" - which it most likely is, at least to a certain extend), although it would have been rounded to get exactly at how much they needed to get a total of 2 TeraFlops.

As for my personal opinion on the RSX, I first believe that people complaining about the memory bandwidth should look at the numbers again. It's two separate 256MB blocks. That means you can another effective 128-bit bus to it if you go through the CELL interface to access your memory (which it will, according to docs on AnandTech). That means that yes, the bandwidth will be limited, but not catastrophically. If the CPU isn't using more than 5GB/s of memory bandwidth, the RSX would have nearly a full 20GB/s more to play with. That isn't very likely at all in practice, but a good 5-10GB/s extra on average seems very possible to be. And considering the resolutions would be lower, and the focus would be on pixel shading instead, this seems just plain fine to me.

Uttar
 
overclocked said:
One of the question is just how custom have Sony/nVidia made the RSX, the R500 from all the info/atleast seems alot more custom-made from what we can tell.

NVIDIA's license agreement with Sony allows them to use existing technology in this upcoming generation of PC parts too, whereas ATI's agreement with Microsoft is the opposite - they cannot base R500/520 on the same architecture. Thus, R500 is custom made for Xbox 360, whereas RSX is likely to be very similar to G70, or whatever NVIDIA's next generation part is called.

I get the impression that R520 will be a more conventional GPU with vertex shaders and pixel pipelines, rather than the unified shader approach that has been used in R500.
 
I think this is a very worthwhile discussion. On paper it looks like no competition:

300M transisters vs. 150M transisters.

Game over... right?

But we know hardly anything about the chips, but what we do know is worth a momement of pause. The RSX does 128bit pixel percision and Nvidia really hit on the difference between 32bit HDR effects and 128bit HDR effects. But think back to FarCry--HDR at 1024x768 was well below 60fps--with no anti aliasing. Of the 55GB/s RSX shares with CELL it gets ~22GB/s of that for frame buffer. The 6800U with 35GB/s bandwidth chokes at 1024x768 with HDR and no AA.

So can we realistically expect 1080p at 60fps with HDR? What about AA? It is possible... just a question.

But I cringe at the thought of a feature that takes up a lot of silicone being unusable. Call me pessimistic, but Nvidia has a long history of introducing features that require more features than the system can realistically perform (fyi, I am running a 6800GT right now, I am not anti Nvidia, just being honest about their track record).

This also relates to the memory design: The GPU having access to two pools sounds nice, but it sounds like yet another balancing act. To compete with the 23GB/s + eDRAM X360 design it was necessary, but accessing one large fast pool and a fast framebuffer versus accessing two separate pools one for framebuffer and content and the other for whatever you cannot fit in seems like it could be a recipe for headaches. It is nice there is the bandwidth there, but I think it closes up the flexibility of the system. What if doing 218GFLOPs requires access to more than 256MB of memory for a certain game (like the satalite photo + height map demo). Or being more realistic, if CELL is doing a ton of physics, AA, and other intense CPU tasks it may require the bandwidth of the XDR, that means the GPU has the 23GB/s for all its graphical needs+framebuffer. While both systems will require some balancing, it looks like PS3 gives less options (at least less options without jumping through hoops).

Another question is did the video encoding/decoding hardware get left on the chip? It is pretty useless with CELL, but that would be another area where the transister counter would be inflated, rendering a comparison on transister budget useless.

The next question is configuration. With tweaks of SM 3.0 features and the inclusion of 128bit pixel percision cutting into the 88M transister difference between NV40 and G70, it would seem the guesses of a 24PS pipe part seem realistic (I have heard 10-14 VS, but I am not sure anyone knows yet).

And with a bump from 400MHz/16PS for the NV40 to 550MHz/24PS for the G70, that would look like 2x the performance. I know this is not scientific, but it should be ballparkish: 400*16 = is 6,400; 550*24 = 13,200. Add in the tweaks the G70 has and it is easy to see how it should be at least 2x as fast as the NV40.

That is impressive--but it is exactly what our PCs will be outputting come fall/spring. Of course the PS3 has a better CPU for gaming and is a closed box.

But if the above is true of the RSX/G70, I am not sure the R500 is going to be underpowered. Again, we do not know much about the R500's featureset, and since it is a new architecture there are a lot of questions about how it compares to traditional parts.

But there seems to be a significant gap in philosophy. First it seems pretty clear the R500 is completely designed to be a console part. The fill rate is not over inflated for its needs and it does not seem to be a legacy part. And Jawed has summerized it summerized it well in another post: The R500 will perform like a 8 pipe part in non-Shader intensive games BUT in PS heavy games it will be like a 48 pipeline part (of course we do not know the quality of thise "pipes" compares to the traditional GPUs we are currently used to).

R500 in this regards looks very clean and very flexibile.

The other area is the eDRAM. 256GB/s of effective bandwidth (64GB/s real?) should smooth out a lot of hiccups. Having your framebuffer as a separate pool means less drain on the main memory. I have wondered if HDR like effects are possible with such a small framebuffer though (but from what I have heard the GPU can write to the system memory also--can it tile the information from the framebuffer to the main memory?)

So on shere size the RSX looks to own, but the few details we have seem to give a good reason to pause (at least they do for me). I think we very well will see that each design has Pros and Cons. Just like desktop GPUs, they may excell in different game situations.

Then again one may just flat outperform the other. We do not know yet, but it will be fun finding out more about these chips. I am especially curious about the FEATURES of these chips. Tesselation, advance displacement mapping features, what flow/branch control improvements there have been, what percision the R500 is rendering, etc.
 
Back
Top