How much work must the SPU's do to compensate for the RSX's lack of power?

Status
Not open for further replies.
Yea,sorry i meant that.You can do both at same time,thats why it is so efficient.If it demands more pixel shading more alus will automatically be dedicated to that,but there will be alus left for vertex shading,both at same time(from what I am understanding) while on G70 architecture there will always be alus sitting idle.If scene is more pixel heavy more alus are going to do pixel shading and vica versa.Here is how Ati described it:

G70 architecture

http://www.elitebastards.com/pic.php?picid=/hanners/ati/dx10/dx10-14.jpg

Xenos

http://www.elitebastards.com/pic.php?picid=/hanners/ati/dx10/dx10-16.jpg

We wont disscuss about 360 and ps3 gpus anymore since Shifty said no comparison posts :)

And, that's where the SPUs come in and make RSX MUCH more efficient, right? How much more efficient (and what ways) does culling alone on the SPUs make the RSX?

How much vertex processing is one SPU capable of?
 
Hm, I wonder if PS3 devs only optimize for the Cell nowaday?
Isn't it as least as much worth to find tricks and optimization for poor ol RSX - we only talk about the Cell ...is RSX maxed out or what?

Check out G70 and G71. Also IIRC Metro 2033 dev talked about RSX use in KZ2 due to him being experienced with G70 architecture from it's birth. Seems like HW are twins with some genetic mutations.
 
Yea,sorry i meant that.You can do both at same time,thats why it is so efficient.If it demands more pixel shading more alus will automatically be dedicated to that,but there will be alus left for vertex shading,both at same time(from what I am understanding) while on G70 architecture there will always be alus sitting idle.If scene is more pixel heavy more alus are going to do pixel shading and vica versa.Here is how Ati described it
You're right and for PS3, you just have to make sure you use all the pixel shaders and use the SPU's for their extra vertex shading power. UC2 does it and except for the lack of MLAA, it's still the best looking console game this gen. It looks noticably better than anything else out there.
 
It's so hard to judge consoles. I don't think an exceptional exclusive game is indicative of anything more than the developer being very talented. And the graphics quality is still highly subjective because a lot of a game's visual impression comes from the craftiness of the artists and whether it works well for you.

One of the best console analyses that I've seen is DRS's thread on making normal mapped and dynamically-lit/shadowed Quake 1 for Wii. No bullshit "40% power!!" stuff, just practical results and comments on it.
 
Last edited by a moderator:
For SPU triangle culling, it depends on the scene and application.

For example, from a 2007 slides on Edge:
http://forum.beyond3d.com/showpost.php?p=956489&postcount=196

Pipeline:
* 1st stage of pipeline decompresses vertexes. Accespts vertex arrays interleaved with data. Separates data into tables of floats. Supports all native RSX formats. Also perserves ability to have RSX process data directly.

* 2nd stage decompresses index data. Indexed triangle lists (because this is the best for RSX).
Optimizing for the mini cache on the RSX is often the most important factor to consider when constructing index data.
Index data is highly compressible. 6.5x more triangles in the same amount of space with index compression.

* (Optional stage) Blend shapes/additive vertex blending.
* (Optional stage) 4 Bone matrix pallet skinning. Most teams do skinning on SPU. 2 very large benefits:
1. Reduce the length of the vertex program.
2. Save time in RSX reading the vertices and weights. 30-70% speed boost over RSX.

* (Optional stage) Triangle Culling. SPU culls to only sends triangles which can be rendered by RSX. Muti-sampling creates some complications - but we can still cull "pretty good"
Overall performance improvement (from culling) in a balanced scene is 10-20%
Reduces the pressure to create an LOD technology in your projects

* Final stage, prepare data the RSX will use. Convert everything into RSX accepted formats

Geometry Processing on SPUs:
* A lot of data
* Double buffering is simple but takes up a long to space.
* Edge uses a single or ring buffer JIT strategy.
* SPU generates data in same frame RSX consumes it.
* RSX almost never waits on SPU - In the rare even[t] it does, the correct synchronization will take place.

Test case examples:
* In general 1 SPU can process 750k triangles per frame at 60FPS while hopefully culling 60% of the triangles.

Not sure if they have improved it further since 2007.
 
Also all the 60fps titles that also look good: Forza 3, MW1-2, Rage. Or games pushing 4xAA like Blur, Joker's sport games, Sebbi's Trials HD.

Stuff that's complicated and hard to do technically won't necessarily look the best at a first glance.
And I'm not saying that Epic's stuff isn't good, it's just that their work shouldn't be the measure of what the 360 can do.
Not to mention that I recall something about deferred shadow rendering being the actual reason for Gears' and UE3's lack of AA in general, and not the architecture itself.

I disagree with Forza 3, Turn 10 are not standout from a tech perspective.
The in game the car models are quite low poly and nothing like you see in photomode/showroom or GT5 for that matter (AND only 6 cars on track).

And Forza 2 was an absolute dog with hideous shader aliasing, crap texture filtering etc. I would say pretty much every other racing studio outdid Turn 10 on 360; Bizarre, Criterion, Codemasters, Black Rock etc.

Epic is definitely one of the better devs on 360, Gears 2 looked amazing, perhaps much of the credit must go to their extremely talented artists, but the underlying tech is solid.

Other than Remedy, I also think Rare are one of the best 360 only devs, Banjo Kazooie 3 looks stunning, so does Viva Pinata.

But, we'll never know what the 360 is capable of, because there's no one interested in pushing it - if MS isn't, why would the third parties?
 
Could you name some games though that you think are doing something extra special technically?

Trials HD is the first one that comes to mind where they explicitly state some of the things they used it for. Other devs have also leveraged it for things other than AA while some use it for AA...

From an interview with Sebbi...

http://www.eurogamer.net/articles/digitalfoundry-tech-interview-trials-hd?page=2

eDRAM gives the system huge extra render target bandwidth for operations like these, and makes Xbox 360 a very suitable platform for deferred rendering techniques.

The anti-aliasing hardware inside the eDRAM is one of the most important performance advantages of the platform. With the anti-aliasing hardware we could speed up our soft shadowing algorithm dramatically, and we could replace lots of usually pixel shader-heavy post-processing steps (blurring and downsampling) with cheaper alternatives. These hardware specific optimisations required down-to-the-metal code, but in the end I must say that the eDRAM hardware was a key feature in making our game run at constant 60FPS.

The inclusion of eDRAM gives developers more than just the ability to have fast AA.

Regards,
SB
 
Epic is definitely one of the better devs on 360, Gears 2 looked amazing, perhaps much of the credit must go to their extremely talented artists, but the underlying tech is solid.

Other than Remedy, I also think Rare are one of the best 360 only devs, Banjo Kazooie 3 looks stunning, so does Viva Pinata.

But, we'll never know what the 360 is capable of, because there's no one interested in pushing it - if MS isn't, why would the third parties?

totally agree with this. I think gears of war 2 looks fantastic and should be considered top tier in the graphics department. Whether it's the art or not, all I know is looking at gears is pleasant. Very pleasant.


For SPU triangle culling, it depends on the scene and application.

For example, from a 2007 slides on Edge:
http://forum.beyond3d.com/showpost.php?p=956489&postcount=196

In general 1 SPU can process 750k triangles per frame at 60FPS while hopefully culling 60% of the triangles.

Not sure if they have improved it further since 2007.

slightly OT, but what is the difference between verticies and triangles? triangle = 3 verticies? Of course as triangles & vertexes approach a high number, they reach a one to one ratio.
Looking at the specs for xenos,
the Maximum vertex count: 6.0 billion vertices per second while
the Maximum polygon count: 500 million triangles per second. Shouldn't they be close together?
 
GPUs use triangle strips, fans, vertex lists and other stuff to speed up processing.
Usually triangles are connected together and to calculate the next one, you already have two of its vertices processed. There are a few things that can break mesh continuity, which is why it's important to have good artists who can optimize their models/UVs well.
 
Again, as good as Gears 2 or Mass Effect 2 looks, it still doesn't mean that the reason is the cutting edge nature of the underlying technology. If you take a look at the elements of the engine and the general performance it delivers (number of characters, scene complexity, shadow/lighting quality, effects etc) you'll find that it isn't really that special or fast.
 
I disagree with Forza 3, Turn 10 are not standout from a tech perspective.
The in game the car models are quite low poly and nothing like you see in photomode/showroom or GT5 for that matter (AND only 6 cars on track).

You know that applies to almost all racing games including GT5. Thre is a polygon count thread with screenshots of ingame cars where one can see they are nowhere near 200-400k but instead more around 40-60k. 40-60k which is quite standard polygon count for most racing games this gen for ingame playable visuals.
 
You know that applies to almost all racing games including GT5. Thre is a polygon count thread with screenshots of ingame cars where one can see they are nowhere near 200-400k but instead more around 40-60k. 40-60k which is quite standard polygon count for most racing games this gen for ingame playable visuals.
Where did You got those numbers? I havent found any comparition screens in polygon count topic.
BTW I've seen many comparitions and car that are close to You looks exacly that same as in showroom, distant cars are LOD'ed for sure, but You cant see LOD transition when You're playing, and for sure cars close to You have the highest LOD possible.
In Forza 3 even cars in Photomode are LOD'ed. Only player car is not LOD'ed and only 4/8 cars has selfshadowing in Photomode [in different quality].
 
the Maximum vertex count: 6.0 billion vertices per second while
the Maximum polygon count: 500 million triangles per second. Shouldn't they be close together?

Yes, they use 3 vertices = 1 triangle to arrive at the peak numbers. These theoretical numbers are calculated by looking only at the raw computation power inside Xenos:

6.0 billion vertices per second = (48 shader pipelines × 4 shader ops per cycle × 500 MHz) / 16)
500 million triangles per second = ( (48 shader pipelines x 500 MHz) / 16) / 3 vertices per triangle

In practice, software/algorithms are more complex, and other bottlenecks usually prevent the GPU from reaching its peak.


The SPU culling number in the Edge presentation is based on real world performance. Although the theoretical peak of RSX is in the hundreds of millions ops, at the end of the day, the SPU could do culling while RSX is busy/weak, and also run MLAA. Architecture-wise, this is all thanks to the flexible SPU cores, plus fast shared access to memory between the RSX and Cell. OTOH, the GPU should be better at other graphics tasks that are embarrassingly parallel, and SIMD-like.

That's why the parallel GPU + CPU use complements each other. :cool:

In reality, the programmers made everything work (by taking advantage of the architecture).
 

For SPU triangle culling, it depends on the scene and application.

For example, from a 2007 slides on Edge:
http://forum.beyond3d.com/showpost.php?p=956489&postcount=196


Not sure if they have improved it further since 2007.

Yes, they use 3 vertices = 1 triangle to arrive at the peak numbers. These theoretical numbers are calculated by looking only at the raw computation power inside Xenos:

6.0 billion vertices per second = (48 shader pipelines × 4 shader ops per cycle × 500 MHz) / 16)
500 million triangles per second = ( (48 shader pipelines x 500 MHz) / 16) / 3 vertices per triangle

In practice, software/algorithms are more complex, and other bottlenecks usually prevent the GPU from reaching its peak.


The SPU culling number in the Edge presentation is based on real world performance. Although the theoretical peak of RSX is in the hundreds of millions ops, but at the end of the day the SPU could do culling while RSX is busy/weak, and also run MLAA. Architecture-wise, this is all thanks to the flexible SPU cores, plus fast shared access to memory between the RSX and Cell. OTOH, the GPU should be better at other graphics tasks that are embarrassingly parallel, and SIMD-like.

That's why the parallel GPU + CPU use complements each other. :cool:

In reality, the programmers made everything work (by taking advantage of the architecture).
It appears the answer to the thread title is "not much".
 
It appears the answer to the thread title is "not much".

Don't think so, otherwise people wouldn't bother to get the SPUs involved at all. I'd say a 5 to 10% gain in vertex processing would already be worth some SPU time, as it'd help with those pesky tiny bottlenecks that can cause framerate fluctuation; but if they dedicate an entire array of them, even for a short time, then it presents a significant gain, indicating the lack of RSX's ability to deal with the issue on its own.
 
We have some very detailed stats on the various spu jobs for Killzone, remember? I'll look it up, I should have an image of that myself somewhere.

I think one advantage for SPU use here, is that for 1280x720 resolutions, you should ideally be able to bring back the actual stuff that needs to be drawn by the GPU to a maximum of 0.9 million, as definitely on consoles I think you're doing a bad job if you want to have more visible triangles than pixels for each frame. I think Uncharted 2 managed to bring it down/up to 750.000 per frame.
 
Status
Not open for further replies.
Back
Top