Why have Vertex Shaders on the CPU instead of the VPU?

DeanoC said:
Often different games (and even different parts of the same game) have very different load balances. So haveing a CPU do vertex work may be right in certain situations, which is why an architecture that can reuse processing elements is so handy.

A real situation might be an game using an army. The main portion of the game needs to render 1000's of relative simple people every frame, the VPU vertex unit is good at this kind of 'brute' force work, the pixel shader are simple but the CPU is busy doing all the AI and game code etc. Perfect place for a 'hardware' vertex shader. Now cut to a close-up cutscene, we want awesome detail, skin shaders, hair, cool lighting etc. This has heavy pixel work and vertex work but little CPU power. So use a CPU core or two to do the vertex work and use all the GPU's power for pixel work.


Of course this is even easier if you have things like HLSL where the same code can be switched from CPU to GPU with relatively little work.

Design a architecture to allow load-balances is a good thing, even if 90% of games allows use it in 'standard' configuation (CPU doing light graphics work), there will be some that achieve better results especially when you consider ports where it may not be possible to redo the entire game to soak up CPU power, at least you might get some better graphics by using the a spare core for something (procedural texture, cool lighting (raytracing etc).


So if a programmer wants he can use the CPU to crunch typical code like A.I. utilizing hyper-threading and have it calculate physics along with other general game code. While the VPU splits its resources among vertex and pixel shader work.

Then if another programmer is working on a game little need for A.I. and generic game code, say for example a fighter like Virtua Fighter 5, most of the CPU resources are concentrating on Vertex shaders and the VPU is almost solely working on pixel shaders, soaking up every bit of bandwidth available in order to produce high quality rendered pixels.
 
Fafalada said:
Bohdy said:
The EE does all of the t&l aswell as special vector processing, while the Gekko was designed to do just the latter, Ie the role of the Vertex Shaders.
Actually the vertex shader role is doing the entire T&L. VUs on EE do more then that though (clipping and culling among other things).

Really? I always though that Vertex Shaders were mainly complementary to the Fixed Function T&L pipeline and were only used for some special rendering (like what Deano was saying).

Anyway, my point was just that the PS2 relegates all the work to the CPU, not just certain features that may be more useful on the CPU instead of the VPU (unlike the GC)
 
Really? I always though that Vertex Shaders were mainly complementary to the Fixed Function T&L pipeline and were only used for some special rendering (like what Deano was saying).
When used, the VS performs the entire T&L function, that's all I was getting at. Also worth noting is that IIRC in the newer hw, fixed pipeline is just a set of predefined shader programs.

Anyway, my point was just that the PS2 relegates all the work to the CPU, not just certain features that may be more useful on the CPU instead of the VPU (unlike the GC)
I don't want to nag here - but I can't agree with that. If you classify the whole EE as CPU just because it's one physical IC, then the new PS2 models perform everything including rasterization on the "CPU".
Truth is that typical PS2 pipeline relegates the work in much the same fashion as GC, and consequently also XBox for that matter. The chief difference here is that low latency memory makes it "easy" for GC cpu to mess around with graphics data, whereas on R5900 you have to do some work to get really efficient stream processing.
 
Fafalada said:
Ah, but let's hypothesize a bit more from there...

Say the Visualizer is just relatively simple, but extaordinarily fast triangle renderer built for the sole purpose of going micropolygons, and BE provides the entire pool of unified shader resources.

Didn't we talk about this before? I think we already did, I remember proposing that the Pixel Engines could be as simple as the GS, relatively speaking; and I know for a fact we've talked about micropolygon rendering. Truth-be-told, this is exactly what I was expecting - something along the lines of what's now a modified and parallel PSP backend. Which is why I kept arguing about the whole "massive front-end computation with a simple back-end rasterizer that does the iternative stuff like filtering" thing.

EDIT: Kinda like this post? Directly following, you then stated it much better. Actually, V3 beat us to it ;) Atleast in the oldest thread the board still has.

Faf said:
Frankly I'd find that more elegant then juggling APUs that may have differring feature sets across multiple chips, and also preferable to the usual GPU-CPU split across differing architectures.

I allways assumed they'd have equivalent functionality and that this type of architecture would intrinsically be able to overcome he GPU-CPU split.
 
Bohdy I had told you that the vertex shaders did all the T&L 6 months ago (or more), it is done to get rid of any fixed function features off the GPU, lets hope you remember it this time. ;)
 
Vince said:
Didn't we talk about this before? I think we already did, I remember proposing that the Pixel Engines could be as simple as the GS, relatively speaking; and I know for a fact we've talked about micropolygon rendering.
Granted, but the assumption back then was the Patent-like Visualizer with a large pool of APU shader resource of its own. And there's nothing wrong with that, except that it stands a bigger chance to enforce the very split between geometry/fragment/cpu processing you are advocating against.
APU feature set debate closely relates to that - questions like how you will efficiently fetch texture data to them. Actually that's one of the issues that would need to be answered anyhow if you wanted to do the micropoly stuff.

V3 said:
Even if Reyes pipeline is better compare to typical SGI pipeline, will next generation, has enough grunt to take advantages offer by Reyes pipeline ? Enough grunt to offer something better than can be achieved with the typical pipeline ?
I'd expect some gives&takes - but assuming they could adequately solve some things, like the one I mentioned above, I think it's doable.
It's probably better if it didn't happen though - the debates between pixel vs poly people are bad enough around here as it is.
 
Fafalada said:
Granted, but the assumption back then was the Patent-like Visualizer with a large pool of APU shader resource of its own. And there's nothing wrong with that, except that it stands a bigger chance to enforce the very split between geometry/fragment/cpu processing you are advocating against.

Very true, I never thought about it like that, it's a really good comment.

I would posit that since our rendering scheme is best suited by a very fast, very parallel, almost minimalst rasterizer, if you didn't put the APUs on the Visualizer you would be incurring a net loss of potential computational resources. Potentiality is a very powerful entity or concept IMHO and I would ask what you'd fill with that area if it wasn't used for computational constructs?

Yet more eDRAM? The GS logic count is <10M tranistors, they'll be cramming a Billion [logic] transistors in an IC within 2-5 years. I think there can be a strong case made for flexibility and nobody is arbitrarily telling one what to use the APUs for or towards - in fact, I do believe the ambiguity and flexibility we've discussed is it's greatest asset. But, this has yet to be seen I suppose.
 
Fafalada said:
Actually that's one of the issues that would need to be answered anyhow if you wanted to do the micropoly stuff.
That is THE question I'm asking myself since CELL patents were discovered.Even if APUs will employ some kind texture fetch instruction in the ISA (or will we just code our texture fetches with the standard read-from-memory instructions??!) how will the APUs hide latency in this regard? A lot of questions arise here..
Even if it's possible to make a complete shift in how meshes+textures are stored/represented that would almost avoid random accesses to textures.

ciao,
Marco
 
nAo said:
how will the APUs hide latency in this regard? A lot of questions arise here.
Pipelining does the job for every other graphic part in existence :p It would likely have to do here too.
Anyway, APUs don't have external memory mapping, they use the TLB in the DMA controller for that. TXLD instruction would have to be part of DMA chains too - which kinda plays in the hand of pipelined operation.

My issue is more with how will the mechanism for triggering DMA transfers work in the first place (it's equally important question for other types of memory accesses too) - writting most of the mechanism by hand would likely be too daunting for most people.
Of course there's still a bunch of other things to address like how and where you do caching of filtered texels and blabla, but I don't think any of that has much of a place aboard APUs themselves at this moment.

Vince said:
I would posit that since our rendering scheme is best suited by a very fast, very parallel, almost minimalst rasterizer, if you didn't put the APUs on the Visualizer you would be incurring a net loss of potential computational resources. Potentiality is a very powerful entity or concept IMHO and I would ask what you'd fill with that area if it wasn't used for computational constructs?
You can always add more PEs across multiple chips if you have too much money to spend :LOL:
The reason why I think rasterizer logic would be best left alone on a separate IC here is because it's only mission is to draw tons of colored polys with ZBuffering. Let's the chip clock as high as possible, and leave it small.
Yeah, by that I am also implying that there's no texture fetching logic etc. on the rasterizer itself, because it doesn't actually need any (which should explain better why I think it would be bad news for these forums if that kind of architecture materialized :p).
 
Yeah, by that I am also implying that there's no texture fetching logic etc. on the rasterizer itself, because it doesn't actually need any (which should explain better why I think it would be bad news for these forums if that kind of architecture materialized ).

Yeah, you know you'd have such a ball in that case... :p Likewise you could also say the same about the streaming ray-tracer model (and solves a lot of shadow problems as well)...
 
Fafalada said:
Yeah, by that I am also implying that there's no texture fetching logic etc. on the rasterizer itself, because it doesn't actually need any (which should explain better why I think it would be bad news for these forums if that kind of architecture materialized :p).

Well, you definitly take my thinking just a bit father - I didn't think you wanted such a set-piece architecture :)

Also, multichip isn't exactly a "solution" I'd consider ideal. heh.
 
Archie said:
Yeah, you know you'd have such a ball in that case... Likewise you could also say the same about the streaming ray-tracer model (and solves a lot of shadow problems as well)...
I need to refresh my memory on that one a bit, do you think it could fit within the same model too? :p I mean the whole point is that how you interact with APUs is not constrained, so we're not limited to Reyes alone.
Anyway, just admit it, you like the idea too! 8)
Insisting that rasterizer should texture is just legacy support for old pipeline that we no longer need after the switch. :LOL:

Vince said:
Well, you definitly take my thinking just a bit father - I didn't think you wanted such a set-piece architecture
It's much less set-piece then "traditional" rasterizer architecture style we use today. You have completely arbitrary allocation of resources, the only limit you have left is the predefined primitive type in rasterizer.
And if you take that away you're doing software rendering all the way... Clearly that's pushing it way too far, plus it doesn't have the near-term benefits akin to what something like Reyes might offer.

Also, multichip isn't exactly a "solution" I'd consider ideal. heh.
Hey, it works for Microsoft...
Besides, we don't have to stay with single rasterizer either - they are small and cheap, pair them up with PEs, make each pair into a layer node. (ok now I know I'm pushing things :p).
 
Faf said:
Yeah, by that I am also implying that there's no texture fetching logic etc. on the rasterizer itself, because it doesn't actually need any (which should explain better why I think it would be bad news for these forums if that kind of architecture materialized).

i'm unable to follow you here, why would it be a bad news?
 
Faf said:
It's much less set-piece then "traditional" rasterizer architecture style we use today. You have completely arbitrary allocation of resources, the only limit you have left is the predefined primitive type in rasterizer.

Perhaps, it's an interesting trade off as opposed to what I suggested. I kinda like it - but I can just imagine the public outcry by *some* about the developers' tender condition. I like the layer node concept, although I was under the assumption that the patent hinted (quite strongly) that you'd use a layering technique regardless in the creation of a final output. That instead of actually forming hardwired nodes, you'd creature virtual ones which funnled down into the 4 Pixel Engines and then it would be analogous to what you stated.

Fafalada said:
Insisting that rasterizer should texture is just legacy support for old pipeline that we no longer need after the switch. :LOL:

I'm just glad I didn't say it.. I'd have everyone and their mother trying to ban me for insulting the ATI employees. ;)
 
darkblu said:
Faf said:
Yeah, by that I am also implying that there's no texture fetching logic etc. on the rasterizer itself, because it doesn't actually need any (which should explain better why I think it would be bad news for these forums if that kind of architecture materialized).

i'm unable to follow you here, why would it be a bad news?


Errr.. *imagines the TEXTURZ R EVRYFIN people commenting on a system that doesn't do Textures (since it doens't need them)*...... *runs*
 
london-boy said:
darkblu said:
Faf said:
Yeah, by that I am also implying that there's no texture fetching logic etc. on the rasterizer itself, because it doesn't actually need any (which should explain better why I think it would be bad news for these forums if that kind of architecture materialized).

i'm unable to follow you here, why would it be a bad news?


Errr.. *imagines the TEXTURZ R EVRYFIN people commenting on a system that doesn't do Textures (since it doens't need them)*...... *runs*

c'ome on now, a hasty individual could accuse the ps2 camp of getting en garde even before getting attacked here..

i personally think there's nothing blashemous if the next console rasteriser from sony turns out to be a completely texture-less architecture. there's nothing 'ultimate' in textures - they're just another mirror in the smoke'n'mirrors set known as realistic image synthesis, sooner or later they're to be replaces by a newer, shinier mirror. i, for one, wouldn't mind at all if another, less bandwidth-savvy technique supersedes textures (at least in their pred-defined form).
 
darkblu said:
c'ome on now, a hasty individual could accuse the ps2 camp of getting en garde even before getting attacked here..

i personally think there's nothing blashemous if the next console rasteriser from sony turns out to be a completely texture-less architecture. there's nothing 'ultimate' in textures - they're just another mirror in the smoke'n'mirrors set known as realistic image synthesis, sooner or later they're to be replaces by a newer, shinier mirror. i, for one, wouldn't mind at all if another, less bandwidth-savvy technique supersedes textures (at least in their pred-defined form).

Oh trust me, i'm with u and Faf on that one, but we have to see if the technology will be there to provide IQ on par with texture-based graphics. If it is, great for me (i'm the one who's been hailing Displacement Mapping as the second coming around here, true or not, that's my opinion), but if it isn't, then we will have loads of people around here going "PS3 games look washed out!!" as they do now with PS2, ignoring everything else that's going in the engine on apart from textures.
I've been around long enough to know that...
 
Fafalada said:
Pipelining does the job for every other graphic part in existence :p It would likely have to do here too.

Do you expect APUs to have 100+ stages of a GPU pixel pipe?
I don't :) and GPUs multithread pixels too.
I believe in CELL patents MT was never mentioned.
On th2 PS2 VUs I had an hard time to fill all the spare instructions slots
while transforming vertices, even if I transform 4 vertices at the same time..so I don't see the same thing working well with texture fetches too, even with a more deeper pipeline.

Anyway, APUs don't have external memory mapping, they use the TLB in the DMA controller for that. TXLD instruction would have to be part of DMA chains too - which kinda plays in the hand of pipelined operation.
Umh..expand on this please :) I can't see how accessing to an 'external device' woule be a good thing during a texture fetch.
Obviously this is possible..like GPUs shown..

My issue is more with how will the mechanism for triggering DMA transfers work in the first place (it's equally important question for other types of memory accesses too) - writting most of the mechanism by hand would likely be too daunting for most people.
Of course there's still a bunch of other things to address like how and where you do caching of filtered texels and blabla, but I don't think any of that has much of a place aboard APUs themselves at this moment.
It would be a nightmare.
I'm very curious to know how Sowy will address this problem.

ciao,
Marco
 
darkblu said:
i, for one, wouldn't mind at all if another, less bandwidth-savvy technique supersedes textures (at least in their pred-defined form).
There are many ways to represent textured meshes, and the current 'canonical' way is not well suited with multimilion polygons models.
Multiresolution representations are the way to go, imho..so textures would be replaced with just some pretty streamlined and compressed bits sequence ;)
Byebye random accesses to memory in order to fetch texels.. :devilish:

ciao,
Marco
 
Do you expect APUs to have 100+ stages of a GPU pixel pipe?

Also, if they're taking real time processing over networks seriously, they need many pipeline stages, to hide latency.
 
Back
Top