SPE vs. GPU Vertex shader

nAo · May 2, 2005

Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too

Simon F · May 2, 2005

nAo said:
Simon F said:

I'm not quite sure I understand what you are saying.

Click to expand...

If a processor(s) switch thread each clock cycle it could prefetch data from live registers temporary store into real registers. You woldn't need more real registers than the maximun number of registers you can address within a single instruction.
Obviously I'm not saying that current hw works that way, but it seems it could, at least from a theoretical point of view.

If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.

nAo · May 2, 2005

Simon F said:
If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.

Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,
I would call it a registers repository

It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.
I'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me

Npl · May 2, 2005

nAo said:
Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too

Real polygons >>>>> faking Detail.

But aint the Z-Buffer already at its limit? I mean if you for example model Tires including the profile (hope thats valid english) wont there be alot z-conflicting-"flashy" polygons because of the lack of granularity?

ERP · May 2, 2005

Npl said:
nAo said:

Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too

Click to expand...

Real polygons >>>>> faking Detail.

But aint the Z-Buffer already at its limit? I mean if you for example model Tires including the profile (hope thats valid english) wont there be alot z-conflicting-"flashy" polygons because of the lack of granularity?

If you started doing that the Zbuffer would be only one of your problems.

Polygon aliasing would pretty much make it look like pooh, and your gonna need more than 4xAA to fix that.

nAo · May 2, 2005

ERP said:
Polygon aliasing would pretty much make it look like pooh, and your gonna need more than 4xAA to fix that.

DM without dynamic LOD is a suicide

Guden Oden · May 2, 2005

_phil_ said:
yes ,that's why i said that.

Which is something that sort of surprises me you say that, because like I said, under most circumstances it is a very effective technique. Especially considering the pretty minimal performance hit.

Also it doesn't work for edge/border like any 'faking geometry' method.

Define "doesn't work", of course it works, though not as well for a completely flat surface. Still, stuff a player actually studies closely typically tends to be at more or less right angles to the camera so I don't think it's much of a problem.

ShootMyMonkey · May 2, 2005

Basically put, I don't see any reason to use the SPEs for vertex processing unless one of four things is true --

1. The process you're trying to do is offline and it doesn't really matter if you have the speed of the GPU, but at least using the fast CPU to solve it helps shave off seconds or even minutes.

2. You're trying to do something with your verts that is either not possible in hardware or possible but incredibly stupid to do on the GPU. (e.g. subdivision).

3. Your vertex shaders are long and complex enough that the vertex processing becomes a bottleneck, and you know there are lots of free CPU cycles to fill up. This is why the occasional PC game nowadays uses software skinning. Next-gen chips will be more oriented towards speeding up shader execution over fillrate, though, so it may take a lot to become vs-limited.

4. You're writing a software engine anyway for some reason. e.g. because damn GPUs only render tris.

Npl · May 2, 2005

ShootMyMonkey said:
Basically put, I don't see any reason to use the SPEs for vertex processing unless one of five things is true --

5) You need the Vertexes anyway for collision-detection or physics and the only alternative to just do them on the CPU would be transfering CPU(creating/sending vertexes)->GPU(vertex-processing)->CPU(physics)->GPU(possible further vertex-processing) - Which I doubt would be faster.

_phil_ · May 2, 2005

Define "doesn't work",

It's 'Realism' breaks past a pretty low angular freedom,and the streched UVs wil remind you that you are in aa game .Its a not big step past bump(that sucks in most of the cases).
Too streched technicalities to fake things IMO ,most developpers don't even know how to make a good texture junction/transition between a wall and a ground ,whatever technical paradigm they possess (RE4 is a master lesson in this regard ).

Simon F · May 2, 2005

nAo said:
Simon F said:

If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.

Click to expand...

Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,
I would call it a registers repository

A lot of modern CPUs (generally those with only a few registers in the ISA) use something called "register renaming" whereby there is a larger set of storage elements and the "registers" of the instruction are assigned on-the-fly to those storage elements. In fact, the storage for "Register X" in one instruction may be completely different to that for "Register X" a couple of instructions later. Those are still "registers" but they don't fit the description you've given.

It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.

The archictecture's storage can't afford to be slow. VS instructions, for example, require between 1 and 3 sources and always one destination. That implies that they have to be fast, (in terms of R/W bandwidth) in which case they are, to all intents and purposes, the registers of the ISA.

I'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me

Well, I'm not either, but I have picked up a few things along the way

Panajev2001a · May 3, 2005

Simon F said:
nAo said:

Simon F said:

If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.

Click to expand...

Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,
I would call it a registers repository

Click to expand...

A lot of modern CPUs (generally those with only a few registers in the ISA) use something called "register renaming" whereby there is a larger set of storage elements and the "registers" of the instruction are assigned on-the-fly to those storage elements. In fact, the storage for "Register X" in one instruction may be completely different to that for "Register X" a couple of instructions later. Those are still "registers" but they don't fit the description you've given.

It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.

Click to expand...

The archictecture's storage can't afford to be slow. VS instructions, for example, require between 1 and 3 sources and always one destination. That implies that they have to be fast, (in terms of R/W bandwidth) in which case they are, to all intents and purposes, the registers of the ISA.

I'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me

Click to expand...

Well, I'm not either, but I have picked up a few things along the way

Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).

Fafalada · May 3, 2005

nAo said:
DM without dynamic LOD is a suicide

Obviously - but that alone won't solve aliasing issues - if you point sample your DMs it'll still alias badly. Question is what kind of filtering can you can use/afford on your displacement map lookups (obviously depends on whether it's a GPU or CPU based implementation as well).

ShootMyMonkey said:
Basically put, I don't see any reason to use the SPEs for vertex processing unless one of four things is true --

Actually I see very good reasons to use a couple of SPEs as an integral part of graphic data processing (not full vertex processing, but also not JUST vertex processing either).
That would include variations of 2 and 4 on your list, although not "literally" (ie. I'm not actually thinking about software rendering but the second part of 4 could apply nonetheless).

Simon F · May 3, 2005

Panajev2001a said:
Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).

I don't mean to be rude, but I think you're missing the point. I suspect there are lots of storage locations inside their PS - let's say a total of N 32bit (==2xN 16bit) locations. I suspect that performance when texturing, however, improves when you can have more pixels in flight, in order to hide texture fetch latency.

If a shader program uses a peak of M storage locations then you would have (at best) an upper bound of floor(N/M) pixels in flight, thus "smaller storage" programs should do better than "larger" ones.

ShootMyMonkey · May 3, 2005

5) You need the Vertexes anyway for collision-detection or physics and the only alternative to just do them on the CPU would be transfering CPU(creating/sending vertexes)->GPU(vertex-processing)->CPU(physics)->GPU(possible further vertex-processing) - Which I doubt would be faster.

Why would you do collision against the render geometry? Typically the collision geometry is separate from the render geometry anyway. Any collsion object made up of tris would be a LOT lower-poly than the render geometry. If they're not, you need to fire your level designers... or at least have them put to death. Typically, vertices used for physics simulations are also never skinned (no real point).

That would include variations of 2 and 4 on your list, although not "literally" (ie. I'm not actually thinking about software rendering but the second part of 4 could apply nonetheless).

The second part of number 4 is one that always bugs me personally. I have so many uses for rendering true quads or rendering convex n-gons explicitly. Problem is that when you triangulate them, the texture slopes and color slopes all become common for a whole triangle, and each triangle gets a separate one which causes all sorts of crazy stretching and warping because of locally varied slopes. To properly render true n-gons, you need to explicitly compute interpolation for each scanline. About the only way to do this with hardware is to just scan-convert all the polys yourself and just have the GPU rasterize scanlines. For perspective correction reasons, though, you may have to play with the z-coords of each line you render.

London Geezer · May 3, 2005

Well, per-poly collision detection would be shweeeet. Not gonna happen but still...

Npl · May 3, 2005

ShootMyMonkey said:
Why would you do collision against the render geometry? Typically the collision geometry is separate from the render geometry anyway. Any collsion object made up of tris would be a LOT lower-poly than the render geometry. If they're not, you need to fire your level designers... or at least have them put to death. Typically, vertices used for physics simulations are also never skinned (no real point).

Its inevitable when you actually place decals on geometry( you dont want bulletholes on thin air), its more accurate when youre calculating the impact of physics... Not that there isnt a need for for using Bounding-Volumes, but those are actually just the "first pass".
And just because its "typically" now, after PCs evolved into 2 seperate entities (CPU=Gameplay, GPU=Graphics, both have seperate assets ) aint meaning this is the better approach, nor that Next-Gen Consoles wont be capable of different and better approaches.

Panajev2001a · May 3, 2005

Simon F said:
Panajev2001a said:

Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).

Click to expand...

I don't mean to be rude, but I think you're missing the point. I suspect there are lots of storage locations inside their PS - let's say N 32bit (==2xN 16bit) locations. I suspect that performance when texturing, however, improves when you can have more pixels in flight, in order to hide texture fetch latency.

If a shader program uses a peak of M storage locations then you would have (at best) an upper bound of floor(N/M) pixels in flight, thus "smaller storage" programs should do better than "larger" ones.

I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it

.

MrSingh · May 4, 2005

Panajev2001a said:
I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it .

*kicks Pana and runs away*

Panajev2001a · May 4, 2005

MrSingh said:
Panajev2001a said:

I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it .

Click to expand...

*kicks Pana and runs away*

Kicking a man who is already down ?

EVVVVIIIIIIIILLLLLLL

SPE vs. GPU Vertex shader

nAo

Nutella Nutellae

Simon F

Tea maker

nAo

Nutella Nutellae

Npl

ERP

nAo

Nutella Nutellae

Guden Oden

Senior Member

ShootMyMonkey

Npl

_phil_

Simon F

Tea maker

Panajev2001a

Fafalada

Simon F

Tea maker

ShootMyMonkey

London Geezer

Npl

Panajev2001a

MrSingh

Panajev2001a

Similar threads