SPE vs. GPU Vertex shader

Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too ;)
 
nAo said:
Simon F said:
I'm not quite sure I understand what you are saying.
If a processor(s) switch thread each clock cycle it could prefetch data from live registers temporary store into real registers. You woldn't need more real registers than the maximun number of registers you can address within a single instruction.
Obviously I'm not saying that current hw works that way, but it seems it could, at least from a theoretical point of view.
If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
 
Simon F said:
If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,
I would call it a registers repository ;)
It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.
I'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me ;)
 
nAo said:
Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too ;)

Real polygons >>>>> faking Detail.

But aint the Z-Buffer already at its limit? I mean if you for example model Tires including the profile (hope thats valid english) wont there be alot z-conflicting-"flashy" polygons because of the lack of granularity?
 
Npl said:
nAo said:
Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too ;)

Real polygons >>>>> faking Detail.

But aint the Z-Buffer already at its limit? I mean if you for example model Tires including the profile (hope thats valid english) wont there be alot z-conflicting-"flashy" polygons because of the lack of granularity?

If you started doing that the Zbuffer would be only one of your problems.

Polygon aliasing would pretty much make it look like pooh, and your gonna need more than 4xAA to fix that.
 
ERP said:
Polygon aliasing would pretty much make it look like pooh, and your gonna need more than 4xAA to fix that.
DM without dynamic LOD is a suicide
 
_phil_ said:
yes ,that's why i said that.
Which is something that sort of surprises me you say that, because like I said, under most circumstances it is a very effective technique. Especially considering the pretty minimal performance hit.

Also it doesn't work for edge/border like any 'faking geometry' method.
Define "doesn't work", of course it works, though not as well for a completely flat surface. Still, stuff a player actually studies closely typically tends to be at more or less right angles to the camera so I don't think it's much of a problem.
 
Basically put, I don't see any reason to use the SPEs for vertex processing unless one of four things is true --

1. The process you're trying to do is offline and it doesn't really matter if you have the speed of the GPU, but at least using the fast CPU to solve it helps shave off seconds or even minutes.

2. You're trying to do something with your verts that is either not possible in hardware or possible but incredibly stupid to do on the GPU. (e.g. subdivision).

3. Your vertex shaders are long and complex enough that the vertex processing becomes a bottleneck, and you know there are lots of free CPU cycles to fill up. This is why the occasional PC game nowadays uses software skinning. Next-gen chips will be more oriented towards speeding up shader execution over fillrate, though, so it may take a lot to become vs-limited.

4. You're writing a software engine anyway for some reason. e.g. because damn GPUs only render tris.
 
ShootMyMonkey said:
Basically put, I don't see any reason to use the SPEs for vertex processing unless one of five things is true --

5) You need the Vertexes anyway for collision-detection or physics and the only alternative to just do them on the CPU would be transfering CPU(creating/sending vertexes)->GPU(vertex-processing)->CPU(physics)->GPU(possible further vertex-processing) - Which I doubt would be faster.
 
Define "doesn't work",

It's 'Realism' breaks past a pretty low angular freedom,and the streched UVs wil remind you that you are in aa game .Its a not big step past bump(that sucks in most of the cases).
Too streched technicalities to fake things IMO ,most developpers don't even know how to make a good texture junction/transition between a wall and a ground ,whatever technical paradigm they possess (RE4 is a master lesson in this regard ).
 
nAo said:
Simon F said:
If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,
I would call it a registers repository ;)
A lot of modern CPUs (generally those with only a few registers in the ISA) use something called "register renaming" whereby there is a larger set of storage elements and the "registers" of the instruction are assigned on-the-fly to those storage elements. In fact, the storage for "Register X" in one instruction may be completely different to that for "Register X" a couple of instructions later. Those are still "registers" but they don't fit the description you've given.
It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.
The archictecture's storage can't afford to be slow. VS instructions, for example, require between 1 and 3 sources and always one destination. That implies that they have to be fast, (in terms of R/W bandwidth) in which case they are, to all intents and purposes, the registers of the ISA.
I'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me ;)
Well, I'm not either, but I have picked up a few things along the way :)
 
Simon F said:
nAo said:
Simon F said:
If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,
I would call it a registers repository ;)
A lot of modern CPUs (generally those with only a few registers in the ISA) use something called "register renaming" whereby there is a larger set of storage elements and the "registers" of the instruction are assigned on-the-fly to those storage elements. In fact, the storage for "Register X" in one instruction may be completely different to that for "Register X" a couple of instructions later. Those are still "registers" but they don't fit the description you've given.
It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.
The archictecture's storage can't afford to be slow. VS instructions, for example, require between 1 and 3 sources and always one destination. That implies that they have to be fast, (in terms of R/W bandwidth) in which case they are, to all intents and purposes, the registers of the ISA.
I'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me ;)
Well, I'm not either, but I have picked up a few things along the way :)

Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).
 
nAo said:
DM without dynamic LOD is a suicide
Obviously - but that alone won't solve aliasing issues - if you point sample your DMs it'll still alias badly. Question is what kind of filtering can you can use/afford on your displacement map lookups (obviously depends on whether it's a GPU or CPU based implementation as well).

ShootMyMonkey said:
Basically put, I don't see any reason to use the SPEs for vertex processing unless one of four things is true --
Actually I see very good reasons to use a couple of SPEs as an integral part of graphic data processing (not full vertex processing, but also not JUST vertex processing either).
That would include variations of 2 and 4 on your list, although not "literally" (ie. I'm not actually thinking about software rendering but the second part of 4 could apply nonetheless).
 
Panajev2001a said:
Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).
I don't mean to be rude, but I think you're missing the point. I suspect there are lots of storage locations inside their PS - let's say a total of N 32bit (==2xN 16bit) locations. I suspect that performance when texturing, however, improves when you can have more pixels in flight, in order to hide texture fetch latency.

If a shader program uses a peak of M storage locations then you would have (at best) an upper bound of floor(N/M) pixels in flight, thus "smaller storage" programs should do better than "larger" ones.
 
5) You need the Vertexes anyway for collision-detection or physics and the only alternative to just do them on the CPU would be transfering CPU(creating/sending vertexes)->GPU(vertex-processing)->CPU(physics)->GPU(possible further vertex-processing) - Which I doubt would be faster.
Why would you do collision against the render geometry? Typically the collision geometry is separate from the render geometry anyway. Any collsion object made up of tris would be a LOT lower-poly than the render geometry. If they're not, you need to fire your level designers... or at least have them put to death. Typically, vertices used for physics simulations are also never skinned (no real point).

That would include variations of 2 and 4 on your list, although not "literally" (ie. I'm not actually thinking about software rendering but the second part of 4 could apply nonetheless).
The second part of number 4 is one that always bugs me personally. I have so many uses for rendering true quads or rendering convex n-gons explicitly. Problem is that when you triangulate them, the texture slopes and color slopes all become common for a whole triangle, and each triangle gets a separate one which causes all sorts of crazy stretching and warping because of locally varied slopes. To properly render true n-gons, you need to explicitly compute interpolation for each scanline. About the only way to do this with hardware is to just scan-convert all the polys yourself and just have the GPU rasterize scanlines. For perspective correction reasons, though, you may have to play with the z-coords of each line you render.
 
ShootMyMonkey said:
Why would you do collision against the render geometry? Typically the collision geometry is separate from the render geometry anyway. Any collsion object made up of tris would be a LOT lower-poly than the render geometry. If they're not, you need to fire your level designers... or at least have them put to death. Typically, vertices used for physics simulations are also never skinned (no real point).

Its inevitable when you actually place decals on geometry( you dont want bulletholes on thin air), its more accurate when youre calculating the impact of physics... Not that there isnt a need for for using Bounding-Volumes, but those are actually just the "first pass".
And just because its "typically" now, after PCs evolved into 2 seperate entities (CPU=Gameplay, GPU=Graphics, both have seperate assets ) aint meaning this is the better approach, nor that Next-Gen Consoles wont be capable of different and better approaches.
 
Simon F said:
Panajev2001a said:
Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).
I don't mean to be rude, but I think you're missing the point. I suspect there are lots of storage locations inside their PS - let's say N 32bit (==2xN 16bit) locations. I suspect that performance when texturing, however, improves when you can have more pixels in flight, in order to hide texture fetch latency.

If a shader program uses a peak of M storage locations then you would have (at best) an upper bound of floor(N/M) pixels in flight, thus "smaller storage" programs should do better than "larger" ones.

I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it :(.
 
Panajev2001a said:
I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it :(.

*kicks Pana and runs away*
 
Back
Top