If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.nAo said:If a processor(s) switch thread each clock cycle it could prefetch data from live registers temporary store into real registers. You woldn't need more real registers than the maximun number of registers you can address within a single instruction.Simon F said:I'm not quite sure I understand what you are saying.
Obviously I'm not saying that current hw works that way, but it seems it could, at least from a theoretical point of view.
Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,Simon F said:If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
nAo said:Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too
Npl said:nAo said:Most advanced virtual displacement techniques are so costly (high number of pixel shader instructions) I'd prefer to spend the same processing power in something like real displacement mapping.
Edges would be antialiased too
Real polygons >>>>> faking Detail.
But aint the Z-Buffer already at its limit? I mean if you for example model Tires including the profile (hope thats valid english) wont there be alot z-conflicting-"flashy" polygons because of the lack of granularity?
DM without dynamic LOD is a suicideERP said:Polygon aliasing would pretty much make it look like pooh, and your gonna need more than 4xAA to fix that.
Which is something that sort of surprises me you say that, because like I said, under most circumstances it is a very effective technique. Especially considering the pretty minimal performance hit._phil_ said:yes ,that's why i said that.
Define "doesn't work", of course it works, though not as well for a completely flat surface. Still, stuff a player actually studies closely typically tends to be at more or less right angles to the camera so I don't think it's much of a problem.Also it doesn't work for edge/border like any 'faking geometry' method.
ShootMyMonkey said:Basically put, I don't see any reason to use the SPEs for vertex processing unless one of five things is true --
Define "doesn't work",
A lot of modern CPUs (generally those with only a few registers in the ISA) use something called "register renaming" whereby there is a larger set of storage elements and the "registers" of the instruction are assigned on-the-fly to those storage elements. In fact, the storage for "Register X" in one instruction may be completely different to that for "Register X" a couple of instructions later. Those are still "registers" but they don't fit the description you've given.nAo said:Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,Simon F said:If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
I would call it a registers repository
The archictecture's storage can't afford to be slow. VS instructions, for example, require between 1 and 3 sources and always one destination. That implies that they have to be fast, (in terms of R/W bandwidth) in which case they are, to all intents and purposes, the registers of the ISA.It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.
Well, I'm not either, but I have picked up a few things along the wayI'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me
Simon F said:A lot of modern CPUs (generally those with only a few registers in the ISA) use something called "register renaming" whereby there is a larger set of storage elements and the "registers" of the instruction are assigned on-the-fly to those storage elements. In fact, the storage for "Register X" in one instruction may be completely different to that for "Register X" a couple of instructions later. Those are still "registers" but they don't fit the description you've given.nAo said:Obviously it behaves as a register but I wouldn't call it a register cause it doesn't hold a sigle data entity but it would serve a lot of different registers or even processors,Simon F said:If this "other store" is able to support 3 reads and 1 write per instruction, then you might as well call it the "real registers" because it'd be behaving as such.
I would call it a registers repository
The archictecture's storage can't afford to be slow. VS instructions, for example, require between 1 and 3 sources and always one destination. That implies that they have to be fast, (in terms of R/W bandwidth) in which case they are, to all intents and purposes, the registers of the ISA.It could be 'big', slow and distant from the 'real' registers but since the hw can know in advance which data will be needed everything could be prefetched several clock cycles before data are needed for real.
Well, I'm not either, but I have picked up a few things along the wayI'm not a hardware designer, you know Simon, I'm just speculating..feel free to correct me
Obviously - but that alone won't solve aliasing issues - if you point sample your DMs it'll still alias badly. Question is what kind of filtering can you can use/afford on your displacement map lookups (obviously depends on whether it's a GPU or CPU based implementation as well).nAo said:DM without dynamic LOD is a suicide
Actually I see very good reasons to use a couple of SPEs as an integral part of graphic data processing (not full vertex processing, but also not JUST vertex processing either).ShootMyMonkey said:Basically put, I don't see any reason to use the SPEs for vertex processing unless one of four things is true --
I don't mean to be rude, but I think you're missing the point. I suspect there are lots of storage locations inside their PS - let's say a total of N 32bit (==2xN 16bit) locations. I suspect that performance when texturing, however, improves when you can have more pixels in flight, in order to hide texture fetch latency.Panajev2001a said:Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).
Why would you do collision against the render geometry? Typically the collision geometry is separate from the render geometry anyway. Any collsion object made up of tris would be a LOT lower-poly than the render geometry. If they're not, you need to fire your level designers... or at least have them put to death. Typically, vertices used for physics simulations are also never skinned (no real point).5) You need the Vertexes anyway for collision-detection or physics and the only alternative to just do them on the CPU would be transfering CPU(creating/sending vertexes)->GPU(vertex-processing)->CPU(physics)->GPU(possible further vertex-processing) - Which I doubt would be faster.
The second part of number 4 is one that always bugs me personally. I have so many uses for rendering true quads or rendering convex n-gons explicitly. Problem is that when you triangulate them, the texture slopes and color slopes all become common for a whole triangle, and each triangle gets a separate one which causes all sorts of crazy stretching and warping because of locally varied slopes. To properly render true n-gons, you need to explicitly compute interpolation for each scanline. About the only way to do this with hardware is to just scan-convert all the polys yourself and just have the GPU rasterize scanlines. For perspective correction reasons, though, you may have to play with the z-coords of each line you render.That would include variations of 2 and 4 on your list, although not "literally" (ie. I'm not actually thinking about software rendering but the second part of 4 could apply nonetheless).
ShootMyMonkey said:Why would you do collision against the render geometry? Typically the collision geometry is separate from the render geometry anyway. Any collsion object made up of tris would be a LOT lower-poly than the render geometry. If they're not, you need to fire your level designers... or at least have them put to death. Typically, vertices used for physics simulations are also never skinned (no real point).
Simon F said:I don't mean to be rude, but I think you're missing the point. I suspect there are lots of storage locations inside their PS - let's say N 32bit (==2xN 16bit) locations. I suspect that performance when texturing, however, improves when you can have more pixels in flight, in order to hide texture fetch latency.Panajev2001a said:Simon, normally register renaming has more register than what the ISA specifies: on GPU's like NV30 we saw a performance hit when more than a certain quantity of registers was used. All those registers were exposed to the application (Shader program).
If a shader program uses a peak of M storage locations then you would have (at best) an upper bound of floor(N/M) pixels in flight, thus "smaller storage" programs should do better than "larger" ones.
Panajev2001a said:I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it .
MrSingh said:Panajev2001a said:I did miss the point there, thanks for the correction and sorry for the comment that went out before giving some more thinking to it .
*kicks Pana and runs away*