Why have Vertex Shaders on the CPU instead of the VPU?

OK, I'm a little confused here by all sorts of people claiming the next paradigm of 3D graphics will be "texture-less", and how all these things like displacement mapping and stuff will be the next best thing... OK, 2 questions, have you people done any 3D modelling, and if you have, no offence intended(really) but are you on CRACK? Textures ain't going anywhere, at least for the next 10 years or so, and I would suggest many years past that. Textures aren't mutually exclusive with all these other fun methods like displacement mapping... Au contraire, displacement mapping, as well as anything, looks pretty dull without our good friend the texture map. In fact, high-end modelling and (yes, even raytracing) software is still 100% dependant on the good ol' texture map, and trust me when I say there are NO major movements away from it. We just keep going higher and higher resolution, and adding more extra passes on top of it.

And while I'm at it, all those people who still believe there's a chance next-gen HW will do raytracing, let me just say that you REALLY don't want that. Hey, even CURRENT generation HW can do raytracing; but the bottom line is, it will ALWAYS be several thousand times slower than the traditional rendering pipeline, whether it's HW-accellerated or not. Even with some really nifty sorting algorithms, you still have to solve the ray/plane intersection problem once for every possible hit, which, forgetting the calculation complexity, is a HUGE memory bandwidth issue, requiring possibly thousands of triangle lookups per pixel at worst, and probably at least a couple in a best-case scenario. YOU DON'T WANT RAYTRACING. The traditional pipeline will look MUCH better than any HW raytracing solutions probably for the next 10 years or so. Which is why Pixar only raytraces the parts of their images that require it(Any Pixar guys, feel free to back me up on that)... 90% of what you see on the screen in any Pixar movie is a traditional pipeline(maybe with the odd NURBS surface and lots of really sweet shaders).

Thank you. This has been an enjoyable rant.
 
OK, I'm a little confused here by all sorts of people claiming the next paradigm of 3D graphics will be "texture-less"

I think what they mean is the rasterizer part will be "TMU-less".

With textures sampling done else where.
 
nAo said:
Do you expect APUs to have 100+ stages of a GPU pixel pipe? On th2 PS2 VUs I had an hard time to fill all the spare instructions slots while transforming vertices, even if I transform 4 vertices at the same time..so I don't see the same thing working well with texture fetches too, even with a more deeper pipeline.
You are going about this the opposite way I was thinking - the fetches within an APU program are only within local memory - by this time you already need to have whatever external data present, be it texels or something else. Pipeline only handles the local fetch latency, and the external fetch latency is hidden by the fact thay you process batches of primitive data, not one at a time.
That's the only way I see that could work and actually fit the outline of APU specification. Attaching texel cache and fetching logic to all APUs is well out of spec of the patent :p

Umh..expand on this please I can't see how accessing to an 'external device' woule be a good thing during a texture fetch.
Obviously this is possible..like GPUs shown..
Erh, unless your entire system is composed of eDram you're Always accessing external 'device' during any kind of fetch that isn't already cached no?
Onboard APU, that translates to DMA transfers when data is not in APU local memory. The question remains what is the mechanism that controls how accesses to external memory are handled. We both agree it would be unresonable to manually control all of it.

It does seem that a streaming friendly "texture" format like what you mentioned would help a lot. I am worried that in the worst case they might not provide means for APUs to access texture memory at all though.

Vince said:
I like the layer node concept, although I was under the assumption that the patent hinted (quite strongly) that you'd use a layering technique regardless in the creation of a final output. That instead of actually forming hardwired nodes, you'd creature virtual ones which funnled down into the 4 Pixel Engines and then it would be analogous to what you stated.
Yeah the patent has multiple CRTCs there that kinda imply this anyhow. I mainly mentioned nodes in reference to making more sense out of a possible multichip solution.
 
OK, I'm a little confused here by all sorts of people claiming the next paradigm of 3D graphics will be "texture-less"
That's not what we were claiming at all.
Not that I'm too surprised some people only got that out of the whole debate.

V3 already summarized the point I was making - move texture logic away from pixel rendering.
Second thing is what nAo brought up - to change the format of texture data into a different representation. Technically this does mean you would no longer use textures as such, but the graphic information represented by it stays more ore less the same.
 
Also, if they're taking real time processing over networks seriously, they need many pipeline stages, to hide latency.

ha, real-time game / game-graphics processing over current broadband is about as likely as raytracing on Nintendo 64 :LOL:
 
Fafalada said:
You are going about this the opposite way I was thinking - the fetches within an APU program are only within local memory - by this time you already need to have whatever external data present, be it texels or something else. Pipeline only handles the local fetch latency, and the external fetch latency is hidden by the fact thay you process batches of primitive data, not one at a time.
Ok..that makes much more sense :)
But I see a lot of problems here trying to determine which parts of a texture should be fetched in advance with stuff like procedural texture mapping coordinates or even dependent reads. It would work fine for 'simple mapping' but imho would be a mess for other stuff like embm (if not bounded) BRDFs/IBL and so on..


That's the only way I see that could work and actually fit the outline of APU specification. Attaching texel cache and fetching logic to all APUs is well out of spec of the patent :p
In fact I was thinking more about the line APUs placing texturing requestes on the pixel engines..those pretty obscure black boxes we read about in the patents. That would me no (general) texture sampling in BE (but it could be still possible and doable like you envisioned) but just on the APUs sitting in the Visualizer.

Erh, unless your entire system is composed of eDram you're Always accessing external 'device' during any kind of fetch that isn't already cached no?
Of course! I just misenderstood what you wrote. I believed you was thinking 'more latency' -> ' the better'.

We both agree it would be unresonable to manually control all of it.
What about founding an hospice for ex-PS1/PS2/PS3 coders that got some form of psychosis in the not so distante future? ;)


It does seem that a streaming friendly "texture" format like what you mentioned would help a lot. I am worried that in the worst case they might not provide means for APUs to access texture memory at all though.
Well..maybe we're speculating too much and we have just to wait final specs or another patent application ;)

ciao,
Marco
 
freq said:
YOU DON'T WANT RAYTRACING. The traditional pipeline will look MUCH better than any HW raytracing solutions probably for the next 10 years or so..
This is not entirely true.
In multimilion polygons scenes you might prefer ray tracing for the first hit versus scan conversion. An old mantra say 'Scan conversion is O(N) with a low constant setup cost but RT is O(ln(N)) with an high costant setup cost'. It's hard to find the right N>N0 where RT cleary win on scan conversion but it's not hard to picture cases where RT is a much better choice.
Sidenote: this Toshiba patent is really nice..;)

ciao,
Marco
 
I've been under the impression that Photon Mapping held much more promise for high quality realtime lighting compared to ray tracing.
 
PC-Engine said:
Brimstone said:
I've been under the impression that Photon Mapping held much more promise for high quality realtime lighting compared to ray tracing.

Sure, but isn't PM based on RT? ;)

I thought PM and RT shared many similar concepts.
 
PM is an extension of RT. It's more realistic (refraction etc.) however it's also more complex and slower than RT.

http://graphics.stanford.edu/papers/photongfx/

There's a small video clip on realtime PM on current GPUs. If you watch the video you'll notice how slow, simple and low res the demo is. It'll take many more GPU generations before we get complex scenes with 60 fps photon mapping. What we'll be able to do with GPUs released in a year like HDR will look really close to RT.
 
Photon mapping is essentially an optimisation technique for Monte Carlo ray tracing. If you compare it with "regular" Monte Carlo ray tracing, it is considerably FASTER, not slower. Of course, photon mapping is slower than regular Whitted ray tracing, but that doesn't do global illumination, only hard shadows and reflections and refraction.

Standard Whitted ray tracing will probably offer very little benefit over the current hw rendering approach for the near future. When you start doing LOTS of secondary rays, i.e. global illumination, ray tracing starts to make sense.
 
V3 said:
OK, I'm a little confused here by all sorts of people claiming the next paradigm of 3D graphics will be "texture-less"

I think what they mean is the rasterizer part will be "TMU-less".

With textures sampling done else where.

OK, that makes a little more sense, but it's still not a good idea. On modern TMUs, the texturing HW can act pretty much as a very fast input to shader HW, which is always desirable (Plus I don't see the standard texture pass going anywhere anytime soon.) So depending on your definitions, this paradigm is either already here or will never be here.
 
nAo said:
freq said:
YOU DON'T WANT RAYTRACING. The traditional pipeline will look MUCH better than any HW raytracing solutions probably for the next 10 years or so..
This is not entirely true.
In multimilion polygons scenes you might prefer ray tracing for the first hit versus scan conversion. An old mantra say 'Scan conversion is O(N) with a low constant setup cost but RT is O(ln(N)) with an high costant setup cost'. It's hard to find the right N>N0 where RT cleary win on scan conversion but it's not hard to picture cases where RT is a much better choice.
Sidenote: this Toshiba patent is really nice..;)

ciao,
Marco

But the setup cost for raytracing is only constant if you're doing it once... I only scanned through the Toshiba patent, but it looks pretty damn much like just subdividing the scene and using that to speed up hits, along with swapping scene info in/out of a localized memory. See, this is real neat and it does speed up raytracin' something fine, until you decide you want to start moving stuff around and then things get REAL costly. Plus, only using first-hit doesn't really give you much of an incentive to use raytracing, does it? I mean, doesn't sticking to that really mean you give up all the advantages of raytracing? (I know you didn't mean it like that, I'm just pointing out that like many rendering techniques, it only looks like an advantage in very limited circumstances.)

There's actually a demo out that uses a method very similar to what you described in software; it's pretty impressive, but it's still damn slow. And the world is very static. And it certainly does come to a crawl as soon as you move into a more geometrically complicated area. I mean, I can see realtime raytracing maybe hitting certain games in the way it's used by CG houses (specific objects are raytraced, everything else scanline'd), but I think realtime raytracing is still far enough away that I wouldn't be designing a console system around it anytime soon.
 
nAo said:
But I see a lot of problems here trying to determine which parts of a texture should be fetched in advance with stuff like procedural texture mapping coordinates or even dependent reads. It would work fine for 'simple mapping' but imho would be a mess for other stuff like embm (if not bounded) BRDFs/IBL and so on...
Very true. The only way I see those kind of situations work reasonably well is to pipeline those processes across multiple APUs (with outputs from one working as inputs to next) - create your own 100stage pipe :?

In fact I was thinking more about the line APUs placing texturing requestes on the pixel engines..those pretty obscure black boxes we read about in the patents. That would me no (general) texture sampling in BE (but it could be still possible and doable like you envisioned) but just on the APUs sitting in the Visualizer.
I know, but isn't this prone to the same latency issue you brought up in the first place? Those pixel engines can't predict things any faster then any other thing can. Or do you have some idea in mind how to work around it?
Anyway problem with this is still that patent is pretty adamant on all APUs having the same ISA too...
If those requests are embeded in DMA you could circumvent that stipulation however, and also allow off chip APUs to get access to the same data, though likely at slower speed.

What about founding an hospice for ex-PS1/PS2/PS3 coders that got some form of psychosis in the not so distante future?
Heheh, maybe Sony should include that as part of developer support, free hospitalization. Actually Archie's idea about bundling hookers with Devkits could work just as well, increase performance and relieve stress!

freq said:
OK, that makes a little more sense, but it's still not a good idea. On modern TMUs, the texturing HW can act pretty much as a very fast input to shader HW, which is always desirable
The rasterizer I suggested when starting this discussion tangent doesn't have any shader hw either, that was kinda the whole point.
 
freq said:
I only scanned through the Toshiba patent, but it looks pretty damn much like just subdividing the scene and using that to speed up hits, along with swapping scene info in/out of a localized memory.
Well, every standard modern 3D engine do that.

See, this is real neat and it does speed up raytracin' something fine, until you decide you want to start moving stuff around and then things get REAL costly.

No, you just update the tree-like structure and move a bunch of pointers and OOBBs.
Each node doesn't have to exactly contain or fit the geometry.
In a standard 3D game 99% of the goemetry is static, so you lay down the tree in preprocessing tools and do the rest in realtime.

Plus, only using first-hit doesn't really give you much of an incentive to use raytracing, does it?
It depends. If next generation games have to render 1-10 Mpolys per frame with heavy shading in full continously LODed scenes..then yeah!, RT might a big incentive ;) (imho)

I mean, I can see realtime raytracing maybe hitting certain games in the way it's used by CG houses (specific objects are raytraced, everything else scanline'd), but I think realtime raytracing is still far enough away that I wouldn't be designing a console system around it anytime soon.
Maybe RT will skip this upcoming generation, or maybe not, who really do know? :)

ciao,
Marco
 
nAo said:
freq said:
I only scanned through the Toshiba patent, but it looks pretty damn much like just subdividing the scene and using that to speed up hits, along with swapping scene info in/out of a localized memory.
Well, every standard modern 3D engine do that.

Well yes, I'm more than aware of this, but the point in this case is that for raytracer to work, every last polygon it intends to draw has to be part of the search tree, which includes all characters, projectiles, and everything else you can think of. Otherwise, they have to be drawn using regular scanline rendering, which is what some here are apparently trying to avoid.
See, this is real neat and it does speed up raytracin' something fine, until you decide you want to start moving stuff around and then things get REAL costly.

No, you just update the tree-like structure and move a bunch of pointers and OOBBs.
Each node doesn't have to exactly contain or fit the geometry.
In a standard 3D game 99% of the goemetry is static, so you lay down the tree in preprocessing tools and do the rest in realtime.

Yes, I'm aware of this, but like I said, every moving object has to become part of the tree, and the overhead starts to increase. This is the problem I'm getting at.
Plus, only using first-hit doesn't really give you much of an incentive to use raytracing, does it?
It depends. If next generation games have to render 1-10 Mpolys per frame with heavy shading in full continously LODed scenes..then yeah!, RT might a big incentive ;) (imho)

even with 10 million polygons per scene, in my own experimentation with raytracing, you're still taking a massive speed hit both on the calculation front, and a huge memory bandwidth hit. It's like the people who were saying voxels were the way to go and would be replacing polygons soon (Don't quote me on this but I believe even a certain person with UNREAL programming skills claimed that at one point.)


I mean, I can see realtime raytracing maybe hitting certain games in the way it's used by CG houses (specific objects are raytraced, everything else scanline'd), but I think realtime raytracing is still far enough away that I wouldn't be designing a console system around it anytime soon.
Maybe RT will skip this upcoming generation, or maybe not, who really do know? :)

ciao,
Marco
 
Fafalada said:
Very true. The only way I see those kind of situations work reasonably well is to pipeline those processes across multiple APUs (with outputs from one working as inputs to next) - create your own 100stage pipe :?
It could be an idea ;) But wouldn't be efficient regarding the bandwith 'wasted' to store and read intermediate data between APUs..
Even if it seems from CELL patents such a technique could be easily implemented..

I know, but isn't this prone to the same latency issue you brought up in the first place? Those pixel engines can't predict things any faster then any other thing can. Or do you have some idea in mind how to work around it?
Pixel engines would (wishfully) fetch texels from edram, so latency could be greatly reduced. In the patent the 4 PUs in the Visualizer have half the APUs than those PUs that sit in the BE. 4 APUs die size devoted to just a single 'standard' (like GS..) scan conversion engine seem an overkill to me..there should be something more in those pixel pipes..
Ok..that can just be some more edram..

Anyway problem with this is still that patent is pretty adamant on all APUs having the same ISA too...
If those requests are embeded in DMA you could circumvent that stipulation however, and also allow off chip APUs to get access to the same data, though likely at slower speed.
yeah..you're right.

ciao,
Marco
 
freq said:
Yes, I'm aware of this, but like I said, every moving object has to become part of the tree, and the overhead starts to increase. This is the problem I'm getting at.
It's a problem only if the numer of moving object is 'big'
And that's not the case in the majority of videogames.
It's like the people who were saying voxels were the way to go and would be replacing polygons soon (Don't quote me on this but I believe even a certain person with UNREAL programming skills claimed that at one point.)
Unfurtunately I'm not that smart (like the UNREAL guy..) and I don't think scan conversion is doomed but let me restate some form of RT, imho, will be a much more viable option in the future in real time rendering than it was in the past

ciao,
Marco
 
nAo said:
Sidenote: this Toshiba patent is really nice..;)
The first claims of that application for a patent were extremely vague as was the abstract/intro. Would you mind summarising what is novel about it?
 
Back
Top