HSR vs Tile-Based rendering?

Chalnoth said:
That's still a driver-level optimization technique. It has nothing to do with the scene buffer, and has completely different performance characteristics due to the fact that it's stored in system memory, and has no overflow problems (overflow in this case will typically just mean that something's processing too quickly, so a stall there won't affect much of anything).
Its not an optimization, its how they all work. I suggest you look at some Linux drivers and the homebrew Dreamcast docs to see how a TBR and IMR (wrong name of course) work exactly the same from the APP point of view. But how actual triangles render is irrelevant for what you think your argueing about

The Tile Acceleration Buffer (IIRC think thats the name its been a while) is what your talking about with regard deferrement. This is the captured scene in the classic circa 1998 terms. Its the buffer that gets fiddled with by the chip (exactly how is NDA'ed) so that triangle A goes to tile 1, triangle B goes to tile 1 and 2 and triangle C goes to tile 2.

Why do you think a TBR couldn't not use system RAM (for the TAB) like an IMR?
Some TBR do indeed use VRAM and of course that places a limit to how many polygons per tile, but thats not a hardware limitation just an optimisation, there is no reason why they couldn't do it all over system RAM. Especially if you don't have the AGP write back problem...

As I pointed out I know of least one 'tiled' renderer (I'm very careful not to call it a TBR...) that doesn't have use a TAB.

You seem to be thinking that TBR are restricted to doing things exactly the same as the PowerVR series 1, they have evolved alot since them.
There alot more popular than just the usual PowerVR, Intel suspects would suggest, expect to see hybrids taking over the world soon...

Chalnoth said:
Anyway, I'd be highly surprised if this buffer really was larger than a few dozen draw calls in size.

Your just showing your inexperiance of how the hardware actually works here.
Think about why a dozen draw calls would reduce parellel processing between CPU and GPU to a minimal level (think pipeline bubbles, for example CPU physics or GPU rendering a character)
 
DeanoC said:
Why do you think a TBR couldn't not use system RAM (for the TAB) like an IMR?
Some TBR do indeed use VRAM and of course that places a limit to how many polygons per tile, but thats not a hardware limitation just an optimisation, there is no reason why they couldn't do it all over system RAM. Especially if you don't have the AGP write back problem...
If you're going to do any hardware-accelerated vertex processing, you have to store the scene buffer in video memory, and you also have to store an entire frame, not part of one, unless you want to use an external z-buffer.

Not doing the above may work if you're talking about something other than the PC. But I'm really only worried about the PC architecture.

Think about why a dozen draw calls would reduce parellel processing between CPU and GPU to a minimal level (think pipeline bubbles, for example CPU physics or GPU rendering a character)
I guess you're right. You'd want the CPU to dump all of its draw calls at once, then get back to working on preparing the next frame. I guess I never knew whether this parallelization was done at the driver or application level.

It's certainly possible, for instance, to have a program store the information for the previous frame before sending the draw calls, then send the draw calls in a separate thread from the thread that's preparing the current frame.
 
Chalnoth said:
If you're going to do any hardware-accelerated vertex processing, you have to store the scene buffer in video memory, and you also have to store an entire frame, not part of one, unless you want to use an external z-buffer.
All modern TBR can output and input z-buffers at begin/end scene. They can if the conditions are right save the bandwidth and not do it but either way its not a problem (its a linear block read/write).

Assuming you have consumed all video RAM so you have to spill to system RAM that just costs a TBR some more bandwidth.

The point in both cases are if the corner cases kick in (all architecture have something they are strong at and something there not), the modern TBR can continue without affecting visual quality.
 
DeanoC said:
All modern TBR can output and input z-buffers at begin/end scene. They can if the conditions are right save the bandwidth and not do it but either way its not a problem (its a linear block read/write).

Assuming you have consumed all video RAM so you have to spill to system RAM that just costs a TBR some more bandwidth.

The point in both cases are if the corner cases kick in (all architecture have something they are strong at and something there not), the modern TBR can continue without affecting visual quality.
I was never worried so much about visual quality. Doing the above would cost massive performance. Think about it. In such a situation, if FSAA is used, not only would you have to output the z-buffer, but you'd also have to output the framebuffer at full resolution. That's a monstrous difference in memory size/bandwidth usage, and I don't see any way to recover from such a difference gracefully.

The only way I can think of to do deferred rendering adequately is expect to not be able to cache the entire scene, and purposely make the scene buffer small. This may limit your maximum performance, but there won't be any pathological scenarios anymore that would slaughter performance.
 
Chalnoth said:
...I don't see any way to recover from such a difference gracefully.

but you havnt invested millions of dollars and a team of engineers in finding a solution, have you?
 
No, but nVidia and ATI have such teams, and have not gone for deferred rendering. nVidia even owns some deferred rendering IP. There's a reason for this.

Besides, that's a copout of an argument if I ever heard one. If you have an idea of how to get around the massive performance drop that would be incurred from a buffer overflow, please, post it. Otherwise...
 
Chalnoth said:
No, but nVidia and ATI have such teams, and have not gone for deferred rendering. nVidia even owns some deferred rendering IP. There's a reason for this.

yeah... they have such teams... but have they ever had their tems work on a deferred renderer? No, because they have already invested so much in improving IMRs that it would be a waste to change to a radically different rendering approach until they have to in order to stay competitive.

As for nVidia's deferred rendering IP... just because they didn't choose to use it doesn't mean that it was the right choice, they have made FX's... err... mistakes, you know.
 
And with the heavy competition that you have now between ATI and nVidia, if TBDR was the "wonder drug" for rendering that some seem to think it is, you can be certain that both companies will have given it serious consideration. If either company really believed that TBDR could be so much better than IMR's, they'd be scrambling over one another to be the first to put out a TBDR.
 
Chalnoth said:
And with the heavy competition that you have now between ATI and nVidia, if TBDR was the "wonder drug" for rendering that some seem to think it is, you can be certain that both companies will have given it serious consideration. If either company really believed that TBDR could be so much better than IMR's, they'd be scrambling over one another to be the first to put out a TBDR.

hey, who ever said it had to be a "wonder drug"? I think that IMRs are going to eventually hit a wall similar to the one (some) CPU's are hitting- there are limits to how many transistors you can squeze in and how much power you can draw. when you hit that wall, you have to shift thinking from "make it bigger" to "make it smarter". There's too much to loose, going with such a different technology is a much larger risk, they won't do it until they have to.
 
See? That's just it. I just don't think TBDR's are necessarily a smarter approach to rendering. I think that the techniques that have been started for IMR's can be continued and result in similar efficiency.

And, of course, all computing is going to hit a huge wall very soon, and to breach that we're going to have to seek out radically different hardware technologies, such as quantum computing or somesuch.
 
Chalnoth said:
And with the heavy competition that you have now between ATI and nVidia, if TBDR was the "wonder drug" for rendering that some seem to think it is, you can be certain that both companies will have given it serious consideration. If either company really believed that TBDR could be so much better than IMR's, they'd be scrambling over one another to be the first to put out a TBDR.

Heavy competition could also mean that both these companies rather takes the "safe route" instead of placing all bets on a TBDR. But who knows, maybe the safe route are also the slower one in the end ?
 
The safe route would also mean not risking it all on faster processes, as nVidia did with the NV30.
 
Chalnoth said:
The safe route would also mean not risking it all on faster processes, as nVidia did with the NV30.

Though that had worked well for them before so it's understandable that they tried. And it's not really "risking it all" when it comes to a faster process. At least not comparable to completely scrapping a design late in the design phase.
 
And then there's the other example of nVidia's NV1 and NV2 using quadric patches instead of triangles as the basic primitive.
 
Chalnoth said:
And then there's the other example of nVidia's NV1 and NV2 using quadric patches instead of triangles as the basic primitive.

They weren't really a big company then though. It's a lot more at stake now.
 
Bjorn said:
They weren't really a big company then though. It's a lot more at stake now.
No, there was a lot more at stake then. As a small company, one wrong move could have dumped the entire company (and it nearly did). These days, a wrong move is bad for the company, but it won't kill it.
 
Chalnoth said:
No, but nVidia and ATI have such teams, and have not gone for deferred rendering. nVidia even owns some deferred rendering IP. There's a reason for this.
You keep swapping between using the term deferred or tiled, so I'm not sure which bit of tech you don't like. Neither are connected in any way except that PowerVR supported both.

Deferring the pixel shader till the z-determination has finished saves pixel shader operations. PowerVR does it completely in hardware, whereas most IMR use z-prepass and hierachical Z to achieve this task.

Tiled is a system where the entire framebuffer doesn't fit into rasterisable memory. It requires the ability to 'cut' the scene into bits, there are many ways of doing this, some requiring storage in VRAM some not.

Tiling is less of a conventional PC concern, the usually rasterise into VRAM so the entire framebuffer can be fitted within in one tile. However on most consoles where fillrate is more of concern a very small but fast framebuffer is kept. If you want to support high resolution and/or multiple render targets your going to have to support tiling in one form or another.

So given that NVIDIA and ATI both have non PC chips are you really sure that no current or near future chip doesn't support tiling?
 
Chalnoth said:
See? That's just it. I just don't think TBDR's are necessarily a smarter approach to rendering. I think that the techniques that have been started for IMR's can be continued and result in similar efficiency.

And, of course, all computing is going to hit a huge wall very soon, and to breach that we're going to have to seek out radically different hardware technologies, such as quantum computing or somesuch.

You can't be sure of the quantum computing phase shift, if it indeed takes place in the not-too-distant future.... the only advantage of quantum computers over conventional hardware (being the features touted by most scientists and specialists these days) is the drastic speed-up that would result in complicated tasks that can be executed and computed more efficiently than on the current-generation hardware. I recall reading an Intel published paper sometime ago, whereby a multitude of architectures were put forth to the drawing table, all in place to revise the current trend, and quantum architectures were definetly not the "favourites" among them (molecular computers might turn out to be a better choice).
 
DeanoC said:
Chalnoth said:
No, but nVidia and ATI have such teams, and have not gone for deferred rendering. nVidia even owns some deferred rendering IP. There's a reason for this.
You keep swapping between using the term deferred or tiled, so I'm not sure which bit of tech you don't like. Neither are connected in any way except that PowerVR supported both.

Deferring the pixel shader till the z-determination has finished saves pixel shader operations. PowerVR does it completely in hardware, whereas most IMR use z-prepass and hierachical Z to achieve this task.

Tiled is a system where the entire framebuffer doesn't fit into rasterisable memory. It requires the ability to 'cut' the scene into bits, there are many ways of doing this, some requiring storage in VRAM some not.

Tiling is less of a conventional PC concern, the usually rasterise into VRAM so the entire framebuffer can be fitted within in one tile. However on most consoles where fillrate is more of concern a very small but fast framebuffer is kept. If you want to support high resolution and/or multiple render targets your going to have to support tiling in one form or another.

So given that NVIDIA and ATI both have non PC chips are you really sure that no current or near future chip doesn't support tiling?

I've done this throughout the thread as well, I don't want to speak for anyone else, but there's been enough references to TBDR to assume that's whats being discussed, and the worse case scenarios of the deferred element.
 
If the entire industry moved to tiling the scene buffer would be a non-issue before it became a problem, scenegraph traversal has to move to the GPU sooner or later ... once it is there, and it takes into account tiling, you can work with a scenebuffer of any size (it would just be a cache, to prevent having to transform geometry who's bounding volume straddles tile boundaries too many times).

Even without that you can transform a tiler from a deferred renderer to a batched renderer, and still retain a bounded size scenebuffer though.
 
Back
Top