If NV30 uses tile-based rendering, will Ati convert too?

BoardBonobo said:
And do you suspect that nVidia may have implemented a similar form of system. So they may be able to reach that 48Gb\sec bandwidth mark?

well when the 48Gb/sec figure was first mooted, TBR techniques were put forward as how it could be acheived on a 128bit bus.

Now we have evidence from elsewhere (a suprising source at that) that it can be done.
 
So is this Gigapixel at work? Or if it is a similar process to the Wildcat would that mean nVidia have adapted and expanded the Gigapixel tech?
 
BoardBonono,

Assume you have an architecture that deals efficiently with overdraw (insert any architecture naming scheme here), and a total bandwidth of 16GB/s on paper.

Times an average overdraw factor of 2.5 = 40GB/sec peak theoretical memory bandwidth.

The way I've been understanding it you'd need a vga (no bandwidth saving techniques) to have on paper 40GB/s to match that efficiency. It does not mean that it equals in the strict sense to 40GB/s real bandwidth.
 
BoardBonobo said:
So is this Gigapixel at work? Or if it is a similar process to the Wildcat would that mean nVidia have adapted and expanded the Gigapixel tech?

I'd guestimate the second ;)
 
Ailuros said:
BoardBonono,

Assume you have an architecture that deals efficiently with overdraw (insert any architecture naming scheme here), and a total bandwidth of 16GB/s on paper.

Times an average overdraw factor of 2.5 = 40GB/sec peak theoretical memory bandwidth.

The way I've been understanding it you'd need a vga (no bandwidth saving techniques) to have on paper 40GB/s to match that efficiency. It does not mean that it equals in the strict sense to 40GB/s real bandwidth.

Sure, I get that. Though I work out the real bandwith to be just under 14Gb\sec giving 37Gb\sec with an OD of 2.5. Though if the 128\256bit memory controller rumours are both true then you have a low end card with a real bandwidth of 14(37)Gb\sec and a high end card with a real bandwidth of 29(74)Gb\sec.

Not quite sure I followed the last para though: vga? Very Good Architecture, Video Graphics Array? ;) The DFR allows you to move just over twice the amount of data for the same amount of bandwidth as you would use without out any bandwidth saving techniques?
 
Randell said:
BoardBonobo said:
And do you suspect that nVidia may have implemented a similar form of system. So they may be able to reach that 48Gb\sec bandwidth mark?

well when the 48Gb/sec figure was first mooted, TBR techniques were put forward as how it could be acheived on a 128bit bus.

Now we have evidence from elsewhere (a suprising source at that) that it can be done.

Do You mean the comment from JohnH and myself about the QBM-style memory interface for an graphics-card (with the QBM-chip within the 3D-chip)?
 
I think this 48gb/sec effective bandwidth isn't what y'all think it is.

Wildcat VP970 Wildcat VP870 Wildcat VP760 Wildcat VP560
Memory 128 MB 256-bit DDR 128 MB 256-bit DDR 64 MB 256-bit DDR 64 MB 128-bit DDR
Displays Independent Dual-Head VGA+DVI-I Independent Dual-Head VGA+DVI-I Independent Dual-Head VGA+DVI-I Independent Dual-Head DVI-I+DVI-I
Performance 255M Vertices/Sec
42G AA Samples/Sec 188M Vertices/sec
35G AA Samples/Sec 165M Vertices/Sec
23G AA Samples/Sec 100M Vertices/Sec
18G AA Samples/Sec

Value Ultimate Visual Processing Performance Powerful, Versatile Productivity Affordable, CAD-optimized Performance Entry-level Dual Display Workstation Graphics
Segment CAD
DCC Simulation CAD
DCC Simulation CAD
Low-end DCC CAD
Web Graphics
Desktop Publishing
ESP (US$) $1,199 $599 $449 $249

I think nvidia's effective bandwidth falls under the same area as what I bolded above, i.e. effective antialiasing bandwidth, much like when the GeForce 3 came out and they were touting effective fillrates. Just a guess, though.
 
The word Effective should automatically set off everyone's alarm bells.

4x compression on a 128 bus will really simply give you 48 GB/s of effective bandwith but all you need is a 12 GB/s of real bandwith.

That is a bit simplistic, but you should get the idea

-Colourless
 
mboeller said:
Randell said:
BoardBonobo said:
And do you suspect that nVidia may have implemented a similar form of system. So they may be able to reach that 48Gb\sec bandwidth mark?

well when the 48Gb/sec figure was first mooted, TBR techniques were put forward as how it could be acheived on a 128bit bus.

Now we have evidence from elsewhere (a suprising source at that) that it can be done.

Do You mean the comment from JohnH and myself about the QBM-style memory interface for an graphics-card (with the QBM-chip within the 3D-chip)?

I meant what Clashman posted just below me.
 
LeStoffer said:
Humus said:
Why would you need all vertex data for a tiler? You can split it up into tiles as you render the triangle.

Yes, but my point is that you cannot start to render the first tile before you have all the vertex data for that frame. Only then can you start to sort in which tile the rendered triangles go (let's call it display lists) and thus you're already one step from IMR and closer to DR.

We're talking about a tiler that's not a deferred renderer, right? What you're describing is a full deferred renderer afaics. A tiler simply means it renders to tiles, rather than say scanlines. You don't need any more information than the actual triangle, no other part of the scene whatsoever, to be able to do that.
 
Humus said:
We're talking about a tiler that's not a deferred renderer, right? What you're describing is a full deferred renderer afaics.

Right. I might make a tiler more complicated than it really has to be. Then: :oops:

Humus said:
A tiler simply means it renders to tiles, rather than say scanlines. You don't need any more information than the actual triangle, no other part of the scene whatsoever, to be able to do that.

Okay, this might be where I'm all wrong, so please try to follow my drift here: In a normal IMR the pixel pipeline just renders the polygon as it comes along no matter where in the frame buffer the polygon belongs. Easy enough. But in a tiler architecture I would assume that you will render one tile (part of the frame buffer) first, then move on to the next tile, then the next and so forth. If this is true, you cannot finish rendering the first tile before you have rendered all the polygons that belongs there – thus you have to wait to all polygons are ‘in place’ before you can start to render that first tile.

Yes, I realize that I must have misunderstood something important, so please correct me Humus!
 
Tiled framebuffers (where you split the frame into small tiles and then render a triangle to one tile before proceeding to the next tile, continuing with the same triangle until you have rendered all affected tiles, and only then switch to the next triangle) have been present since Voodoo1 - AFAIK, all voodoo, geforce and radeon series cards support tiled framebuffers. Which, IMO, has nothing to do with tile-based rendering.
 
Some sort of deferred rendering would make sense for NV30 since NVidia is touting computational efficiency. If you've got a 1024 instruction long pixel shader, performance will be way faster if you don't draw that pixel in the first place. No matter how much logic you devote to executing multiple shader ops in parallel per cycle, on long shaders, simply not executing the shader at all will trump all the other performance optimizations.
 
Speculation mode on...


Assuming NV30 is some sort of deffered rendering architecture, could that possibly shed any light on NVIDIA's intentions for the 16tmus.

Would 4*4 be better then 8*2 in a deffered renderer?
 
ATI's documentation claimed their HyperZ III virtually eliminated overdraw using an early Z-test. Are there any significant benefits to tile-based architectures other than overdraw reduction? Because if not, this debate might be moot.
 
The one big benefit to tilers other than overdraw reduction is reduction of framebuffer traffic/memory usage, which are probably unimportant for the performance of 1000-instruction pixel shaders.

On the other hand, how many developers actually make use of front-to-back rendering? Especially given that IMRs are so critically dependent on front-to-back rendering to eliminate overdraw.
 
Personally I think it is partly a myth that fillrate/bandwith will get less important ... your shading model can only be so complex without any actual information on what is being reflected by a pixel, wether it is being lit, wether it is inside some volume and to what extent etc etc ... and those extra sources of information to be used in those huge pixel shaders have to come from somewhere before they can be used, which will take both fillrate (to produce) and bandwith (to produce and retrieve). A whole heap of extra storage too incidentally.

Marco

PS. I dont consider procedural textures all that relevant ...
 
arjan de lumens said:
The one big benefit to tilers other than overdraw reduction is reduction of framebuffer traffic/memory usage, which are probably unimportant for the performance of 1000-instruction pixel shaders.

On the other hand, how many developers actually make use of front-to-back rendering? Especially given that IMRs are so critically dependent on front-to-back rendering to eliminate overdraw.

If you are going to use a very long shader then with a traditional IMR you can render the depth pass first so that you guarantee overdraw of one with the complex shader. This can speed things up immensely without the need for front-to-back sorting of subsequent data.

Of course if the shader generates the depth buffer value then you would need to run this pass with the excised depth part of the shader.

If the whole of the long shader is required to generate the depth value (that's a seriously complicated depth shader!) then you are, unfortunately SOOL, but so is a tiler since it would also need to run the whole shader on every triangle to generate the depth (the triangles are no longer necessarily planar in Z)... :LOL:
 
GetStuff said:
Speculation mode on...


Assuming NV30 is some sort of deffered rendering architecture, could that possibly shed any light on NVIDIA's intentions for the 16tmus.

Would 4*4 be better then 8*2 in a deffered renderer?

IIRC correctly, in fact the opposite - it is more likely with a TBR architecture to be 1TMU as multiple textures can be applied in a single pass - Kyro could do 8 I beleive. Cant remember the reason why though I'm sure its in this sites article on TBR's :)
 
ATI's documentation claimed their HyperZ III virtually eliminated overdraw using an early Z-test.

I haven't seen any evidence that that's true, and personally I very much doubt it.

Are there any significant benefits to tile-based architectures other than overdraw reduction?

Yes allot of other advantages, I mentioned a few of them above.
 
Back
Top