If NV30 uses tile-based rendering, will Ati convert too?

Randell · Oct 22, 2002

BoardBonobo said:
And do you suspect that nVidia may have implemented a similar form of system. So they may be able to reach that 48Gb\sec bandwidth mark?

well when the 48Gb/sec figure was first mooted, TBR techniques were put forward as how it could be acheived on a 128bit bus.

Now we have evidence from elsewhere (a suprising source at that) that it can be done.

BoardBonobo · Oct 22, 2002

So is this Gigapixel at work? Or if it is a similar process to the Wildcat would that mean nVidia have adapted and expanded the Gigapixel tech?

Ailuros · Oct 22, 2002

BoardBonono,

Assume you have an architecture that deals efficiently with overdraw (insert any architecture naming scheme here), and a total bandwidth of 16GB/s on paper.

Times an average overdraw factor of 2.5 = 40GB/sec peak theoretical memory bandwidth.

The way I've been understanding it you'd need a vga (no bandwidth saving techniques) to have on paper 40GB/s to match that efficiency. It does not mean that it equals in the strict sense to 40GB/s real bandwidth.

Ailuros · Oct 22, 2002

BoardBonobo said:
So is this Gigapixel at work? Or if it is a similar process to the Wildcat would that mean nVidia have adapted and expanded the Gigapixel tech?

I'd guestimate the second

BoardBonobo · Oct 22, 2002

Ailuros said:
BoardBonono,

Assume you have an architecture that deals efficiently with overdraw (insert any architecture naming scheme here), and a total bandwidth of 16GB/s on paper.

Times an average overdraw factor of 2.5 = 40GB/sec peak theoretical memory bandwidth.

The way I've been understanding it you'd need a vga (no bandwidth saving techniques) to have on paper 40GB/s to match that efficiency. It does not mean that it equals in the strict sense to 40GB/s real bandwidth.

Sure, I get that. Though I work out the real bandwith to be just under 14Gb\sec giving 37Gb\sec with an OD of 2.5. Though if the 128\256bit memory controller rumours are both true then you have a low end card with a real bandwidth of 14(37)Gb\sec and a high end card with a real bandwidth of 29(74)Gb\sec.

Not quite sure I followed the last para though: vga? Very Good Architecture, Video Graphics Array?

The DFR allows you to move just over twice the amount of data for the same amount of bandwidth as you would use without out any bandwidth saving techniques?

mboeller · Oct 22, 2002

Randell said:
BoardBonobo said:

And do you suspect that nVidia may have implemented a similar form of system. So they may be able to reach that 48Gb\sec bandwidth mark?

Click to expand...

well when the 48Gb/sec figure was first mooted, TBR techniques were put forward as how it could be acheived on a 128bit bus.

Now we have evidence from elsewhere (a suprising source at that) that it can be done.

Do You mean the comment from JohnH and myself about the QBM-style memory interface for an graphics-card (with the QBM-chip within the 3D-chip)?

Clashman · Oct 22, 2002

I think this 48gb/sec effective bandwidth isn't what y'all think it is.

Wildcat VP970 Wildcat VP870 Wildcat VP760 Wildcat VP560
Memory 128 MB 256-bit DDR 128 MB 256-bit DDR 64 MB 256-bit DDR 64 MB 128-bit DDR
Displays Independent Dual-Head VGA+DVI-I Independent Dual-Head VGA+DVI-I Independent Dual-Head VGA+DVI-I Independent Dual-Head DVI-I+DVI-I
Performance 255M Vertices/Sec
42G AA Samples/Sec 188M Vertices/sec
35G AA Samples/Sec 165M Vertices/Sec
23G AA Samples/Sec 100M Vertices/Sec
18G AA Samples/Sec
Value Ultimate Visual Processing Performance Powerful, Versatile Productivity Affordable, CAD-optimized Performance Entry-level Dual Display Workstation Graphics
Segment CAD
DCC Simulation CAD
DCC Simulation CAD
Low-end DCC CAD
Web Graphics
Desktop Publishing
ESP (US$) $1,199 $599 $449 $249

I think nvidia's effective bandwidth falls under the same area as what I bolded above, i.e. effective antialiasing bandwidth, much like when the GeForce 3 came out and they were touting effective fillrates. Just a guess, though.

Colourless · Oct 22, 2002

The word Effective should automatically set off everyone's alarm bells.

4x compression on a 128 bus will really simply give you 48 GB/s of effective bandwith but all you need is a 12 GB/s of real bandwith.

That is a bit simplistic, but you should get the idea

-Colourless

Randell · Oct 22, 2002

mboeller said:
Randell said:

BoardBonobo said:

And do you suspect that nVidia may have implemented a similar form of system. So they may be able to reach that 48Gb\sec bandwidth mark?

Click to expand...

well when the 48Gb/sec figure was first mooted, TBR techniques were put forward as how it could be acheived on a 128bit bus.

Now we have evidence from elsewhere (a suprising source at that) that it can be done.

Click to expand...

Do You mean the comment from JohnH and myself about the QBM-style memory interface for an graphics-card (with the QBM-chip within the 3D-chip)?

I meant what Clashman posted just below me.

Humus · Oct 22, 2002

LeStoffer said:
Humus said:

Why would you need all vertex data for a tiler? You can split it up into tiles as you render the triangle.

Click to expand...

Yes, but my point is that you cannot start to render the first tile before you have all the vertex data for that frame. Only then can you start to sort in which tile the rendered triangles go (let's call it display lists) and thus you're already one step from IMR and closer to DR.

We're talking about a tiler that's not a deferred renderer, right? What you're describing is a full deferred renderer afaics. A tiler simply means it renders to tiles, rather than say scanlines. You don't need any more information than the actual triangle, no other part of the scene whatsoever, to be able to do that.

LeStoffer · Oct 22, 2002

Humus said:
We're talking about a tiler that's not a deferred renderer, right? What you're describing is a full deferred renderer afaics.

Right. I might make a tiler more complicated than it really has to be. Then:

Humus said:
A tiler simply means it renders to tiles, rather than say scanlines. You don't need any more information than the actual triangle, no other part of the scene whatsoever, to be able to do that.

Okay, this might be where I'm all wrong, so please try to follow my drift here: In a normal IMR the pixel pipeline just renders the polygon as it comes along no matter where in the frame buffer the polygon belongs. Easy enough. But in a tiler architecture I would assume that you will render one tile (part of the frame buffer) first, then move on to the next tile, then the next and so forth. If this is true, you cannot finish rendering the first tile before you have rendered all the polygons that belongs there â€“ thus you have to wait to all polygons are â€˜in placeâ€™ before you can start to render that first tile.

Yes, I realize that I must have misunderstood something important, so please correct me Humus!

arjan de lumens · Oct 22, 2002

Tiled framebuffers (where you split the frame into small tiles and then render a triangle to one tile before proceeding to the next tile, continuing with the same triangle until you have rendered all affected tiles, and only then switch to the next triangle) have been present since Voodoo1 - AFAIK, all voodoo, geforce and radeon series cards support tiled framebuffers. Which, IMO, has nothing to do with tile-based rendering.

DemoCoder · Oct 22, 2002

Some sort of deferred rendering would make sense for NV30 since NVidia is touting computational efficiency. If you've got a 1024 instruction long pixel shader, performance will be way faster if you don't draw that pixel in the first place. No matter how much logic you devote to executing multiple shader ops in parallel per cycle, on long shaders, simply not executing the shader at all will trump all the other performance optimizations.

GetStuff · Oct 22, 2002

Speculation mode on...

Assuming NV30 is some sort of deffered rendering architecture, could that possibly shed any light on NVIDIA's intentions for the 16tmus.

Would 4*4 be better then 8*2 in a deffered renderer?

GraphixViolence · Oct 22, 2002

ATI's documentation claimed their HyperZ III virtually eliminated overdraw using an early Z-test. Are there any significant benefits to tile-based architectures other than overdraw reduction? Because if not, this debate might be moot.

arjan de lumens · Oct 22, 2002

The one big benefit to tilers other than overdraw reduction is reduction of framebuffer traffic/memory usage, which are probably unimportant for the performance of 1000-instruction pixel shaders.

On the other hand, how many developers actually make use of front-to-back rendering? Especially given that IMRs are so critically dependent on front-to-back rendering to eliminate overdraw.

MfA · Oct 22, 2002

Personally I think it is partly a myth that fillrate/bandwith will get less important ... your shading model can only be so complex without any actual information on what is being reflected by a pixel, wether it is being lit, wether it is inside some volume and to what extent etc etc ... and those extra sources of information to be used in those huge pixel shaders have to come from somewhere before they can be used, which will take both fillrate (to produce) and bandwith (to produce and retrieve). A whole heap of extra storage too incidentally.

Marco

PS. I dont consider procedural textures all that relevant ...

andypski · Oct 22, 2002

arjan de lumens said:
The one big benefit to tilers other than overdraw reduction is reduction of framebuffer traffic/memory usage, which are probably unimportant for the performance of 1000-instruction pixel shaders.

On the other hand, how many developers actually make use of front-to-back rendering? Especially given that IMRs are so critically dependent on front-to-back rendering to eliminate overdraw.

If you are going to use a very long shader then with a traditional IMR you can render the depth pass first so that you guarantee overdraw of one with the complex shader. This can speed things up immensely without the need for front-to-back sorting of subsequent data.

Of course if the shader generates the depth buffer value then you would need to run this pass with the excised depth part of the shader.

If the whole of the long shader is required to generate the depth value (that's a seriously complicated depth shader!) then you are, unfortunately SOOL, but so is a tiler since it would also need to run the whole shader on every triangle to generate the depth (the triangles are no longer necessarily planar in Z)...

Randell · Oct 22, 2002

GetStuff said:
Speculation mode on...

Assuming NV30 is some sort of deffered rendering architecture, could that possibly shed any light on NVIDIA's intentions for the 16tmus.

Would 4*4 be better then 8*2 in a deffered renderer?

IIRC correctly, in fact the opposite - it is more likely with a TBR architecture to be 1TMU as multiple textures can be applied in a single pass - Kyro could do 8 I beleive. Cant remember the reason why though I'm sure its in this sites article on TBR's

Teasy · Oct 22, 2002

ATI's documentation claimed their HyperZ III virtually eliminated overdraw using an early Z-test.

I haven't seen any evidence that that's true, and personally I very much doubt it.

Are there any significant benefits to tile-based architectures other than overdraw reduction?

Yes allot of other advantages, I mentioned a few of them above.

If NV30 uses tile-based rendering, will Ati convert too?

Randell

Senior Daddy

BoardBonobo

My hat is white(ish)!

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

BoardBonobo

My hat is white(ish)!

mboeller

Clashman

Colourless

Monochrome wench

Randell

Senior Daddy

Humus

Crazy coder

LeStoffer

arjan de lumens

DemoCoder

GetStuff

GraphixViolence

arjan de lumens

MfA

andypski

Randell

Senior Daddy

Teasy

Similar threads