X1800/7800gt AA comparisons

Jawed · Oct 13, 2005

I have to say I'm more intrigued to know if the ultra threaded despatch processor (or whatever it's called) is programmable and what effects we might see from that.

And what kind of relationship exists between the UTDP and the MC. There must be a fair degree of symbiosis there.

Jawed

ferro · Oct 13, 2005

sireric said:
The MC also looks at the DRAM activity and settings, and since it can "look" into the future for all clients, it can be told different algorithms and parameters to help it decide how to best make use of the available BW.

The MC can look into the future? Could you please provide its algorithms to all elevator manufacturing companies? It would be great if an elevator is already there when you press the button.

Thanks in advance.

AlphaWolf · Oct 13, 2005

ferro said:
The MC can look into the future? Could you please provide its algorithms to all elevator manufacturing companies? It would be great if an elevator is already there when you press the button.

Thanks in advance.

They probably could make elevators more efficient using prediction.

Geo · Oct 13, 2005

http://www.elevator-world.com/magazine/archive01/9606-001.htm

sireric · Oct 13, 2005

Jawed said:
I have to say I'm more intrigued to know if the ultra threaded despatch processor (or whatever it's called) is programmable and what effects we might see from that.

And what kind of relationship exists between the UTDP and the MC. There must be a fair degree of symbiosis there.

Jawed

The whole thing is one system -- All units depend on each other to work to correctly. The MC requires the clients to have lots of latency tolerance so that it can establish a huge number of outstanding requests and pick and chose the best ones to maximize memory bandwidth (massive simplification).

However, texture ends up being a MC client but also has the shader dependant on it. Consequently, if the MC wants high latency, the shader has to be designed to deal with that.

There are two reasonable ways to deal with that: You can either have large batch sizes of pixels, in which case you hide the latency of fetches, more or less, just by doing the same thing over and over on many pixels before going to the next thing. This would be an architecture that, says, executes the same pixel shader instruction on 1000's of pixels. This works well to hide latency, and is somewhat cheap, area wise. However, it suffers granularity loss, since it has to work in large batches. This would make for a good SM2 type part. The new way, is to make small batches, but have lots of them. So you execute one instruction on a small batch (say 16 pixels), then switch to another instruction and batch until the data for the first one returns. You need to have lots of live threads in this type of architecture, and you need lots of resources (i.e. area) for it to properly hide latency. But, its advantage is that it rules from a granularity standpoint and branching (prime feature of SM3) works perfectly. That's what we did for the R5xx. I believe that the first architecture is more popular for others.

At the end, the whole thing works together. To achieve high memory bandwidth, you need an efficient memory controller design with windowed requests, and clients capable of dealing with long latencies. We did all this, and made our control units very programmable on top -- Since we knew that tuning would be difficult and that we need the flexibility to be able to achieve high efficiency (we did not trust that we would get it right with the first set of settings/prgrms

). It also allows us to experiment and try new things, so that we'll be more ready for the future.

Edit: corrected some terrible grammar.

ERK · Oct 13, 2005

This is great.
Heh, isn't B3D a great site?

ERK · Oct 13, 2005

BTW, my boss would NEVER let me talk about that kind of info.

Rys · Oct 13, 2005

trinibwoy said:
Are you testing on an XL as well?

By the time I get some spare time to do XL, R600 will be a $5 part on low end. I need about 256 hours in a day, in both directions. Or something.

sireric · Oct 14, 2005

ERK said:
BTW, my boss would NEVER let me talk about that kind of info.

Dunno. Nobody's fired me yet. Perhaps tomorrow.

However, our plan is to push to open up the architectures and make things a little more visible, a la CPU. We've been working the GPGPU aspect of things, which can be applied to physics as well; and they want open, low level access. The idea is not to create yet another API for physics, and 2 for graphics, and another for something else, etc... -- We'd rahter a thin layer (required for cross generation compatibility) where lots of things, be they physics or DNA matching or whatever, can be used -- To access the parallel computer sitting there. But to make a thin layer like that, you need to open up your architectures a lot more.

Though, some of the GPUBench stuff will tell you all that I've told you, and lots more. It can tell you batch sizes, how you deal with GPRs, ALUs, etc...

fallguy · Oct 14, 2005

The real difference is in the AF to me. ATi's AF looks MUCH better, in those screenies from the [H] review. Its not even close.

sireric · Oct 14, 2005

fallguy said:
The real difference is in the AF to me. ATi's AF looks MUCH better, in those screenies from the [H] review. Its not even close.

I don't know if it's *the* real difference, but I'm happy I got one of my key guys (Tony the texture god) to redesign the AF. We listened (did not always agree) to the criticisms, and we wanted to improve that specifically. Personally, I always run with AA & AF, so I like the quality. I'm glad that the driver guys agreed with me, and pushed to expose this to users for launch.

Geo · Oct 14, 2005

sireric said:
I don't know if it's *the* real difference, but I'm happy I got one of my key guys (Tony the texture god) to redesign the AF. We listened (did not always agree) to the criticisms, and we wanted to improve that specifically. Personally, I always run with AA & AF, so I like the quality. I'm glad that the driver guys agreed with me, and pushed to expose this to users for launch.

So the "pure" AF isn't only software --there was some hardware redesign elements to that vs R3/4? Much of an "ouchie" transistor-wise, or a "good" that mainly flowed from other elements like the MC and threader? What a cool business card --"Tony the Texture God".

Edit: What I'm after is the hardware elements, if any, specific to why there appears to be such a small performance hit for the "rotation independent" AF --congrats on that, btw --I'd have been happy if y'all had provided opt-free with a bigger hit as an option; what came out on that score was better than I'd hoped when I thot I was being pollyanna-ish.

rwolf · Oct 14, 2005

sireric said:
Dunno. Nobody's fired me yet. Perhaps tomorrow.

However, our plan is to push to open up the architectures and make things a little more visible, a la CPU. We've been working the GPGPU aspect of things, which can be applied to physics as well; and they want open, low level access. The idea is not to create yet another API for physics, and 2 for graphics, and another for something else, etc... -- We'd rahter a thin layer (required for cross generation compatibility) where lots of things, be they physics or DNA matching or whatever, can be used -- To access the parallel computer sitting there. But to make a thin layer like that, you need to open up your architectures a lot more.

Though, some of the GPUBench stuff will tell you all that I've told you, and lots more. It can tell you batch sizes, how you deal with GPRs, ALUs, etc...

We heard that a feature called scatter for random memory access was added, what other features have been added for general purpose processing?

KimB · Oct 14, 2005

fallguy said:
The real difference is in the AF to me. ATi's AF looks MUCH better, in those screenies from the [H] review. Its not even close.

Well, duh, because they focused on off-angle surfaces. It really upsets me that nobody focused on off-angle surfaces back when the Radeon 9700 was released and the GeForce4 Ti cards were still beating the pants off of it in anisotropic filtering quality.

BRiT · Oct 14, 2005

It's good to see what I suspected and posted about before, on October 9th, actually occur, and in a big way.

BRiT said:

I thought I'd mention the driver configuration items, since DB mentioned what Orton said. It was something along the line of ''outside developers as well as their internal engineers haven't been able to do much with the previous batches of cards -- too inconsistent and flaky behavior''. We also saw on previous ATI hardware releases they achieved significant performance increases by tweaking/configuring how the memory subsystem is actually employed.

Click to expand...

Great Job ATI! Now if you could pull out a few more of these, everyone will be happy.

Geo · Oct 14, 2005

Chalnoth said:
Well, duh, because they focused on off-angle surfaces. It really upsets me that nobody focused on off-angle surfaces back when the Radeon 9700 was released and the GeForce4 Ti cards were still beating the pants off of it in anisotropic filtering quality.

Go halvsies on therapy with an anger management specialist? I'll do the first 1/2 hour on NV PR and you can have the second 1/2 hr on the unfairness of the history of AF quality.

DemoCoder · Oct 14, 2005

Isn't the danger of opening up low level GPU access, a reduction of abstraction, and therefore, less freedom of GPU implementation technique in the future? Do we really want developers depending on low level details of GPU implementation that should be subject to change, and are will not always be relevant to rendering?

Rather than expose internal GPU workings, I think the better approach is to expose high level APIs for Physics, and certain problems in GPGPU space and then let the driver do the translation work if it can. But exposing the GPU as a general purpose computation device, and promoting performance on GPGPU in PR I think is dangerous. GPGPU performance should be secondary to rendering performance, and should not come at its expense.

Deathlike2 · Oct 14, 2005

A small related question.. besides the X1000 (R520 based) series of hardware, what other hardware has the programmable memory controller? (I'm sure that it probably won't get the same amount of performance boosts the X1000 series will get)

It's fascinating how this seems to be a key thing ATI continues to work on... efficency.

BRiT · Oct 14, 2005

Deathlike2 said:
A small related question.. besides the X1000 (R520 based) series of hardware, what other hardware has the programmable memory controller? (I'm sure that it probably won't get the same amount of performance boosts the X1000 series will get)

The X800/R420 series has a programmable memory controller but it's not as programmable or expansive as the X1000 series' programmable memory controller.

Karoshi · Oct 14, 2005

ferro said:
The MC can look into the future? Could you please provide its algorithms to all elevator manufacturing companies? It would be great if an elevator is already there when you press the button.

Thanks in advance.

Sure, if you dont mind stopping your internal ticking the moment you press the button and entering "elevator time", where you "tick" based on the availability of the elevator. And having 511 frozen people all with the finger on the button may be a problem, too, when you try to press it to enter elevator time.

Now seriously: may I insist on the question about linux support of this asked a few pages ago? IIRC I read many moons ago the linux driver code tree wasnt quite based on win code, is this right?

X1800/7800gt AA comparisons

Jawed

ferro

AlphaWolf

Specious Misanthrope

Geo

Mostly Harmless

sireric

ERK

ERK

Rys

Graphics @ AMD

sireric

fallguy

sireric

Geo

Mostly Harmless

rwolf

Rock Star

KimB

BRiT

(>• •)>⌐■-■ (⌐■-■)

Geo

Mostly Harmless

DemoCoder

Deathlike2

BRiT

(>• •)>⌐■-■ (⌐■-■)

Karoshi

Similar threads