Recent Radeon X1K Memory Controller Improvements in OpenGL with AA

trinibwoy said:
Are you testing on an XL as well?
By the time I get some spare time to do XL, R600 will be a $5 part on low end. I need about 256 hours in a day, in both directions. Or something.
 
ERK said:
BTW, my boss would NEVER let me talk about that kind of info. :oops:

Dunno. Nobody's fired me yet. Perhaps tomorrow.

However, our plan is to push to open up the architectures and make things a little more visible, a la CPU. We've been working the GPGPU aspect of things, which can be applied to physics as well; and they want open, low level access. The idea is not to create yet another API for physics, and 2 for graphics, and another for something else, etc... -- We'd rahter a thin layer (required for cross generation compatibility) where lots of things, be they physics or DNA matching or whatever, can be used -- To access the parallel computer sitting there. But to make a thin layer like that, you need to open up your architectures a lot more.

Though, some of the GPUBench stuff will tell you all that I've told you, and lots more. It can tell you batch sizes, how you deal with GPRs, ALUs, etc...
 
The real difference is in the AF to me. ATi's AF looks MUCH better, in those screenies from the [H] review. Its not even close.
 
fallguy said:
The real difference is in the AF to me. ATi's AF looks MUCH better, in those screenies from the [H] review. Its not even close.

I don't know if it's *the* real difference, but I'm happy I got one of my key guys (Tony the texture god) to redesign the AF. We listened (did not always agree) to the criticisms, and we wanted to improve that specifically. Personally, I always run with AA & AF, so I like the quality. I'm glad that the driver guys agreed with me, and pushed to expose this to users for launch.
 
sireric said:
I don't know if it's *the* real difference, but I'm happy I got one of my key guys (Tony the texture god) to redesign the AF. We listened (did not always agree) to the criticisms, and we wanted to improve that specifically. Personally, I always run with AA & AF, so I like the quality. I'm glad that the driver guys agreed with me, and pushed to expose this to users for launch.

So the "pure" AF isn't only software --there was some hardware redesign elements to that vs R3/4? Much of an "ouchie" transistor-wise, or a "good" that mainly flowed from other elements like the MC and threader? What a cool business card --"Tony the Texture God". :LOL:

Edit: What I'm after is the hardware elements, if any, specific to why there appears to be such a small performance hit for the "rotation independent" AF --congrats on that, btw --I'd have been happy if y'all had provided opt-free with a bigger hit as an option; what came out on that score was better than I'd hoped when I thot I was being pollyanna-ish.
 
sireric said:
Dunno. Nobody's fired me yet. Perhaps tomorrow.

However, our plan is to push to open up the architectures and make things a little more visible, a la CPU. We've been working the GPGPU aspect of things, which can be applied to physics as well; and they want open, low level access. The idea is not to create yet another API for physics, and 2 for graphics, and another for something else, etc... -- We'd rahter a thin layer (required for cross generation compatibility) where lots of things, be they physics or DNA matching or whatever, can be used -- To access the parallel computer sitting there. But to make a thin layer like that, you need to open up your architectures a lot more.

Though, some of the GPUBench stuff will tell you all that I've told you, and lots more. It can tell you batch sizes, how you deal with GPRs, ALUs, etc...

We heard that a feature called scatter for random memory access was added, what other features have been added for general purpose processing?
 
fallguy said:
The real difference is in the AF to me. ATi's AF looks MUCH better, in those screenies from the [H] review. Its not even close.
Well, duh, because they focused on off-angle surfaces. It really upsets me that nobody focused on off-angle surfaces back when the Radeon 9700 was released and the GeForce4 Ti cards were still beating the pants off of it in anisotropic filtering quality.
 
Chalnoth said:
Well, duh, because they focused on off-angle surfaces. It really upsets me that nobody focused on off-angle surfaces back when the Radeon 9700 was released and the GeForce4 Ti cards were still beating the pants off of it in anisotropic filtering quality.

Go halvsies on therapy with an anger management specialist? I'll do the first 1/2 hour on NV PR and you can have the second 1/2 hr on the unfairness of the history of AF quality.
 
Isn't the danger of opening up low level GPU access, a reduction of abstraction, and therefore, less freedom of GPU implementation technique in the future? Do we really want developers depending on low level details of GPU implementation that should be subject to change, and are will not always be relevant to rendering?


Rather than expose internal GPU workings, I think the better approach is to expose high level APIs for Physics, and certain problems in GPGPU space and then let the driver do the translation work if it can. But exposing the GPU as a general purpose computation device, and promoting performance on GPGPU in PR I think is dangerous. GPGPU performance should be secondary to rendering performance, and should not come at its expense.
 
A small related question.. besides the X1000 (R520 based) series of hardware, what other hardware has the programmable memory controller? (I'm sure that it probably won't get the same amount of performance boosts the X1000 series will get)

It's fascinating how this seems to be a key thing ATI continues to work on... efficency.
 
Deathlike2 said:
A small related question.. besides the X1000 (R520 based) series of hardware, what other hardware has the programmable memory controller? (I'm sure that it probably won't get the same amount of performance boosts the X1000 series will get)

The X800/R420 series has a programmable memory controller but it's not as programmable or expansive as the X1000 series' programmable memory controller.
 
ferro said:
The MC can look into the future? Could you please provide its algorithms to all elevator manufacturing companies? It would be great if an elevator is already there when you press the button.

Thanks in advance.

Sure, if you dont mind stopping your internal ticking the moment you press the button and entering "elevator time", where you "tick" based on the availability of the elevator. And having 511 frozen people all with the finger on the button may be a problem, too, when you try to press it to enter elevator time.

Now seriously: may I insist on the question about linux support of this asked a few pages ago? IIRC I read many moons ago the linux driver code tree wasnt quite based on win code, is this right?
 
I should've thought through my response more.

There's a lot more "forward thinking" going on with ATI.

There obviously was a reason to have a programmable memory controller. Given that faster memory is always expensive and in demand... there needed to be a method to get the most of what you have. Ideally, there would be this memory controller that would be optimized (but non-programmable) that you would get the most of your transistors. The reality is that there is a way to get general gains through analysis (probably not as good as a dedicated solution, but works better in general).

Besides competition driving innovation, it's not surprising that after the introduction to the use of transparent AA by NVidia that ATI has something to respond with... which even extends to an older generation because of what they thought was a good idea back then. I'm not saying NVidia is doing a terrible job, but looking at the changes between ATI and NVidia's individual architecture.. someone is doing a better job of researching better ways to get more out of hardware. NVidia's transparant AA is great.. no doubt about it. However, it's fixed in the hardware.. unlike ATI's programmable AA where you can do many more things with it (it might not be as fast as dedicated hardware, but it can be done.. and still can be usable).

You could say that it's similar to writing a better driver to benefit all prior hardware.. but it is also something that you add new features into a driver based on an architectural design because it allows you to go further than the limitations of previous architectures.
 
DemoCoder said:
Isn't the danger of opening up low level GPU access, a reduction of abstraction, and therefore, less freedom of GPU implementation technique in the future? Do we really want developers depending on low level details of GPU implementation that should be subject to change, and are will not always be relevant to rendering?


Rather than expose internal GPU workings, I think the better approach is to expose high level APIs for Physics, and certain problems in GPGPU space and then let the driver do the translation work if it can. But exposing the GPU as a general purpose computation device, and promoting performance on GPGPU in PR I think is dangerous. GPGPU performance should be secondary to rendering performance, and should not come at its expense.

I'm not saying replace the gfx APIs -- Just trying to limit to prolification of new ones. What if the physics API doesn't allow for all physical phenomena to be done? Do you create a new API for that? What if signal processing wants to be done and you only have collision hooks?

At the end, I fear the same thing regarding low level of detail. But I fear the extreme work in having lots of new specialized APIs too. I'd like a reasonably low level API that allows more "to the metal" performance, but that abstracts some of the quirks of programming a given architecture. I don't really know the answer either. It's a new place were we are continuing to explore, but we are listening and talking to that community.
 
BRiT said:
The X800/R420 series has a programmable memory controller but it's not as programmable or expansive as the X1000 series' programmable memory controller.

Yes it does. And we've done quite a bit of tuning on our X700/x800 products for that MC. But, most of all, we've learned a lot and used that knowledge as a foundation for a new design.
 
So if you are about getting more low level, then would it be safe to assume that you will make available utilities to query the new performance info and more importantly, create your own profiles to tweak the MC and other new features? Or will sophisticated profile creation be an ATI in-house only thing?
 
sireric said:
I'm not saying replace the gfx APIs -- Just trying to limit to prolification of new ones. What if the physics API doesn't allow for all physical phenomena to be done? Do you create a new API for that? What if signal processing wants to be done and you only have collision hooks?

Well, it's a valid concern, but not one you don't encounter when designing any API, which is why APIs evolve over time. Almost by definition, APIs usually present a restricted interface to an underlying resource, that limits the way it can be manipulated. This applies not just to device drivers/graphics APIs, but also to TCP/IP Sockets database, and many other successful APIs.

I think it's must easier to handle the case of revising APIs over time, and the case of legacy software tied to a "to the metal" API which imposes severe constraints on GPU IHVs. Since GPU production is much more expensive than software, I'd rather a proliferation of API versions, than GPUs locked into a low level API which ties their hands because software depends on it.




At the end, I fear the same thing regarding low level of detail. But I fear the extreme work in having lots of new specialized APIs too. I'd like a reasonably low level API that allows more "to the metal" performance, but that abstracts some of the quirks of programming a given architecture. I don't really know the answer either. It's a new place were we are continuing to explore, but we are listening and talking to that community.

I fear the requirement that GPUs have to run physics, and therefore, we must modify APIs to make them "closer to the metal" so that any conceivable calculation can be implemented. I'd rather keep the GPU paradigm to rendering and stream based calculations and leave it at that. If it turns out that Physics doesn't run well under DX9/DX10, well, then so be it. I care about the use case that rendering runs well.
 
Here's some Doom 3 results for the new patch on an X1800 XL running our "Turkey Baster" test in Ultra quality:

Code:
X1800 XL   640x480  800x600  1024x768  1280x1024  1600x1200
4x         130.7    109.7    83.7      58.3       42.5
4x (Patch) 135.0    119.2    92.6      67.0       49.9
% Increase 3.3%     8.7%     10.6%     14.9%      17.4%
 
Back
Top