"Slowing Down the Process Migration"

Yep, well, the physics demands that things will have to slow down. The absolute limit of these things is around 10nm. This slowdown is due to approaching this limit, and there will have to be a whole lot of engineering work in order to get process sizes much closer to 10nm than they currently are.

Anyway, since graphics cards have so much higher densities of funcitonal units than CPUs, I really have a hard time believing they can ever approach the frequencies of CPUs. But, we'll see. With process technologies slowing down, nVidia and ATI will need to seek out other means to improve performance.
 
Well, I seriously doubt we will see 3.4 GHz graphics on the 130 nm process, but I think that 800 MHz for a high end part is not beyond the realm of plausibility.

Chalnoth is correct though, GPU's are more about the functional units and small amounts of cache, while CPU's are all about cache with a single functional unit (in gross terms of coarse). Though, the amounts of cache the P4 and A64 have, it is still not an easy task to get memory banks to run that fast (unless I am totally smoking crack here).

The Intrinisity Fast-14 stuff looks really interesting though, if it gives even half of the performance it promises, then these companies can really put out a screaming chip.
 
Nice write-up, Josh. I'd like to see what ATi makes of Intrinsity's work, too.

One Q regarding this quote, though:

Many think that the max feasible transistor count for a mass produced product is around 250 million transistors for the 130 nm process. This does put a ceiling on the amount of transistors engineers can reasonably include into any design. This will make engineers look for more efficient means to do more with fewer transistors. A current example of this is what ATI has done this with their anisotropic filtering algorithms (which decrease the amount of transistors needed to adequately filter the scene).
Does ATi actually save transistors with their angle-dependent AF, or just speed (meaning texture accesses)? If anything, wouldn't incorporating this angle-dependency in hardware require perhaps a handful more transistors beyond the standard TMUs? Similarly, didn't ATi's RV350+ MIP-map filtering "optimisations" require extra transistors beyond what R300-R360 have for texturing, rather than swapping some TMU logic to save processing time?

I always thought taking intelligent short-cuts would require adding extra transistors to a working GPU, which is why it took nV so (relatively) long to implement AF optimisations.

An obvious tangent Q (for a n00b like me) would be, Are bilinear and trilinear hardwired into the GPU, or are the two processes/access patterns simply specified in software?
 
I'm not sure pete .

Perhaps they mean they can save transistors they would have used to make aniso faster by using this tech ?
 
Sort of like going from one-cycle tri to two-cycle tri from NV10 to NV15, a half-hardware and half-software compromise? Meaning, rather than implementing one-cycle tri or faster AF solely in hardware, you break it down into smaller, repeatable chunks.

If AF becomes standard going forward, though, it seems both ATi and nV will eventually have to dedicate more transistors to it. The same appears true of AA, though perhaps Longhorn's rumored 200dpi display support would push AA down the "must-have" list.
 
Pete said:
Does ATi actually save transistors with their angle-dependent AF, or just speed (meaning texture accesses)?
Yep. The algorithm that leads to the angle dependency is a rougher approximation to the proper formula than was used by nVidia with the NV2x and NV3x. The angle dependency is a byproduct of saving transistors, basically.

The performance increase is also a byproduct. In fact, you would get a performance increase over the proper formula from any approximation, since any approximation will sometimes lead to sampling less (and thus to a lower LOD) than the proper formula, but never more (since that would waste performance with little to no improvement in image quality).
 
Pete said:
If AF becomes standard going forward, though, it seems both ATi and nV will eventually have to dedicate more transistors to it. The same appears true of AA, though perhaps Longhorn's rumored 200dpi display support would push AA down the "must-have" list.
Not especially. As time moves forward, I expect texture accesses to be less and less frequent with respect to math operations. As long as we don't see drastically higher capacity for math ops than texture accesses going forward, anisotropic filtering performance will sort itself out.
 
From what I have been told, NVIDIA did use a lot of extra "hardware" to do the better filtering on the GeForce FX line and have it perform as fast as it did. ATI on the other hand used a less precise way of doing filtering, and used an algorithm that will still make the scene look good. So from this standpoint it isn't that ATI put aside a larger transistor budget to optimise the output, but rather they were doing less work on the scene, so the performance was faster, and they didn't need as many transistors to get the work done.

I do not know the exact details, so this is something that you would have to hit up SirEric on!
 
Chalnoth said:
Pete said:
If AF becomes standard going forward, though, it seems both ATi and nV will eventually have to dedicate more transistors to it. The same appears true of AA, though perhaps Longhorn's rumored 200dpi display support would push AA down the "must-have" list.
Not especially. As time moves forward, I expect texture accesses to be less and less frequent with respect to math operations. As long as we don't see drastically higher capacity for math ops than texture accesses going forward, anisotropic filtering performance will sort itself out.


I don't think we need much faster than what we currently have in terms of fillrate. With the highend cards we can already turn on high af amounts and high fsaa amounts at larger reses .

I think the next few gens are going to concentrate on shader power.

Then hopefully display tech will take another jump foward and they will start up with the fillrate again .


Is that what your thinking too
 
I'm thinking there hasn't been anything particularly interesting in the display market, and it is high time we did get some nice advances. While LCD's have gone down in price, and they are becoming faster and sharper, they still have a long ways to go to match CRT's in most aspects.

The Viewsonic P225F that I use was designed and first introduced over 3 years ago, and it is still one of the top monitors out there. I think it is definitely high time that we get some real advances in the CRT/LCD market! With this latest generation of graphics cards, playing at 1600x1200 is now very common. My wish would be a monitor that had a max of 4000x3000 rez (so you really don't need to use AA anymore...).
 
JoshMST said:
My wish would be a monitor that had a max of 4000x3000 rez (so you really don't need to use AA anymore...).

Well stupidly high-res LCDs *are* available of course (ie. 3k x 2k), but they are also stupidly expensive, and of course a single DVI channel doesn't have enough bandwidth to drive them (thereby requiring multiple DVI outputs on graphics cards).

The problem with LCDs is the manufacturing process and associated yields, isn't it? Yield is proportional to 1/number of pixels (roughly) so unless people are willing to accept more dead/bright pixels price will do the same (all other factors ignored!).

Add to that that CRTs are a very old technology, so manufacturers have had decades of experience in bulk manufacture over LCDs, it's not hard to see why LCDs are so far behind.

Some of the new manufacturing processes I've read about look interesting, eg. inkjet printing LCDs, etc.

What I wonder is whether LCD will *ever* reach the market penetration currently enjoyed by CRT before it's replaced by something else (eg. OLED), or whether it's destined to have been a "nearly" technology.
 
Chalnoth said:
Yep. The algorithm that leads to the angle dependency is a rougher approximation to the proper formula than was used by nVidia with the NV2x and NV3x. The angle dependency is a byproduct of saving transistors, basically.

The performance increase is also a byproduct. In fact, you would get a performance increase over the proper formula from any approximation, since any approximation will sometimes lead to sampling less (and thus to a lower LOD) than the proper formula, but never more (since that would waste performance with little to no improvement in image quality).

Heh, curious what do you mean by proper formula since there isn't any official formula exactly for anisotropic filtering like there is for trilinear filtering. Really to me it seems the proper formula should be a true integration and really all feasible anisotropic filtering methods are just various lets sample more methods. But sure wouldn't call any particular one proper since there is several different ways of deciding where those extra samples should be going (hell I'm all up for a method that uses monto carlo integration using perlin noise for choosing the sampling).

It seemed to me all ATI's method revolved around having several different fixed samplings for different angles. Then whatever angle is closest gets chosen for using those samplings instead of computing out the "proper" sampling coordinates. So should definately result in a transistor savings though over doing the math to compute the sampling posistions.
 
Cryect said:
Heh, curious what do you mean by proper formula since there isn't any official formula exactly for anisotropic filtering like there is for trilinear filtering.
Yes, there is. There is a proper formula for the degree of anisotropy to use. What is up in the air for anisotropic filtering is how many samples to take and in what pattern to take them. But the degree is calculated by a simple formula that is almost identical to the MIP map degree selection algorithm.

Just look up the anisotropic filtering extension in OpenGL to see what I'm talking about.
 
Chalnoth said:
Cryect said:
Heh, curious what do you mean by proper formula since there isn't any official formula exactly for anisotropic filtering like there is for trilinear filtering.
Yes, there is. There is a proper formula for the degree of anisotropy to use. What is up in the air for anisotropic filtering is how many samples to take and in what pattern to take them. But the degree is calculated by a simple formula that is almost identical to the MIP map degree selection algorithm.

Just look up the anisotropic filtering extension in OpenGL to see what I'm talking about.

What am I looking for?
 
Well on the LCD vein, would it ever be possible to do transplants? Like patch them together so even if there are a few bad pixels you can just put another section in.

I mean I think this would be a worthwhile expenditure of research assets,
 
I think that ATI/nVidia must start customize their transistor´s and cell´s more and they are probably doing this cause they see the problem´s appear with smaller nodes.
I remember reading the "speculates" about the R300 before it was launched.
Statement´s like 110m transistors over 250mhz was impossible on 150nm. The R360 overclocks in most cases over 430Mhz on 150nm :oops:
 
Yes, ATI did an amazing job with their R3x0 series of chips, and it also really showed how solid TSMC's 150 nm process was. Their overall execution of the R300 and later chips really forced many to look at NVIDIA and consider, "What exactly were you guys thinking?"
 
Back
Top