What gives ATI's graphics architecture its advantages?

no-X · Apr 5, 2010

larrabee said:
why isnt ATi a lot faster than nvidia with their huge advantage in shading power, texturing, and fillrate? from an architectural perspective, nvidia's shaders are way behind but in games like crysis or stalker they do fine. something's not adding up.

BW isn't better, number of ROPs is lower, too. VRAM capacity can be also limiting in specific situations.

GZ007 · Apr 5, 2010

larrabee said:
why isnt ATi a lot faster than nvidia with their huge advantage in shading power, texturing, and fillrate? from an architectural perspective, nvidia's shaders are way behind but in games like crysis or stalker they do fine. something's not adding up.

They have quite a advantage if u consider 50% less transistors and 60% power consumption.

trinibwoy · Apr 5, 2010

GZ007 said:
They have quite a advantage if u consider 50% less transistors and 60% power consumption.

Fewer transistors and lower power consumption don't explain why they aren't as fast as the theoreticals would imply.

Xenus · Apr 5, 2010

Indeed one would say that AMD should be looking at their ALU's and the rest of the chip to see what is eating the huge advantage they have there. Since it fails completely to show up in all but a few mostly theoretical test cases,

Dave Baumann · Apr 5, 2010

Rarely are you entirely bound by any one thing, so the question is what brings the bang for the buck. While you may say that it "rarely shows up", you should also think that it is enabling these higher performances relative to die and power size by minimising the parts of the rendering that are bound by these elements (i.e. the bang for the buck is a very big win in this area).

trinibwoy · Apr 5, 2010

Theoretically yes, but given the measured performance it means that Nvidia's stuff is significantly faster at some other aspect of the workload in order to arrive at performance parity in the end. If we assume that Cypress benefits proportionately from its flops advantage, where does Fermi catch up? Wouldn't it be easier to assume that Fermi only catches up because raw flops don't account for much of the workload hence Cypress doesn't run away with the crown?

Dave Baumann · Apr 5, 2010

Across a large cross section of reviews and games (and removing obvious framebuffer limitations) at the moment Fermi averages at 13% faster than Cypress at 2560x1600 (all AA/AF combinations) - yet it has >300% more triangle rate, 24% more fillrate, 270% more Z-fillrate and 16% more bandwidth.

trinibwoy · Apr 5, 2010

Well we know triangle rate doesn't count for squat in current titles so that's a non-issue. So Nvidia's Z fillrate compensates for ATi's flop throughput? Maybe. You know it would really be nice if there was a site out there that went above and Beyond3D to do this sort of analysis. If only....

mczak · Apr 5, 2010

Dave Baumann said:
Across a large cross section of reviews and games (and removing obvious framebuffer limitations) at the moment Fermi averages at 13% faster than Cypress at 2560x1600 (all AA/AF combinations) - yet it has >300% more triangle rate, 24% more fillrate, 270% more Z-fillrate and 16% more bandwidth.

Well color fillrate isn't really higher, not for single-cycle operations at least as that's limited by the 32 (30) pixel rasterization limit (that the theoretically higher color fillrate fails to show up in multi-cycle rop operations too is a mystery yet to solve). By the same argument z-fillrate is "only" 180% higher in theory too. Maybe the higher z-fillrate (not only in theory but also measured) indeed does help a bit in real world though I'm not convinced (as the z-fillrate is _really_ high).

sethk · Apr 5, 2010

This reminds me in many ways of the whole deeply pipelined uarch that Intel had with Netburst that was compared to AMD's very different design at the time.

Netburst with it's very high clock rates and multiple execution units should have been much faster than AMD's architecture at the time - it had a significantly higher peak theoretical computation rate, and in the right benchmarks it was very fast at the time. But in real world application (and usually games), the IPC and overall execution speed was higher on the Athlons of the time.

Some of it was the very high penalty of branch prediction misses, some of it was the memory latency, but over time, Intel kept adding cache, improving their branch prediction, raising clock speeds, etc. until towards the end of the Netburst timeline, they were generally faster across the board. Of course the total redesign of the Core 2 architecture was such a quantum leap in so many ways that AMD is still catching up.

Ironically, in this example, Netburst reminds me of ATI's architecture. Like all comparisons, there are some major dissimilarities of course, and arguments could be made that nVidia is more like Netburst with the power draw and heat, etc.

DavidGraham · Apr 6, 2010

rpg.314 said:
I think the real issue is that even if you throw out the 5x ILP from VLIW, AMD still had (last gen) a ~30% advantage over nv in flops/mm for their alu's.

Does that mean that ATI ALUs require 30% less transistors than nvidia ALUs to do the same work in a given clock cycle?

For example : a single ATI ALU is composed of 2000 transitiors and can do a single FP32 MAD/MUL operation in every clock, while a single Nvidia ALU is composed of 2600 transistors , and can do just the same thing .

If true , then does that count in the fact that nvidia has to add more transistors to achieve higher clocks ?

I am a little confused with the last point , is it really necessary to add more transistors to achieve higher clocks ? I thought clocks is a matter of voltage and heat , other than that , any chip could be clocked higher with the right cooling (for example HD5870 reaches 1300MHz already on LN ).

so why we should add more transistor to each nvidia ALU to achieve higher clocks again ?

mczak · Apr 6, 2010

DavidGraham said:
so why we should add more transistor to each nvidia ALU to achieve higher clocks again ?

Because you have several gates in series for a single clock. The max clock you can get is is also a (linear) function of how long these chains are (though I'm not really up to date here, I've no idea what the length actually is for these modern chips).
Generally, you can make them shorter by using more gates in parallel, but unfortunately that will increase the gate and hence transistor count very significantly.

dkanter · Apr 6, 2010

Good question OP.

I'd recommend reading my Fermi article (http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932) and thinking about which features are useful for graphics and which are not.

Generally speaking, NV made a decision to invest resources in what they consider to be general purpose workloads (i.e. HPC). In some cases, those resources also help graphics (caches) - in some cases, they are totally useless for graphics (e.g. DP).

ATI generally focuses only on graphics and has not put nearly the area/power into HPC workloads and programmability...because they want to focus on gaining market share in graphics.

As part of that, NV's architecture is easier to achieve high utilization of resources. In essence, NV spends area/power on control logic and features which do not improve raw throughput, but instead improve average utilization. In some sense, NV made a more 'robust' and 'flexible' microarchitecture.

ATI instead decided to focus on a microarchitecture which matched very well with graphics workloads (i.e. vec4 works fine), but was not very 'robust' or 'flexible', and instead had extremely good peak performance.

Consequently, ATI and NV both have good utilization for graphics due to the explicitly parallel nature of the workload.

However, the utilization falls off substantially for ATI on non-graphics workloads (not to mention, programming is a pain). So their performance there will generally be inferior to NV's.

In essence, it's a case of what each company is optimizing for. ATI is focused on the workloads of the present, and NV on the workloads of the future.

There are some other issues as well - ATI genuinely has better physical design and implementation than NV. And they have more experience with TSMC 40nm, and GDDR5. All those combined give them a much more compact and power efficient product.

DK

Ethatron · Apr 6, 2010

trinibwoy said:
If we assume that Cypress benefits proportionately from its flops advantage, where does Fermi catch up?

~1400MHz vs. 850 MHz ALU clock.

aaronspink · Apr 6, 2010

DavidGraham said:
so why we should add more transistor to each nvidia ALU to achieve higher clocks again ?

larger transistors, parallel logic, more pipelining, etc.

Mintmaster · Apr 6, 2010

trinibwoy said:
If we assume that Cypress benefits proportionately from its flops advantage, where does Fermi catch up?

That's a rather silly premise to begin with.

It's not proportional by any means. 5x the SIMD width of a scalar design brings maybe 1.5-2x overall game performance, but that's pretty damn good considering that going from 1x to 5x only adds maybe 20% to the transistor cost and 10% to the board cost.

Cost efficiency is the only thing that matters. Effiency as a percentage of theoreticals is rather useless.

DemoCoder · Apr 6, 2010

The 1.5-2x advantage almost never materializes however and is based on workload, if for example, hardly anyone was doing any kind of heavy shading, then the most cost efficient card would have very little ALU power, and would stock up on TMUs, ROPs, bandwidth, and setup. If on the other hand, the workloads were really dependent upon CPU-like performance, then the optimal card would look something like a many cored x86, or machine Sun's canceled Rock CPU.

I wouldn't actually say Fermi's architecture is bad, it's just from a cost-benefit analysis summing over all current workloads, it looks non-cost effective. NVidia may be ok with that, because they might have their sights on trying to alter the market into one where their card becomes cost effective.

I guess the question is, is "embarassingly parallel" maxed out, and is the problem space that needs more SMP-style inter-communication and non-coherent workloads a highly valuable and necessary one going forward?

Mint, you neatly summarized the issue in the Console forums a few years ago, noting that, on a per-pixel based, both the XB360 and G7x GPUs could do an enormous number of operations for each pixel drawn, but that in practice, they never get even close to that. I have to say that, despite enormous boosts in theoretical flops and pixel-power in recent years, games don't look all that more impressive to me, we seem to be getting diminishing returns, Game budgets are skyrocketing (tens of millions), FLOPS have gone through the roof, but nothing has really blown me away in the last few years.

seahawk · Apr 6, 2010

ATI builts GPUs that are realistic. They stay away from the boundries of the process and they execute perfectly. The architekture is perfect for the typical gamer and still very powerful as GPGPU.

NV is overconfident. they ignore the boundries of the fabs. The GPU is way too much focused on GPGPU, which has no value to the gamer. Therefore their chipsd are late to the market and hot and energy hungry. I predict that the whole GF10X line-up will be a disaster. GF104 will need as much power as Cypress to barely win against Juniper.

Currently AMD has won. And that is before the Islands show up which will be the final nail in the NV coffin.

So in short:

One firm is full of arrogance, the other cares for their customers.

DemoCoder · Apr 6, 2010

AMD execution has not been perfect, let's take off the rose colored glasses. Both AMD and NVidia are corporations, what they care about are making money. The way you make money is by convincing customers you have the best product. Sometimes to do this, you take risks and make big bets, and sometimes it works, and sometimes it doesn't. It is the nature of capitalism, the bigger the bet, the more potential for disruption and the bigger the potential win, but also the potential loss.

This seems to work much like politics. When the economy or ruling party is successful, we tend to ascribe grandiose properties to them, and when the economy is bad, or their fscking up, we tend to be overly harsh. If you look at it dispassionately, NVidia is still in business, they made a competitive card from a performance perspective, it needs to be refined, and if the NV3x didn't put them out of business, Fermi won't, sorry to spoil the melodrama. AMD might gain back marketshare, but to count NVidia out would be suicidal.

Harison · Apr 6, 2010

DemoCoder said:
AMD might gain back marketshare, but to count NVidia out would be suicidal.

Exactly, thats what NV did after G80 - slept on laurels and milked same architecture over and over again till AMD caught up and became leader. On the other hand, it seems AMD borrowed tick-tock strategy from Intel, and wont be sleeping on NV anytime soon.

What gives ATI's graphics architecture its advantages?

no-X

GZ007

trinibwoy

Meh

Xenus

Dave Baumann

Gamerscore Wh...

trinibwoy

Meh

Dave Baumann

Gamerscore Wh...

trinibwoy

Meh

mczak

sethk

DavidGraham

mczak

dkanter

Ethatron

aaronspink

Mintmaster

DemoCoder

seahawk

DemoCoder

Harison

Similar threads