AMD: R9xx Speculation

Intel's calls so far have been pretty good, from having 45nm without immersion and going gate-last for their metal gates.
It's not that I'm saying they can't do it ... I'm saying their reasons for staying with bulk have never been the same reasons everyone else stuck with it. The jump to Finfets rather than SOI might be perfectly in line with their own reasoning where the inertia of cell design knowhow just isn't that big a deal to the costs, but fabless semiconductor companies would probably rather pay a little more for FD-SOI than abandon planar transistors.
 
Lets hope it wont be the size of the named titans.:p
Like another 1-2 inches on the PCB and the cards will rip out the pci-e slots.
But just in case the card is not already jammed in the hdd bays of your SUPER XXL mega case you needed to buy last time :devilish:

Oh please, even HD5970 fits every single case following ATX specs. If the HDD bays are in the way, the case isn't following ATX specs
 
It's not that I'm saying they can't do it ... I'm saying their reasons for staying with bulk have never been the same reasons everyone else stuck with it. The jump to Finfets rather than SOI might be perfectly in line with their own reasoning where the inertia of cell design knowhow just isn't that big a deal to the costs, but fabless semiconductor companies would probably rather pay a little more for FD-SOI than abandon planar transistors.
Intel's engineering resources and wealth could make the decision to abandon planar unique to them.

In other situations, one of their other concerns could be manufacturability, something that appears to have taken a higher priority than designer comfort.
The gate-first/gate-last debate appears to be resolving itself on that basis.
 
One important point from the GloFo's departure from AMD entirely is where the production of the ATI's GPUs will infact end up. Without that stake in ownership they no longer have any reason compelling them to produce their GPUs on GloFo's foundries aside from any technical merits that GloFo has over TSMC.

So where will the ATI GPUs end up? Are they going to stay with TSMC or are they likely to be better off splitting their production by some proportion between the two, or migrating fully to GloFo? Although that would essentially place all their eggs in the one bucket.
 
On a financial note, it would depend on what ATIC buys to get control.
It may not buy all of AMD's stock in GF, just all of AMD's controlling interest.
I'm not an accountant, so I don't know if that's possible.

If AMD can keep some percentage of the stock, and then GF becomes profitable, the dividend payments would offset the hit AMD takes in margins from GF's own profit-taking.

Perversely, the more successful GF becomes, and the smaller the share of its overall business AMD becomes, the more the foundry penalty lessens for AMD.
 
How big job, in terms of time, is designing the a chip for another manufacturers lines, instead of the intended one?
Since Chartered is apparently now part of GF, and they as far as I know have working 32nm bulk process, could they have shifted 32nm, planned for TSMC's now cancelled process, over to Chartered/GF's process instead?
 
Oh please, even HD5970 fits every single case following ATX specs. If the HDD bays are in the way, the case isn't following ATX specs

They could take away 2 inches from the pcb-s lenght and increase the height by half (or even by one, it would still fit with in any case). They would have bigger area gained this way and no one would bother with a pcb even inch higher than the back plate. And of course they could use a bigger slow rpm fan.
 
They could take away 2 inches from the pcb-s lenght and increase the height by half (or even by one, it would still fit with in any case). They would have bigger area gained this way and no one would bother with a pcb even inch higher than the back plate. And of course they could use a bigger slow rpm fan.

Perhaps, perhaps not, it's not all just about mm^2's, but regardless of that, if anyone is to blame, it's case manufacturers ignoring the specifications of the ATX spec they're supposedly following (since they're sold as ATX cases regardless if they follow the spec or not)
 
My guess is that HectonXX (too damn hard to pronounce/spell/remember...:mad: ) will be "simple" shrink to 28 nm to get a feel for the new process before a brand new architecture is launched. Even if they do add clusters to cypress's replacement, I am expecting cypress's successor to have less area (~250 mm2) than cypress since IMHO, cypress is too big to fit in the sweet spot.

However, nv will chase the 28 nm node like crazy as well...., so expect a shrink of GF100 within ~6 months of GF100.

At any rate, I have put together some efficiency charts for DP

http://picasaweb.google.com/rpg.314/GPUPics#5429684576081615058

and SP

http://picasaweb.google.com/rpg.314/GPUPics#5429684576081615058

Assumptions,

GF100 clock-1.5G for SP and 1.3G for DP (Teslas's are clocked lower), LRB clock 2G for both SP and DP.

GF100 area 570 mm2 and LRB area 600mm2.

GF100 power 250 W for SP and 225W for DP. LRB power 250 W for both.

GF100 has 14 SM's for DP and 16 for SP. LRB has 32 cores for the 600mm2 version.

Needless to say, flops aren't everything, but I feel it is a useful pointer nonetheless.
 
Last edited by a moderator:
My guess is that HectonXX (too damn hard to pronounce/spell/remember...:mad: ) will be "simple" shrink to 28 nm to get a feel for the new process before a brand new architecture is launched. Even if they do add clusters to cypress's replacement, I am expecting cypress's successor to have less area (~250 mm2) than cypress since IMHO, cypress is too big to fit in the sweet spot.

However, nv will chase the 28 nm node like crazy as well...., so expect a shrink of GF100 within ~6 months of GF100.

At any rate, I have put together some efficiency charts for DP

http://picasaweb.google.com/rpg.314/GPUPics#5429684576081615058

and SP

http://picasaweb.google.com/rpg.314/GPUPics#5429684576081615058

Assumptions,

GF100 clock-1.5G for SP and 1.3G for DP (Teslas's are clocked lower), LRB clock 2G for both SP and DP.

GF100 area 570 mm2 and LRB area 600mm2.

GF100 power 250 W for SP and 225W for DP. LRB power 250 W for both.

GF100 has 14 SM's for DP and 16 for SP. LRB has 32 cores for the 600mm2 version.

Needless to say, flops aren't everything, but I feel it is a useful pointer nonetheless.

Interesting, thanks. Odd how efficent LRB is in terms of DP and SP, and yet they didn't bring it out, you'd think they could have found a niche. It must have had some major issue for them to cancel it(for now, anyway), I've never really heard an explaination of why.

Cypress's effectiveness pretty much says to me they will probably never abandon the 5D design.
 
At any rate, I have put together some efficiency charts for DP

http://picasaweb.google.com/rpg.314/GPUPics#5429684576081615058

and SP

http://picasaweb.google.com/rpg.314/GPUPics#5429684576081615058

Assumptions,

GF100 clock-1.5G for SP and 1.3G for DP (Teslas's are clocked lower), LRB clock 2G for both SP and DP.

GF100 area 570 mm2 and LRB area 600mm2.

GF100 power 250 W for SP and 225W for DP. LRB power 250 W for both.

GF100 has 14 SM's for DP and 16 for SP. LRB has 32 cores for the 600mm2 version.

Needless to say, flops aren't everything, but I feel it is a useful pointer nonetheless.
Wasn't Larrabee rumored to be a 300W+ part? At least the demonstrated version featured a triple slot cooling solution.
And are we really sure Larrabee does (or would have done) DP at half rate? I found it a bit suspicious intel was specifically not talking DP with Larrabee, but only SP in all besides the very early presentations. There were no performance figures for DP given, all they said was the vector unit can work with DP data. But maybe the DP instructions issue only at half rate? Would save quite a bit.
 
Wasn't Larrabee rumored to be a 300W+ part? At least the demonstrated version featured a triple slot cooling solution.
And are we really sure Larrabee does (or would have done) DP at half rate? I found it a bit suspicious intel was specifically not talking DP with Larrabee, but only SP in all besides the very early presentations. There were no performance figures for DP given, all they said was the vector unit can work with DP data. But maybe the DP instructions issue only at half rate? Would save quite a bit.

I am assuming that DP is half rate in this pic. However, it might not be true for the first iteration.

It might have been a 300W+ part, I dunno.

Interesting, thanks. Odd how efficent LRB is in terms of DP and SP, and yet they didn't bring it out, you'd think they could have found a niche. It must have had some major issue for them to cancel it(for now, anyway), I've never really heard an explaination of why.

LRB 1, made on 45 nm, the cancelled thing, was comparable to a GTX285. GF100, it's closest competitor should be ~80% faster. Since GF100 is on 40 nm, let's call it 65% ahead. Clearly, if it has to catch up within say 10-15% of GF100, it needs a major tweaks/redesign. Merely throwing more cores at it won't help lrb. Not to mention that as gpus shed their ff hw, they have room for growth above permitted by Moore's law, while lrb is pretty much stuck at the ceiling.
 
One of the problems of LRB1 was the performance segment they chose. As they first GPU and working concept they could realy choose a much smaler GPU in the range of hd4600,gt9600.
They had only integrated graphic on laughable performance levels and they wanted to jump right into high end. How naive :rolleyes:
Even a hd4600,gt9600 performance would be a milestone against they GMA series and they could at least sell it. And for the architecture,drivers,real world performance a low end gpu could have much less problems and same valuable experience. Intel has sometimes these magalomaniac ambition.

With x86 they can only chase ati/nvidia in rasterization and never catch up. The biggest mistake with LRB1 was the expectation and design goal to be faster than the latest from competition.:cry:
 
No, because clocks can take only so far and high clock rate circuits take up more area than identical but lower clocked circuits. Distributed geometry setup is the future as you can have more than 4 setup units/gpu if you want.
Why is it so important? If I want e.g. 60 FPS, than the geometry load is identical independently on resolution, AA/AF/texture/PS settings, die-size/price segment etc. I really don't understand why it should be good to have higher geometry performance on $500 GPU than on $300 GPU.
 
Why is it so important? If I want e.g. 60 FPS, than the geometry load is identical independently on resolution, AA/AF/texture/PS settings, die-size/price segment etc. I really don't understand why it should be good to have higher geometry performance on $500 GPU than on $300 GPU.

My point was not about cost. My point was that clocks scale more slowly than the unit count on GPU's, so at some point in the near future, (with NI probably), it would be more effective to parallelize the tessellator/setup/raster instead of relying on higher clock speeds.
 
Why is it so important? If I want e.g. 60 FPS, than the geometry load is identical independently on resolution, AA/AF/texture/PS settings, die-size/price segment etc.
Per frame there are many render passes. Some render passes, e.g. shadow buffer generation, don't need to do pixel shading. So you want to spend the minimum time doing these passes.

Some passes involve rasterisation, i.e. shadow buffer generation, but other passes just involve geometry, e.g. computing adaptive tessellation factors.

Anything that requires rasterisation can be bottlenecked either by rasterisation or by setup (if not earlier in the pipeline).

Jawed
 
Jawed: Maybe I didn't describe my idea properly. Imagine two systems:

1. mainstream GPU, resolution e.g. 1440*900
2. high-end GPU, resolution e.g. 1920*1200

both systems will achieve 60FPS under these settings. Isn't the load of triangle setup identical in both situations? If so, what's the benefit of having twice as fast setup for the high-end GPU?


rpg.314: I understand. I think the increase of clock speed was sufficient by now. Now it's necessary to increase the triangle-rate significatly because of tesselation. But developmnet of distributed geometry system would be advantageous only if the demand of geometry performance will be significantly increased each year/half-year. As I understand, increase of geometry performance is a single-shot job because of tessealation. Further n-ary increase is no needed, additional polygons wouldn't affect image quality. That's why I see no reason to develop a scaleable system.
 
Back
Top