AMD: R9xx Speculation

rpg.314 · Mar 12, 2010

HVAC said:
how about fusion ???

Fusion is SOI. There's no 28nm SOI@gf.

FrameBuffer · Mar 12, 2010

rpg.314 said:
After ~2 yrs of ass kicking, nv won't let AMD have time-to-new-node advantage. Fermi2 will be on 28nm, it is most certainly slotted for 4q10 (ofc, delays are another matter) and it will be >550mm2.

doubtful..

rpg.314 · Mar 12, 2010

My understanding of nv says so. That company is too driven by 'halo'.

Alexko · Mar 12, 2010

rpg.314 said:
After ~2 yrs of ass kicking, nv won't let AMD have time-to-new-node advantage. Fermi2 will be on 28nm, it is most certainly slotted for 4q10 (ofc, delays are another matter) and it will be >550mm2.

That was the plan for 40nm. We all know how well that turned out...

rpg.314 · Mar 12, 2010

Alexko said:
That was the plan for 40nm. We all know how well that turned out...

The strategy was right, the execution....

Alexko · Mar 12, 2010

rpg.314 said:
The strategy was right, the execution....

Well that's the crux of the issue, isn't it? Any plan is only as good as your ability to execute it. And since physical design clearly isn't NVIDIA's strong suit...

rpg.314 · Mar 12, 2010

Alexko said:
Well that's the crux of the issue, isn't it? Any plan is only as good as your ability to execute it. And since physical design clearly isn't NVIDIA's strong suit...

Well, they'll just have to improve upon it. They'll do a GF100b X2 if they must, but they aren't gonna lose the halo. One hopes that with all the fermi heavy lifting, they can pull off 28nm in a shorter time.

Bouncing Zabaglione Bros. · Mar 12, 2010

rpg.314 said:
My understanding of nv says so. That company is too driven by 'halo'.

The problem is that a cutting edge halo part needs a cutting edge process - and these processes are rarely ready in the first instance for such large and complex chips. That's why we've seen ATI go to a new process with a less complex chip (ie not their top of the range part) to understand the process and it's limits before putting it's top chip on that process.

AMD seem to be designing the best product they can reliably produce for a given process. Nvidia seem to make a rod for it's own back going for these huge chips, new designs, new processes, etc all at the same time. That's bitten Nvidia on the ass and they've been unable to execute. It doesn't matter how clever and innovative Nvidia designs are if they can't be made until you've got a new process that has had a year to have all the problems shaken out of it.

In that time AMD has had a several quarters of being a generation ahead of Nvidia, and is moving onto the next process while Nvidia is only just getting their halo part out the door in small numbers.

There's only so far Nvidia can go on rebadging old chips while failing to execute on their new chips. And that's before I start asking where their mainstream DX11 chips are.

neliz · Mar 13, 2010

Let's try this.. RV970 = 2400SP, 48 ROP 256 Bit, basically 1,5 times Cypress in units yet the underlying architecture will be much better suited for todays workload and it would actually scale better than RV870 because of the improved set-up and Geometry etc.

Still taping out this year (or maybe it already did?!) and appearing near the end of the year, conveniently a bit before Fermi2.

fellix · Mar 13, 2010

I would say 2560 ALUs, a perfect fit for 32 SIMD layout.

ROPs -- why bother keeping such amount of FF hardware anymore?

neliz · Mar 13, 2010

fellix said:
ROPs -- why bother keeping such amount of FF hardware anymore?

Okay.. Okay.. 40 ROPS then.

Ailuros · Mar 15, 2010

fellix said:
I would say 2560 ALUs, a perfect fit for 32 SIMD layout.
ROPs -- why bother keeping such amount of FF hardware anymore?

Honest question: any particular reason why 32 SIMDs would be better than 16 SIMDs?

neliz said:
Okay.. Okay.. 40 ROPS then.

Neliz if you're trying to keep a similar reasoning to Evergreen in next generation parts then if you'd have (as you suggested) something like 3 blocks of 10 clusters and you'd have hypothetically 1 raster unit for each, you could think of either 8 or 16 pixels/clock/raster unit. 8*3=24 sounds too little and your original hypothesis on 16*3=48 makes more sense for that theoretical constellation.

However fellix is obviously thinking of something quite different; if I didn't get him wrong he's speculating for 32 SIMDs meaning 512 Vec5 ALUs divided over 16 clusters. If I take that further by one step, break those into two 8 cluster blocks with each having each own raster unit (and possibly trisetup?) and there you can easily go along with 16 pixels/raster and final 32 ROPs for the entire chip.

One question for that scenario would be whether 32SIMD is a better idea than 16SIMD and the next best would be whether 4 or 8 TMUs per cluster. With 4 you break even with GF100 in TMU amount, with 8 you're two times above it. TMUs are ff hw too

rpg.314 · Mar 15, 2010

neliz said:
Let's try this.. RV970 = 2400SP, 48 ROP 256 Bit, basically 1,5 times Cypress in units yet the underlying architecture will be much better suited for todays workload and it would actually scale better than RV870 because of the improved set-up and Geometry etc.

Still taping out this year (or maybe it already did?!) and appearing near the end of the year, conveniently a bit before Fermi2.

Let's see, new architecture+new process+much better geometry handling+"more suited to modern workloads"=Fuckmi.

Ailuros · Mar 15, 2010

rpg.314 said:
Let's see, new architecture+new process+much better geometry handling+"more suited to modern workloads"=F***mi.

Not directed at you but sweet irony hm? Up to recently the polymorph engine was a "sw solution" changed to something that was dreadfully overhyped only for unrealistic overtessellated scenarios and now we're looking in our speculations for something that will hypothetically compete with it more adequately?

Let's see the thing first tested by independent sources in real time game scenarios as well as in theoretical synthetics and it'll be easy to see if AMD really needed or will need a more adequate solution then they currently have.

rpg.314 · Mar 15, 2010

neliz said:
Let's try this.. RV970 = 2400SP, 48 ROP 256 Bit, basically 1,5 times Cypress in units yet the underlying architecture will be much better suited for todays workload and it would actually scale better than RV870 because of the improved set-up and Geometry etc.

Still taping out this year (or maybe it already did?!) and appearing near the end of the year, conveniently a bit before Fermi2.

My guess would be that the no. of SIMD's (or whatever the latest granularity of ALU's is) would be a multiple of 4, if only to make die lego easier for the cheaper parts.

Personally, I am expecting 28-32 simd's for rv970.

rpg.314 · Mar 15, 2010

Ailuros said:
Not directed at you but sweet irony hm? Up to recently the polymorph engine was a "sw solution" changed to something that was dreadfully overhyped only for unrealistic overtessellated scenarios and now we're looking in our speculations for something that will hypothetically compete with it more adequately?

Let's see the thing first tested by independent sources in real time game scenarios as well as in theoretical synthetics and it'll be easy to see if AMD really needed or will need a more adequate solution then they currently have.

Nah.., I am just wondering if the said factors will conspire to create a Fermi like situation.

Regarding Fermi's tessellation, it does seem over-provisioned, especially for something that is going to see only baby-usage for the most of it's life.

w0mbat · Mar 15, 2010

so, whats the best guess on the process of NI? 32nm? or u guy betting on 28, or still 40? cause the roadmap still shows 32nm. how easy is it to change the process size?

rpg.314 · Mar 15, 2010

w0mbat said:
so, whats the best guess on the process of NI? 32nm? or u guy betting on 28, or still 40? cause the roadmap still shows 32nm. how easy is it to change the process size?

http://www.semiaccurate.com/forums/showpost.php?p=29055&postcount=20

Ailuros · Mar 15, 2010

w0mbat said:
so, whats the best guess on the process of NI? 32nm? or u guy betting on 28, or still 40? cause the roadmap still shows 32nm. how easy is it to change the process size?

Why would you specifically ask such a question? *shrugs*

AlexV · Mar 15, 2010

Ailuros said:
Honest question: any particular reason why 32 SIMDs would be better than 16 SIMDs?

Considering Cypress is 20, how would that work?

AMD: R9xx Speculation

rpg.314

FrameBuffer

rpg.314

Alexko

rpg.314

Alexko

rpg.314

Bouncing Zabaglione Bros.

neliz

GIGABYTE Man

fellix

neliz

GIGABYTE Man

Ailuros

Epsilon plus three

rpg.314

Ailuros

Epsilon plus three

rpg.314

rpg.314

w0mbat

rpg.314

Ailuros

Epsilon plus three

AlexV

Heteroscedasticitate

Similar threads