AMD: R9xx Speculation

rjc · Mar 30, 2010

rjc said:
I translated the above as: "A to note recently Tepeout whole series of RV9xx all are 40nm, doesn't exist 28nm hardware to strenghthen TES, to increase Blu Ray decoding quality, 3D Mark Vantage not much increase"

Is the part in bold correct "tapeout whole series"? Supposedly there are 3 chips in Hectoncheires(?sp aarrggh)

Just following up previous post...it's spelt "Hecatonchires", of which there are 3 members: Briareus, Cottus and Gyges.

Interestingly Briareus ~ "sea goat" has another name "Aigaion", or "Aegean" in English.

Looking at the NI codenames - Mykonos, Ibiza and Cozumel. Mykonos is an island in the Aegean sea, would be nice if the 2 chips were somehow related, make things a bit easier to remember.

Sadly suspect AMDs intention is the opposite.

Edit: Misremembered the Strings the Gipsel found in the driver: should be Kauai not Mykonos, so no obvious relationship betwen the two series.

jaredpace · Mar 30, 2010

Could someone please explain how ati designs are more dense with transistors but use less power than Nvidia designs? Is that considered an example of superior engineering? I would assume it is, but how do they do it?

Also,
N.I. = rv1070 (new design, new node, next year), and
Hecatonchires = rv970 (refresh, same node, this year)
?

aaronspink · Mar 30, 2010

jaredpace said:
Could someone please explain how ati designs are more dense with transistors but use less power than Nvidia designs? Is that considered an example of superior engineering? I would assume it is, but how do they do it?

Comes down essentially to overhead per ALU. Nvidia effectively has a control overhead per 16 alus (8 in G80), iirc. ATI has a control overhead per 80 alus. This has impact not only in the pipeline but also in the register file and any bypass networks.

John021 · Mar 30, 2010

Charlie on Southern Islands

rpg.314 · Mar 30, 2010

John021 said:
Charlie on Southern Islands

Fits Lixian's theory to a T.

@TSMC, what is up with you guys? Is cancelling/fucking-up processes the new black?

rpg.314 · Mar 30, 2010

BTW, do the contracts include penalties should a fab fuck up a process, screwing customer roadmaps?

3dilettante · Mar 30, 2010

The memory controllers and miscellaneous doodads would seem to fit the somewhat nebulous uncore (not sure what counts as core in a Cypress-style GPU).
The L2 cache seems tied closely enough to the memory controllers, so that might change.

The shader arrays are tightly linked to the TMU, and LDS, and the GDS has to interface with them. That seems to cap what can be done about those.
The scheduling hardware is also tightly linked to the current SIMD structure and instruction support.
ROPs interact with memory and shader writeback, but what point is there in modifying them if the shaders that feed them don't change?

Perhaps whatever is on the other side of the setup engine can be fiddled with. The setup engine feeds into the scheduler, so it is one step removed from the shaders. Would that count as uncore, or at least notcore? (the latter not being a serious word)

rpg.314 · Mar 30, 2010

3dilettante said:
The memory controllers and miscellaneous doodads would seem to fit the somewhat nebulous uncore (not sure what counts as core in a Cypress-style GPU).
The L2 cache seems tied closely enough to the memory controllers, so that might change.

The shader arrays are tightly linked to the TMU, and LDS, and the GDS has to interface with them. That seems to cap what can be done about those.
The scheduling hardware is also tightly linked to the current SIMD structure and instruction support.
ROPs interact with memory and shader writeback, but what point is there in modifying them if the shaders that feed them don't change?

Perhaps whatever is on the other side of the setup engine can be fiddled with. The setup engine feeds into the scheduler, so it is one step removed from the shaders. Would that count as uncore, or at least notcore? (the latter not being a serious word)

Exactly what Lixian said.:smile: Setup/raster+tess+cache. And since these were the architectural advantages of fermi, then without B1 it seems 6870 should be able to beat a 480.

Squilliam · Mar 30, 2010

rpg.314 said:
Exactly what Lixian said.:smile: Setup/raster+tess+cache. And since these were the architectural advantages of fermi, then without B1 it seems 6870 should be able to beat a 480.

I guess then that might add credence to the idea that there will be a B1 variant of G100? (If the above is true).

However early information can often be false information and if ATI are going through GloFo then the regular rumour mill doesn't exactly apply here.

rpg.314 · Mar 31, 2010

I wonder what will happen to the leaks from ATi side should they switch to GF. Are the fab workers in Germany/NY any less loose lipped?

Squilliam · Mar 31, 2010

rpg.314 said:
I wonder what will happen to the leaks from ATi side should they switch to GF. Are the fab workers in Germany/NY any less loose lipped?

I would say that the standard rumour mill isn't developed as much in Germany as compared to Taiwan etc because there hasn't been any reason to really point our noses in that direction for rumours and tidbits.

dizietsma · Mar 31, 2010

With TSMC making a mess of 40nm and 28nm causing concerns it does make you wonder whether AMD and nvidia will be thinking more and more about Global Foundaries or other.

rpg.314 · Mar 31, 2010

dizietsma said:
With TSMC making a mess of 40nm and 28nm causing concerns it does make you wonder whether AMD and nvidia will be thinking more and more about Global Foundaries or other.

http://www.semiaccurate.com/2009/08/18/nvidia-takes-huge-risk/

Betting people say that TSMC will eat the costs and Nvidia will be a GlobalFoundries customer at the earliest opportunity anyway.

http://www.semiaccurate.com/2010/03/30/atis-next-generation-outed/

The lead off parts were due to come on TSMC's 28nm process, which is set for Q1/2011, followed by derivatives on GlobalFoundries' 28nm process.

After 40/32/28, can't say I blame nv/ati. :???:

GZ007 · Mar 31, 2010

3dilettante said:
The memory controllers and miscellaneous doodads would seem to fit the somewhat nebulous uncore (not sure what counts as core in a Cypress-style GPU).
The L2 cache seems tied closely enough to the memory controllers, so that might change.

The shader arrays are tightly linked to the TMU, and LDS, and the GDS has to interface with them. That seems to cap what can be done about those.
The scheduling hardware is also tightly linked to the current SIMD structure and instruction support.
ROPs interact with memory and shader writeback, but what point is there in modifying them if the shaders that feed them don't change?

Perhaps whatever is on the other side of the setup engine can be fiddled with. The setup engine feeds into the scheduler, so it is one step removed from the shaders. Would that count as uncore, or at least notcore? (the latter not being a serious word)

They could also change the ALU clocks like nvidia did long ago. ROP-s and TMU-s are likely bandwith bound but ALU-s usualy not.
Its quite hard to increase the clocks to 1.2 GHz for the whole chip, but not if u would just increase the SP clocks to 1.2 GHz.
Something like 900 MHz for the whole gpu and 1.2 GHz for the SP-s.(1.3333 multiplier) Just this mild frequency increase could earn with the same 1600 SP-s near 40% more performance for same die area.
But It surely has some penalties to keep two clock domains on the chip.
edit: Actualy whats the disadvantage to keep two or more clock domains on the chip ?

hkultala · Mar 31, 2010

GZ007 said:
They could also change the ALU clocks like nvidia did long ago. ROP-s and TMU-s are likely bandwith bound but ALU-s usualy not.
Its quite hard to increase the clocks to 1.2 GHz for the whole chip, but not if u would just increase the SP clocks to 1.2 GHz.
Something like 900 MHz for the whole gpu and 1.2 GHz for the SP-s.(1.3333 multiplier) Just this mild frequency increase could earn with the same 1600 SP-s near 40% more performance for same die area.
But It surely has some penalties to keep two clock domains on the chip.
edit: Actualy whats the disadvantage to keep two or more clock domains on the chip ?

1) Having multiple clocks that are not multiplies of each others is problematic for data integrity, needs additional buffers and data integrity logic

2) These increase the latency of the data going thru the clock speed boundary

3) Distribution of clock signals becomes more complex thing to do when there are multiple different clock signals

4) Control logic becomes more complex when everything cannot be calculated by simple clock cycles.

GZ007 · Mar 31, 2010

hkultala said:
1) Having multiple clocks that are not multiplies of each others is problematic for data integrity, needs additional buffers and data integrity logic

2) These increase the latency of the data going thru the clock speed boundary

3) Distribution of clock signals becomes more complex thing to do when there are multiple different clock signals

4) Control logic becomes more complex when everything cannot be calculated by simple clock cycles.

The 9xxx nvidias had 2.5 multiplier. My 1.3333 was a bad example

. But of course keeping in sync the scalar sp-s could be much easyer than the vector ones.

eastmen · Mar 31, 2010

Its looking more and more like the 5870 is bandwidth limited at high resolutions by reading the 2GB edition benchmarks.

Do you guys think ati will go with a wider bus with the refresh ?

hkultala · Mar 31, 2010

GZ007 said:
The 9xxx nvidias had 2.5 multiplier. My 1.3333 was a bad example . But of course keeping in sync the scalar sp-s could be much easyer than the vector ones.

2.5 is also fractional multipler, its equally bad(or might be worse, as requires multply/division by 5, 4/3 requires multiplying/division by only 3 and 4)

2 is an easy one. 4 is easy one. 8 is easy one. 3, 5, 6 are bit more difficult, but easier than 2.5 or 1.33

Mindfury · Mar 31, 2010

eastmen said:
Its looking more and more like the 5870 is bandwidth limited at high resolutions by reading the 2GB edition benchmarks.

Do you guys think ati will go with a wider bus with the refresh ?

According to B3D benchmark,5870 is not bandwidth limited.

I'm pretty sure ATI won't use weird bus like 384bit.512bit bus will cost too much die space.I think they will stay with 256bit bus this year.

GZ007 · Mar 31, 2010

Mindfury said:
According to B3D benchmark,5870 is not bandwidth limited.

I'm pretty sure ATI won't use weird bus like 384bit.512bit bus will cost too much die space.I think they will stay with 256bit bus this year.

If the gpu was designed for 256bit bus and around 4.8GHz clocks than just increasing the clocks wont show to much increase. U have the same 32 ROP-s for same 8*32bit controlers and buffers.
As the caches are now several hundred GB/s , and the gpu is designed for a given buss width ,bandwith , ROP-s,buffers u wont see much difference with just memory clock changes. At least my theory.

I just want to say that u could see much more difference with the 384bit when the whole gpu would be designed round it.

AMD: R9xx Speculation

rjc

jaredpace

aaronspink

John021

rpg.314

rpg.314

3dilettante

rpg.314

Squilliam

Beyond3d isn't defined yet

rpg.314

Squilliam

Beyond3d isn't defined yet

dizietsma

rpg.314

GZ007

hkultala

GZ007

eastmen

hkultala

Mindfury

GZ007

Similar threads