AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
1 package/heatsink contact for that hugearse card we saw. "dual-core".

I looked at the nApolean quote and tried to translate it:

单 single
芯 core
双核 dual-core
是 is
没错 correct
,
但 however
你 you
如果 in the event
拆开 to disassemble
散热器 heat sink
看 to see
的话 if
,
就是 it is
一个 a
核心 core
,
字面 literal
上 on
大家 authority
还是 or
要 demand
说 to speak
清楚 clearly understood
点 point
好 good
,
免得 so as to avoid
误会 misunderstanding

To me the above is saying that it is a single core consisting internally of 2 cores ie a single piece of silicon.
 
Green?

Everybody knows that something painted red goes faster, painted green draws less energy, painted blue has lower temperature and painted black or white makes the owner cooler.

If rumors about leakage of TMSC's 40nm are true, the choice is green.

:LOL::devilish::oops:
 
The AMD-green 7 in your signature for a while yesterday was hardly innocuous.

Jawed

Yeah, and as soon as i discovered that Charlie was linking to my posts to proves something i deleted it. Thats why he thinks its fake, so maybe im his hero now :rolleyes:
 
To me the above is saying that it is a single core consisting internally of 2 cores ie a single piece of silicon.
Couldn't it simply be saying it's an X2 board, like HD4870X2?

For all we know the Juniper and Juniper X2 boards are what AMD is launching this year, and next year the 256-bit chip arrives. Or 384-bit. Whatever.

Jawed
 
Yeah, and as soon as i discovered that Charlie was linking to my posts to proves something i deleted it. Thats why he thinks its fake, so maybe im his hero now :rolleyes:

Actually it was proven in the forum over @SA, he was just pointing at you as one of the suspects of creating this false slide.
 
I looked at the nApolean quote and tried to translate it:

To me the above is saying that it is a single core consisting internally of 2 cores ie a single piece of silicon.

What on earth would be the point of doing something like that on "single piece of silicon"? I mean, you'd be duplicating hardware you only need to have one in there (for example UVD), while you could just make bigger "traditional GPU" with a bit more extra space to play around with.
 
Actually it was proven in the forum over @SA, he was just pointing at you as one of the suspects of creating this false slide.

Ah, ok...
I read the posting in the S|A forum but theres nothing more than some strange conclusions. Whatever.
 
What on earth would be the point of doing something like that on "single piece of silicon"? I mean, you'd be duplicating hardware you only need to have one in there (for example UVD), while you could just make bigger "traditional GPU" with a bit more extra space to play around with.

UVD and other low-bandwidth sections already hang off of a lower bandwidth ring bus in RV770.
In a scenario where there are multiple graphics partitions, there could concievably be an uncore ring bus shared across the chip that extra units can be tied to.

If the ring bus is not buffed out, it would be insufficient for any duties other than sharing miscellaneous fixed-function blocks, and probably not enough to do mult-GPU rendering. (or at least nothing more intense than what is already done)
 
Couldn't it simply be saying it's an X2 board, like HD4870X2?

They seem to use a different word to describe X2 boards - 胶水 which translates as 'glue' i think.

Also just today:
R800就是2个RV870在一个PCB上,并不是封装在一起,希望能帮一些YY过头的朋友从悬崖上拉回来,所以说ATI方面最强单GPU很明显是RV870
Amateurishly translated: R800 is 2 individual RV870 on top of a PCB, and is not to encapsulate together, to wish to help some, YY to over do it pull back friend from precipice, therefore to say ATI most strong single GPU quite clear is RV870
 
UVD and other low-bandwidth sections already hang off of a lower bandwidth ring bus in RV770.
In a scenario where there are multiple graphics partitions, there could concievably be an uncore ring bus shared across the chip that extra units can be tied to.

If the ring bus is not buffed out, it would be insufficient for any duties other than sharing miscellaneous fixed-function blocks, and probably not enough to do mult-GPU rendering. (or at least nothing more intense than what is already done)

But what would the big point of it be?
I mean, which seems more sensible in your eyes:
2x SPs, TUs, RBEs, UVDs, PCIe interfaces, memory controllers etc etc
Or
2x SPs, TUs, RBEs, 1x the rest, like we traditionally do
 
That question is no different since HD3870X2 turned up.

You could argue that RV770 was much too large for the sweet spot strategy, in comparison with RV670. Ideally Juniper would be 256-bit...

Jawed
 
But what would the big point of it be?
I mean, which seems more sensible in your eyes:
2x SPs, TUs, RBEs, UVDs, PCIe interfaces, memory controllers etc etc
Or
2x SPs, TUs, RBEs, 1x the rest, like we traditionally do

The ring bus would not be duplicated, so it would be 2x of the primary GPU elements (SPs, TUs, RBEs, edit: these I'm not sure of *mem control, Command processors?*), and the rest would remain the same.
The extra fixed-function blocks are already somewhat separate and wouldn't need to be duplicated. The ring bus would just need an extra stop for the extra GPU.

As for why it would make sense, there could be other reasons.
It saves a lot on design effort, as at its most rudimentary it is just crossfire on a chip with two smaller GPUs pasted onto the same uncore.

The other is that perhaps the pieces of global hardware on GPUs are not scaling very well, and might become serious problems on a single monster GPU:
the crossbar interfaces from the memory controllers to the SIMDs, shared memory network, command processor, triangle setup section might become overly complex if forced to scale up to feed a single graphics cluster with 2x the needed bandwidth and setup required.
 
I don't see why ATI would want to put everything on a single piece of Si period. The whole point of the sweet spot strategy is to reduce die size and using multiple dice where demanded by the market.

If you make a big piece of Si, then you should just make a big GPU like NV. ATI's strategy doesn't involve large pieces of Si, AFAICT. And that's probably good.

DK
 
The ring bus would not be duplicated, so it would be 2x of the primary GPU elements (SPs, TUs, RBEs, edit: these I'm not sure of *mem control, Command processors?*), and the rest would remain the same.
The extra fixed-function blocks are already somewhat separate and wouldn't need to be duplicated. The ring bus would just need an extra stop for the extra GPU.

As for why it would make sense, there could be other reasons.
It saves a lot on design effort, as at its most rudimentary it is just crossfire on a chip with two smaller GPUs pasted onto the same uncore.

The other is that perhaps the pieces of global hardware on GPUs are not scaling very well, and might become serious problems on a single monster GPU:
the crossbar interfaces from the memory controllers to the SIMDs, shared memory network, command processor, triangle setup section might become overly complex if forced to scale up to feed a single graphics cluster with 2x the needed bandwidth and setup required.

But considering how "modular" todays GPUs are, would it really make any significant difference designwise, and would it possible hamper the performance?
 
I think it is very likely something's getting lost in translation and we are looking at an MCM i.e. multiple dies per package rather than a monolithic die.
 
Yeah, I read that translation as something like "You'd have to take off the heatsink/heatspreader to realise that its not one piece of silicon"

If they will really have a 4 chip MCM X2 setup, then having redundant bits that you need only one of per GPU (& which are too big to have redundancy for on one chip) is possibly quite a good thing in terms of yield.

A 4 chip card could be made with 3 chips having dead UVD for example.
 
Wouldn't it be possible to achieve the same benefits as an MCM approach for multiple dice without the problems (the large number of off die pin outs, high packaging costs, etc) by keeping all the dice on the same piece of silicon. The benefits of the MCM approach are a single die design for multiple performance markets with complete software transparency and single chip style scalability (as opposed to Crossfire-SLI style scalability) by treating multiple dice as if they were one larger die.

The MCM approach would be to design a base die with high bandwidth communications off die, cut up the wafer into individual dice, and package them in a single MCM package with cross die communications.

In the single silicon approach, you would again design a single base die with high bandwidth communications (but these communications would stay on the die, a big advantage). You would then cut the wafer into dice that consist of 2 base die with high bandwidth communications between them. If one of the base die tested out bad, then you would cut that die in 2, disable the comm link, and use the good die for the lower performance chip. What's more, if you need more of the lower performance chips, you just cut all the die in 2 for the wafer, since all the different performance chips would use the same wafers (they just cut the dice differently).

This is similar to the old idea of building a larger die, then disabling bad shaders and selling those die as lower performance chips. However, the above multiple dice approach has significant advantages: 1) if any portion of an individual base die fails (not just the shaders), that base die can be discarded without discarding the other base die which significantly reduces wasted silicon 2) The same wafer design can be used for all levels of performance chips without wasted silicon.

The same approach could be used for 4x, 6x, 8x or any other scale of multiple dice. For instance, for 4x, you would build in communications between 2x2 squares of die, and cut the wafer into 2x2 squares by default. If one of a 2x2 group is bad, you cut it, and sell a 2x chip and a 1x chip. If two are bad you cut and either sell a 2x chip or 2 1x chips depending on the configuration. If 3 are bad, you cut and sell a 1x chip. All with no more wasted silicon than if all the die were for the the lowest performance 1x chip or if MCM packaging were used.
 
I've been thinking about the same kind of thing, e.g. fabbing as pairs, then cutting into either singles or pairs. Has this ever been done before? Decades ago there was the concept of wafer-scale integration, but that doesn't seem to have gone anywhere.

http://en.wikipedia.org/wiki/Wafer-scale_integration

Wouldn't such irregular cutting be quite expensive? How much testing can be done before cutting?

Also, binning pairs of chips could be quite a problem - it could lower the ceiling on achievable clockspeeds substantially. Though you could argue this is ideal meat for a refresh pie, as the process matures and you simply bin for higher dual-die clocks in 6 months' time.

Jawed
 
... If one of the base die tested out bad, then you would cut that die in 2, disable the comm link, and use the good die for the lower performance chip. ...
It's a fun idea to play with, but you can't make selective cuts on a wafer. You're using a slightly higher precision version of your father's circular saw. Great for making long straight cuts from one end all the way to the other. Not so great to make an additional cut somewhere in the middle.
 
Back
Top