AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
How do you come to this conclusion based on one, poor image of the die? Do you even know what die was shown?

Idle speculation is one thing but idle speculation followed by statements such as "And i said that it' does seem great but it obviously helps ATi with this Rxxx generation" are pointless.

-FUDie

omg Did you read thread subject :LOL:

You dont read my posts, righ? Well I'll try for smarties like you to explain myself for the 3rd time.

Looking how many problems just 800SPs have in power distribution and patch they issue as cap-ring on RV790 i'd say all these are just cheap design, and for more building parts they need to leave their taifun flood concept and rearrange the die a little bit. It's hellawa cheaper than a cap-ring on die ;)
And these cheap checkerboard solution will yet againg be just enough for their <1500 as it seems SP units.



I was stating that the grainy visual evidence indicated that the situation was not as simple as you claimed, and some evidence might potentially go against it.
I also stated that devoting such a large fraction of die area to cache, as your claim asserts, would be very unusual for a GPU.

How unusual? When RV770 also had a huge cache just differently spread , and G200 has a huge cache in that cross surrounded by 10 cores.

Possibly, but is this heat spread out evenly?
There will be two hot and two cold corners of the chip.
RV770 might be somewhat cooler on the edges, and in the one section with other logic, but why would this be less even than what you've claimed?

That's what i was trying to point out. There's better and cheaper quick solution than cap-ring that they needed to patch R700 series. But somehow i don't believe that these hot-spots as you call them would make any problem for them, cause they have one large hot-spot in the center these days ;) And yes that's why i point out that nVidia's G200 seems like really greener solution from the design standpoint, but not perfect ofc. And that's why i mention Elpida's solution as a great one in it industry in recent years. Better performing , maybe just for memory blocks, than weird Prescott fragmented design. Yep Prescott has cache on one side but it was fragmented on die-shot unlike other classic CPU cache designs.

And i said that it' does seem great but it obviously helps ATi with this Rxxx generation ("It's not a lot in fact but seems it works well for AMD")

I'm not clear what this part means.

Doesn't mean anything when you cut it this way ;) this Rxxx = RV870 based generation whatever they call it.

I don't think Nvidia had the time to change their design once RV770's specs were known to them.
The chips were released pretty close to one another, and GT200 had some amount of delay. The details for the design would have been set in stone for a long time, probably in the year prior to release. It might be more, the process takes 4-5 years these days.

Not such a long time ... maybe few month, but it's better for them to just add two more cores (uneven placement of building blocks on pretty symmetrical G200 die). Anyway who add some parts after whom .... it's just speculative but G200 shouldn't be 584mm2, but around 500mm2 (472mm2 iirc, initial web specs-ulations in jan-feb 2008). And yep it's harder to nVidia than ATi to add two more processing cores, in my opinion too as i stated somewhat before, as they have lighter SP units.
 
Last edited by a moderator:
/off:
weird Prescott fragmented design. Yep Prescott has cache on one side but it was fragmented on die-shot unlike other classic CPU cache designs.
In fact, the Willamette/Northwood layout is way more "fragmented" than Prescott's one. The latter has been additionally streamlined by a significant margin, to achieve better clock signal distribution across the IC blocks.
 
Charlie says 4 DX11 GPUs from ATI in or before October.

No, he says Cards - which is in line what you would expect with the expected 2 GPUs (dual highend, highend, crippled highend (ddr3?), mainstream) - actually i would expect 5 (2 cards on the mainstream gpu). Could ofcourse also as rpg says be just one GPU.
 
How unusual? When RV770 also had a huge cache just differently spread , and G200 has a huge cache in that cross surrounded by 10 cores.
Are you using the words "cache" and "huge", the way they are commonly used?
GT200 has 256 KiB of L2 and 24x10 KiB of L1 texture cache. About half a meg of cache throughout the entire chip.
AMD's texture caches are possibly over half a meg.

There is much more capacity in the register files within the SIMDs for both chips.

That's what i was trying to point out. There's better and cheaper quick solution than cap-ring that they needed to patch R700 series. But somehow i don't believe that these hot-spots as you call them would make any problem for them, cause they have one large hot-spot in the center these days ;) And yes that's why i point out that nVidia's G200 seems like really greener solution from the design standpoint, but not perfect ofc.
The decap ring would have been used for signal integrity issues, and whatever affect it had on hot-spots seems to be of secondary importance, as the thermals for RV790 appear proportionate to the increased clocks.

And that's why i mention Elpida's solution as a great one in it industry in recent years. Better performing , maybe just for memory blocks, than weird Prescott fragmented design. Yep Prescott has cache on one side but it was fragmented on die-shot unlike other classic CPU cache designs.
I don't see any signficant fragmentation in the die shots I googled.

Not such a long time ... maybe few month, but it's better for them to just add two more cores (uneven placement of building blocks on pretty symmetrical G200 die). Anyway who add some parts after whom .... it's just speculative but G200 shouldn't be 584mm2, but around 500mm2 (472mm2 iirc, initial web specs-ulations in jan-feb 2008). And yep it's harder to nVidia than ATi to add two more processing cores, in my opinion too as i stated somewhat before, as they have lighter SP units.
I ran across commentary that somebody at Nvidia's admitted that they shaved off 2 clusters to arrive at 30 due to reticle limits.
Since digital folks love their powers of 2, it would seem 32 would have been a prettier target than growing two clusters from 28.

It would be like how AMD initially planned for 8 SIMDs, before upping the count to 10 for RV770.
 
I ran across commentary that somebody at Nvidia's admitted that they shaved off 2 clusters to arrive at 30 due to reticle limits.
Since digital folks love their powers of 2, it would seem 32 would have been a prettier target than growing two clusters from 28.
If this is true, it would also have meant a different ALU:TEX ratio for the hardware, as GT200's clusters contain 3 multiprocessors.

So, either 2:1 (like G92) giving 16 clusters, hence 128 TMUs - very unlikely I'd say. Or 4:1 giving 8 clusters and 64 TMUs (same as G92).

Would deleting 4 quad TMUs and 2 clusters but with 2 added multiprocessors have fitted? Each cluster would have increased in internal complexity to cope with the routing amongst 4 multiprocessors.

On the other hand there would have been less routing twixt clusters and ROP/MC partitions: 8:8 instead of 10:8.

Jawed
 
If this is true, it would also have meant a different ALU:TEX ratio for the hardware, as GT200's clusters contain 3 multiprocessors.
They could add the extra TMUs to keep the ratio the same.
Looking at the layout of the chip, putting in the extra clusters and not expanding the hardware area situated between them would just leave empty die space.
 
They could add the extra TMUs to keep the ratio the same.
Looking at the layout of the chip, putting in the extra clusters and not expanding the hardware area situated between them would just leave empty die space.
I don't understand what you're suggesting. What configuration with 32 multiprocessors has 3:1 ALU:TEX?

When you said 30 clusters earlier ("shaved off 2 clusters to arrive at 30"), I presume you meant 30 multiprocessors. There's only 10 clusters.

Jawed
 
I don't understand what you're suggesting. What configuration with 32 multiprocessors has 3:1 ALU:TEX?

When you said 30 clusters earlier ("shaved off 2 clusters to arrive at 30"), I presume you meant 30 multiprocessors. There's only 10 clusters.

Sorry, I got my head mixed up juggling the usages of "core" and "cluster".

The realworldtech piece said it was originally intended for there to be 32 SMs, which was what I was calling clusters because I was trying not to use the word "core" or "SIMD" to (fail to) avoid adding even more term confusion.
 
I am not sure if it was already posted but I couldn't find it. Here are some speculations from BSN based on German ATI-forum:
The Evergreen chip should feature 1200 cores, divided into 12 SIMD groups with 100 cores each [20 "5D" units], while RV770 was based on 10 SIMD group with 80 cores total [16 "5D" groups consisting out of one "fat" and four simpler ones].
Thus, it is logical to conclude that when it comes to execution cores, not much happened architecturally - ATI's engineers increased the number of registers and other demanding architectural tasks in order to comply with Shader Model 5.0 and DirectX 11 Compute Shaders.
The core is surrounded with 48 texture memory units, meaning ATI is continuing to increase the ROP:Core:TMU.
For the first time, ATI is shipping a part with 32 ROP [Rasterizing OPeration] units, meaning the chip is able to output 32 pixels in a single clock.
 
That's one of the possibilities that was discussed previously based on earlier rumors.

One wrinkle is that for sane batch sizes, 20 lanes and 32 ROPs do not match up.
20 lanes with two 16 ROP banks can work with an increase in the time it takes for a ROP section to output a batch.

The die shot seems to indicate that certain things have been shuffled around at a higher level than SIMD organization.
 
Because branch granularity isn't the most important thing? It costs a lot of transistors, but performance is affected only in specific cases. ATi needs primarily to increase quantity of ROPs and arithmetic power, which is used for more and more purposes.
 
Hmmm... what happened to the 24ALUs per SIMD, 10 SIMDS, 8TMUs per SIMD rumor?

Re-looking at the numbers though, a 32ALU SIMD makes more sense to me.
Leaves room for a 10 and 12SIMD refresh if need be, hopefully on 28nm.

RV830(~180mm2)- 960ALUs in 6SIMDs, 48TMUs, to go with the 16ROPs and 128bit bus.
RV870(~240mm2)- 1280ALUs in 8SIMDs, 64TMUs, to go with the 32ROPs and 256bit bus.
RV810(~80mm2)- 320ALUs in 4SIMDs (or would RV730's 8SIMDs work better?) 16 or 32TMUs with a 8ROPs on a 64bit bus. Hate doing spec on lowend cause there are too many different variants that might work.

RV830 replaces RV770/RV790 in the under $150 range.
Lower RV870 takes the $200-$250 range.
RV870 takes the $300-$400 range.
RV740 keeps the under $100 while RV810 replaces RV730 in the under $70 range.
 
I'm personally hoping for an up to 2x times performance increase over RV770, which is under normal conditions one of the targets IHVs set for each new generation.

So irrelevant of the unit amounts I'm far more interested in what architectural changes they've gone through per unit first above all. I just don't buy the supposed RV770+X11 compatibility rumors that float around.
 
Back
Top