AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
The blocks change colour depending on the angle of the light, so you can't use this picture's colour difference as the basis for anything.

If you compare lots of different areas of the wafer you'll start to discern structure in the die. Suffice to say that the "four large blocks" turn out to be different sizes and shapes with smaller blocks around the edges (i.e. the die no longer looks so simple).

Jawed
 
www.nordichardware.com has some more info:

http://www.nordichardware.com/news,9486.html

"AMD showed live DirectX 11 hardware at Computex, although with rigged FPS gage to hide the actual performance. The hardware was performance level, not high-end,"

"AMD is very confident about its coming DirectX 11 lineup and even though it used the same performance level hardware, and not high-end, it claimed it was the fastest single-GPU graphics card on the market. They said that the upcoming launch will be even more of a surprise than when they shocked the world with the 800sp RV770 GPU, still vague but very intriguing. "

/Kef

Very interesting, but when they say it isn't the "high end hardware", that doesn't necessarily mean it isn't the top end single chip solution. They could still be referring to the high end as the X2 variant. If thats the case then its nice to know the fastest single chip RV870 will be faster than the GTX 285 but its not particularly unexpected.

If on the other hand this isn't the fastest single chip RV8xx, well then thats excellent news :D
 
Very interesting, but when they say it isn't the "high end hardware", that doesn't necessarily mean it isn't the top end single chip solution. They could still be referring to the high end as the X2 variant. If thats the case then its nice to know the fastest single chip RV870 will be faster than the GTX 285 but its not particularly unexpected.

If on the other hand this isn't the fastest single chip RV8xx, well then thats excellent news :D

It's the "They said that the upcoming launch will be even more of a surprise than when they shocked the world with the 800sp RV770 GPU" that intrigues me. ;)
 
The downside of die shots is that natural light shots don't look particularly interesting, and the prettiest shots involve polarized light and digital editing.

Applying what is seen there to a one-off photo in iffy ambient light might not be the best.

If the color regions visible are of the logic side of the chip, it can point to delineations of areas of high regularity and density like the SIMDs, and less regimented units.

In RV770, the SIMDs took up the center of the die.
The die shots of the latest 40nm Nvidia chips had an interesting visual parallel, with two groups of SIMDs, which did not line up symmetrically, but were off the center line with respect to each other.

Scale up the SIMD size to the proportions AMD uses, and push the SIMDs a little further to the sides and we'd get a similar arrangement to the blurry Cypress die shot.

Perhaps AMD no longer surrounds the SIMD hardware with a ring of other logic.
It might also indicate other bifurcations, like two ROP banks, and two SIMD banks.
 
I'm holding out for no RBEs :p

With 2000 ALU lanes you have ~60 ALU cycles to do blending, Z/stencil testing and MSAA sample manipulation/resolve to match the performance of 32 RBEs.

Or, 2000 scalar operations per clock compared with 32 colour blends and 128 MSAA samples tested/updated.

Broad guess: I reckon the RBEs in RV770 are about 25% of the die (10 clusters are 40%, 160 ALU lanes are 30%+, physical part of memory IO is about 14%). Suggesting the RBEs are equivalent in area to ~640 ALU lanes. So in the next GPU, 32 RBEs would be equivalent in area to ~1280 ALU lanes.

Add that to the 2000 :p

Let's call it 40 clusters (each of 80 ALUs) in round numbers, 10 clusters per 64-bit MC.

Jawed
 
The color variations would seem to preclude ditching RBEs.

Unless something's particularly wacky, the color variation would indicate a physical difference in the arrangement and density of the silicon in that area.

Unless AMD threw in two completely different SIMD designs into the die, it would follow that the physically regular SIMDs would have similar colors. We're left with maybe half the die that is the same color--and by extension only half the die for ALUs.

If there are no RBEs and only half the die is SIMDs, that's a really terrific ALU density for the SIMDs, and lot of something else to fit in the other half.
 
You still need other stuff on the die, e.g. control and data routing plus texturing - and render target (de)compression.

But I agree that while the wafer shot initially presents "4 large blocks" it turns out each of them is smaller - and actually they don't look like they're all the same size.

Jawed
 
A possible RBE/IMC blocks location:

85577904p.jpg


As you see, there are three types of block logic, each one is replicated four times across the perimeter, leading to the suggestion that this "complex" is the RBE. I guess the green ones are the IMCs, judging by the more even distribution across the die?!
 
Last edited by a moderator:
The other theoretically heavyweight unit that's out in that part of the die is the shader export - the unit that buffers the data produced by each "stage" of shading (e.g. vertices) and routes them on for further processing. The fragment data produced by pixel shading is the most arduous in terms of volume, I guess, but vertices seem "more clumpy" I suppose.

MCs have their own buffering of some kind (re-ordering, coalescing) and perhaps the L2s are closely associated with them. The RBEs have colour, Z and stencil buffer caches. Then there's the (de)compression hardware with associated tag tables and there's also hierarchical-Z support.

Jawed
 
The color variations would seem to preclude ditching RBEs.

Unless something's particularly wacky, the color variation would indicate a physical difference in the arrangement and density of the silicon in that area.


Seems everyone fail to remember Prescot die. It looked a like pile of ultra-fragmented shit and intel's named reason was to improwe power dissipation of too complex deep pipeline chip for great MHz scale-up from cool Northwood core. Well we all know where it end up :D

Anyway this so called pattern looks to me like a simple checkerboard solution (for future simple build ups around it, maybe) L2cache+whatever number of pipelines should look alike and exact same combination rotated 180° on down part of die. Just to improve power dissipation over a die, and loose a big cap-ring needed to provide extra juice for maybe overwhelmed infant architechture. It's not a lot in fact but seems it works well for AMD. I'd like more rotating a navigating too see in R600-descent cores, like weird Elpida's DRAM architecture, but that's maybe too much to ask
 
Very interesting, but when they say it isn't the "high end hardware", that doesn't necessarily mean it isn't the top end single chip solution. They could still be referring to the high end as the X2 variant. If thats the case then its nice to know the fastest single chip RV870 will be faster than the GTX 285 but its not particularly unexpected.

If on the other hand this isn't the fastest single chip RV8xx, well then thats excellent news :D

Yet still they seem to achieve it in lower transistor budget than G200b and way lower dimensions (even shrinking 55nm G200b would still produce die bigger than 182mm2).
Now we need some power numbers! I say it will be less than RV770 :smile:
 
Seems everyone fail to remember Prescot die. It looked a like pile of ultra-fragmented shit and intel's named reason was to improwe power dissipation of too complex deep pipeline chip for great MHz scale-up from cool Northwood core. Well we all know where it end up :D
I would argue a high performance x86 CPU from years ago would have a lower amount of relevance than recent GPUs that have much more commonality in process, design targets, and engineering resources.

Anyway this so called pattern looks to me like a simple checkerboard solution (for future simple build ups around it, maybe) L2cache+whatever number of pipelines should look alike and exact same combination rotated 180° on down part of die.
That pattern may not be that simple, as even with the blurry shots we have we have evidence of variation between the sections.

GPUs have a lot of other units that need to be on-die, and I am not certain devoting half of the die to an L2 for a GPU would yield improvements in workloads that so far have gotten by fine without so much cache.
 
Yet still they seem to achieve it in lower transistor budget than G200b and way lower dimensions (even shrinking 55nm G200b would still produce die bigger than 182mm2).
Now we need some power numbers! I say it will be less than RV770 :smile:

That will indeed be quite an achievment if true. And if AMD get the big lead on NV as is rumored then we could literally be looking at a full DX11 GPU thats faster than the GTX285 and comes in at a relatively mainstream price!

The 285 will be rendered instantly obsolete or forced tonose dive its price (having a knock on on the 275 and 260 as well).

Thats if AMD chooses to nit NV on the price front as well of course. They may simply choose to charge more for RV870 given its superior performance and features.

I'm certainly considering picking one up if its at the right price.
 
That pattern may not be that simple, as even with the blurry shots we have we have evidence of variation between the sections.

You totally missed idea in my post you replied to. Seems intentional.

I was referring to some power dissipation advancement on that checkerboard die. not involving inner complexity of chip itself. Prescott had even L2 cache dispersed afair, and i'd try to sketch what i was talking.

*--*--*
|L2+ROP|SPU|
*--*--*
|SPU|L2+ROP|
*--*--*

So that power dissipation is more evenly disperse over die. And i said that it' does seem great but it obviously helps ATi with this Rxxx generation ("It's not a lot in fact but seems it works well for AMD")

GT200 has pretty advanced dispersion of it's hot cores on the outer edges, with simply added two more cores when they figure out that RV770 has 10 SIMD clusters instead anticipated 8. And it seems from it that cores dissipates much more energy than all that crosslike cache in the center of the chip.


Thats if AMD chooses to nit NV on the price front as well of course. They may simply choose to charge more for RV870 given its superior performance and features.

I'm certainly considering picking one up if its at the right price.


What's the right price? :D


I'm somehow doubtful about availability or the right prce of RV870 considering TSMC still yapping how they will improve 40nm yields while rapidly developing 28nm High-K Low Power process announced for the Q1 next year. And considering how low ATi priced their only 1 year old child just to fill up the budget for too expensive 40nm if they still pay for all the crap TSMC produce not just adequately working chips.

OTH, neither was 9700pro cheaper than 4800Ti products while delivering astonishing performance improvements over them.
 
Last edited by a moderator:
I was referring to some power dissipation advancement on that checkerboard die. not involving inner complexity of chip itself. Prescott had even L2 cache dispersed afair, and i'd try to sketch what i was talking.

*--*--*
|L2+ROP|SPU|
*--*--*
|SPU|L2+ROP|
*--*--*

So that power dissipation is more evenly disperse over die. And i said that it' does seem great but it obviously helps ATi with this Rxxx generation ("It's not a lot in fact but seems it works well for AMD")
How do you come to this conclusion based on one, poor image of the die? Do you even know what die was shown?

Idle speculation is one thing but idle speculation followed by statements such as "And i said that it' does seem great but it obviously helps ATi with this Rxxx generation" are pointless.

-FUDie
 
You totally missed idea in my post you replied to. Seems intentional.
I was stating that the grainy visual evidence indicated that the situation was not as simple as you claimed, and some evidence might potentially go against it.
I also stated that devoting such a large fraction of die area to cache, as your claim asserts, would be very unusual for a GPU.

I didn't debate the merits of the claim because I wasn't sure that the evidence showed what you claimed.

I was referring to some power dissipation advancement on that checkerboard die. not involving inner complexity of chip itself. Prescott had even L2 cache dispersed afair, and i'd try to sketch what i was talking.
Prescott did not distribute the L2 throughout the die. It was localized to one side.

*--*--*
|L2+ROP|SPU|
*--*--*
|SPU|L2+ROP|
*--*--*

So that power dissipation is more evenly disperse over die.
Possibly, but is this heat spread out evenly?
There will be two hot and two cold corners of the chip.
RV770 might be somewhat cooler on the edges, and in the one section with other logic, but why would this be less even than what you've claimed?

And i said that it' does seem great but it obviously helps ATi with this Rxxx generation ("It's not a lot in fact but seems it works well for AMD")
I'm not clear what this part means.

GT200 has pretty advanced dispersion of it's hot cores on the outer edges, with simply added two more cores when they figure out that RV770 has 10 SIMD clusters instead anticipated 8. And it seems from it that cores dissipates much more energy than all that crosslike cache in the center of the chip.
I don't think Nvidia had the time to change their design once RV770's specs were known to them.
The chips were released pretty close to one another, and GT200 had some amount of delay. The details for the design would have been set in stone for a long time, probably in the year prior to release. It might be more, the process takes 4-5 years these days.

The cross-like center area of the GT200 is also not all cache, or even mostly cache.
Much of that area is other forms of logic, with little islands of SRAM.
 
I guess that HD58XX will remain true in that it won't charge nV like amounts of money ($650) for a high end card at release but they should stay at a comfortable $300-$350 in the months they are the only DX11 parts on the market.
 
Back
Top