AMD: R8xx Speculation

Jawed · Jul 26, 2009

Nice, so that seems to confirm the 181mm² chip is 128-bit, as it matches the first version of the tessellation video that was hastily withdrawn, the video with "EG BROADWAY".

But the mass production stuff is misleading there, I reckon. If these are the same as the desktop chips, then the chips have to be in production for desktop anyway. And laptops will follow later. The supposed early sighting of RV740 was because AMD was trying to get it into laptops because of their longer design cycles.

<50% performance gain over RV740? That would confirm 128-bit I think (as opposed to 192-bit or 256-bit), with maybe slightly faster memory. I'm not sure what the fastest laptop RV740 memory is, though. But the clocks appear to top out at 650MHz. 750MHz would be 15%. 800 ALU lanes instead of 640 would be 25%. The two combined is <50%.

TDP in laptop is confusing: is that the chip or the chip + memory?

What is M2 and S3? The type of module? "MXM 2" and "soldered 3"?

Jawed

Vincent · Jul 26, 2009

Jawed said:
<50% performance gain over RV740? That would confirm 128-bit I think (as opposed to 192-bit or 256-bit), with maybe slightly faster memory. I'm not sure what the fastest laptop RV740 memory is, though. But the clocks appear to top out at 650MHz. 750MHz would be 15%. 800 ALU lanes instead of 640 would be 25%. The two combined is <50%.

Or more TMUs ???

w0mbat · Jul 26, 2009

They write 135% to 145% faster and not 135% to 145% of RV740. Read the original version.

BTW: RV8xx yields are better than RV740 @ NH

Jawed · Jul 26, 2009

But, well, I don't believe >100% faster for a 181mm² chip compared with RV740.

So maybe RV710 has been replaced by something very serious (pad limited, what could they do?). Or someone's just sloppy with percentages?

Jawed

no-X · Jul 26, 2009

As for the 180mm² GPU - 3 possibilities were mentioned:

1. 128bit - we know, that 100mm² are sufficient for 128bit bus (GDDR3). GDDR5 interface could require a bit more space, so lets say 120mm² (my guess). We expect, that the rest of space could be used for some interconnection (MCM)

2. 192bit - die-space of this GPU seems to be sufficient for 192bit bus, but this kind of decision wouldn't be typical for ATi. However, in this case number of ROPs could be increased to 24. As we saw, performance difference between HD4730 and HD4770 is quite significant in some games and the only difference between these two products are ROPs (8 vs. 16). Maybe that performance impact of ROPs was a bit underrated recently. As somebody mentioned, tesselator will produce more edges, which would require more ROPs to keep MSAA performance at acceptable level. The other possibility of course is eliminating ROPs and emulating their functionality via SPs. Anyway, there will be definately no MCM interface in this case.

3. 256bit - the smallest 256bit GPUs are RV670 (192 mm²) and Parhelia (180 mm²). I'm not sure, if I can consider Parhelia as good example, because it's mem. controller was simple and didn't support GDDR5. I think it wouldn't be possible to cram 256bit GDDR5 controller to 180mm2 GPU... but it's only my opinion.

Let's assume, that the GPU has no MCM interface and the bus width is 192bit. I have no idea, how complex will be the DX11 implementation, but I'll assume, that it will take as much die-space, as side-port on RV790. I'll also assume, that increased number of ROPs and decreased width of memory bus will compensate each another in terms of die-space. Situation would be quite simple then:

1 SIMD (SPs + texturing logic) of RV7xx takes about 9.3 mm². 6 additional SIMDs would take 56 mm² on 55 nm. RV790 is 288 mm², additional 56 mm² would make it 344 mm² large. Linear 40nm "shrink" would be 182 mm² large.

RV790 is 57% faster than RV740 (1680x1050 4xAA/16xAF, ComputerBase.de). 60% of additional texturing and math power + 50% of additional ROPs power could get in the +130% ball-park.

Someone could disagree because of the bamdwidth. I presume, that this (mainstream) part will be targeted only on MSAA 4x (not 8x or higher). We shouldn't forget, that 512MB HD4770 is capable to deliver 83% performance of 512MB HD4870 with only 45% of its bandwith (again, 1680x1050 4xAA/16xAF, ComputerBase.de). It should be possible to create a product, which would be +120% faster at this bandwidth (enough to get into the +130% ball-park). 1GB of video memory affects performance by 17% (based on the same test, again).

I think it's possible to create 180mm² 40nm GPU, which would perform at least twice as fast as RV740 (16 SIMDs, 24 ROPs, 192bit bus, 1GB of 2200MHz GDDR5), but I have no idea, how close are these specs to the ATi's next-gen GPU

w0mbat · Jul 26, 2009

Jawed said:
But, well, I don't believe >100% faster for a 181mm² chip compared with RV740.

So maybe RV710 has been replaced by something very serious (pad limited, what could they do?). Or someone's just sloppy with percentages?

Jawed

Me neither, but thats what they are writing. Maybe its true, maybe a typo or maby just made up.

mboeller · Jul 27, 2009

w0mbat said:
They write 135% to 145% faster and not 135% to 145% of RV740. Read the original version.

BTW: RV8xx yields are better than RV740 @ NH

On Chipbell you can now find the same information but with a twist. It seems to be 135-145% higher performance per watt not 135-145% higher performance.

Link: http://translate.google.de/translat...&extra=page%3D1&sl=zh-CN&tl=en&hl=de&ie=UTF-8

rjc · Jul 27, 2009

mboeller said:
On Chipbell you can now find the same information but with a twist. It seems to be 135-145% higher performance per watt not 135-145% higher performance.

Link: http://translate.google.de/translat...&extra=page%3D1&sl=zh-CN&tl=en&hl=de&ie=UTF-8

I think the original post had "Per Watt" too as well.

Note support for GDDR5 right down to the lowest chip. Also DDR3 across the range, Don't think GDDR3 is going to be around for too much longer(too expensive, and worse power characteristics too i think).

Finally re flood of products coming, as said by Nvidia, TSMC is in process upgrading 40nm capacity will be a free for all by about September or so.

Cause of above can see AMD is coming with lower volume products first while limited capacity is in effect (ie High End and Performance) that don't take too many wafers. After say September/October when capacity comes online can start their mainstream and entry level products that take lots of wafers.

Is quite interesting that like R7xx series launch AMD must have figured methods make their design process more modular so can quickly migrate their designs across to different market segments.

Pressure · Jul 27, 2009

Since the market is heading mobile, I could actually see the reason why they would want to ship their mobile offerings first.

Last year were the first year notebooks sold more than desktops, so if they have a large number of design wins it only makes sense.

hoom · Jul 27, 2009

Note support for GDDR5 right down to the lowest chip

That would save some die space, GDDR5 only memory controllers have to be more simple than one that does GDDR3 (& other formats) too?

trinibwoy · Jul 27, 2009

hoom said:
That would save some die space, GDDR5 only memory controllers have to be more simple than one that does GDDR3 (& other formats) too?

It's not exclusively GDDR5.

Jawed · Jul 27, 2009

I have made a small area breakdown spreadsheet for RV7xx GPUs, based on a high resolution die photo:

http://www.cupidity.f9.co.uk/RV7xxAreaBreakdown.xls

Note I have guessed which are the 4 MCs, the patches I've labelled "C":

I can't quite identify a fourth "B", which is a shame as A and B seem to go together as a pair

I'm assuming that there is a fourth B, so have counted the area of A+B as RBEs (including colour/z/stencil buffer caches) + L2s.

The naive scenario I have included in there is called "Juniper is RV740 with extra clusters and no D3D11-specific changes". That scenario is 16 clusters, 128-bit, no sideport - it has room to spare. Obviously we know there are D3D11-specific changes (and others):

enhanced tessellator
HS and DS slots required in scheduler
LDS is 32KB
texture filtering is precisely defined (could be expensive? return to big TUs?)
16KB burst fetch mode

RV740 also has room to spare. Refinements to the spreadsheet welcome

Jawed

mczak · Jul 27, 2009

rjc said:
Note support for GDDR5 right down to the lowest chip. Also DDR3 across the range, Don't think GDDR3 is going to be around for too much longer(too expensive, and worse power characteristics too i think).

You're probably right. GDDR3 has worse power characteristics than DDR3 as far as I can tell, and ddr3 is almost certainly always cheaper. In fact it seems now ddr3 512mbit parts are even available for graphics cards (without that you couldn't build 128-bit 512MB parts, since ddr3 chips only come in 16-bit wide flavors max, though I'm not sure this is even desired any longer), with frequencies up to 1.0Ghz. That's still slower than gddr3 (up to 1.3Ghz now), but the difference isn't that huge and if you require more memory bandwidth it might be more effective now to use narrower bus with gddr5 memory. Which, btw, now seem to be available at 1.5Ghz/6Gbps (not that this speed grade would be something for low-end chips, but that's more than twice the bandwidth per pin compared to the fastest gddr3 parts available).

Jawed · Jul 27, 2009

no-X said:
As for the 180mm² GPU - 3 possibilities were mentioned:

1. 128bit - we know, that 100mm² are sufficient for 128bit bus (GDDR3). GDDR5 interface could require a bit more space, so lets say 120mm² (my guess). We expect, that the rest of space could be used for some interconnection (MCM)

Maybe it's possible to turn the I/O areas through 90 degrees so that they take less perimeter?...

2. 192bit - die-space of this GPU seems to be sufficient for 192bit bus, but this kind of decision wouldn't be typical for ATi. However, in this case number of ROPs could be increased to 24. As we saw, performance difference between HD4730 and HD4770 is quite significant in some games and the only difference between these two products are ROPs (8 vs. 16). Maybe that performance impact of ROPs was a bit underrated recently. As somebody mentioned, tesselator will produce more edges, which would require more ROPs to keep MSAA performance at acceptable level. The other possibility of course is eliminating ROPs and emulating their functionality via SPs. Anyway, there will be definately no MCM interface in this case.

My dodgy spreadsheet indicates a 10-cluster, 192-bit, 24 RBE Juniper would leave ~20mm² for D3D11 improvements :smile:

3. 256bit - the smallest 256bit GPUs are RV670 (192 mm²) and Parhelia (180 mm²). I'm not sure, if I can consider Parhelia as good example, because it's mem. controller was simple and didn't support GDDR5. I think it wouldn't be possible to cram 256bit GDDR5 controller to 180mm2 GPU... but it's only my opinion.

If that was 32 RBEs then my spreadsheet indicates no, no space left for anything major for D3D11.

One of the problems with my spreadsheet is not knowing if there's a cap ring. It also assumes that RV740 has no sideport.

Someone could disagree because of the bamdwidth. I presume, that this (mainstream) part will be targeted only on MSAA 4x (not 8x or higher). We shouldn't forget, that 512MB HD4770 is capable to deliver 83% performance of 512MB HD4870 with only 45% of its bandwith (again, 1680x1050 4xAA/16xAF, ComputerBase.de).

Yes, RV740 is deceptively good, and a high RBE:bandwidth ratio is better than I dare hoped for

I just hope the 256-bit Evergreen GPU has 32 RBEs.

Jawed

Kaotik · Jul 27, 2009

Jawed said:
Maybe it's possible to turn the I/O areas through 90 degrees so that they take less perimeter?...

My dodgy spreadsheet indicates a 10-cluster, 192-bit, 24 RBE Juniper would leave ~20mm² for D3D11 improvements :smile:

If that was 32 RBEs then my spreadsheet indicates no, no space left for anything major for D3D11.

One of the problems with my spreadsheet is not knowing if there's a cap ring. It also assumes that RV740 has no sideport.

Yes, RV740 is deceptively good, and a high RBE:bandwidth ratio is better than I dare hoped for I just hope the 256-bit Evergreen GPU has 32 RBEs.

Jawed

This is something that I can't really understand, do we have any real reason to presume that it uses similar architecture to RV7xx?
I mean both overall architecture and unit-wise.

For example, look at RV6xx and RV7xx, did anyone expect them to ditch ringbus so quickly, or believe how much smaller they could make the shader units?

There could be several similar big changes in Evergreen-generation, or even bigger

Jawed · Jul 27, 2009

Charlie v Theo (actually, everyone v Theo)

http://www.semiaccurate.com/2009/07/27/plagiarism-rampant-it-journalism/

Forget that, what about this:

[...] Hemlock.", is not a low end part, is the highest end part. Hemlock is the code for the dual Cypress board, likely to be called 5870X2. The low end parts are Cedar and Redwood, mid-range is Juniper, and the high end single chip is Cypress. Hemlock is above that, a dual Cypress/X2 board, not the low end.
[...]
I also confirmed that there is no disinformation campaign floating around, no leaked slides with the wrong info, just the correct information. More to the point, Sylvie was the only person to officially get the names so far.

neliz · Jul 27, 2009

Lol, I heard that about HemRock too today, but not from Charlie. .it was named as a Hypothetical HD5900 card...

Jawed · Jul 27, 2009

Kaotik said:
There could be several similar big changes in Evergreen-generation, or even bigger

Sure. Feel free to speculate in that direction!

e.g. Texture filtering in D3D11 is strictly defined, the reference rasteriser is meant to be absolute in this I believe.

Though R600 doesn't show "highest possible quality" texture-filtering, I've been wondering if R600's fundamental architecture for texture filtering, which works in the fp16 domain, was ATI's step towards strictly defined texture filtering. Perhaps R600 can do D3D11-strict texture filtering (hmm, doubtful, I know).

If Evergreen is the return of fp16 TUs, then that's a lot of die space... RV770's conversion to old style TUs supposedly increased TU performance by 70% per mm².

Jawed

trinibwoy · Jul 27, 2009

Jawed said:
I've been wondering if R600's fundamental architecture for texture filtering, which works in the fp16 domain, was ATI's step towards strictly defined texture filtering.

Sorry if this is obvious but why would DX11's stricter requirements mandate full-speed FP16? DirectX guidelines are all about quality not performance right?

fellix · Jul 27, 2009

R600 had additional point samplers in there, not just the wider bilerp lanes. As for the possible native FP16 tex impl in R800, it would be justified, if game dev's are to be lured by the new FP compression formats and start using more extensively HDR texturing all around.

AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Within 1 or 2 weeks

Within a month

Within couple months

Very late this year

Not until next year

Jawed

Vincent

w0mbat

Jawed

no-X

w0mbat

mboeller

rjc

Pressure

hoom

trinibwoy

Meh

Jawed

mczak

Jawed

Kaotik

Drunk Member

Jawed

neliz

GIGABYTE Man

Jawed

trinibwoy

Meh

fellix

Similar threads