Llano IGP vs SNB IGP vs IVB IGP

AnarchX · Oct 29, 2010

How do you think they will compare?

Llano:
- 32nm
- 400SPs (5D VLIW) @ up to 600MHz
- dual-channel DDR3 @ ~ 1.6Gbps
- mid 2011

Intel Graphics HD 200:
- 32nm
- 12 EUs (4D MADDs?) doubled troughput over last generation , 4 TMUs, clocks up to 1.35GHz
- Direct3D 10.1 support, OpenCL, DirectCompute
- connected to 8MiB LL-cache
- dual-channel DDR3 @ ~ 1.6Gbps
- early 2011

Iy Bridge Graphics:
- 22nm
- 16 EUs according to Intel
- Direct3D 11 support
- stacked DRAM?
- early 2012

Chabi · Oct 29, 2010

SNB IGP OpenCL compatible?

AnarchX · Oct 29, 2010

Ce core graphique intègre cependant le support de l’antialiasing pour pouvoir passer à DirectX 10.1. Il supporte également OpenGL 3.1 et, plus intéressant, OpenCL. DirectCompute en version 4.1 est également au menu.

http://www.hardware.fr/articles/803-5/idf-2010-atom-sandy-bridge-honneur.html

mczak · Oct 29, 2010

AnarchX said:
Intel Graphics HD 200:
- 12 EUs (4D MADDs?) doubled troughput over last generation , 4 TMUs, clocks up to 1.35GHz

The EUs still can't do MAD. They can, however, do MAC (with a special accumulator reg), and, in contrast to the last generation, enable/disable accumulator update per instruction, which might make it more easy to exploit this. Earlier EUs were 4D physical, 8D logical (well they had 4D mode but such a 4D instruction still took 2 cycles), so it's possible (but I don't know) they are 8D physical now (which would explain the "double throughput" but maybe that quote was meant to describe something else).
I'm quite sure there were 8 TMUs even for i965 already (though not sure what they could do per clock), and I certainly wouldn't expect SNB to have less (in theory, it could have more, since it appears some versions will have 6 EUs the other 12 EUs, it's possible at least on paper the tmu block isn't shared).
In any case, texture fillrate should be quite good even with 8 TMUs (possibly approaching Llano levels), with the caveat I've no idea about FP16 etc. For flops, if that's 4D units, you're looking at ~120GFlops if you count that MAC as 2 ops. If that's 8D units, well then that's twice that which would begin to look nearly comparable to Llano.
So for Ivy Bridge, if that basically doubles SNB graphics performances, that could be quite a challenge for Llano. Though of course there's a lot more to graphic performance than just alus/tmus - one area intel was very weak was what AMD initially named HyperZ, things like early-z (though intel can do this now), z buffer compression etc to save bandwdith. I think though SNB improves this quite a bit, and the 8MB cache could give it a huge advantage in some situations since these chips are quite a bit bandwidth-challenged.

AnarchX · Nov 9, 2010

Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

- still 32nm
- probably L3-Cache connection for IGP
- probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)
- probably mid 2012 release
- Komodo probably with 3 memory channels or GDDR5 sideport

chavvdarrr · Nov 9, 2010

AnarchX said:
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

- still 32nm
- probably L3-Cache connection for IGP
- probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)
- probably mid 2012 release
- Komodo probably with 3 memory channels or GDDR5 sideport

I had a feeling that Zacate has 2 SIMDs with 80SPs total

AnarchX · Nov 10, 2010

chavvdarrr said:
I had a feeling that Zacate has 2 SIMDs with 80SPs total

The topic is about higher performance APUs/CPU-IGP-chips: Llano IGP vs SNB IGP vs IVB IGP.

hkultala · Nov 10, 2010

AnarchX said:
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

- still 32nm

Yes, of course.

- probably L3-Cache connection for IGP

I see nothing suggesting this.

- probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)

.. except that Llano will not have 6 but 3 SIMD cores (240 ALUs).
And I don't except them to increase die size much, would be too costly to manufacture.

My estimate is increase from 3(*80) to 4(*64)

- probably mid 2012 release
- Komodo probably with 3 memory channels or GDDR5 sideport

AMD has never used non-2-power memory buses before. I don't except them to do it with Komodo either.

hkultala · Nov 10, 2010

AnarchX said:
- Komodo probably with 3 memory channels or GDDR5 sideport

And there won't be a sideport in a chip which does not contain a GPU.

AMD's PDF document for the investor day:

http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9Njk3NDJ8Q2hpbGRJRD0tMXxUeXBlPTM=&t=1

AMD nov 9 pdf said:
“Komodo”
Market: Server and Performance Desktops
What is it? “Komodo” is AMD’s next generation CPU and is primarily intended for
servers and high-performance desktops. “Komodo” will feature next-generation
“Bulldozer” CPU cores and, in desktop PC platforms, is designed to couple with
DirectX® 11 GPUs to provide enthusiast-level system performance.
Planned for introduction: 2012

caveman-jim · Nov 10, 2010

"designed to couple with" doesn't prove the existence of sideport.

keritto · Nov 11, 2010

AnarchX said:
Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

Komodo is listed asCPU, and you should differentiate it from Llano and NG-Trinity as it could be seen in slides

Komodo is CPU and guesstimating that it will probably be augmented with GPU similar to one used in Ontario/Zacate APUs, up to 80SPs (5D-VLIW) but more probably 64SPs "3rd Gen DX11" 4D-VLIW with other TMU:ROPS unchanged from O/Z. My guess is that Komodo will probably addressing lack of IGPs in new chipsets and also make it more comparable to intels SB. And it will be socket compatible with Zambezi (AM3r2)

As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders). But then maybe AMD will stay to 2-4 BD cores just so they could add up necessary 4MB of L3 cache to it instead of extra 2 BD cores.

Trinity
2-4BD cores (4MB L2 cache)
4MB L3 cache
640SP (4D DX11 gen3)
sFM1/sFS1

or better (?)
4-6BD cores (6MB L2 cache)
no L3 cache
640SP (4D DX11 gen3)
sFM1/sFS1

second solution would certainly need less job to adapt Llano style APU design to Trinity design.

And does GPU really benefit from additional 4MB L3, instead already large 6M L2 (total for six BDv1 cores) available in HPC case. And for most of 3D/gaming work Llano and probably Trinity will rely on cheap 128-bit DDR3 1866MHz memory BW giving 30GB/s in total (shared w/ CPU) which is probably even good enough for budget dual display 1080p noAA/noAF gaming (considering for praised 640SP), or single 1080p 2AA/16AF?

hkultala · Nov 11, 2010

keritto said:
As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders). But then maybe AMD will stay to 2-4 BD cores just so they could add up necessary 4MB of L3 cache to it instead of extra 2 BD cores.

more than 4 bulldozer cores/2 bulldozer modules would make it too big.
It's still manufactures at 32nm, and it's not a high-end products, so it must not big too big/too expensive to manufacture.

And I don't see L3 cache as "necessary thing" for this market segment. With 2*2 MB L2 cache there is already plenty of cache.

mczak · Nov 11, 2010

keritto said:
As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders).

I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.

hkultala said:
And I don't see L3 cache as "necessary thing" for this market segment. With 2*2 MB L2 cache there is already plenty of cache.

Well, the advantage of L3 is that you can use it for graphics too - L2 being exclusive to the cpu cores. This also probably means you can make the L2 cache attached to the ROPs smaller if you've got shared L3 and it's still faster (as the gpu l2 cache wasn't that large). Clearly, for Phenom II / Athlon II the L3 cache did not really help THAT much - but that balance should shift towards the solution with L3 cache in terms of performance benefits / area if you can also use it for the graphic core. It might require some changes to the MC/graphic core though, which might be something AMD isn't willing to do (as they couldn't just use basically unchanged discrete gpu cores).

hkultala · Nov 11, 2010

mczak said:
Well, the advantage of L3 is that you can use it for graphics too - L2 being exclusive to the cpu cores.

What makes this an advantage?

hkultala · Nov 11, 2010

mczak said:
I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.

Yep.

And the size of the GPU part of the chip also seems to indicate it has 240 shader ALU's, not 480.

Alexko · Nov 11, 2010

mczak said:
I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.

They said 500+ GFLOPS. That sounds to me like 480SPs @ ~550MHz or maybe 400SPs @ ~630MHz.

240SPs at ~1040MHz just doesn't seem realistic, power-wise.

That GPU-part looks to be around 100mm², which is close to Redwood's size, but on 32nm.

mczak · Nov 12, 2010

Alexko said:
They said 500+ GFLOPS. That sounds to me like 480SPs @ ~550MHz or maybe 400SPs @ ~630MHz.

240SPs at ~1040MHz just doesn't seem realistic, power-wise.

The quote was 400-500 GFlops. And from how it was worded, it was for the whole chip. Which leaves 300-400Gflops for the GPU. With 240SPs that gives you 625-830Mhz. Sounds doable to me.

That GPU-part looks to be around 100mm², which is close to Redwood's size, but on 32nm.

You are right it looks quite big.

Alexko · Nov 12, 2010

mczak said:
The quote was 400-500 GFlops. And from how it was worded, it was for the whole chip. Which leaves 300-400Gflops for the GPU. With 240SPs that gives you 625-830Mhz. Sounds doable to me.

You are right it looks quite big.

There was another comment during analyst day, where the guy said 500+ GFLOPS, worded in a way that makes me think it was just for the GPU. I don't have time right now but I'll try to find it a link it later today.

mczak · Nov 12, 2010

Alexko said:
There was another comment during analyst day, where the guy said 500+ GFLOPS, worded in a way that makes me think it was just for the GPU. I don't have time right now but I'll try to find it a link it later today.

Even with 500+ gflops for the gpu, shouldn't 400 SPs be more than sufficient? That would only need 625Mhz. Shouldn't the 32nm SOI process actually allow clock increases over 40nm bulk? Granted the structure doesn't really look like that. But it would be strange imho if there would be so many simds (hence increasing cost) but then they'd be clocked so low.

Alexko · Nov 12, 2010

mczak said:
Even with 500+ gflops for the gpu, shouldn't 400 SPs be more than sufficient? That would only need 625Mhz. Shouldn't the 32nm SOI process actually allow clock increases over 40nm bulk? Granted the structure doesn't really look like that. But it would be strange imho if there would be so many simds (hence increasing cost) but then they'd be clocked so low.

400 SPs seems plausible, but 240 doesn't, IMO.

I can't find a free transcript for Tuesday's analyst day, but I think the quote in question was during the Client platforms breakout session, for which the webcast is still available.