Llano IGP vs SNB IGP vs IVB IGP

TKK · Jan 8, 2011

AnarchX said:
So Intel may counter Llano with a clock upgrade?

I agree with Alexko, highly unlikely.
Not really feasible in mobile space were power efficiency is so important, and I expect the desktop version of Llano to be clocked quite a bit higher than the mobile version, so it wouldn't help much in desktop space, either.

It will simply be like this:
- People who know a bit and want the fastest CPU with a somewhat acceptable IGP (or don't care about the IGP at all) will buy SB.
- People who know a bit and just want a fast enough CPU with decent IGP will buy Llano.
- And those who don't know anything will base their decision either on brand recognition or price.

Conclusion: Intel wouldn't gain anything by brute-forcing their IGPs to significantly higher clocks, so they won't do it.

AnarchX · Jan 8, 2011

According to Intel HD Graphics 3000 delivers 105-125 GFLOPs SP MAD: http://software.intel.com/file/33239 page 11

mczak · Jan 8, 2011

AnarchX said:
According to Intel HD Graphics 3000 delivers 105-125 GFLOPs SP MAD: http://software.intel.com/file/33239 page 11

12 (EU) * 4 (physical width of EU) * 2 (mul + accumulate) * 1.3Ghz.
Strange, I don't see where the quoted doubling (per clock) from previous generation comes. Maybe intel counts different for new gen.

EduardoS · Jan 8, 2011

Or is counting fps instead of flops...

mczak · Jan 13, 2011

Blazkowicz said:
maybe the reviewer used a 5450 with gddr3 in the first link, and a 5450 with ddr2 was used in yours one.

No, actually it looks like it was just specific to COD:MW2. Computerbase has their full review up for SB graphics (http://www.computerbase.de/artikel/grafikkarten/2011/test-sandy-bridge-grafik/12/) and indeed in every other title the AA hit is roughly comparable to HD 5450. Could be either driver or something the game is doing SB doesn't like, but generally the AA implementation seems fine.

Tridam · Jan 14, 2011

mczak said:
12 (EU) * 4 (physical width of EU) * 2 (mul + accumulate) * 1.3Ghz.
Strange, I don't see where the quoted doubling (per clock) from previous generation comes. Maybe intel counts different for new gen.

AFAIK previous Intel IGP can't do single cycle MAD.

mczak · Jan 14, 2011

Tridam said:
AFAIK previous Intel IGP can't do single cycle MAD.

Neither can the current one...
Unless you count the accumulator (whose usefulness has improved, but nothing which would change the theoretical peak flop rate as far as I can tell).
IIRC docs said you couldn't do back-to-back accumulator write followed by accumulator read without stalling (which might no longer be the case dunno), but for a constant stream of MACs this should not affect peak flop rate.

I.S.T. · Jan 18, 2011

http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=99&Itemid=1

rpg.314 · Jan 22, 2011

Regarding Ivy Bridge GPU, we know it will be DX11, so,

a) How would intel implement local memory? Dedicated ram (a la cayman/fermi) or unified with the rest of the cache hierarchy like lrb? 64 KB of L1 seems doable on somewhat low freq (~1.1 GHz for full turbo).

b) Considering the jumps Intel has made with SB, I would expect unified cpu/gpu address space.

ltcommander.data · Jan 25, 2011

AnarchX said:
According to Intel HD Graphics 3000 delivers 105-125 GFLOPs SP MAD: http://software.intel.com/file/33239 page 11

So Intel's pretty clear on Sandy Bridge's IGP supporting Compute Shader 4.x, presumably meaning both CS4.0 and CS4.1, mentioning it on pg 14 and pg 16. Has anyone actually tested this capability yet, since I don't remember seeing it in any reviews?

http://arstechnica.com/apple/news/2010/12/apple-may-drop-nvidia-for-sandy-bridges-igp-next-year.ars
http://www.realworldtech.com/page.cfm?ArticleID=RWT120710035639

And what does this say about Sandy Bridge IGP OpenCL support? Chris Foreman and David Kanter were pretty adamant that Sandy Bridge's IGP couldn't/wouldn't support OpenCL, with Foreman going so far as saying Sandy Bridge couldn't do any GPGPU at all, which now appears incorrect with CS4.x support. If I'm not mistaken all GPUs that have Compute Shader support, CS4.0 in the G80 and up and CS4.1 in the RV770 and up, have also supported OpenCL. Is this generally true, that CS4.x and OpenCL have enough overlap that Intel should be able to support OpenCL as well?

Axel · Jan 27, 2011

mczak said:
Neither can the current one...

http://intellinuxgraphics.org/IHD_OS_Vol4_Part2_July_28_10.pdf
lists mad as opcode 0x5b (page 154), and has full description in 8.3.25 (p 211)

AnarchX · Jan 28, 2011

Fudo says that IVB features "only" 16 EUs: http://www.fudzilla.com/graphics/item/21658-16-graphics-eus-in-ivy-bridge

Could it be possible that Intel goes from MACs to MADDs?

HD Graphics 3000 @ 1,35GHz : ~130 GFLOPs
IVB Graphics @ ~1,5GHz: ~380 GFLOPs?

mczak · Jan 28, 2011

AnarchX said:
Fudo says that IVB features "only" 16 EUs: http://www.fudzilla.com/graphics/item/21658-16-graphics-eus-in-ivy-bridge

Could it be possible that Intel goes from MACs to MADDs?

HD Graphics 3000 @ 1,35GHz : ~130 GFLOPs
IVB Graphics @ ~1,5GHz: ~380 GFLOPs?

I've got some doubts they will support normal MADD but either way obviously intel counts it the same so it wouldn't improve flops.
I was wondering though with the earlier rumors about 24 EUs and now 16 EUs, maybe both rumors are true? Some chips with 24 and some with 16, instead of the current 12 or 6?

rpg.314 · Jan 28, 2011

AnarchX said:
Fudo says that IVB features "only" 16 EUs: http://www.fudzilla.com/graphics/item/21658-16-graphics-eus-in-ivy-bridge

Could it be possible that Intel goes from MACs to MADDs?

HD Graphics 3000 @ 1,35GHz : ~130 GFLOPs
IVB Graphics @ ~1,5GHz: ~380 GFLOPs?

I don't know how accurate this is, but as a general rule of thumb - and paraphrasing Arun - people leak stuff to Fudzilla only when they want to deny it.

DavidC · Jan 29, 2011

mczak said:
I've got some doubts they will support normal MADD but either way obviously intel counts it the same so it wouldn't improve flops.
I was wondering though with the earlier rumors about 24 EUs and now 16 EUs, maybe both rumors are true? Some chips with 24 and some with 16, instead of the current 12 or 6?

Maybe not. Earlier rumors of G35 had EUs at 16, but it came with 8. They only increased EU count by 20-30% each generation so 16 for IVB isn't surprising.

BTW, about throughput, I don't know how they got the flops number, but just for the claim of doubled throughput might have been with geometry processing(VS/T&L) performance. Read the developer guide for HD graphics Sandy Bridge.

3dcgi · Jan 29, 2011

AnarchX said:
Fudo says that IVB features "only" 16 EUs: http://www.fudzilla.com/graphics/item/21658-16-graphics-eus-in-ivy-bridge

Could it be possible that Intel goes from MACs to MADDs?

HD Graphics 3000 @ 1,35GHz : ~130 GFLOPs
IVB Graphics @ ~1,5GHz: ~380 GFLOPs?

What is your distinction between MAC and MADD? To me they are two names for the same instruction.

rpg.314 · Jan 29, 2011

3dcgi said:
What is your distinction between MAC and MADD? To me they are two names for the same instruction.

Sorta like FMA3 and FMA4, IMO.

mczak · Jan 29, 2011

rpg.314 said:
Sorta like FMA3 and FMA4, IMO.

Yes, though not quite the same - IIRC FMA3 has several versions so you can choose which of the source regs is also used as destination (as you can specify 3 operands in total). intel igp however will always use the accumulator reg for last source operand.
The manual (from Ironlake) says:
"The mac instruction takes component-wise multiplication of <src0> and <src1>, adds the results with the corresponding accumulator values, and then stores the final results in <dst>."

rpg.314 · Jan 29, 2011

mczak said:
Yes, though not quite the same - IIRC FMA3 has several versions so you can choose which of the source regs is also used as destination (as you can specify 3 operands in total). intel igp however will always use the accumulator reg for last source operand.

That doesn't seem like much of a distinction. You can always change the order of your operands.

mczak · Jan 29, 2011

rpg.314 said:
That doesn't seem like much of a distinction. You can always change the order of your operands.

How so? In a * b + c there's only one operand which gets added to the multiplication result.
Also, you need to get the operand into the accumulator reg first - luckily you can do that "for free", that is instructions can implicitly update the accumulator reg (or they can do it explicitly). You don't need to do that with FMA3 (so for FMA3, the hw still has to be able to fetch 3 normal regs, not so for igp mac).