Clarkdale IGP benchmark

compres · Dec 19, 2009

Pardon my ignorance: what is a MathBox?

mczak · Dec 19, 2009

compres said:
Pardon my ignorance: what is a MathBox?

"MathBox", also called "extended math unit" is a unit inside the Gen4 intel igps (i965, g35, g45, Ironlake) performing all those complicated arithmetic instructions - rcp, rsq, sin, cos, pow, log,... It'll perform pretty much the same things as nvidias SFU (though this one will also do mul and attribute interpolation) or AMD's T unit (though this one also can do almost the same things as their other 4 units). However, the way this is achieved is quite different - in contrast to amds or nvidias solution the mathbox is a completely separate (shared by several shader units) unit which can't access the register file of the shader units, instead it's invoked by passing messages to it (similar to how the sampler is invoked actually, though sampler is shared by all units not just by some).
If you want to learn more about it, intel has published complete docs for it - or look at the open source linux driver

.

jalyst · Dec 21, 2009

This is really quite impressive all things considered....
I hope it doesn't spell the 'beginning of the the end' for the low-end graphics from AMD/nV.

DavidC · Dec 22, 2009

mczak, on that diagram there are total of 12 EUs which are split into 3 rows. Each rows say "3X Mathbox".

Now where have you got that info about the hw clip unit from? You are talking about Ironlake yes? Does that mean that the G45 doesn't have hw clip?

Even back in G965 they talked about Early Z. Were they using all software for that then?

codedivine · Dec 22, 2009

Add to this the rumor that it will support video encoding through GPGPU (for as yet unknown software), then it sounds very interesting.

mczak · Dec 22, 2009

DavidC said:
mczak, on that diagram there are total of 12 EUs which are split into 3 rows. Each rows say "3X Mathbox".

I must have missed that... linky?
Though if each row has indeed 3 Mathbox units, that would be a total of 9 Mathbox units then I guess. In that case I wonder though how that works (how does the hw decide to which unit the messages should go?)

Now where have you got that info about the hw clip unit from? You are talking about Ironlake yes? Does that mean that the G45 doesn't have hw clip?

Yes talking about Ironlake. There's some pieces in the opensource linux driver, in particular there's a BRW_CLIPMODE_KERNEL_CLIP bit which older chips didn't have, and I assume this does the clipping in fixed function hardware.
In older chips the hardware did handle some of the clipping, in particular the hardware did clip determination and trivial accept/reject - for other cases it spawned a new thread which handled the actual clipping. So this was still hardware, but handled on the execution units.

Even back in G965 they talked about Early Z. Were they using all software for that then?

No, why would they? That's after clipping anyway.

DavidC · Dec 23, 2009

It's from IDF Fall 2009 presentation

Filename: SF09_ARCS001_FIN
Presentation name: Desktop Platform Design Overview for Intel Microarchitecture(Nehalem) Based Platforms

Page 32

DavidC · Dec 23, 2009

Man, how can a reviewer be so stupid? The latest review Zhao Ping did with H55 is claiming its slower because of the chipset differences! Because somehow the H57 chipset which is a modified I/O Hub affects GPU performance right?

No, its cause you OVERCLOCKED the damn thing!

http://translate.googleusercontent....gle.ca&usg=ALkJrhi8p3118OsDcPcv_f6EqBiHoeisQQ

1/3 fast as the original!

Although there's a possibility of running only at 466MHz and the driver is slowing down a bit(around 20%), that won't make up for 3x.

mczak · Dec 23, 2009

DavidC said:
No, its cause you OVERCLOCKED the damn thing!

http://translate.googleusercontent....gle.ca&usg=ALkJrhi8p3118OsDcPcv_f6EqBiHoeisQQ

Hmm, I guess this means the original also had the GPU overclocked? It looks like it increased the reference clock (from 133 to 200Mhz) maybe the multiplier for GPU stays the same which would mean it's running at 800Mhz instead of 533Mhz (that is, ~10% above the levels of a core i3).

1/3 fast as the original!

It is actually 1/2 the performance almost spot on.
There's only one benchmark where it performed much worse than 2x compared to the original, World in Conflict. And I write this off as some outlier, since the performance drop doesn't make sense at all for going to 1920x1080 - maybe it's because the box had only 2GB of ram and the driver limited amount of memory it would allocate for graphics or something like that.

Although there's a possibility of running only at 466MHz and the driver is slowing down a bit(around 20%), that won't make up for 3x.

Where did you get that 466Mhz? If it's running at that it would explain almost all the difference, if the original indeed ran at 800Mhz.

DavidC · Dec 23, 2009

Hard to conclude even the 900MHz version will be anywhere near what it showed. If you look at my link on the last page, there's one with the Core i5 661, which is the 900MHz IGP review. It's what, 20-30% faster than the stock 785G?

The 466MHz clock is from yet another screenshot. It's from Cooaler, pretty widely known guy from Xtremesystems. The BIOS screenie says a 466MHz clock on a Core i3 540, with frequency adjustments in increments of 33MHz to maximum of 600MHz. It's plausible that some pre-release versions are running at that clock.

Now the weird thing is the 3DMark06 scores. 1200? That's what G45 can do. The HKEPC review with a driver that barely recognized Clarkdale did 1600, with a GPU score 2x.

Still can't see what's missing that allows it to outperform the 210 and 4350 even with overclock. The driver that allows CPU assist should be responsible for ~20% from the same site. The i5 661 should then increase its lead to 40-50% over 785G, which is not enough to overtake the discrete parts. Now where's the rest 30-40% then?

Maybe the conclusion by Zhao isn't flawed and there is something that makes H57 faster than H55 for IGPs.

Internal QPI clocks:
Pentium G6950/Core i3: 4.8GT/s
Core i5: 6.4GT/s

You should know what the G45/Ironlake is able to allocate for DVMT. It's 1795MB with 4GB memory and 780MB with 2GB. I don't think it should run out even with 780MB.

AnarchX · Dec 23, 2009

Is the IGP not directly connected to the IMC? Which means 25.6GB/s with DDR3-1600, even shared with CPU a lot more than the 6.4GB/s on G210/HD4350 with DDR2-800@64-Bit.

mczak · Dec 23, 2009

DavidC said:
Still can't see what's missing that allows it to outperform the 210 and 4350 even with overclock. The driver that allows CPU assist should be responsible for ~20% from the same site. The i5 661 should then increase its lead to 40-50% over 785G, which is not enough to overtake the discrete parts. Now where's the rest 30-40% then?

Well the review using the core i5 661 is using ddr3-1333 ram instead of ddr3-1600. The bandwidth saving features of G45 are rather weak in comparison to other chips (read: except early-z nonexistent), so that could possibly explain some of the difference, as I'd expect it to be somewhat bandwidth limited (despite it has a huge bandwidth advantage to i785/g210/4350 when using dual channel ram). Also the 4350 they tested seems to be a slow one, in other reviews it got about 10% more 3dmark06 score so I don't know what's up with that. Add it up and not much is missing.

Maybe the conclusion by Zhao isn't flawed and there is something that makes H57 faster than H55 for IGPs.

It just doesn't make sense. Display outputs too slow or what

.

You should know what the G45/Ironlake is able to allocate for DVMT. It's 1795MB with 4GB memory and 780MB with 2GB. I don't think it should run out even with 780MB.

I don't know if the driver still has the same limits. And it doesn't really matter, the point was that you can dismiss the result at that resolution as bogus, even if it's for another reason.

DavidC · Dec 23, 2009

AnarchX said:
Is the IGP not directly connected to the IMC? Which means 25.6GB/s with DDR3-1600, even shared with CPU a lot more than the 6.4GB/s on G210/HD4350 with DDR2-800@64-Bit.

Ok, I own a G965 now with Core 2 Duo E6600 as the CPU, but for a brief time I had a Celeron D 320(Prescott-based, 2.4GHz).

I did a few tests with various configs. DDR2-800 SC/DDR2-800 DC/DDR2-667 DC

On the Core 2, there wasn't anything too unusual. It was pretty bandwidth sensitive though. Something like 10% gain from DDR2-667 to DDR2-800. Some user reviews of 4500MHD on laptops showed 40-50% difference from single to dual channel. Tells how much 4500MHD advanced.

Now on the Celeron, it was like capped or something. Dual channel didn't bring anything more than 10%. I've read somewhere putting a Celeron in slows memory down to 533MHz. Whatever. DC should have done something though, like on the Core 2, which did WAY more than 10%.

Is there a possibility that the IGP is artificially handicapped on the lower value chips? That definitely seems true. What if most of the memory access is done through the FSB rather than a direct connection? It would guarantee Celerons to be slower. I'm guessing with claims of being "DMA" maybe few operations can do direct access, but not most of them.

If that's true, Clarkdale's config with QPI might open lot up.

DavidC · Dec 23, 2009

mczak said:
Well the review using the core i5 661 is using ddr3-1333 ram instead of ddr3-1600. The bandwidth saving features of G45 are rather weak in comparison to other chips (read: except early-z nonexistent),

*snip*

Yea, you could read my above post which says 10% increase from DDR2-667 to DDR2-800. Should be similar at DDR3-1333 to DDR3-1600. Probably even less on Ironlake, no?

Ironlake has Hierarchial Z, which I assume is an advancement from Early Z, and there's a possible fixed function clip unit(even the presentations say much faster clip).

The assumption the review with Core i5 661 has slower driver is just that, an assumption.

The same DDR2 64-bit 4350 is doing 50-80% better than 780G: http://www.anandtech.com/showdoc.aspx?i=3420&p=4

Still, to do yet another 20% over that? You could see the articles Zhao is putting up in defense of his original article: http://translate.googleusercontent....gle.ca&usg=ALkJrhhruVo9Hl7dItFak2QgQJ_SQyCucw

What's the real cause that went from sometimes being 1/5 of 785G with G45 to being lot faster?

mczak · Dec 23, 2009

DavidC said:
Now on the Celeron, it was like capped or something. Dual channel didn't bring anything more than 10%. I've read somewhere putting a Celeron in slows memory down to 533MHz. Whatever.

I think there were limitations on the memory clock depending on the FSB, so that could be true.

DC should have done something though, like on the Core 2, which did WAY more than 10%.

Indeed. Dunno why it wouldn't do anything.

Is there a possibility that the IGP is artificially handicapped on the lower value chips? That definitely seems true. What if most of the memory access is done through the FSB rather than a direct connection? It would guarantee Celerons to be slower. I'm guessing with claims of being "DMA" maybe few operations can do direct access, but not most of them.

I can't follow you there that doesn't really make a lot of sense. The IGP living in the chipset always has a direct connection to memory.
If you got a driver though which offloads vertex processing to the cpu, maybe you're just cpu limited. Celeron D aren't exactly that fast...

Ironlake has Hierarchial Z, which I assume is an advancement from Early Z, and there's a possible fixed function clip unit(even the presentations say much faster clip).

Oh Hierarchical Z that's nice (in fact early z doesn't save any bandwidth, just shader cycles). Compared to the competition still not a lot (still missing z/color buffer compression) but it's a start.

What's the real cause that went from sometimes being 1/5 of 785G with G45 to being lot faster?

It wasn't really 1/5 all the time. Don't forget the G45 could already achieve a score of over 1000 in 3dmark06 in ideal conditions - that's only roughly a factor 2 increase there for Ironlake.
So I guess it really just is a combination of all factors - hw clip unit, vastly more Mathbox units, 20% more execution units, hierarchical-z, faster ram and probably drivers too which contributes to the increase. Still not quite sure how to interpret that "3x Mathbox" but you could see a huge increase there alone by some apps if they use a lot of these functions - and likely no increase at all in others.

OpenGL guy · Dec 23, 2009

mczak said:
Oh Hierarchical Z that's nice (in fact early z doesn't save any bandwidth, just shader cycles). Compared to the competition still not a lot (still missing z/color buffer compression) but it's a start.

Saving shader cycles can save bandwidth

mczak · Dec 23, 2009

OpenGL guy said:
Saving shader cycles can save bandwidth

Oh yes right for the texture lookups. Still, I don't really consider that a bandwidth saving feature, since you should save more in execution resources than bandwidth.
Though regarding hierarchical-z, I can't see anything in the open source driver - I'd have thought there'd be some new bits for that.

DavidC · Dec 24, 2009

The problem with 3DMark06 was that it started a trend of "GPU" benchmarks which relied having it helped by the CPU more than usual.

That's why 3DMark05 was used for general comparisons. Looking at just the GPU portion is ok, but that's harder to find than the final score.

On 3DMark05, 780G scored almost 3000 with right setups and G45 did 1800. However on 06, it would be 1500 on the 780G vs 1200 on the G45.

G45 might not have looked that bad when looking across many apps and games, but there were outliers that killed it: http://www.anandtech.com/mb/showdoc.aspx?i=3432&p=4

Even with the same game, certain settings would make it drastically worse. For example, like Fog.

Tahir2 · Jan 3, 2010

From the chart, we can see that the H55 definitely runs better than the predecessors. When compared to GF9300, it is still slightly behind.

http://en.ocworkbench.com/tech/h55-with-pentium-g6950-performance-detailed-against-g45-g41/

Story found at Fudzilla..

Thats comparing a E6550 and X4500HD / GF9300 and the G6950 at 533MHz.

jalyst · Jan 4, 2010

all things considered still pretty impressive imo...
still not that great for gaming, but the gap for video decode is tiny now.

Clarkdale IGP benchmark

Similar threads