Sandy Bridge preview

Nick · Sep 6, 2010

hkultala said:
OpenCL is supported by every graphics core AMD and nvidia have sold in last 2-3 years.

Sure, but up to this day stores are also still selling GeForce FX 5200's... That's of course the other extreme, but the point is that there's a wide variety of graphics hardware, both in performance and features (even among gamers over 20% is still stuck with DX9 hardware). Computing things on the CPU is far more reliable. SSE2 support is practically 100% and they all have enough juice to run a wide range of Windows applications.

Regardless of whether or not Intel's IGPs support OpenCL today, it's not really going to help adoption of hardware OpenCL. It's still goint to take at least half a decade before the majority of installed GPUs will have the features and performance that make OpenCL worth the trouble.

Also, OpenCL 1.0 is rubbish compared to OpenCL 1.1, and even the latter still has too many valuable extensions which are optional. It's hardly a mature API and still needs to settle. And the baseline GPU architectures have a long way to go to meet and exceed the capabilities of CPUs for that to happen.

So you say intel GPU's should not support openCL because it's not currently supported by intel's GPU's? quite a circular logic.

No, I'm saying there is no need to support hardware OpenCL yet because they have a robust alternative. The average GPGPU application only runs faster than an optimized software solution when running on a high-end GPU. GPUs (in particular low-end) need to widen the gap much further, but it doesn't look like that's going to happen. All GPUs need to have double-precision support, debugging features and a cached unified addressing space before application developers will touch OpenCL with a ten foot pole. And adding these features to low-end GPUs means there's a lower transistor budget for raw performance. CPUs already have all the programmability features, and will increase their computing performance with AVX and FMA. This convergence doens't work in OpenCL's favor.

Quad-core Sandy Bridge may theoretically do 200 SP gflops when running at >3 GHz but most laptops will be having dual-core chips running at 2.5 GHz. We are talking about 80 theoretical SP gflops then.

Nvidia's new integrated chip(found in 13" macbooks and mac mini) can do about 150 gflops, and does it with much lower power consumption. And that's still based on last-generation tech, not fermi.

That's great, but those laptops are brand spanking new so they don't represent the average system in use today. But even then, twice the (SP) GFLOPS is not nearly enough to convince the average developer to invest in OpenCL programming.

Point-in-case, dual-core CPUs are being sold for half a decade now, but multi-threaded application development is only barely starting to become mainstream. And threading is child's play compared to making use of OpenCL (and achieving a speedup). The only application developers who invest a bit more in performance are game developers. And they need these low-end GPUs to do graphics, not to waste any GFLOPS on generic computing!

The major reason Intel does not do openCL drivers is because Intel don't want openCL to succeed; As long as most software cannot execute their dsp-style code in GPU, people need intel's bigger, more powerful, more expensive CPU's to run their video and image processing software.

Nonsense. Intel hasn't invested in OpenCL yet because it simply is not ready yet for the masses. Developers are in no rush to adopt it while the performance isn't leaps and bounds ahead, developing (and maintaining) optimized code is a huge pain, and there are no guarantees. From a management's point of view the ROI and risks just don't balance out. Also, OpenCL development typically has to happen in-house to ensure optimal integration, while there are plenty of off-the-shelf libraries which use SSE under the hood which can easily be extended to use AVX.

There's no reason for Intel to boycot OpenCL either. If the market was really interested in it then they would simply sell more CPUs with a bigger integrated GPU, or discrete Larrabee based cards.

But there simply is no real demand for OpenCL on low-end GPUs.

epicstruggle · Sep 6, 2010

Erinyes said:
Its not anyone wouldn't want to, simply that you cant run SB processors on the older motherboards. Intel's moved the clock generator on chip with SB which requires a new motherboard design. Intel sure is making a killing on the chipsets though. They're selling southbridges which earlier used to cost $5 for the price of a northbridge(ie $40).

the southbridges are ridiculously more complex than they used to be. I work on validating it and let me tell you that this is definitely not your fathers southbridge.

rpg.314 · Sep 6, 2010

epicstruggle said:
the southbridges are ridiculously more complex than they used to be. I work on validating it and let me tell you that this is definitely not your fathers southbridge.

Off hand, it's difficult to see what has changed. It's not like SB's have grown extra functionality over the years. They have typically added more of the same and more something refreshed.

epicstruggle · Sep 6, 2010

rpg.314 said:
Off hand, it's difficult to see what has changed. It's not like SB's have grown extra functionality over the years. They have typically added more of the same and more something refreshed.

I can only speak for what I've seen in the last few months. From the research I've had to do to ramp up, the new generation of SB's have integrated several external components directly into the chip. Im going to be very vague, but lets just say that the SB has many new features/functionality than previous generations.

rpg.314 · Sep 6, 2010

Nick said:
SSE2 support is practically 100% and they all have enough juice to run a wide range of Windows applications.

And that too, took ~8 years to happen.

GPUs (in particular low-end) need to widen the gap much further, but it doesn't look like that's going to happen.

Yes and no. core count per $ has been growing much more slowly than Moore's law. And the reason is not hard to see. Typical CPU based apps scale very less with more cores vs typical GPU apps. Single threaded IPC still rules in CPU world.

All GPUs need to have double-precision support, debugging features and a cached unified addressing space before application developers will touch OpenCL with a ten foot pole.

Yes.

And adding these features to low-end GPUs means there's a lower transistor budget for raw performance.

Not if majority of growth in transistor budget is given to GPU part of die.

CPUs already have all the programmability features, and will increase their computing performance with AVX and FMA. This convergence doens't work in OpenCL's favor.

And AVX etc. will take similarly long to filter down because AVX won't be there in majority of installed base. Broadly speaking, this is the reason Intel gave while dropping FMA from Sandy Bridge.

Erinyes · Sep 6, 2010

epicstruggle said:
the southbridges are ridiculously more complex than they used to be. I work on validating it and let me tell you that this is definitely not your fathers southbridge.

epicstruggle said:
I can only speak for what I've seen in the last few months. From the research I've had to do to ramp up, the new generation of SB's have integrated several external components directly into the chip. Im going to be very vague, but lets just say that the SB has many new features/functionality than previous generations.

Im not talking about a southbridge today vs a southbridge 10 years back. Im specifically talking about Core 2 SB's vs Nehalem SB's.

From what i can remember there was no real change from ICH9 to ICH10. If we consider P55, from what i see there is virtually no change compared to ICH10. H55 of course has the FDI and i definitely agree that there is more validation involved there. But how does than justify the price of $40 compared to $5? You can buy G31 motherboards which have a NB+SB( including onboard graphics) which cost a lot less than H55 motherboards.

Sandy bridge southbridge brings us 2 SATA 6Gbps ports and a clock gen inbuilt. And i read something about a power controller for DVD-ROM drive. Any other details you can divulge?

epicstruggle · Sep 6, 2010

Erinyes said:
Im not talking about a southbridge today vs a southbridge 10 years back. Im specifically talking about Core 2 SB's vs Nehalem SB's.

From what i can remember there was no real change from ICH9 to ICH10. If we consider P55, from what i see there is virtually no change compared to ICH10. H55 of course has the FDI and i definitely agree that there is more validation involved there. But how does than justify the price of $40 compared to $5? You can buy G31 motherboards which have a NB+SB( including onboard graphics) which cost a lot less than H55 motherboards.

Sandy bridge southbridge brings us 2 SATA 6Gbps ports and a clock gen inbuilt. And i read something about a power controller for DVD-ROM drive. Any other details you can divulge?

Actually at this point I get worried about what I can and can't say. Too new to know the boundaries. The big changes started with ICH10, but have continued with this new generation of SB. By the way we no longer refer to them as ICH#, they have code names, but not sure if they are public knowledge. Tomorrow, I'll ask some of my co-workers if I can list the big ICH10 changes that might have contributed to the price change. If you have any other questions about the SB let me know, I'll ask around.

Nick · Sep 7, 2010

rpg.314 said:
And that too, took ~8 years to happen.

Indeed, but there was always an easy fallback solution and it was a relatively small investment with guaranteed performance.

With OpenCL you really have to rearchitect your code and it's hard to predict whether you'll even achieve a speedup at all, especially on low-end hardware. The fallback solution of running OpenCL on the CPU is also likely much slower than software tailored to your algorithms. And nobody likes to maintain multiple vastly different code paths.

Yes and no. core count per $ has been growing much more slowly than Moore's law. And the reason is not hard to see. Typical CPU based apps scale very less with more cores vs typical GPU apps. Single threaded IPC still rules in CPU world.

These are just growing pains. Applications and libraries are slowly but steadily becoming multi-threaded. Today's graduates also have a much better notion of threading than a few years ago. Once quad-core is fully mainstream (which will take a couple more years), it becomes hard to ignore the potential. And once we're truely in the multi-core era there is no turning back and extra cores will definitely be welcomed.

OpenCL on the other hand is just one API. Eventually this API will go away so you can program GPU cores in the same language as the CPU cores. But at that point these cores will be so much alike that it really doesn't make any sense to keep them separate.

Besides, the type of applications that would potentially benefit from OpenCL is a subset of the applications that would benefit from additional CPU cores. People who don't ask for more CPU cores will definitely not ask for more GPU cores.

It's also not just about core count per €. The low-end Core i3 beats a Core 2 Duo by about 50% for multi-threaded applications. Sandy Bridge will increase per-core IPC even further and with AVX the floating-point performance doubles without touching the core count. These things give them plenty of time to conquer the mult-threaded development inertia.

Not if majority of growth in transistor budget is given to GPU part of die.

The market at which these chips are targeted are only interested in "adequate" graphics. IGPs have always been as cheap as possible. They're not going to invest more in it than absolutely necessary to run Windows and casual games.

Making the GPU part of the chip OpenCL capable and powerful enough to compensate for the inefficiency (plus it still has to do graphics too) will require far more transistors than AVX, FMA, or even additional CPU cores. And a lot more applications can benefit from additional cores than from OpenCL. So it's pretty clear how the growing transistor budget will be spent. The roadmaps strut AVX, FMA and more cores, not powerful IGPs (just non-crap IGPs).

And AVX etc. will take similarly long to filter down because AVX won't be there in majority of installed base. Broadly speaking, this is the reason Intel gave while dropping FMA from Sandy Bridge.

Yes, but taking advantage of AVX when it's supported will be relatively straightforward. And once FMA is supported that will also be taken advantage of with minor software adjustments. So despite that it will take years to filter down it can and probably will be used pretty early on.

Making use of OpenCL on the other hand is a big step that won't give much if any benefit on low-end hardware. And for hardware that doesn't support it at all you'll have to provide a vastly different fallback path. So it will take about half a decade for it to become universally supported, but there's no guarantee that the performance will have increased faster than that of the CPU. Everything points to the opposite...

DavidC · Sep 13, 2010

mczak said:
Yes, this is true, but at some point you will reach the limits of the shader alus - it seems unlikely that if you want to double the performance of the igp, you'd suddenly go back to half the shader units and compensate that loss with improvements in other parts. The architecture is still very similar, after all.

Intel says that each EUs in Sandy Bridge are roughly equal to 2 EUs in previous generations when counting the enhancements. They also mention 4-20x enhancement on transcendental instructions...

With the claims earlier on of 10x improvement over 65nm graphics, and today's 25x over 2006, they are on track to be something like 4x over the current GMA HD.

mczak · Sep 13, 2010

DavidC said:
Intel says that each EUs in Sandy Bridge are roughly equal to 2 EUs in previous generations when counting the enhancements. They also mention 4-20x enhancement on transcendental instructions...

I can't see how the EUs could be twice as fast, since they still look the same, with the same capabilities mostly etc. Now, certainly other parts of the chips could have improved (I think the current gmas could also be limited by message passing for instance), so maybe intel really meant overall per EU it would be twice as fast. If that's really the case, quite an achievement.
Now transcendentals being hugely faster is quite possible. This is using the funky mathbox. There are probably more mathbox units (EUs are divided in rows and columns, and there is one MathBox per row so that can be adjusted), plus they could be faster too (sin/cos are specified with 5-12 clocks per element with typically 6 clocks, though this is G45 intel hasn't released Ironlake datasheet yet). So a four times improvement sounds easily doable.

With the claims earlier on of 10x improvement over 65nm graphics, and today's 25x over 2006, they are on track to be something like 4x over the current GMA HD.

I still have my doubts about this, but in any case I think most people would be happy with 2x.

DavidC · Sep 13, 2010

They had Starcraft 2 playing what was at least medium settings playable on the mobile version. With the current HD Graphics, if you look it up it gets 6 fps on medium.

You can go look up in Youtube now and see the play:
http://www.youtube.com/watch?v=4ETnmGn8q5Q

mczak · Sep 14, 2010

Actually there IS a quite easy way how intel could make the EUs twice as fast without changing the ISA much which I overlooked: those EUs are physically four-wide, and logically 8-wide (they also have programming modes for 4 and 16 wide, but 4-wide mode explicitly says there is no useful work being done in the 2nd clock, so you basically only get half the peak throughput). So, intel could make them physically 8-wide. This wouldn't be visible in the ISA, though I guess the purpose of them being logically twice as wide as physical also was because this hides latencies - either they got a lot more complex so they still hide latencies or the driver now would have to schedule instructions so it doesn't stall.

Erinyes · Sep 14, 2010

What about the fact that the GPU can share the L3 cache? Apart from the other improvements would this also result in an increase in performance?

entity279 · Sep 14, 2010

Or double-pumping ?

mczak · Sep 14, 2010

Erinyes said:
What about the fact that the GPU can share the L3 cache? Apart from the other improvements would this also result in an increase in performance?

Certainly (I think anand mentions a 5 times increase for data accessing L3 cache). That'll help the ROPs but it won't make the EUs faster.
Interestingly, according to some new information released (I've seen it at anandtech), transcendentals are now handled in the EU. Looks like bye bye MathBox (strange didn't see that in the driver - maybe intel didn't contribute that code yet...)

Asher · Sep 14, 2010

Anand just posted a very detailed Sandy Bridge expose piece: http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed

DavidC · Sep 15, 2010

It says the new HD Graphics support 4x MSAA(finally!), DX10.1 and OpenGL 3.0.

mczak · Sep 15, 2010

DavidC said:
It says the new HD Graphics support 4x MSAA(finally!), DX10.1 and OpenGL 3.0.

I'm not surprised it supports MSAA (I've suspected that earlier) but where the heck does it say that?

trinibwoy · Sep 15, 2010

Yeah don't see that mentioned anywhere either. So on the CPU side, SB is more of the same just better. The turbo, graphics and video features are pretty exciting though. GPU decode might have just heard its death knell.

4.9Ghz on air?

nAo · Sep 15, 2010

mczak said:
I'm not surprised it supports MSAA (I've suspected that earlier) but where the heck does it say that?

Last slide from the last presentation (page bottom):
https://intel.wingateweb.com/us10/scheduler/catalog/catalog.jsp

Sandy Bridge preview

Nick

epicstruggle

Passenger on Serenity

rpg.314

epicstruggle

Passenger on Serenity

rpg.314

Erinyes

epicstruggle

Passenger on Serenity

Nick

DavidC

mczak

DavidC

mczak

Erinyes

entity279

mczak

Asher

DavidC

mczak

trinibwoy

Meh

nAo

Nutella Nutellae

Similar threads