Grid 2 has exclusive Haswell GPU features

Paran · Jun 4, 2013

Yes 2x runs emulated in 4x mode. Means not any faster or marginally faster - useless on Intel Gen7.x. 2xMSAA might be not that good in removing edges but better than nothing when performance for 4xMSAA is not high enough. This is certainly often the case on lower end graphics.

Kaarlisk · Jun 4, 2013

Paran said:
http://download-software.intel.com/...tion_Core_Graphics_Developers_Guide_Final.pdf

Interesting:
GT1 becomes much more powerful. Not only does it gain 66% more EUs, the EUs seem to be as powerful as in GT2 (in Ivy and Sandy, GT1 EUs had less threads/EU than GT2 EUs). IMHO, Haswell's GT1 should be at least as good as HD Graphics 3000.
Also, some performance regressions may be possible with the rearchitected driver. The guide says that the driver performs less optimizations, so I suspect that some older games may have regressions. ("One goal of this driver is to spend less time doing complex analysis and reordering of rendering commands in the driver, as these tasks are better suited to the application itself, which has the context to do these optimizations optimally. As much as possible, the new driver will submit rendering commands from the application to the hardware with minimal manipulation.")

Paran said:
They call it Iris extensions. Looks like this is Iris exclusive indeed.

One could understand why some extensions would be exclusive – Iris may be the only part with enough performance to use the effects enabled by these extensions – but this surely is not relevant for the extensions dealing with direct access to CPU/GPU memory?

Andrew Lauritzen · Jun 4, 2013

Paran said:
2xMSAA might be not that good in removing edges but better than nothing when performance for 4xMSAA is not high enough. This is certainly often the case on lower end graphics.

I would have agreed in the past, but these days post-AA methods do a better job in almost all cases than 2x MSAA and with less of a performance hit. While I'd still argue that 4x has advantages in reducing flicker/crawling over post-AA, 2x really doesn't do a good enough job for me to make the same argument.

Kaarlisk said:
The guide says that the driver performs less optimizations, so I suspect that some older games may have regressions.

It's certainly possible, although I don't think I've seen a case where that has caused Haswell to regress over Ivy Bridge. The reality of constrained platforms with shared power budgets between CPU/GPU is that you either penalize *all* applications with complex driver logic, or only poorly coded ones by keeping the driver as thin as possible. Typically AMD/NVIDIA build a very heavy driver because they consider any amount of CPU performance spent making the GPU faster effectively "free", and since tech sites typically benchmark with the fastest CPUs available, it further supports that notion. Now with integrated CPU/GPUs, using more CPU time in the driver can reduce the speed/frequency of the GPU due to the shared power budget, so even if your GPU is the bottleneck, optimizing the CPU/driver can make things faster.

Paran · Jun 4, 2013

Kaarlisk said:
Interesting:
GT1 becomes much more powerful. Not only does it gain 66% more EUs, the EUs seem to be as powerful as in GT2 (in Ivy and Sandy, GT1 EUs had less threads/EU than GT2 EUs). IMHO, Haswell's GT1 should be at least as good as HD Graphics 3000.

GT1 Haswell should be easily faster than HD3000. Ivy Bridge GT1 with 6 EUs was almost on par. Here an old test: http://www.computerbase.de/artikel/grafikkarten/2012/test-intel-graphics-hd-4000-und-2500/13/

Only 25-30% and the 2600k has a 1350 Mhz HD3000.

Kaarlisk said:
Also, some performance regressions may be possible with the rearchitected driver. The guide says that the driver performs less optimizations, so I suspect that some older games may have regressions. ("One goal of this driver is to spend less time doing complex analysis and reordering of rendering commands in the driver, as these tasks are better suited to the application itself, which has the context to do these optimizations optimally. As much as possible, the new driver will submit rendering commands from the application to the hardware with minimal manipulation.")

Driver release notes says this:

Several optimizations to reduce the system power consumption have been added in this driver. These optimizations might result in minor performance drop in some games while yielding power savings. The user can choose to override these power optimizations by selecting the “Maximum Performance” power mode in Intel control panel or “High Performance” mode in the Windows power setting.

So the user can choose. On a desktop HD4000 there is no driver regression. It runs faster overall. Intel had serious power management issues with 15.26 and especially 15.28 drivers in a couple of newer and older games (iGPU remained on its base clock). This is solved.

Blazkowicz · Jun 4, 2013

2x MSAA is useful, geforce 3/4/FX happened to have 2x AA as rotated grid and 4x ordered grid by the way so I would use 2x all the time. I would even drop resolution rather than disable it (good old 17" CRT times)

Andrew Lauritzen · Jun 4, 2013

Paran said:
Driver release notes says this: ... So the user can choose. On a desktop HD4000 there is no driver regression. It runs faster overall.

Those notes apply to different power saving features things than the general rearchitecture of the user mode driver though. It is still possible to have regressions due to games that use particularly awful render command streams, but like I said, I haven't actually seen any major regressions to date with the 15.31 drivers.

Blazkowicz · Jun 4, 2013

Andrew Lauritzen said:
I would have agreed in the past, but these days post-AA methods do a better job in almost all cases than 2x MSAA and with less of a performance hit. While I'd still argue that 4x has advantages in reducing flicker/crawling over post-AA, 2x really doesn't do a good enough job for me to make the same argument.

Such "smart AA" methods might use 2x MSAA as one of their components.

Andrew Lauritzen · Jun 4, 2013

Blazkowicz said:
Such "smart AA" methods might use 2x MSAA as one of their components.

Sure, and I like I'm not arguing that it's "useless". It just looks waaay worse than 4x which is why most GPUs are optimized for that. If you're going to pay for MSAA and a resolve pass at all, might as well pay for 4x IMHO.

Paran · Jun 4, 2013

Andrew Lauritzen said:
Those notes apply to different power saving features things than the general rearchitecture of the user mode driver though. It is still possible to have regressions due to games that use particularly awful render command streams, but like I said, I haven't actually seen any major regressions to date with the 15.31 drivers.

There are no regressions. I'm testing the drivers in lots of games regularly. Sure in a set of 30 games there are always 1-2 games who can lose something minor. But this happens with any driver, it happened before 15.31 and also happens with AMD/Nvidia drivers.

Davros · Jun 4, 2013

paran since your testing could you try crimson skies
demo in case you dont have it
http://www.microsoft.com/games/crimsonskies/downloads.aspx

Paran · Jun 5, 2013

Davros said:
paran since your testing could you try crimson skies
demo in case you dont have it
http://www.microsoft.com/games/crimsonskies/downloads.aspx

Menu text corruption and black squares in 3d. Application bug.

CaptainGinger · Jun 5, 2013

I'd love a new Crimson Skies.

Davros · Jun 5, 2013

bugger

Thorburn · Jun 6, 2013

Since it's now a 'launched' product, for anyone who's interested: http://www.youtube.com/watch?v=kZQg_FIZuQ4

99% sure that capture was at 1080p, Medium detail and the Intel effects enabled - the menu text looks horrible below 1080p in GRID 2 for some reason, but I haven't checked on the release build. The effect is quite subtle and I didn't notice a performance difference with them on or off.

One thing that might be of use to some people is on lower-end parts I found it useful to switch GRID 2 down to 30Hz and keep V-sync on. It looked DREADFUL on stuff that could only run it at 40-50fps as the motion wasn't fluid, so you're better limiting it to something you can achieve all the time.

Paran · Jun 6, 2013

Thorburn said:
Since it's now a 'launched' product, for anyone who's interested: http://www.youtube.com/watch?v=kZQg_FIZuQ4

99% sure that capture was at 1080p, Medium detail and the Intel effects enabled - the menu text looks horrible below 1080p in GRID 2 for some reason, but I haven't checked on the release build. The effect is quite subtle and I didn't notice a performance difference with them on or off.

One thing that might be of use to some people is on lower-end parts I found it useful to switch GRID 2 down to 30Hz and keep V-sync on. It looked DREADFUL on stuff that could only run it at 40-50fps as the motion wasn't fluid, so you're better limiting it to something you can achieve all the time.

Which Iris Pro is this, 47W, 55W or 65W?

Thorburn · Jun 6, 2013

Paran said:
Which Iris Pro is this, 47W, 55W or 65W?

It's a 47W mobile. I can't remember off the top of my head which chip I captured with, probably a 4850HQ.

There aren't any 55W chips, all the mobile parts are 47W, the Anandtech 55W tests were with the TDP limit raised. The only higher TDP part is the 4770R at 65W.

The lower-end parts I referred to weren't GT3e.

Davros · Jun 7, 2013

some info http://www.pcgamer.com/2013/06/07/a...es-are-likely-to-struggle-on-intel-tech/#null

Intel says traditional GPU architecture needs to fundamentally change in order to compete with the graphics power of Haswell and its new PixelSync API extension. But do you see gaming effects such as order-independent transparency as important enough to really warrant such a shift in design?

“AMD presented on real-time concurrent per-pixel linked lists construction at the Game Developer Conference in 2010. We then released the Mecha demo that took advantage of this technique to implement pixel-perfect Order-Independent Transparency (OIT).

Finally we recently worked closely with Crystal Dynamics to integrate TressFX Hair technology into Tomb Raider where each transparent hair fragment is accurately sorted against each other to produce the best-looking and most physically-correct hair implementation seen in a video game to date.

While interesting in itself, functionality such as enabled by “PixelSync” is obviously not a prerequisite to implementing robust and efficient OIT solutions on compute-efficient GPUs – as demonstrated in Tomb Raider.

In fact, serialization of access into read/write buffers (called “Unordered Access Views” in DirectX® 11 jargon) can be done at the API level via a mutex-based solution – AMD already has such functionality running and you can expect the results of this work to be published in the near future.

So to answer your question: game developers will certainly not be relying on “PixelSync” to design the nature of their OIT algorithms.”

Andrew Lauritzen · Jun 7, 2013

Oh Nick, quite masterful with his words as usual... I legitimately respect the amount of misleading statements that can be packed into a small space

Do folks here actually want a discussion/reply to any specific points he made there, or should we just take it as the marketing that it is?

Davros · Jun 7, 2013

I'd like to hear what you have say... (especially if it contains personal insults

)
when he says pixel sync is not necessary for oit is he wrong ?

Andrew Lauritzen · Jun 7, 2013

Davros said:
when he says pixel sync is not necessary for oit is he wrong ?

No, that's the implementation I linked earlier - the pure DX implementation of AVSM and AOIT use what he's talking about. However, it's very slow, so his statement that it's "robust and efficient" is a borderline lie. Codemasters chose not to have it as a fallback because it's just unreasonably expensive. Take a look at the benchmarks for Tomb Raider TressFX on a 200W discrete GPU and tell me the overhead is reasonable

The implementation with pixel sync is pretty close to optimal efficiency and way, way faster and memory-efficient than per-pixel linked lists. But like I said, we published the algorithms and example code with both methods, it's just that game developers rightfully scoff at the cost of the DX11 method. I'll note as well that even with the slow DX11 per-pixel linked list technique, AVSM/AOIT is *still* faster than the naive sorting they're doing. Really not sure why they didn't at least replace their sort pass with the AVSM/AOIT resolve.

Anyways no personal insults - Nick is just doing his job. If he wants to have a technical conversation he can always join in the thread here

Grid 2 has exclusive Haswell GPU features

Paran

Kaarlisk

Andrew Lauritzen

Moderator

Paran

Blazkowicz

Andrew Lauritzen

Moderator

Blazkowicz

Andrew Lauritzen

Moderator

Paran

Davros

Paran

CaptainGinger

Davros

Thorburn

Moderator

Paran

Thorburn

Moderator

Davros

Andrew Lauritzen

Moderator

Davros

Andrew Lauritzen

Moderator

Similar threads