NVIDIA Kepler speculation thread

You can find thoses test not valuable, but thoses test are the ones who are made in the industry for comparing hardware for computing.
I think this is SHOC:

http://forum.beyond3d.com/showthread.php?t=57420

The final paragraph of section 5.2 of the report from 2010 still holds it seems:

Performance is nearly identical for all Level Zero benchmarks. However, when nontrivial device kernels are used, OpenCL begins to trail CUDA performance significantly (especially in FFT). This demonstrates the immaturity of the OpenCL compiler and has dramatic consequences for potential application developers.

Now some of that is apparently because of CUDA specific libraries that aren't available for the OpenCL version (e.g. FFT).

Another factor is transcendentals: OpenCL is fairly strict about transcendentals. But CUDA provides multiple levels of precision and that chart doesn't say what the CUDA precision is. CUDA's maximum precision is tougher than OpenCL's, i.e. CUDA should be no faster than OpenCL if set to maximum precision.
 
I dont think this is due to driver, but to the updated version of Luxmark... they have change some little things, and this is why the score decrease on 580.

Nope, NVIDIA has cut of about a 50% the performance of many OpenCL applications (including LuxMark) with the release of OpenCL 1.1/CUDA 4.0 support. You can just install an older (i.e. OpenCL 1.0) driver version a verify the problem. There are also a couple of articles about this problem available on the net.

This may be related to the introduction of a new compiler back-end in CUDA 4.0.

Anyway, after several months, this problem has not yet been addressed :cry:

P.S. while this problem is important when comparing NVIDIA Vs AMD, it is doesn't matter when comparing 580 Vs 680: the 580 seems faster in most OpenCL and CUDA applications, no matter the version of the driver.
 
They're all hand-tuned kernels. That's what "reasonable care" amounts to.


Using TEX cache is a hack?

More-intense speculative fetches than older GPUs? That's not a hack, since they were doing this already.
Aila's code does just ray traversal. A real world ray tracer would do more. Some level of speculation might be fine, but excessive speculation to suit just one product is unlikely to be present in a non-synthetic vendor neutral app.

In fact all of this points directly at the kind of tool-based optimisation that Dally was talking about. Speculative stuff especially so, since that's something that an SM scheduler just can't grok.
Obviously this is something beyond an SM scheduler.

Seems you haven't seen Dally's presentation.
I have. Tool based optimization is great, but an architecture and it's tools have to make third party code - written with some performance portability across vendors in mind - run fast. I don't think Kepler's tools are capable of the kinds of changes that Aila made yet.

Dally's presentation looks ambitious. We haven't seen the kind of compiler wizardry anywhere, least of all from C-ish compilers. I am not sure what he is selling is even possible in a CUDA compiler, with all the pointers and side effects and what not. They'll probably have to switch to a different language or dump a lot more proprietary extensions into CUDA to make it more declarative.
 
I don't think it is, vis-a-vis 580. The backend of the compiler is bring made a scapegoat for 680 by some, with no good reason.
Make up your mind:
I think that would be GK110. 104 doesn't seem to fit the bill for that description to me.
ALU scheduling in GK104 is predictable. It is not in Fermi and prior. That makes it more amenable to the tools NVidia is developing.
 
Aila's code does just ray traversal. A real world ray tracer would do more.
Everything else is easier in rendering.

Some level of speculation might be fine, but excessive speculation to suit just one product is unlikely to be present in a non-synthetic vendor neutral app.
Why wouldn't speculation suit other highly parallel machines?

Tool based optimization is great, but an architecture and it's tools have to make third party code - written with some performance portability across vendors in mind - run fast.
If other tools are lagging in the absolute performance they produce then...

I don't think Kepler's tools are capable of the kinds of changes that Aila made yet.
Kepler is the basis of several years' worth of tools development, I'm sure it'll get better.

Dally's presentation looks ambitious. We haven't seen the kind of compiler wizardry anywhere, least of all from C-ish compilers.
This isn't your grandma's auto-vectorisation.

I am not sure what he is selling is even possible in a CUDA compiler, with all the pointers and side effects and what not. They'll probably have to switch to a different language or dump a lot more proprietary extensions into CUDA to make it more declarative.
That's what he's selling, declarative abstractions that are meaningful to their tools instead of asking programmers to peer into manuals and trying to make sense of directed tests.

I don't know what language they will end up with. He's pretty clear that things have to change.
 
If Nvidia persist to dont give OpenCL attention and try to slow down the project ( they are part of Khronos group ), i think they could really do a big error. They have the hardware, and i dont think the adoption of OpenCL ( in the actual situation ) will change anything for Tesla.

Sure it would. If they would actually get actively involved, and turn CL into something worthwhile, as opposed to a headless-chicken sortof mess, they'd forfeit an exclusive competitive advantage that they have, namely CUDA. What would the gain be? I think NV will care the day one significant customer (not students working on a paper, mind you, but someone who's ready to lump a great amount of money on their doorstep) doesn't come back to their loving CUDAfied arms after trying to deal with CL. I don't think this has happened yet, I doubt it will happen soon. Also, guiding and fixing CL would be a pretty significant investement for them...why not put the money in CUDA, for professional apps, and let Microsoft sort out consumer/desktop with C++ AMP?
 
Why would nVidia embrace the lethargic design-by-committee model of OpenCL? All that would accomplish is slowing down the pace with which they can roll out new features to customers. At some point OpenCL will have sufficient market penetration to fight off compelling proprietary features but we're not there yet.
 
EVGA GTX 680 was in stock again for under a minute (or perhaps newegg is just slow with the auto notification, but it certainly wasn't there for long). So there is some stock moving, (but not very much and certainly way less availability than the 7970 after the first week (hard to tell what the real volume is, other than not high)). So does that speak to issues with NVs yields, or is it just that AMD had a bit of extra lead time on shipping? (probably too soon to tell)
 
Some nice blogpost by Steve Scott, actually it's a bit OT -more suitable to LB thread- but ive just seen his comment about release date of next kepler(biggie)
Of course, our next
generation Kepler GPU will be shipping
later this year, and you can expect some
impressive enhancements of both the
programming environment and
performance on that platform. We'll be
able to have a grounded conversation
about programming for performance at
that time.
blogs.nvidia.com/2012/04/no-free-lunch-for-intel-mic-or-gpus/
 
EVGA GTX 680 was in stock again for under a minute (or perhaps newegg is just slow with the auto notification, but it certainly wasn't there for long). So there is some stock moving, (but not very much and certainly way less availability than the 7970 after the first week (hard to tell what the real volume is, other than not high)). So does that speak to issues with NVs yields, or is it just that AMD had a bit of extra lead time on shipping? (probably too soon to tell)

Too soon to tell. The only conclusion that can be drawn with any absolute certainty is that demand is greater than supply. :)

It is at least a sign that supply isn't dead if they are at least getting a little bit of stock in every 1-2 days, even if it only lasts for 30-60 seconds before it's sold out. Doesn't mean it's great or even good or even marginal. Just not dead. :p

It does make me wonder, however, if demand in the US is so much greater than Europe. Or if Nvidia just misallocated stock with too much going to Europe and too little going to the US. Although if the world economic crunch is hitting Europe harder than the US that might also go a little bit towards explaining why the stock situation appears to be a bit better in Europe.

Regards,
SB
 
Make up your mind:

ALU scheduling in GK104 is predictable. It is not in Fermi and prior. That makes it more amenable to the tools NVidia is developing.
I think we are talking at cross purposes here. You are speaking of the compiler's ability to schedule hw threads (or play a role in it). I am spaeking of compiler's code generation and instruction scheduling within a single thread.

My understanding of Kepler is that it relies on the compiler for scheduling to the extent that GCN does. The compiler can figure out when to deschedule (or stall for sync). I don't think the compiler has any role in deciding which thread to switch to. Which is why I don't think tools have much to do with kepler, as yet.
 
Back
Top