Jawed
Legend
Pay attention, I said "tools".I think that would be GK110. 104 doesn't seem to fit the bill for that description to me.
Pay attention, I said "tools".I think that would be GK110. 104 doesn't seem to fit the bill for that description to me.
I think this is SHOC:You can find thoses test not valuable, but thoses test are the ones who are made in the industry for comparing hardware for computing.
Performance is nearly identical for all Level Zero benchmarks. However, when nontrivial device kernels are used, OpenCL begins to trail CUDA performance significantly (especially in FFT). This demonstrates the immaturity of the OpenCL compiler and has dramatic consequences for potential application developers.
I dont think this is due to driver, but to the updated version of Luxmark... they have change some little things, and this is why the score decrease on 580.
Which part of "GK104's performance is more dependent upon compiler/tools" do you not understand?
Aila's code does just ray traversal. A real world ray tracer would do more. Some level of speculation might be fine, but excessive speculation to suit just one product is unlikely to be present in a non-synthetic vendor neutral app.They're all hand-tuned kernels. That's what "reasonable care" amounts to.
Using TEX cache is a hack?
More-intense speculative fetches than older GPUs? That's not a hack, since they were doing this already.
Obviously this is something beyond an SM scheduler.In fact all of this points directly at the kind of tool-based optimisation that Dally was talking about. Speculative stuff especially so, since that's something that an SM scheduler just can't grok.
I have. Tool based optimization is great, but an architecture and it's tools have to make third party code - written with some performance portability across vendors in mind - run fast. I don't think Kepler's tools are capable of the kinds of changes that Aila made yet.Seems you haven't seen Dally's presentation.
Make up your mind:I don't think it is, vis-a-vis 580. The backend of the compiler is bring made a scapegoat for 680 by some, with no good reason.
ALU scheduling in GK104 is predictable. It is not in Fermi and prior. That makes it more amenable to the tools NVidia is developing.I think that would be GK110. 104 doesn't seem to fit the bill for that description to me.
Everything else is easier in rendering.Aila's code does just ray traversal. A real world ray tracer would do more.
Why wouldn't speculation suit other highly parallel machines?Some level of speculation might be fine, but excessive speculation to suit just one product is unlikely to be present in a non-synthetic vendor neutral app.
If other tools are lagging in the absolute performance they produce then...Tool based optimization is great, but an architecture and it's tools have to make third party code - written with some performance portability across vendors in mind - run fast.
Kepler is the basis of several years' worth of tools development, I'm sure it'll get better.I don't think Kepler's tools are capable of the kinds of changes that Aila made yet.
This isn't your grandma's auto-vectorisation.Dally's presentation looks ambitious. We haven't seen the kind of compiler wizardry anywhere, least of all from C-ish compilers.
That's what he's selling, declarative abstractions that are meaningful to their tools instead of asking programmers to peer into manuals and trying to make sense of directed tests.I am not sure what he is selling is even possible in a CUDA compiler, with all the pointers and side effects and what not. They'll probably have to switch to a different language or dump a lot more proprietary extensions into CUDA to make it more declarative.
Is there any more details?ALU scheduling in GK104 is predictable. It is not in Fermi and prior. That makes it more amenable to the tools NVidia is developing.
If Nvidia persist to dont give OpenCL attention and try to slow down the project ( they are part of Khronos group ), i think they could really do a big error. They have the hardware, and i dont think the adoption of OpenCL ( in the actual situation ) will change anything for Tesla.
I don't know of anything very specific I'm afraid.Is there any more details?
blogs.nvidia.com/2012/04/no-free-lunch-for-intel-mic-or-gpus/Of course, our next
generation Kepler GPU will be shipping
later this year, and you can expect some
impressive enhancements of both the
programming environment and
performance on that platform. We'll be
able to have a grounded conversation
about programming for performance at
that time.
EVGA GTX 680 was in stock again for under a minute (or perhaps newegg is just slow with the auto notification, but it certainly wasn't there for long). So there is some stock moving, (but not very much and certainly way less availability than the 7970 after the first week (hard to tell what the real volume is, other than not high)). So does that speak to issues with NVs yields, or is it just that AMD had a bit of extra lead time on shipping? (probably too soon to tell)
I got Auto-Notify messages for four different GTX680 Cards today but they were gone very fast.EVGA GTX 680 was in stock again for under a minute (or perhaps newegg is just slow with the auto notification, but it certainly wasn't there for long). So there is some stock moving
I got Auto-Notify messages for four different GTX680 Cards today but they were gone very fast.
I think we are talking at cross purposes here. You are speaking of the compiler's ability to schedule hw threads (or play a role in it). I am spaeking of compiler's code generation and instruction scheduling within a single thread.Make up your mind:
ALU scheduling in GK104 is predictable. It is not in Fermi and prior. That makes it more amenable to the tools NVidia is developing.