Nvidia BigK GK110 Kepler Speculation Thread

LiXiangyang · Mar 9, 2013

But of cause, if MIC's cache is just as programmable/manageable as GK110's L1 cache, then the situation can be changed.

keldor314 · Mar 9, 2013

rpg.314 said:
Bigger scenes mean it's even more important to have better cache utilization.
Even if the cache thrashes, it is no more costly than striaght load/store to memory since power is mostly burnt in going off die.

The larger the scene, the higher the divergence even between nearby rays since the acceleration structure goes deeper.

rpg.314 said:
There has been a fair amount of work for reordering rays for better cache utilization with real scenes. Nvidia's paper showed they were able to save >90% bw with their prototype design for highly incoherent scenes.

I haven't seen that paper.

I know there's been a lot of work on ray bundling, but I haven't heard much about it actually having that much benefit, given that the overhead of sorting the rays.

rpg.314 said:
Not with directories. MIC uses them a lot.

Directories have similar latency to traditional caches. You just need them since traditional caches don't scale past a handful of cores. The downside is that you need a fair bit of extra data for each cache line, about 2 bits per core, hence MIC would have 64 or so bits of extra overhead for each cache line.

rpg.314 · Mar 9, 2013

keldor314 said:
The larger the scene, the higher the divergence even between nearby rays since the acceleration structure goes deeper.

I haven't seen that paper.

I know there's been a lot of work on ray bundling, but I haven't heard much about it actually having that much benefit, given that the overhead of sorting the rays.

Paper
http://dl.acm.org/citation.cfm?id=1921497&dl=ACM&coll=DL&CFID=188652385&CFTOKEN=88988771

Presentation
http://www.highperformancegraphics..../RayTracing_II/HPG2010_RayTracing_II_Aila.pdf

Directories have similar latency to traditional caches. You just need them since traditional caches don't scale past a handful of cores. The downside is that you need a fair bit of extra data for each cache line, about 2 bits per core, hence MIC would have 64 or so bits of extra overhead for each cache line.

Bits are free. Communication is expensive.

rpg.314 · Mar 9, 2013

LiXiangyang said:
But it still generates heat and adds largely useless area to the die that would be otherwise used in better ways for improving performance, needless to say the fact larger cache and more levels of cache itself can damage performance if cache-miss is high since caching itself add latencies through caching and addressing.

I remeber sandy bridge/ivy bridge's L3 cache has a latency of 40-50 CPU cycles, and L1/L2 together has 20-30 cycles of latency, whilst the latency of memory is "merely" 150-200 CPU cycles, so if your cache contributed little besides heat, die size and cache-misses, then the situation will become pretty ugly.

GK110 has 1.5MB of global L2 cache and 64K of L1 cache per core, and its L1 cache is manageable and programmable (althrough I hope they could improve their L1 cache's accessing), it also has a large amount of registers, coupling with the fact it has short pipeline and designed to use parallelization to hide latency, I think it maybe a better design routine for HPC-scale multi-threading applications.

Anyway, the release date of MIC is near, anyone can try it, I think some will be disappointed just like me.

That depends on how useful the cache is. Like nv showed, cache can be very useful for ray tracing.

I think one of the problems with MIC is that that #pragma's are not a good model for robust vectorization. SPMD models need to become more widely adopted.

keldor314 · Mar 10, 2013

rpg.314 said:
Paper
http://dl.acm.org/citation.cfm?id=1921497&dl=ACM&coll=DL&CFID=188652385&CFTOKEN=88988771

Please give the version *not* behind a paywall.

http://www.tml.tkk.fi/~timo/publications/aila2010hpg_paper.pdf

Does the paper mention the cost of sorting the rays? I didn't see it, but I don't have time to do more than skim the paper at the moment. I did notice them mention that with their treelet approach, sorting the rays no longer had a noticeable effect.

Anyhow, eventually we will run into a bandwidth wall for ray tracing, but apparently not yet - http://www.tml.tkk.fi/~timo/publications/aila2012hpg_techrep.pdf. By the same authors and everything.

rpg.314 said:
Bits are free. Communication is expensive.

Exactly the purpose for using a directory system, since a snoopy cache scales by O(n^2). I'm just pointing out that for a CPU system with only a few cores the communication may be cheaper than the bits.

rpg.314 said:
I think one of the problems with MIC is that that #pragma's are not a good model for robust vectorization. SPMD models need to become more widely adopted.

This is the biggest problem I have with MIC. My experience is that SIMT is quite easy to use, whereas explicit SIMD is hellish, and things like OpenMP (I think that's the right one...) are a mess with transferring stuff across between CPU and GPU. Much of the time, it's just better to put the serial stuff on the GPU rather than eating the cost of shuffling stuff across the PCIe bus, since you're likely to get at least Pentium 3 level performance, and since the serial stuff tends to be book keeping stuff, and not particularly expensive.

rpg.314 · Mar 10, 2013

keldor314 said:
Exactly the purpose for using a directory system, since a snoopy cache scales by O(n^2). I'm just pointing out that for a CPU system with only a few cores the communication may be cheaper than the bits.

Then don't snoop. That's what directories are for.

DSC · Mar 19, 2013

http://www.techpowerup.com/181601/N...0-based-GeForce-Graphics-Card-for-Summer.html

Looks like Nvidia will be releasing a 13 SMX 2496 CUDA cores, 320bit 5GB GDDR5 RAM GK110 card to slot under TITAN.

Cookie Monster · Mar 19, 2013

DSC said:
http://www.techpowerup.com/181601/N...0-based-GeForce-Graphics-Card-for-Summer.html

Looks like Nvidia will be releasing a 13 SMX 2496 CUDA cores, 320bit 5GB GDDR5 RAM GK110 card to slot under TITAN.

Guessing its either under the 6x0 series moniker i.e.priced at $599 or the Titan brand so $899..

Wonder what they will call this. GTX Titan Mini?

UniversalTruth · Mar 19, 2013

Maybe NVIDIA should stop releasing pointless cards and start making cheaper cards with same performance. Pretty much EVERY single card from NVIDIA is overpriced compared to similar performing Radeons. Every time i'm buying new gfx card i look at both and see that all GeForces are too expensive for what they offer.

I.S.T. · Mar 19, 2013

There is a price disparity, yes, but I wouldn't go that far... Titan is a showoff product mostly, a dick enhancer or a way to REALLY future proof your system.

lanek · Mar 19, 2013

Well without new AMD cards on the market or who will come soon, they are free to set the price they want.

We was expect to see them release something like that, and im sure, peoples tempted by Titan but not the price can jump on it. ( still too high priced for the performance difference with actual offer ).

Im a bit worried about the rest of the "future" Nvidia lineup.

The titan is 33% faster of a 680. 25-27% faster of a 7970ghz. if this card take place at 10-15% performance under Titan.. it will not stay a lot of margin for a 680 refresh.
And with such an high price.. i dont see them start a GTX 700 series with performance close of the 680 at a 550$ price point.

UniversalTruth · Mar 19, 2013

You are so right.

That is exactly what I am afraid of.

AMd will follow and say- "alright, you want titanIC's performance, then we will charge you 900ish € as well"

The proof- 7790 where they didn't cut any prices but instead cut the performance and then lowered the price.

Which of course sucks very bad and it means forever growing prices.

Blazkowicz · Mar 19, 2013

The tech is slowing down too, Moore's law is said to have gone to three years instead of two or 1.5, new processes cost dearly and lack cost-per-transistor reduction, memory has stagnated - we've had gddr5 on 256bit or 384bit as the standard for quite some years.

So, the times aren't really great. Only solution is to wait longer than in the past for new gens.

Dave Baumann · Mar 19, 2013

UniversalTruth said:
The proof- 7790 where they didn't cut any prices but instead cut the performance and then lowered the price.

Errr, what?

UniversalTruth · Mar 19, 2013

We are offtopic here but since you ask:

http://www.techpowerup.com/181257/AMD-Radeon-HD-7790-quot-Bonaire-quot-Detailed-Some-More.html

AMD appears to have a gaping hole in its product stack, between the ~$110 Radeon HD 7770 and ~$170 Radeon HD 7850 1GB, which needs filling. NVIDIA's ~$150 GeForce GTX 650 Ti appears to be getting cozy in that gap. AMD plans to address this $110~$170 gap not by lowering price of the HD 7850 1GB, but by introducing an entirely new SKU.

Kaotik · Mar 19, 2013

UniversalTruth said:
We are offtopic here but since you ask:

http://www.techpowerup.com/181257/AMD-Radeon-HD-7790-quot-Bonaire-quot-Detailed-Some-More.html

AMD appears to have a gaping hole in its product stack, between the ~$110 Radeon HD 7770 and ~$170 Radeon HD 7850 1GB, which needs filling. NVIDIA's ~$150 GeForce GTX 650 Ti appears to be getting cozy in that gap. AMD plans to address this $110~$170 gap not by lowering price of the HD 7850 1GB, but by introducing an entirely new SKU.

Why on earth would they cut price of over 30% faster card just to fight certain price point better when they have cheaper to build cheaper chip for that point available, too? (which is around ~20% faster than 650 Ti based on preliminary benches)

Blazkowicz · Mar 19, 2013

And why can't I get a GT 640 priced the same as the cheapest 6670 ddr3? They owe me something, right?

UniversalTruth · Mar 19, 2013

Kaotik said:
Why on earth would they cut price of over 30% faster card just to fight certain price point

That's what I'm saying. Welcome to the next-gen AMD card with titan performance at its former price tag.

Kaotik · Mar 19, 2013

UniversalTruth said:
That's what I'm saying. Welcome to the next-gen AMD card with titan performace at its former price tag.

Huh?
They're introducing a new SKU at similar price as 650 Ti with around 20% more performance, that's nothing like what you're saying.

DSC · Mar 19, 2013

http://blogs.nvidia.com/2013/03/a-demo-thats-truly-a-head-of-its-time/

Very impressive demo running on TITAN if you saw it on the GTC 2013 livestream.

Nvidia BigK GK110 Kepler Speculation Thread

LiXiangyang

keldor314

rpg.314

rpg.314

keldor314

rpg.314

DSC

Cookie Monster

UniversalTruth

I.S.T.

lanek

UniversalTruth

Blazkowicz

Dave Baumann

Gamerscore Wh...

UniversalTruth

Kaotik

Drunk Member

Blazkowicz

UniversalTruth

Kaotik

Drunk Member

DSC

Similar threads