Nvidia BigK GK110 Kepler Speculation Thread

Bigger scenes mean it's even more important to have better cache utilization.
Even if the cache thrashes, it is no more costly than striaght load/store to memory since power is mostly burnt in going off die.

The larger the scene, the higher the divergence even between nearby rays since the acceleration structure goes deeper.

There has been a fair amount of work for reordering rays for better cache utilization with real scenes. Nvidia's paper showed they were able to save >90% bw with their prototype design for highly incoherent scenes.

I haven't seen that paper.

I know there's been a lot of work on ray bundling, but I haven't heard much about it actually having that much benefit, given that the overhead of sorting the rays.

Not with directories. MIC uses them a lot.

Directories have similar latency to traditional caches. You just need them since traditional caches don't scale past a handful of cores. The downside is that you need a fair bit of extra data for each cache line, about 2 bits per core, hence MIC would have 64 or so bits of extra overhead for each cache line.
 
The larger the scene, the higher the divergence even between nearby rays since the acceleration structure goes deeper.



I haven't seen that paper.

I know there's been a lot of work on ray bundling, but I haven't heard much about it actually having that much benefit, given that the overhead of sorting the rays.

Paper
http://dl.acm.org/citation.cfm?id=1921497&dl=ACM&coll=DL&CFID=188652385&CFTOKEN=88988771

Presentation
http://www.highperformancegraphics..../RayTracing_II/HPG2010_RayTracing_II_Aila.pdf

Directories have similar latency to traditional caches. You just need them since traditional caches don't scale past a handful of cores. The downside is that you need a fair bit of extra data for each cache line, about 2 bits per core, hence MIC would have 64 or so bits of extra overhead for each cache line.
Bits are free. Communication is expensive.
 
But it still generates heat and adds largely useless area to the die that would be otherwise used in better ways for improving performance, needless to say the fact larger cache and more levels of cache itself can damage performance if cache-miss is high since caching itself add latencies through caching and addressing.

I remeber sandy bridge/ivy bridge's L3 cache has a latency of 40-50 CPU cycles, and L1/L2 together has 20-30 cycles of latency, whilst the latency of memory is "merely" 150-200 CPU cycles, so if your cache contributed little besides heat, die size and cache-misses, then the situation will become pretty ugly.

GK110 has 1.5MB of global L2 cache and 64K of L1 cache per core, and its L1 cache is manageable and programmable (althrough I hope they could improve their L1 cache's accessing), it also has a large amount of registers, coupling with the fact it has short pipeline and designed to use parallelization to hide latency, I think it maybe a better design routine for HPC-scale multi-threading applications.

Anyway, the release date of MIC is near, anyone can try it, I think some will be disappointed just like me.

That depends on how useful the cache is. Like nv showed, cache can be very useful for ray tracing.

I think one of the problems with MIC is that that #pragma's are not a good model for robust vectorization. SPMD models need to become more widely adopted.
 

Please give the version *not* behind a paywall.:p
http://www.tml.tkk.fi/~timo/publications/aila2010hpg_paper.pdf

Does the paper mention the cost of sorting the rays? I didn't see it, but I don't have time to do more than skim the paper at the moment. I did notice them mention that with their treelet approach, sorting the rays no longer had a noticeable effect.

Anyhow, eventually we will run into a bandwidth wall for ray tracing, but apparently not yet - http://www.tml.tkk.fi/~timo/publications/aila2012hpg_techrep.pdf. By the same authors and everything.

Bits are free. Communication is expensive.

Exactly the purpose for using a directory system, since a snoopy cache scales by O(n^2). I'm just pointing out that for a CPU system with only a few cores the communication may be cheaper than the bits.

I think one of the problems with MIC is that that #pragma's are not a good model for robust vectorization. SPMD models need to become more widely adopted.

This is the biggest problem I have with MIC. My experience is that SIMT is quite easy to use, whereas explicit SIMD is hellish, and things like OpenMP (I think that's the right one...) are a mess with transferring stuff across between CPU and GPU. Much of the time, it's just better to put the serial stuff on the GPU rather than eating the cost of shuffling stuff across the PCIe bus, since you're likely to get at least Pentium 3 level performance, and since the serial stuff tends to be book keeping stuff, and not particularly expensive.
 
Last edited by a moderator:
Exactly the purpose for using a directory system, since a snoopy cache scales by O(n^2). I'm just pointing out that for a CPU system with only a few cores the communication may be cheaper than the bits.
Then don't snoop. That's what directories are for.
 
Maybe NVIDIA should stop releasing pointless cards and start making cheaper cards with same performance. Pretty much EVERY single card from NVIDIA is overpriced compared to similar performing Radeons. Every time i'm buying new gfx card i look at both and see that all GeForces are too expensive for what they offer.
 
There is a price disparity, yes, but I wouldn't go that far... Titan is a showoff product mostly, a dick enhancer or a way to REALLY future proof your system.
 
Well without new AMD cards on the market or who will come soon, they are free to set the price they want.

We was expect to see them release something like that, and im sure, peoples tempted by Titan but not the price can jump on it. ( still too high priced for the performance difference with actual offer ).

Im a bit worried about the rest of the "future" Nvidia lineup.

The titan is 33% faster of a 680. 25-27% faster of a 7970ghz. if this card take place at 10-15% performance under Titan.. it will not stay a lot of margin for a 680 refresh.
And with such an high price.. i dont see them start a GTX 700 series with performance close of the 680 at a 550$ price point.
 
You are so right. :( That is exactly what I am afraid of.

AMd will follow and say- "alright, you want titanIC's performance, then we will charge you 900ish € as well"

The proof- 7790 where they didn't cut any prices but instead cut the performance and then lowered the price.

Which of course sucks very bad and it means forever growing prices.
 
The tech is slowing down too, Moore's law is said to have gone to three years instead of two or 1.5, new processes cost dearly and lack cost-per-transistor reduction, memory has stagnated - we've had gddr5 on 256bit or 384bit as the standard for quite some years.

So, the times aren't really great. Only solution is to wait longer than in the past for new gens.
 
We are offtopic here but since you ask:

http://www.techpowerup.com/181257/AMD-Radeon-HD-7790-quot-Bonaire-quot-Detailed-Some-More.html

AMD appears to have a gaping hole in its product stack, between the ~$110 Radeon HD 7770 and ~$170 Radeon HD 7850 1GB, which needs filling. NVIDIA's ~$150 GeForce GTX 650 Ti appears to be getting cozy in that gap. AMD plans to address this $110~$170 gap not by lowering price of the HD 7850 1GB, but by introducing an entirely new SKU.
 
We are offtopic here but since you ask:

http://www.techpowerup.com/181257/AMD-Radeon-HD-7790-quot-Bonaire-quot-Detailed-Some-More.html

AMD appears to have a gaping hole in its product stack, between the ~$110 Radeon HD 7770 and ~$170 Radeon HD 7850 1GB, which needs filling. NVIDIA's ~$150 GeForce GTX 650 Ti appears to be getting cozy in that gap. AMD plans to address this $110~$170 gap not by lowering price of the HD 7850 1GB, but by introducing an entirely new SKU.

Why on earth would they cut price of over 30% faster card just to fight certain price point better when they have cheaper to build cheaper chip for that point available, too? (which is around ~20% faster than 650 Ti based on preliminary benches)
 
That's what I'm saying. Welcome to the next-gen AMD card with titan performace at its former price tag. :rolleyes:

Huh?
They're introducing a new SKU at similar price as 650 Ti with around 20% more performance, that's nothing like what you're saying.
 
Back
Top