LiXiangyang
Newcomer
But of cause, if MIC's cache is just as programmable/manageable as GK110's L1 cache, then the situation can be changed.
Bigger scenes mean it's even more important to have better cache utilization.
Even if the cache thrashes, it is no more costly than striaght load/store to memory since power is mostly burnt in going off die.
There has been a fair amount of work for reordering rays for better cache utilization with real scenes. Nvidia's paper showed they were able to save >90% bw with their prototype design for highly incoherent scenes.
Not with directories. MIC uses them a lot.
The larger the scene, the higher the divergence even between nearby rays since the acceleration structure goes deeper.
I haven't seen that paper.
I know there's been a lot of work on ray bundling, but I haven't heard much about it actually having that much benefit, given that the overhead of sorting the rays.
Bits are free. Communication is expensive.Directories have similar latency to traditional caches. You just need them since traditional caches don't scale past a handful of cores. The downside is that you need a fair bit of extra data for each cache line, about 2 bits per core, hence MIC would have 64 or so bits of extra overhead for each cache line.
But it still generates heat and adds largely useless area to the die that would be otherwise used in better ways for improving performance, needless to say the fact larger cache and more levels of cache itself can damage performance if cache-miss is high since caching itself add latencies through caching and addressing.
I remeber sandy bridge/ivy bridge's L3 cache has a latency of 40-50 CPU cycles, and L1/L2 together has 20-30 cycles of latency, whilst the latency of memory is "merely" 150-200 CPU cycles, so if your cache contributed little besides heat, die size and cache-misses, then the situation will become pretty ugly.
GK110 has 1.5MB of global L2 cache and 64K of L1 cache per core, and its L1 cache is manageable and programmable (althrough I hope they could improve their L1 cache's accessing), it also has a large amount of registers, coupling with the fact it has short pipeline and designed to use parallelization to hide latency, I think it maybe a better design routine for HPC-scale multi-threading applications.
Anyway, the release date of MIC is near, anyone can try it, I think some will be disappointed just like me.
Bits are free. Communication is expensive.
I think one of the problems with MIC is that that #pragma's are not a good model for robust vectorization. SPMD models need to become more widely adopted.
Then don't snoop. That's what directories are for.Exactly the purpose for using a directory system, since a snoopy cache scales by O(n^2). I'm just pointing out that for a CPU system with only a few cores the communication may be cheaper than the bits.
http://www.techpowerup.com/181601/N...0-based-GeForce-Graphics-Card-for-Summer.html
Looks like Nvidia will be releasing a 13 SMX 2496 CUDA cores, 320bit 5GB GDDR5 RAM GK110 card to slot under TITAN.
The proof- 7790 where they didn't cut any prices but instead cut the performance and then lowered the price.
We are offtopic here but since you ask:
http://www.techpowerup.com/181257/AMD-Radeon-HD-7790-quot-Bonaire-quot-Detailed-Some-More.html
AMD appears to have a gaping hole in its product stack, between the ~$110 Radeon HD 7770 and ~$170 Radeon HD 7850 1GB, which needs filling. NVIDIA's ~$150 GeForce GTX 650 Ti appears to be getting cozy in that gap. AMD plans to address this $110~$170 gap not by lowering price of the HD 7850 1GB, but by introducing an entirely new SKU.
Why on earth would they cut price of over 30% faster card just to fight certain price point
That's what I'm saying. Welcome to the next-gen AMD card with titan performace at its former price tag.