Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Llano gpu is a evergreen derivative.I know, but at one point or another gpu functionality, at least for low end gpu power, will have to move to SOI. Fusion is not that far away.
Is it out of the box to think that graphic functionality will come from NI architecture?
For one thing, they have high current power gating on SOI (I don't think anyone does it in bulk except for Intel).
Their global atomics are already cached.
While I understand why and how to make reg file and local memory OR local memory and L1 cache unified I cant understand why L1 ache and reg file should be unified, what's the advantage? How to do it?e) Actually a wish: Unification of atleast 2 out of {Reg file, local memory, L1 cache} pools. All 3 unified will be a Christmas gift. :smile:
While I understand why and how to make reg file and local memory OR local memory and L1 cache unified I cant understand why L1 ache and reg file should be unified, what's the advantage? How to do it?
Sorry, I must have missed something, didn't LRB1 had separated reg files and caches?How to do it? Well, the only answer to that I have right now is a variant of LRB1's memory system.
One question though -> how could they cope with the occurrence of simultaneous accesses to both L1, LDS and register file, which should happen quite often?The advantage is flexibility and higher overall utilization. It will make the performance cliffs much more gradual. I'll prefer to have L1 and local mem unified, though I guess reg file and local mem is an easier target.
How to do it? Well, the only answer to that I have right now is a variant of LRB1's memory system.
Can you expand a bit more on "The key to GPRs in ATI is that their data is private to the owning ALU"?The key to GPRs in ATI is that their data is private to the owning ALU - though there's a bus that connects the GPRs to the TUs, LDS and import/export blocks. That owning-ALU constraint makes the high throughput of the GPRs possible. Full-speed GPR data from arbitrary locations in the register file to arbitrary ALUs would be a nightmare.
http://www.research.ibm.com/people/h/hind/pldi08-tutorial_files/GPGPU.pdfCan you expand a bit more on "The key to GPRs in ATI is that their data is private to the owning ALU"?
No. Fermi has a load-store ISA, and it doesn't need to do simultaneous access to any of the three memory pools.One question though -> how could they cope with the occurrence of simultaneous accesses to both L1, LDS and register file, which should happen quite often?
LRB's register file is too small (32 float16 registers per hw thread) to hold the context for the entire work group. The context is really stored in the per core cache.Sorry, I must have missed something, didn't LRB1 had separated reg files and caches?
Err... differential signaling on GDDR5 coming?