Some fun at the perimeter:
Damn, must make a plea to the planners and engineers to put a "cookie monster" in our ASIC's!
Some fun at the perimeter:
Anyone seen any TMUs?
Jawed
Groovy.SANTA CLARA, Calif. —Sep. 30, 2009—Oak Ridge National Laboratory (ORNL) announced plans today for a new supercomputer that will use NVIDIA®’s next generation CUDA™ GPU architecture, codenamed “Fermi”. Used to pursue research in areas such as energy and climate change, ORNL’s supercomputer is expected to be 10-times more powerful than today’s fastest supercomputer.
Jeff Nichols, ORNL associate lab director for Computing and Computational Sciences, joined NVIDIA co-founder and CEO Jen-Hsun Huang on stage during his keynote at NVIDIA’s GPU Technology Conference. He told the audience of 1,400 researchers and developers that “Fermi” would enable substantial scientific breakthroughs that would be impossible without the new technology.
“This would be the first co-processing architecture that Oak Ridge has deployed for open science, and we are extremely excited about the opportunities it creates to solve huge scientific challenges,” Nichols said. “With the help of NVIDIA technology, Oak Ridge proposes to create a computing platform that will deliver exascale computing within ten years.”
ORNL also announced it will be creating the Hybrid Multicore Consortium. The goals of this consortium are to work with the developers of major scientific codes to prepare those applications to run on the next generation of supercomputers built using GPUs.
“The first two generations of the CUDA GPU architecture enabled NVIDIA to make real in-roads into the scientific computing space, delivering dramatic performance increases across a broad spectrum of applications,” said Bill Dally, chief scientist at NVIDIA. “The ‘Fermi’ architecture is a true engine of science and with the support of national research facilities such as ORNL, the possibilities are endless.”
No, no.. the right words are: give me a wallpaper sized RV870 die shot, now!!!1!1one
Damn, must make a plea to the planners and engineers to put a "cookie monster" in our ASIC's!
Oh, I obviously meant 'execution' as in 'execution units'; the scheduling hardware is still very much there and busy. If you do need a large amount of both FP (especially @DP) and cheap INT stuff, then the 'total' overhead is much larger than it 'needs' to be but not too awful (I'll admit to not fully know what the branching hardware can do on its own though, if much of anything). This is not an usual case, although as I said this is still (less) relevant to the many cases where you've got more MUL/ADDs than MADDs.How so?
You need to give the Nvidia engineers a little more creditArun said:You just can't have your cake and eat it too.
Oh, I obviously meant 'execution' as in 'execution units';
Wait, are you implying there's something I'm missing about the architecture? I assume you can't say, but if not and you're just saying I should be more enthusiastic, then don't get me wrong! This is a very very impressive solution for HPC, and from that point of view it's also a very exciting architecture with lots of nice things. The dual-scheduler approach isn't what I was expecting but it's definitely elegant. All this doesn't mean it's the best architecture for all possible purposes (nothing could ever be) and I was just pointing out one potential case where its weaknesses might be especially pronounced *if* I understood the architecture correctly. Here's hoping I didn't...You need to give the Nvidia engineers a little more credit
L2 is coherent with itself, there is a single L2 for each memory bus ... there can only ever be one copy.I am waiting to see how many ppl will start to complain about the introduction of some sort of a semi-coherent cache.
So the 16 SMs are on the "north" and "south" sides of the chip w/PCI-e and GDDR5 interfaces along the borders, any guesses as to what's in the center? Especially the very center. Scheduling?
In graphics I expect the 32-bit integer units, with all those bit manipulation capabilities, will be doing texturing while the floating point units are doing shader arithmetic - unless of course you have some integer shader math to do, in which case that'll get its turn.Heck, maybe that's what Bob is implying here! (i.e. there are cases where the units can actually both be used at the same time)
dnavas: I don't know if they're separate like that, but one GT200 diagram at the Tesla Editor Day clearly indicated separate INT units and then an engineer told me outright that was only marketing when I asked. Maybe it's the same this time around, or maybe it isn't. Heck, maybe that's what Bob is implying here! (i.e. there are cases where the units can actually both be used at the same time)
Does D3D11 compliance require hardware support, sorry for the dumb question Rys.I think the tesselator is a software pipe with very little hardware support, too.