I think that's mostly a result of a larger output buffer, but fundamentally it works the same. That's more of a tweak rather than an architectural change.I was refering to the geometry shader, not the overall performance.
I think that's mostly a result of a larger output buffer, but fundamentally it works the same. That's more of a tweak rather than an architectural change.I was refering to the geometry shader, not the overall performance.
GT200 has improved geometry shader, performance is way higher than G80, register file size is doubled etc. G80 is Compute Capability 1.0, GT200 is Compute Capability 1.3. GT200 fully decodes H.264 in hardware while G80 didn't. It's not just more units, it's very different internally almost everywhere.
http://www.anandtech.com/show/2549/5
Also GT2xx GPUs can do something that G8x/G9x GPU can't do.
High level changes are rather in a NV30---------G80-------Kepler-------etc. cadence (while Fermi being somewhat an in between case sample). Anything between G80 and Fermi were based on a similar architectural backbone. On hindsight were are the "high level improvements" at the competition? Does GCN sound to you like something they'll overhaul all that soon?
Wasn't GT200 just a G80 with more units and a handful of FP64 units added to the mix under that very same reasoning?
By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes.
GCN has already seen big front-end changes. What will happen over the next year is an open question.
By "high-level changes" I mean "improvements that change enough stuff to be obvious on the simplistic block diagrams that IHVs are willing to disclose". By this definition, all recent NV architectures featured high-level changes
Changes to the interconnect, which I understand to mean the network(s) by which different elements on the chip communicate. This sounds like it's a cross-bar in current NVidia chips, but as far as I know, there are topologies (e.g. flattened butterfly) that have better energy delay products.
I'm a hardware layman as well. I was thinking of this link (NoCs are discussed at the end). But it turns out I misremembered, and the comparison is between a mesh and flattened butterfly. So, err... ignore me. Perhaps someone with actual HW design experience can set me straight?
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).
Judging by the "CUDA density" on GM107, it is not hard to expect a 20nm GM110 is likely to accomodate ~6500 CUDA cores (thats based on the assumption that they dont try to fit in a nonsense denver/ARM cores into their high end Maxwells).
This is the approach that Broadcom chose for its Videocore
technology which is fully controllable by a clearly-documented
Remote Procedure Calls (RPC) interface. This allowed the
Raspberry Pi foundation to provide a binary-driver-free Linuxbased
OS on their products [34].
Why would Denver 64-Bit Custom ARM be nonsense in the very high-end?
It would seem like that is where it will make the most sense.
Even if the number of CUDA cores are reduced that is more than made up (and probably greatly exceeded) by replacing an Intel CPU with another GM110.