Hardware.fr analysis is up (still in french):
http://www.hardware.fr/articles/782-1/nvidia-geforce-gf100-revolution-geometrique.html
http://www.hardware.fr/articles/782-1/nvidia-geforce-gf100-revolution-geometrique.html
Can obviously not be done with DX gather ... at a guess they will simply advocate using Sample multiple times with varying offsets, and point sample filtering, they have 16 LSUs after all.[*]Sampler will do jittered-offset for Gather4 (no idea how, the texture-space offset is constant per call)
Maybe it's just a common case that they can detect in the shader code and apply the hardware solution (possibly rewriting the shader code on the fly)?Would have to be SM5.1 then ... unless Microsoft introduces a new shader instruction without bumping the version.
So I just read the summaries and what-not of Fermi. I'm not the most technical person on the GPU side, more a CPU person, but from looking at the slides Nvidia have build a quad-core GPU no, with each core having 128CUDA cores and a rasterizing engine.
This diagram looks eerily similar to early dual/quad core CPU designs from AMD and later Intel. So this chip is going to be massively parallel right. Is this the best path for Nvidia to take, from experience with parallelism in the CPU space it has taken 6-7 years for some of the more basic elements to take advantage of having >1 core. Then again I suppose the same questions were raised when AMD went multi-core with Athlon.
Anyway, that is my limited understanding of what Nvidia are trying to achieve, it's probably misguided and wrong, so please correct me if this is the case. If I'm right the design paves the way for very easy scalability, I mean they can basically cut the chip in half and get 50% performance with a 50% die size or increase the number of 'cores' in the next iteration (32nm) to 6 along with other general architectural improvements and efficiency gains and finally stop increasing the die size but get massive performance increases similar to what Intel/AMD do in the CPU space.
What is the real thinking, am I completely wrong?
Sure, but maybe there is some sort of "magic code" (i.e. a code sequence that the driver recognizes) and Nvidia is instructing developers how to exploit that or maybe it's more a game-by-game optimization thing where Nvidia enables support for "important" games by changing shaders (think 3DMark optimization).You really want it to be deterministic though and not have the driver guess you wanted jittered samples when you really didn't.
Nathan, all GPU architectures have been relatively modular for a while, this isn't really anything new.
I guess that was the plan, though they can scale also by # of SMs within each GPC.Basically what I am trying to find out is whether Nvidia could ship this with just a single GPC in a tiny package for notebooks or embedded devices or increase the GPC count to 6/8 in the next gen.
Well if that was the plan and Nvidia delayed to execute properly then surely it has been worth it as it sets up the next 2-3 generations for them with easy linear increases in power by just increasing the number of GPCs and increasing the clock speeds as the process node matures.
And how is that? If we take the lower of the die-size estimates 550mm2 and estimate that a half GF100 would be 0.6 times (scaling down is not linear) then we end up with exactly the size of Cypress, 330mm2. Pairing that up with 128bit GDDR5, I dont see how it be competitive with 285 or 5850? It should be faster than the 5770 though.In terms of scalability I don't really get why Fermi should be easier than GT200. Having all the fixed function stuff outside of the SIMDs certainly didn't hamper AMD's ability to quickly scale Cypress downward. I always figured that GT200's biggest problem was that its derivatives would be uncompetitive with G92 and its derivatives and that has now been shown to be true (on 40nm, 55nm would've been a disaster). One potential bright spot is that a half-Fermi should be smaller than Cypress and based on early numbers looks like it would be competitive with the GTX 285 / HD 5850 especially in newer titles.