According to a Mod from known PCinlife, who has also die-shots and other informations in his hands, GTX 280 will offer 2 times 8800 Ultra performance.
And it has 3 times its floating point performance ? So, the missing mul maybe is still missing...
We chat with a lot of knowledgeable people in the industry and we've learned some quite interesting truth behind GT200 chip. Our developer friends called it brute force chip, with not so much brains.
Putting 240 Shader units in the chip that basically reminds on G80 and G92 design will naturally get things faster. More Shader units at faster clock will always make your card faster, especially at higher resolutions.
G92 with 65nm design has 128 Shaders while GT200 has 240 or almost twice as much. The die size of GT200 is much bigger than G92 and that is how you get the fastest chip around.
Our developer friends added that the last innovation that Nvidia did was G80 and that G92 is simply a die shrink of the same idea. You can look at GT200 as G92 with 240 Shaders.
This results in GT200 running quite hot but it will compensate with sheer power, so let’s just hope that Nvidia's yields of such a huge chip (rumoured bigger than 550mm2) will be acceptable.
Just like any other American company, Nvidia plans to continue ripping the Europeans off and we believe that Europeans actually got used to it. While Geforce GTX 260, the slower of two GT200 based cards will end up selling for about $450 in USA we heard that in European countries that are loyal to Euro should end up paying between 400 and 450 Euros.
Today 400 Euro converts to $620 while €450 converts to $697.5 which is a huge difference from the suggested US prices.
Geforce GTX 260 will be slower clocked with both memory and the GPU but it should still end up as the runner up to the fastest thing around.
Geforce GTX 280 will end up with similar price difference and we expect a price of around $600 in the USA and about €550 to €600 in Europe.
...You heard that right, the successor for the GT200 chip has already taped out, and it too will be nothing special. Documents seen by the INQ indicate that this one is called, wait for it, the GT200b, it is nothing more than a 55nm shrink of the GT200. Don't expect miracles, but do expect the name to change.
...The GT200 is about six months late, blew out their die size estimates and missed clock targets by a lot. ATI didn't. This means that buying a GT260 board will cost about 50 per cent more than an R770 for equivalent performance. The GT280 will be about 25 per cent faster but cost more than twice as much. A month or so after the 770 comes the 700, basically two 770s on a slab. This will crush the GT280 in just about every conceivable benchmark and likely cost less.
...The GT200b will be out in late summer or early fall, instantly obsoleting the GT200. Anyone buying the 65nm version will end up with a lemon, a slow, hot and expensive lemon.
What are they going to do? Emails seen by the INQ indicate they are going to play the usual PR games to take advantage of sites that don't bother checking up on the 'facts' fed to them. They plan to release the GT200 in absurdly limited quantities, and only four AIBs are going to initially get parts.
The documents talk about "Improved Dual Issue"... so make of it what you will.... Also mentioned are "2x Registers" and "3x ROP blending performance".
Thanks.Is today, die shot's day?
Ok, i will contribute with a G80 die shot, that even babies will understand.
Jawed it's a little easier to discern blocks with this picture , no?
One thing that I'm a bit dubious about is the way that a cluster is split into multiprocessor and TMU. There are parts of a cluster that aren't either, as far as I can tell, relating to scheduling/instruction issue.
I'm curious about the batch size (number of elements in a hardware thread) on GT200. G80 has an underlying batch size of 16 but I have the impression that it'll be 32 in GT200. I wonder if this leads to some simplification of the multiprocessors, or at least in the scoreboarding/scheduling/instruction-issuing.And I almost forgot to say I saw shared memory per multiprocessor gets doubled in GT200 vs G80 (32KB vs 16KB). And remember there are 30 multiprocessors vs 16 in G80.
I'm sure there'll be a lot of CUDA-using people jumping for joy over the register file increase :smile:The documents talk about "Improved Dual Issue"... so make of it what you will.... Also mentioned are "2x Registers" and "3x ROP blending performance".
No, that's part of the SMs. There are indeed things which are only present at the cluster level (I think constant cache is one of them, but I can't remember right now) and there's definitely some basic scheduling there too. However, it's probably fair to say that a significant majority of it is related to texturing or fetches in general.One thing that I'm a bit dubious about is the way that a cluster is split into multiprocessor and TMU. There are parts of a cluster that aren't either, as far as I can tell, relating to scheduling/instruction issue.
Of course the missing MUL was a serious hit to NVidia's claims for the efficiency of their ALUs. If it routinely achieves 2/3 of the "headline" GFLOPs rating - then it's better to just pretend it doesn't exist, which is why we have G80 as 346GFLOPs, not 518GFLOPs, etc.Sounds like they just fixed the MUL issue by adding register space and made a marketing issue out of it. I guess it could be viewed as discovered since it wasn't really available for general use, even through they counted it towards performance.
OK. One thing I'm still unclear about is whether NVidia's architecture has dedicated point-samplers in addition to the "TMU"s or whether in fact the modular configuration of texture fetching/filtering (e.g. addressing unit, fetching, filtering) allows them to run all fetches through the same samplers. It seems likely to be the latter.No, that's part of the SMs. There are indeed things which are only present at the cluster level (I think constant cache is one of them, but I can't remember right now) and there's definitely some basic scheduling there too. However, it's probably fair to say that a significant majority of it is related to texturing or fetches in general.
What's the current blending rate ?
Further, it can natively blend pixels in integer and floating point formats, including FP32, at rates that somewhat diminish with bandwidth available through the ROP (INT8 and FP16 full speed (measured) and FP32 half speed). Each pair of ROPs share a single blender (so 12 blends per cycle) from testing empirically.
I dare to propose, that the batch-size is untouched from the G80, e.g. an additional SIMD array of eight SPs per cluster just increases the threading parallelism...I'm curious about the batch size (number of elements in a hardware thread) on GT200. G80 has an underlying batch size of 16 but I have the impression that it'll be 32 in GT200. I wonder if this leads to some simplification of the multiprocessors, or at least in the scoreboarding/scheduling/instruction-issuing.
I dare to propose, that the batch-size is untouched from the G80, e.g. an additional SIMD array of eight SPs per cluster just increases the threading parallelism.
So now GT200 has a kind of triple-instruction issue, to name it.
I wonder, how this would impact the interpolation rate, compared to G80/92.