NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
For all we know each TF does single cycle FP16 filtering now like R6xx and we'll get free 4xAF INT8 or free 2xAF FP16 due to the 2:1 ratio.
That would only get you free 2xAF INT8 (like free 2xAF FP16) - otherwise that would be 160 TMUs with half-rate FP16 :).

In that scenario your theoretical INT8 bilinear performance gets cut in half but FP16 and AF performance get a shot in the arm relative to G92.
But if the TF units are single-cycle FP16, all the more reason why you need more TA units - now not only INT8, but also FP16 bilinear performance would be cut in half too (though memory bandwidth will typically limit this way earlier anyway probably). I'd be totally unsurprised though if the TF units are unchanged and still single-cycle INT8 only.
 
But if the TF units are single-cycle FP16, all the more reason why you need more TA units - now not only INT8, but also FP16 bilinear performance would be cut in half too.

I don't think you're doing the math right.

FP16 bilinear G92: 32 units * 1.0 full speed * 675Mhz = 21600 MT/s (all 64 TF's used)
FP16 bilinear GT200 (speculated): 40 units * 1.0 full speed * 600Mhz = 24000 MT/s (only 40 TF's used)

FP16 trilinear G92: 32 units * 0.5 half speed * 675Mhz = 10800 MT/s (all 64 TF's used)
FP16 trilinear GT200 (speculated): 40 units * 1.0 full speed * 600Mhz = 24000 MT/s (all 80 TF's used)

So FP16 bilinear goes up slightly and trilinear is doubled.

The halving of INT8 bilinear would be irrelevant since G92 doesnt have the bandwidth to get up there anyway.
 
I don't think you're doing the math right.

FP16 bilinear G92: 32 units * 1.0 full speed * 675Mhz = 21600 MT/s (all 64 TF's used)
FP16 bilinear GT200 (speculated): 40 units * 1.0 full speed * 600Mhz = 24000 MT/s (only 40 TF's used)

FP16 trilinear G92: 32 units * 0.5 half speed * 675Mhz = 10800 MT/s (all 64 TF's used)
FP16 trilinear GT200 (speculated): 40 units * 1.0 full speed * 600Mhz = 24000 MT/s (all 80 TF's used)

So FP16 bilinear goes up slightly and trilinear is doubled.
Well with cut in half I meant relative to what it could do wrt the 80 (single-cycle FP16) TFs - not in comparison to G92. So yes the math is correct (I did mention bilinear - you're quite right it's irrelevant for AF or trilinear).

The halving of INT8 bilinear would be irrelevant since G92 doesnt have the bandwidth to get up there anyway.
I don't think so. INT8 very often means compressed textures, which requires like nothing in terms of memory bandwidth (for 48GT/s you'd only ever need 48GB/s with DXT5 or half that with DXT1 assuming semi-sane conditions (mipmaps, high cache hit rate etc.).
 
I don't think so. INT8 very often means compressed textures, which requires like nothing in terms of memory bandwidth (for 48GT/s you'd only ever need 48GB/s with DXT5 or half that with DXT1 assuming semi-sane conditions (mipmaps, high cache hit rate etc.).

How many INT8 textures are only bilinearly filtered nowadays?
 
How many INT8 textures are only bilinearly filtered nowadays?
Dunno. I am waiting for AMD and NVIDIA to come to their senses and offer 1:1 TA:TF ratios again though :).
Since more TF than TA is useful for trilinear, AF.
More TA than TF is good for point sampling, vertex texture fetch, etc. (and I was told point sampling is indeed rather common nowadays for table lookups)
Same amount of TA than TF is good for bilinear - the golden compromise :).
 
What about CUDA? Wouldn't you starve the ALUs a bit if only 1 out of 6 of them had an Adress-Unit avaiable at any given time? I know, CUDA's supposed to be massively math-bound, but nonetheless you'd have to keep them ALUs fed.
 
What about CUDA? Wouldn't you starve the ALUs a bit if only 1 out of 6 of them had an Adress-Unit avaiable at any given time? I know, CUDA's supposed to be massively math-bound, but nonetheless you'd have to keep them ALUs fed.

Pardon my ignorant question, but aren't memory requests made per SIMD multi-processor array, not per SP?
 
http://www.fudzilla.com/index.php?option=com_content&task=view&id=7743&Itemid=1

If the information we were told is correct, then Nvidia is having some problems ramping up the production of the GT200 series and this sounds all too familiar when it comes to new Nvidia products.

We’ve heard that even large distributors have a hard time getting more than a handful of cards and that is only if they place large orders of products that Nvidia’s partners have overstock of.
So, between being forced to wait for R700 and GT200 being in short supply, enthusiasts look like they won't have a happy summer...

Jawed
 
How many INT8 textures are only bilinearly filtered nowadays?
The question you should be asking is how often INT8 fetches wind up being only bilinearly filtered, and the answer is most of them.

All pixels with textures undergoing magnification need only one bilinear sample, and at high resolution the percentage of such texture fetches goes up. Also, if AF is disabled or not needed then there's the "brilinear" optimizations too (which are very sensible if not taken beyond, say, 50%) that make many texture fetches need only one bilinear sample.
 
The question you should be asking is how often INT8 fetches wind up being only bilinearly filtered, and the answer is most of them.

In that case my vote is for a 1:1 TA:TF ratio in GT200. Although I think somebody hinted that this wasn't the case earlier (Lukfi maybe?).
 
they always do these things, why do you think Dell was still selling HD2xxx series till recently? When the HD2xxx was being replaced with the HD3xxxx Dell still has x1xxx on the plate :???:

nm they are still selling 2xxx series

http://configure.us.dell.com/dellstore/config.aspx?c=us&cs=19&l=en&oc=DXCWPP4&s=dhs

I don't think it's fair to say Dell is selling the "2xxx series" when they happen to feature a single model from that family of products. The single most popular, and cheapest model, at that.
 
Status
Not open for further replies.
Back
Top