NVIDIA GF100 & Friends speculation

Well the ALU/TEX ratio will be increasing dramatically from GT200, per comments on evga forum. So yes, the number of texture units will be going down (presumably to 64).
 
Last edited by a moderator:
Guys, please remember the Charlie love/hate thread is that way ------->

To the person who linked Charlie's article in this thread upstream, if I could come across the internet and donkey punch you in your twig 'n berries right now, I would. Sheesh.
 
Well the article was good for a laugh I guess. I know me and a few others enjoyed. I guess he can believe what he wants to believe. Performance will speak for itself.
 
Its just slides with info that been going around for the last 3 week. There is no reviewer benchmarks just slides from NV that say what they say the performance will be. Its more a waste of time. If reviewers had cards to work with and put there finding is one thing but this is just highly controlled info with no real independent reviewer data.
 
Its just slides with info that been going around for the last 3 week. There is no reviewer benchmarks just slides from NV that say what they say the performance will be. Its more a waste of time. If reviewers had cards to work with and put there finding is one thing but this is just highly controlled info with no real independent reviewer data.

Yea, the lack of consumer-related info on things like different board configs, clock speeds, price, power gobbling, etc., is disappointing, but NV is most likely still getting a handle on this beast's yields.
 
Interesting:
There are 64 texture units, compared to the GTX 285’s 80, but the Texture Units have been moved inside the Third Generation Streaming Multiprocessors (SM)for improved efficiency and clock speed. In fact, the Texture Units will run at a higher clock speed than the core GPU clock.

So, 64 texture units but they are effectively clocked twice as high. Not too shabby.

At least thats how I read it, they are clocked at the shader hotclock?
 
Well, its past 9pm, anyone have anything interesting to report?

A few days ago on chiphell tomsmith(post #66, talking about Fellix's numbers posted here previously):
nVidia 的人已经默认这是真的,不过说频率应该还有一点空间,这应该是650M 448SP 的默认性能。512SP 的需要第二阶段放出,6月前应该没有了。

"Nvidia people already agree this is to be correct, only frequency to be a little bit of space, this is 650Mhz 448SP default performance. 512SP of requires a second stage to create, 6 months should not be available."

ie Current cards at 650Mhz/448 shaders are at the previously quoted performance figures with scope to increase frequency a little. 512 shader product will require a retape.

So a long wait still..... :cry:
 
Last edited by a moderator:
Interesting:

So, 64 texture units but they are effectively clocked twice as high. Not too shabby.

At least thats how I read it, they are clocked at the shader hotclock?

Don't know how the clock domains are going to work out, only that the Texture Units do not operate at the GPU/ROP clock, they will be higher, if they match Shader clock I do not know yet. No information on clock speed domains.
 
No that's not it. The TMUs are clocked at half the speed of the Cuda cores. It the CC are at 1.5gig then the TMU's are at 750mhz. The GTX285 TMU's is at 600mhz. I think the the CC target speed is 1.5gig but hearing it might be 1.2gig.
 
I wonder how the 448 shader thing works with a raster engine per 4 SMs. One just has a lot of free time? Or is it two have a little free time? Seems like the would more likely go for 384, disabling an enitre unit.
 
No that's not it. The TMUs are clocked at half the speed of the Cuda cores. It the CC are at 1.5gig then the TMU's are at 750mhz. The GTX285 TMU's is at 600mhz. I think the the CC target speed is 1.5gig but hearing it might be 1.2gig.

I see, not as good as I thought, but still better than just 64 TMU's at the core clock... maybe. Didnt that one article out of china say they were going with a 2:1 locked clock for core/shaders now though?
 
4 triangle setup units, and 16 fixed-function DX11 tessellation stages. @600Mhz, that would be 2.4 billion tris/second. Each tessellator can produce an amplification of up to 64X, and there are 32 ALUs per unit for domain shading. By doing the work now of finally parallelizing the last bits of fixed function pipeline hardware, it looks like have more future flexibility for scaling. It's almost true multicore now. It might be that the ratios of setup to tessellator to ALU could be revised for the future (e.g. 2 SMs per raster engine, or 8)

I actually read this as the exact opposite of Charlie. The hard work was splitting up the previously monolithic units. Adjusting where you locate these units and the ratio of them is easier. The idea that they can't scale this up in the future seems like FUD.
 
4 triangle setup units, and 16 fixed-function DX11 tessellation stages. @600Mhz, that would be 2.4 billion tris/second. Each tessellator can produce an amplification of up to 64X, and there are 32 ALUs per unit for domain shading. By doing the work now of finally parallelizing the last bits of fixed function pipeline hardware, it looks like have more future flexibility for scaling. It's almost true multicore now. It might be that the ratios of setup to tessellator to ALU could be revised for the future (e.g. 2 SMs per raster engine, or 8)

I actually read this as the exact opposite of Charlie. The hard work was splitting up the previously monolithic units. Adjusting where you locate these units and the ratio of them is easier. The idea that they can't scale this up in the future seems like FUD.

It reads like that scaling it down has been made extremely easy. They could use GF100 for the 448 and 512SP parts, maybe even a 384 part. redo the layout and use it for 256 and 192 binned parts, rearrange and then have 1 skew layout for 128 and a layout for a 64. Obviously the top to bottom would look like this

Single GPUs
Highend
512SP
448SP
384SP(Use this for dual GPU after you remove 128SP and rearrange the die package)
Midrange
256SP and 192SP
Mainstream and entry lvl
128SP and 64SP

Nvidia, please end this senseless need to only give entry lvl people 16SPs and call them gaming cards, let be reasonable for once, make the bottom end 64SPs.
 
Last edited by a moderator:
Interesting:

So, 64 texture units but they are effectively clocked twice as high. Not too shabby.

At least thats how I read it, they are clocked at the shader hotclock?
http://anandtech.com/video/showdoc.aspx?i=3721&p=2
"Finally, texture units are now tied to the shader clock and not the core clock. They run at 1/2 the shader clock speed of GF100."

I wonder how much more texture performance its going to have than gt200. can't be much if it is indeed half the hot clock and only 64.
 
It reads that scaling it down has been made extremely easy. They could use GF100 for the 448 and 512SP parts, maybe even a 384 part. redo the layout and use it for 256 and 192 binned parts, rearrange and then have 1 skew layout for 128 and a layout for a 64. Obviously the top to bottom would look like this

Single GPUs
Highend
512SP
448SP
384SP(Use this for dual GPU after you remove 128SP and rearrange the die package)
Midrange
256SP and 192SP
Mainstream and entry lvl
128SP and 64SP
yes, but don't they always say its going to be easy scaling it down?
 
i think charlie was saying that tessalation performace is tied directly to shader utilization so on lower parts tessalation could become a bottleneck where as on an ATI card the tessalation performace is constant across the whole product line.

Apart from that in general i agree it seems to be "easy" to scale each part as needed. apart from setup didn't AMD go down this path with R600 and then revert to a more fixed configuration for RV770? I wonder what the trade offs are.

edit: yes i know ATI had more fixed function parts.
 
this is just highly controlled info with no real independent reviewer data.

Which is what we've always known it was going to be. Reviewers can't have cards until Nvidia does ;) What they're trying to do is impressive but it's still gotta be put to the test. I'm really curious to know how they maintain triangle order across the chip with geometry processing so widely distributed.
 
Back
Top