NVIDIA GF100 & Friends speculation

As long as you don't want D3D11 compute, yeah.

Not to mention PhysX, Photoshop, transcoding, etc. Disabling or removing DP is fine, but I'm not sure NVIDIA would want that much differentiation between GF100 and the derivatives in terms of general compute performance.
 
nApoleon at chiphell just posted


....interesting the die size at below 300mm2 seems to contradict cfcnc and edison previously saying it was somewhere in the 65nm G92 region or a bit above(ie 330-350mm2)

Re the rose colored glasses comment previously: that was in relation to the reason the chip is initially going up against the 5830 in cutdown form is to protect sales of GTX470...would have to be on drugs to believe that...the chip is initally starting in crippled form principally to get inital yields up to a money making level, not enough full versions can be made yet.

If GF104 is in fact 384SPs, and around Cypress size, maybe they could do a GF104x2 with it? Maybe it was the plan after all, and it was delayed...
 
If GF104 is in fact 384SPs, and around Cypress size, maybe they could do a GF104x2 with it? Maybe it was the plan after all, and it was delayed...

riddle me this, if GF100 has 512 SP's and only 480 survive, then GF104 has how many SP's left if the original design has 384?

another factor is that it also isn't allowed to get to close the 470.
 
riddle me this, if GF100 has 512 SP's and only 480 survive, then GF104 has how many SP's left if the original design has 384?

another factor is that it also isn't allowed to get to close the 470.

Well that depends on the reason why only 480 survived from the 512. If it was yields, and the chip is smaller, why would it repeat on GF104? But if you know why only 480 out of 512 survived, please enlighten me :D

About not being allowed to be close to 470, is that a hint? :p
I mean, could it be close to it? Is the chip that good? If yes, then my speculation some posts before about it being the new 8800GT, rather then the 9600GT, might have ground? :D
 
I would not be surprised if the GF104 retail products have only 256SPs and a power efficiency deficit versus comparably specced cards.
 
I would not be surprised if the GF104 retail products have only 256SPs and a power efficiency deficit versus comparably specced cards.

GF100 has 128TMUs, right?
Can you explain why nVidia would not sell a 3xxmm^2 die with 8 memory chips, less power consumption and the speed of a GTX470 for 349$?
 
The 384SP line has GTX465, which is GF100 cut down, not GF104

That seems to be wrong in two magnitudes:

1- GTX465 seems to have 352 SP, not 384;
2- One page back there is a link and translation where it says GF104 is a 384 SPs chip, with 2 model versions: 256 SP/192 bit and 384 SP/256 bit.
 
If the DP is implemented the way Aaron suggested a while back then it could only make a notable difference to die size if the int MUL is also deleted. Doing that would hurt compute stuff.
Jawed
Who needs int mul in compute, especially in gf100's amounts?
 
GF100 has 128TMUs, right?
Can you explain why nVidia would not sell a 3xxmm^2 die with 8 memory chips, less power consumption and the speed of a GTX470 for 349$?

I think his keyword is "RETAIL". IMO, he is implying that GF104 will be castrated, because of line up reasons (dont compete with GTX470). That way its performance per watt will be lower, than it would be if not castrated.
 
GF104 will have no DP and missing most (if not all) of GF100's cache structure. In that sense it's more like GT212 + DX11 than a GF100 variant.
Thanks, i put it in my sig to show precision of your "sources" next time:smile:
My point that deletion of caches in gf104 is no more then ridiculous rumor, because it as i suppose play role of FIFO's between some stages of pipeline and replace ex-shader_buffer, it just cannot be deleted without radicall changes of architecture itself:LOL:
 
Thanks, i put it in my sig to show precision of your "sources" next time:smile:
My point that deletion of caches in gf104 is no more then ridiculous rumor, because it as i suppose play role of FIFO's between some stages of pipeline and replace ex-shader_buffer, it just cannot be deleted without radicall changes of architecture itself:LOL:

Well thats one more reason to justify his affirmation, isnt it? Hes clearly saying that it looks more like GT212 + DX11 than a GF100 variant. Nowhere is he saying its still GF100 based, but without caches.
 
Who needs int mul in compute, especially in gf100's amounts?
You need it for addressing. Something that compute kernels tend to do.

A simple % and / combination on ATI:

Code:
il_cs_2_0
dcl_cb cb0[1]
udiv r1.x, cb0[0].x, cb0[0].y
umod r1.y, cb0[0].x, cb0[0].y
mov r0.x, cb0[0].w
mov g[r0.x], r1
end

results in quite a few MULs:

Code:
00 ALU: ADDR(32) CNT(27) KCACHE0(CB0:0-15) 
      0  x: LSHL        R1.x,  KC0[0].w,  (0x00000002, 2.802596929e-45f).x      
         t: RCP_UINT__EG  T0.y,  KC0[0].y      
      1  t: MULLO_UINT  T0.w,  KC0[0].y,  PS0      
      2  x: SUB_INT     ____,  0.0f,  PS1      
         t: MULHI_UINT  T0.z,  KC0[0].y,  T0.y      
      3  y: CNDE_INT    ____,  PS2,  PV2.x,  T0.w      
      4  t: MULHI_UINT  ____,  PV3.y,  T0.y      
      5  x: ADD_INT     ____,  T0.y,  PS4      
         z: SUB_INT     ____,  T0.y,  PS4      
      6  y: CNDE_INT    ____,  T0.z,  PV5.x,  PV5.z      
      7  t: MULHI_UINT  T0.w,  PV6.y,  KC0[0].x      
      8  x: ADD_INT     T1.x,  -1,  PS7      
         w: ADD_INT     T1.w,  PS7,  1      
         t: MULLO_UINT  ____,  PS7,  KC0[0].y      
      9  x: SUB_INT     T0.x,  KC0[0].x,  PS8      
         y: SETGE_UINT  T1.y,  KC0[0].x,  PS8      
     10  y: SUB_INT     T0.y,  PV9.x,  KC0[0].y      
         z: SETGE_UINT  ____,  PV9.x,  KC0[0].y      
     11  x: AND_INT     ____,  T1.y,  PV10.z      
     12  x: CNDE_INT    T0.x,  PV11.x,  T0.x,  T0.y      
         y: CNDE_INT    ____,  PV11.x,  T0.w,  T1.w      
     13  z: CNDE_INT    ____,  T1.y,  T1.x,  PV12.y      
         w: ADD_INT     ____,  KC0[0].y,  PV12.x      
     14  x: CNDE_INT    R0.x,  KC0[0].y,  -1,  PV13.z      
         z: CNDE_INT    ____,  T1.y,  PV13.w,  T0.x      
     15  y: CNDE_INT    R0.y,  KC0[0].y,  -1,  PV14.z      
01 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R1.x], R0, ELEM_SIZE(3)  VPM 
END_OF_PROGRAM
I have no idea how this sort of thing compiles on NVidia at the machine level.

Simple 2D array addressing requires an int MUL.

Clearly if the programmer can guarantee that only 24-bit MULs are required, it's less stressful.

Throughput of 32-bit MULs doesn't have to be particularly high of course, so then it's a question of emulating 32-bit MUL with 24-bit MUL or finding some other cheap way to do it if the DP is removed.

There is stuff that's DP-only, e.g. sub-normal capable adder. So there are degrees of what could be cut in disabling DP.

Jawed
 
Which would fit nicely with the goals of the GF104, IMHO.
Not really, because they also make cut-down versions of their professional products. Or rather, they make professional versions of their cut-down products as well as the high-end.
 
Well thats one more reason to justify his affirmation, isnt it? Hes clearly saying that it looks more like GT212 + DX11 than a GF100 variant. Nowhere is he saying its still GF100 based, but without caches.
That is completely ridiculous. This would, if it were true, have nVidia designing not one, but two DX11 architectures, instead of just designing one DX11 architecture and scaling it down.
 
Well thats one more reason to justify his affirmation, isnt it? Hes clearly saying that it looks more like GT212 + DX11 than a GF100 variant. Nowhere is he saying its still GF100 based, but without caches.

Why would nVidia made two different DX11 designs? Fermi is the basis of further chips.
He has no real information. I'm still waiting for his claim that GF100 will not available after launch for months.

Oh, you haven't heard? the chip is going back to TSMC after the "launch" for it's first full re-spin!

Hence the no-to-limited availability etc. until June/July. It also is the basis for Charlies "handful" of chips for partners, A3 will be here for launch and B1 will be the "shipping" product.
 
He has no real information. I'm still waiting for his calm that GF100 will not available after launch for months.
It is available, but if the rumors and analyst speculations are correct, there's less chips out there total now than ATI had Cypresses on launch week, the demand just isn't as high
 
You need it for addressing. Something that compute kernels tend to do.

A simple % and / combination on ATI:

Code:
il_cs_2_0
dcl_cb cb0[1]
udiv r1.x, cb0[0].x, cb0[0].y
umod r1.y, cb0[0].x, cb0[0].y
mov r0.x, cb0[0].w
mov g[r0.x], r1
end

results in quite a few MULs:

Code:
00 ALU: ADDR(32) CNT(27) KCACHE0(CB0:0-15) 
      0  x: LSHL        R1.x,  KC0[0].w,  (0x00000002, 2.802596929e-45f).x      
         t: RCP_UINT__EG  T0.y,  KC0[0].y      
      1  t: MULLO_UINT  T0.w,  KC0[0].y,  PS0      
      2  x: SUB_INT     ____,  0.0f,  PS1      
         t: MULHI_UINT  T0.z,  KC0[0].y,  T0.y      
      3  y: CNDE_INT    ____,  PS2,  PV2.x,  T0.w      
      4  t: MULHI_UINT  ____,  PV3.y,  T0.y      
      5  x: ADD_INT     ____,  T0.y,  PS4      
         z: SUB_INT     ____,  T0.y,  PS4      
      6  y: CNDE_INT    ____,  T0.z,  PV5.x,  PV5.z      
      7  t: MULHI_UINT  T0.w,  PV6.y,  KC0[0].x      
      8  x: ADD_INT     T1.x,  -1,  PS7      
         w: ADD_INT     T1.w,  PS7,  1      
         t: MULLO_UINT  ____,  PS7,  KC0[0].y      
      9  x: SUB_INT     T0.x,  KC0[0].x,  PS8      
         y: SETGE_UINT  T1.y,  KC0[0].x,  PS8      
     10  y: SUB_INT     T0.y,  PV9.x,  KC0[0].y      
         z: SETGE_UINT  ____,  PV9.x,  KC0[0].y      
     11  x: AND_INT     ____,  T1.y,  PV10.z      
     12  x: CNDE_INT    T0.x,  PV11.x,  T0.x,  T0.y      
         y: CNDE_INT    ____,  PV11.x,  T0.w,  T1.w      
     13  z: CNDE_INT    ____,  T1.y,  T1.x,  PV12.y      
         w: ADD_INT     ____,  KC0[0].y,  PV12.x      
     14  x: CNDE_INT    R0.x,  KC0[0].y,  -1,  PV13.z      
         z: CNDE_INT    ____,  T1.y,  PV13.w,  T0.x      
     15  y: CNDE_INT    R0.y,  KC0[0].y,  -1,  PV14.z      
01 MEM_EXPORT_WRITE_IND: DWORD_PTR[0+R1.x], R0, ELEM_SIZE(3)  VPM 
END_OF_PROGRAM
I have no idea how this sort of thing compiles on NVidia at the machine level.
One 32 bit int mul is doable in 4 24bit int mul's. That's not the point. The point is that even in dxcs, the ratio of int mul to spfp mul is to low to justify the matched spfp:int mul rate.

Broadly speaking, int32 mul will be needed for load/store operations, while spfp for alu ops. So spfp:int should be higher than 1:1.
 
Back
Top