Plus as Big-K special sauce:
- one dedicated physx processor per SMK
- a broken and unfixable design
*SCNR*
Don't be so mean ....do you want fries with that?
Plus as Big-K special sauce:
- one dedicated physx processor per SMK
- a broken and unfixable design
*SCNR*
Well there's this set of rumors/speculation from 3DCenter (translated) saying 3072 CCs, so that would use lots of transistors. According to that rumor, GK110 seems close to an overall doubled GK104 in terms of basic specs.
First way I read it:3072 ALUs
-> 6x GPCs (à 512 SPs)
--> 4 SMK to each GPC, 128 ALUs/SMK
--> each SMK has
---> 4 groups of 32 ALUs
----> two groups share a quad TMU
GF114 wasn't anywhere close to as stripped from GPGPU capabilities as GK104 is.
Again at a much higher die area AND transistor count difference between performance (GK104) and high end (GK110). I don't know if my math is broken but I see a huge difference between 1.95/3.0b (365/530mm2) and 3.54/7.0b (294/550mm2) but it's obviously just me.There's far more things GK110 needs to add over GK104 than 110 had over 114 just to for the GPGPU speed
Disregarding the DP performance, there must be some "special sauce" GK104 is missing, since in so many cases it's getting beaten left and right by 580 on SP workloads too, despite matching or beating 580 on most if not all theoretical meters
The non-GPGPU professional segment really likes more polygons. The typical professional GPU app uses very simple shading, but extremely complex geometry. So the distributed geometry might be there exactly because they want to scale it way up in the high-end chip.Does it make sense to scale up the number of polymorph units (which seems tied to the SMs), rasterizers (tied to the number of GPCs), or TMUs versus GK104?
What about the same 4 GPC, 128 TMU, 8 SMX setup, but with 256 ALUs per SMX - something like 4x (1 scheduler, 1 vec32 SP ALU, 1 vec32 DP ALU, 1 vec32 LD/ST unit)?
Probably inner bandwidth is also a factor I guess?Cache!
Probably inner bandwidth is also a factor I guess?
L2 cache? GK104 is definitely not BW starved in there.
I doubt -- since the RF is doubled, the L1 size should be less of a problem. Sharing data will be a tight job, for those kernels that rely more on the LDS, though....but my guess is that L1 and registers are the main culprits.
I doubt -- since the RF is doubled, the L1 size should be less of a problem. Sharing data will be a tight job, for those kernels that rely more on the LDS, though.
[my bold]Does it make sense to scale up the number of polymorph units (which seems tied to the SMs), rasterizers (tied to the number of GPCs), or TMUs versus GK104?
Disregarding the DP performance, there must be some "special sauce" GK104 is missing, since in so many cases it's getting beaten left and right by 580 on SP workloads too, despite matching or beating 580 on most if not all theoretical meters
That isn't true.CarstenS said:Absolutely, since they're absolutely free in terms of transistor count being only names for marketing slides.