No GPGPU focus for GK104?I wonder why they left the LDS/L1 combo size inchanged?
Cleaned for thou geekish pleasure :
The command processor in the middle is strangely smeared. Looks like NV doesn't want anyone to peek in the gigathread factory.
Fermi packed everything, not part of the SMs, right into the middle section of the die. That was a major reason for the dense wire clutering, that plagued the Fermi design. Now, in Kepler we see a pretty radical re-distribution of all the blocks away from each other. Setup and thread dispatching, memory controllers & ROPs and misc. I/O are all plased apart, each in its own area.The bar on the right edge is strange. Can't be ROPs or mem controller as the pads are on the other edges and there's no point in isolating them here. The layout is fermi'ish so it has to be something new. PhysX hw.
PS:
If they didn't have a similar mistake in that slides as during the Fermi presentation, the total register space is the same as with GF100/GF110 (2 MB), so really tiny compared to Tahiti (8 MB). I have a hard time believing that number, considering the similarity of the ALU count of GK104 and Tahiti. I would expect double the value given in that slide (4 MB), i.e. 512 kB per SMX or 128kB per Scheduler.
But also the local memory/L1 is quite small (still 64kB) considering how many threads/workgroups on one SMX have to share it.
That is an 8 way scheduler (or dual 4 way) . My last paragraph with the "quarter warps" was describing exactly this.It would basically result in a scheduling with a granularity of just 8 work items.
I thought most games and desktop are using a HW cursor, that is: rendered independently from the main frame buffer. So you could still move the cursor at full frame rate while rendering the frame at half.jimbo75 said:That's pretty much exactly as I see it as well. Don't get me wrong - if it is optional and it works then I think it's fair enough for Nvidia to be exploring these options. But yeah...turbo and "lower fps side screens" makes me wonder a lot about how it's gonna make them look better in benchmarks.
Interesting point on the mouse, I guess they'd need the cursor to be decoupled from fps in most games for that to work ok.
@rpg14 - yeah on the face of it I agree. Regarding the registers only... off the top of my head, the only thing that comes to mind is that if that register file caching mechanism Dally et al. wrote about isn't really a cache, but is compiler visible (which I believe is one setup they explored in the paper), then perhaps it can be used to reduce the number of registers required in the main RF for typical kernels.
Shouldn't lower frame rate be fine for a racing game? Is it really more distracting to have half the rate on the side than lower frame rates across the board?
Probably worthy of healthy experimentation.
I seriously doubt board power is determined by profiling.I can only think of some sort of game profiling for 3Dmark11.
It was only broken in Charlies and his disciples minds.So it looks like Kepler is still very much Fermi just with things laid out a little differently and a DDR speed increase. The unfixable wasn't all that broken after all.
Binning?For those wishing for a 7970 1ghz (1.2 ghz) edition to compete against the GTX680 I really don't see the purpose.
Lets say AMD produces a factory overclocked 1.2 ghz 7970 to outperform the GTX680.
Wouldn't the TPD of 250 watts have to be raised?
If so to what a number?
If its 300 watts isn't that dual 8 pin power then.
Now what about pricing?
It would have to be higher than the current $549 and that means even lower number of units sold.
And since AMD owners claim they could over-clock the 7970 to that same 1.2 ghz why would they ever buy the factory over clocked 1.2 ghz version.
And what about the thermals that a 1.2 ghz card would produce. That would mean more/bigger/faster fans and lots of noise.
I seriously doubt board power is determined by profiling.
For it to switch so quickly it really must be hardware based.
It was only broken in Charlies and his disciples minds.
For those wishing for a 7970 1ghz (1.2 ghz) edition to compete against the GTX680 I really don't see the purpose.
-----------------
Lets say AMD produces a factory overclocked 1.2 ghz 7970 to outperform the GTX680.
Wouldn't the TPD of 250 watts have to be raised?
If so to what a number?
If its 300 watts isn't that dual 8 pin power then.
Now what about pricing?
It would have to be higher than the current $549 and that means even lower number of units sold.
And since AMD owners claim they could over-clock the 7970 to that same 1.2 ghz why would they ever buy the factory over clocked 1.2 ghz version.
And what about the thermals that a 1.2 ghz card would produce. That would mean more/bigger/faster fans and lots of noise.
And after all this all Nvidia has to do is either release a GTX685 with higher clocks or release the upcoming GK110.
So all in all I do not see AMD releasing a factory over clocked 7970.
I'm seeing this as orthogonal to vsync: left and right at 1/2 speed by frame doubling. I know nothing about how our brain would deal with this and I take your word for it that it would probably be terrible. But I'd still like to see an actual realization of the idea to make sure. Sounds like a nice student project for a Unreal engine hack.Silent_Buddha said:And god forbid if you drop down to 30 FPS and the side monitors were suddenly chugging along at 15 FPS. It's not like there'd be the option to have the side monitors suddenly match the primary monitor in framerate as supposedly they'd be doing this to try to maintain a high framerate on the primary monitor.
In this context, 'binning' is the opposite from 'not a higher price'. Yes, it's a die with the same logic and area, but binning is by its very definition excluding inadequate dies from the total pool. How could that not lead to price differences? (Yes, I leave the door open of keeping the top price the same and lowering the price of the rejects, but that means that AMD would make less money.)kalelovil said:... should be possible using binning. ...
I don't see why it would have to be higher priced.
Please tell exactly how many of these magic 7970 dies can be produced vs the standard 7970 dies for each wafer?Lowering the voltage by 5-7% while raising the clock to ~1075MHz should be possible using binning.