NVIDIA Maxwell Speculation Thread

Isn't this a curious time to be investing in tiling with as a bandwidth saving measure with on-package/stacked DRAMs on the horizon?

They have a large cache, probably mostly for GPGPU, so they may as well use it.

In any case, I suspect the tile based part is more dealing with small numbers of triangles at a time - if they can merge these in the rasterizer stage, then they can get better pixel shader efficiency, especially for highly tessellated geometry.
 
On-package memory is a nice one-time boost. But the laws of physics unfortunately make transferring data expensive relative to computing on it. Even with on-package memory, we need all the bandwidth savings we can get.


Why not do both?

That seems like a win-win.


Yeah it's all good. Just seems like something they should've done a long time ago. Maybe Arun's comment on L2 cache size requirements has some merit.

I'm pretty cynical in general though when it comes to developers harnessing hardware capabilities to produce better looking games. Hopefully all of these efficiency gains are useful for more than just running mediocre looking games at 4K resolutions.

Tiling doesn't suck only when NV uses it :p

Who said tiling sucked? :D
 
Who said tiling sucked? :D

I'm not sure if I can still find the ancient footnotes for the ULP GeForce in Tegra that stated that for anything up to DX9 tiling is fine, but for anything above that sub-optimal. My former post obviously was a joke and NV's notes might have had some merits (there are probably countless of ways to "tile" to start with), but in hindsight I'd love to know what GK20A exactly does in K1 in that regard :devilish:
 
That's probably true for fully deferred tilers. They don't do so well with more complex geometry. Granted triangle counts don't seem to be increasing dramatically even with the advent of tessellation (concrete barriers aside).
 
That's probably true for fully deferred tilers. They don't do so well with more complex geometry. Granted triangle counts don't seem to be increasing dramatically even with the advent of tessellation (concrete barriers aside).

TBDRs will have to prove otherwise; however it was rather meant in the consensus that the majority of other ULP SoC GPU solutions were tile based while anything up to then on Tegra's roadmap probably wasn't.

In hindsight once you start binning geometry even in small portions, there's buffering going on for it. You can't obviously just fill the bucket, you'd also have to empty it once in a while. And that constant geometry buffering that goes on in the background for all sorts of tilers could pose a problem or that's what NV seemed to imply back then.

So now they probably have a tiny bucket which doesn't get filled all that often, but they've implemented part of something that was deemed to be problematic? :???:
 
but they've implemented part of something that was deemed to be problematic? :???:

Personally, I'm happy when someone changes their mind and sees things my way. This happens all the time in technology. Especially over long time periods - I haven't read the quote your referring to, but I imagine it is pretty old.

People and institutions change their mind all the time. It's a good thing.
 
That's probably true for fully deferred tilers. They don't do so well with more complex geometry. Granted triangle counts don't seem to be increasing dramatically even with the advent of tessellation (concrete barriers aside).

Tessellation doesn't increase geometry stream out for TBDR's.
 
You can't obviously just fill the bucket, you'd also have to empty it once in a while.


My understanding is you have to submit all geometry and fill the bucket completely in one pass for a true TBDR implementation to work. How else will you do accurate HSR?

Tessellation doesn't increase geometry stream out for TBDR's.


Doesn't all geometry amplification and displacement have to be completed before the final HSR pass?
 
My understanding is you have to submit all geometry and fill the bucket completely in one pass for a true TBDR implementation to work. How else will you do accurate HSR?




Doesn't all geometry amplification and displacement have to be completed before the final HSR pass?
Tilers have a limit to how much geometry they can buffer so while the ideal implementation will bucket an entire frame this doesn't happen with a lot of geometry. Thus HSR might not be perfect.

You also don't need to bin after tessellation if you want to minimize geometry storage and can tolerate less accurate HSR. Another cost to binning before tessellation is patches will often cover multiple bins and you might need to re-tessellate. If you don't have an efficient way to tessellate part of a patch there will be a lot of duplicated work.

Edit: I forgot you mentioned displacement. That would make HSR prior to tessellation difficult. You could however sort during binning so z-buffering is likely to throw out a lot of the work.
 
64 ROPs? Interesting. A typo or some crazy tiling architecture that relies heavily on L2? Fancy compression like Tonga?

Might be right. It's the same proportions of shaders,tmus, and rops as the 750 Ti.

Now the question is will they be able to take it to 1GHz like the 750 Ti or if it's closer to 900MHz.
 
Might be right. It's the same proportions of shaders,tmus, and rops as the 750 Ti.

Now the question is will they be able to take it to 1GHz like the 750 Ti or if it's closer to 900MHz.

It's probably not right. It's probably 32 ROPs. A 256-bit bus would be double the memory controllers of GM107's 128. So unless Nvidia reworked the memory controllers, and for the first time in ages, changed the corresponding number of ROP's they have per controller, then it's likely 32 ROP's. The TMU's and core count is likely accurate though, because GK104 and GF114 were both 4x the core and texture units over their respective smaller brothers.
 
I highly suspect that they have use any rumored numbers who was float around. ( need maybe check the rumor spec or "Possible " spec on Computerbase.de article or Videocardz. ) Outside ROPs the other spec look in line with what we have heard so far.

But well, the launch is not so far now.. ( 18/09 ).
 
Just a clueless retailer that needed something to fill the lines.

I hope you're wrong :D

It's probably not right. It's probably 32 ROPs. A 256-bit bus would be double the memory controllers of GM107's 128. So unless Nvidia reworked the memory controllers, and for the first time in ages, changed the corresponding number of ROP's they have per controller, then it's likely 32 ROP's. The TMU's and core count is likely accurate though, because GK104 and GF114 were both 4x the core and texture units over their respective smaller brothers.

Any chances that 2nd generation of Maxwell has this kind of evolution on memory controller from gm107?
 
Last edited by a moderator:
Back
Top