NVIDIA GF100 & Friends speculation

Or is moving data into/out of a central geometry hub not a big deal assuming all the cache bits are in place?
That's the key question. It's why I raised GDS, as all SIMDs share that for communicating with TS, as far as I can tell. So is GDS a bottleneck? Is centralised TS the bottleneck? Would centralised parallel setup remain a bottleneck?

Dunno.

One of the other questions that Heaven raises and Ethatron mentions, something I can't work out, is what's the impact on pixel shading efficiency due to sub-pixel triangles? As resolution increases less and less triangles are sub-pixel sized (assuming Heaven just hits "max tessellation" without taking account of pixels-per-triangle, which seems to be the case). Increasing resolution actually increases efficiency of pixel shading (or if you prefer, lowers the relative pain of zero-fragment triangles).

Well, I can't work out what's actually happening. I wonder if the NVidia GDC10 Booth presentation that the Unigine guys are giving will shed any light.

Jawed
 
80% of TSMC's 40 nm production will be amazing to NV. This seems to coincide with the rumour that NV lean on it to dilute the advantage of AMD RV870 family.

Tegra 2 and ION2 are 40nm TSMC too. I can imagine that the vaste majority from the 80% is Tegra2 and ION2 ;).
 
So how do things work at TSMC? A company says "i want 80% of your lines tied up from May - July" and they just go "OK!" without any regards for other customers?
 
I found the claim of 80% of TSMCs 40nm production being NV absolutely ridicilous, considering that not only ATI & nV take parts of the capacity, and that it's just absolutely impossible that ATI would have full lineups of mobile and desktop chips being produced at sufficient quantities (perhaps excluding Cypress itself) with under 20% of the capacity.
 
It's not perfectly related to sub-pixel polys/high overdraw induced by tessellation.

If it were an efficiency loss, framerate would drop as you lower resolution substantially, which I checked is not the case all the way down to 320x200, framerate increases even if it's minimal (when using a GPU capable of 200fps and making it stall for 30ms, halving effective rendering time won't have much of an impact, basically 35ms -> 32.5ms total frame time).

There's an enormous overhead on Evergreen GPUs when tessellation is On, but apparently entirely due to poor dispatch... which could explain why GF100 has no less than 1 tessellation unit per SM and why LDS conflicts appear on Cypress.

I think a workaround for this behaviour could be to dedicate some of the SIMDs to tessellation as that should avoid conflicts.

Hopefully, AlexV or someone else will be able to dig more into it.

But when we render to let's say a 1x1 output, doesn't the architecture become as slow as the slowest participating component? This 1 pixel has do be gone over 100k times in a strict deterministic order, which makes it for at least one component serial in nature, removes any possible parallel exploit.
I also see a lot of oportunity to develop different unique approaches to face that problem (let's say with Catalyst AI on and restrictions of order are thrown away). The AA case is I think even more attractive for a custom technique.
The difference between Tess on/off is that this approach can not be invoked in front of the process (entering the pipelines) but must reorganize the hardware in the middle of it being in full steam.

Maybe I sound naiive, it's just I try to learn from all of your expertise, and I like pathological (worst) cases to understand the implications. Black-box reverse engineering never was easy. :)
 
How many times we have seen such a redesign like GF100? New shader cores, cache hierarchy with L2 and L1 cache, unified r/w L2 cache, new front-end with 4 setup-engines, new geometry setup (OoO execution), TMUs in the shader core, every SM has a dual dispatcher...
That's a big step for nVidia.

I won't argue with that and it's interesting if and how the OoO execution was due to some delay. The changes are really big but my point was rather that each new architecture implements redesign and overhaul in combination with innovation. It's not a question of either new architecture or overhaul as it seemed to be discussed now.

Boring point though. I know.
 
So, if you think that nothing is a new architecture, you definitely think that RV670 -> RV770 isn't a new architecture correct ?

Which was my point. Not sure why you quoted me, when you were somewhat agreeing with me :)

On the contrary.

See post above.

I rather think new architecture implements redesign and new solutions I was just being picky with the either/or use which in my mind is wrong.

But that may be just me.
 
I found the claim of 80% of TSMCs 40nm production being NV absolutely ridicilous, considering that not only ATI & nV take parts of the capacity, and that it's just absolutely impossible that ATI would have full lineups of mobile and desktop chips being produced at sufficient quantities (perhaps excluding Cypress itself) with under 20% of the capacity.

It's not just ridiculous, it's ludicrous.
 
It was the banner that got me! :LOL:

br.jpg


Shiny sparkly! :D
 
Unlike, say, the G84, it's gonna live forever!

But it still only comes with a one year warranty. :D

Btw the Fame lyrics are scarily apt in this circumstance.

Baby look at me
And tell me what you see
You ain't seen the best of me yet
Give me time I'll make you forget the rest

I got more in me
And you can set it free
I can catch the moon in my hands
Don't you know who I am

Remeber my name
Fermi

I'm gonna live forever
I'm gonna learn how to fly
High

I feel it coming together
People will see me and cry
Fermi

I'm gonna make it to heaven
Light up the sky like a flame
Fermi

I'm gonna live forever
Baby remember my name

Remember
Remember
Remember
Remember
 
Back
Top