I highly doubt much is going to be done about geometry performance for SI. I think it's far more likely that NI might do something with regards to that as it's supposed to feature more radical changes (when both are compared to Evergreen).
I wouldn't say so necessarily. It's just able to lessen a serial bottleneck which would greatly benefit also any TB(D)R.Conceptually, isn't distributed setup/raster another way of doing TBR. Or atleast a beginning of migration towards TBR?
I wouldn't say so necessarily. It's just able to lessen a serial bottleneck which would greatly benefit also any TB(D)R.
The setup-rasteriser architecture is explicitly large-triangle friendly, not small-triangle friendly. It needs a complete overhaul for future scaling. While it appears adequate for games - that's mostly because it's early days, I reckon. And lack of analysis.What I've been pondering on lately is the following:
Everyone seems to be assuming that AMD is going to rectify their geometry performance and best/match/come close to what Fermi is offering.
There are a few arguments that make me wonder if this is really a priority at AMDs.
First, they seemed to be quite content with the tessellation performance, when designing their top three chips with the same ff-hardware. And arguably it doesn't seem like a major bottleneck in currently shipping DX11 titles (yes, actual games that is).
I'm waiting for a decent analysis of the behaviour of Fermi architecture here.Second, Nvidia was pretty long rumored to be doing "soft-tessellation" implying rather lackluster performance. Now, obviously I don't know if AMD itself was misled by that also, but since even semiaccurate and before that the inquirer kept trumpeting how abysmal GF100 was to perform when faced with DX11 workloads, I'd consider the possibility at least.
That difference didn't really cost NVidia anything. I think that's partly because ATI rasterisation performance in games is poor these days and partly because NVidia had a Z rate advantage - though not necessarily being used very well.Third, Nvidias geometry performance wasn't really something to boast about before Fermi. IIRC before the new architecture, Nvidias chips were capable of 0.5 drawn triangles per clock, whereas at least higher end Radeons could achieve a theoretical ratio of 1.0. This also doesn't really point into the direction, that the Santa Clarans were about to invest really heavily into this area.
How big is the setup-rasteriser in GT200? Also, how much of that growth was caused by improved early-Z culling, screen-space tiling, etc.? In other words, 10% isn't very meaningfulFourth, according to Nvidia, the distributed tri-setup and raster grew the whole GF100 chip by 10 percent. Now, that's probably marketing, but I tend to believe that it wouldn't be quite as cheap as single-digit square millimeters to incorporate that feat. Talking of which, they seemed to be quite proud of having succeeded at all, so it's probably no minor task you can throw in into a largely defined chip.
Evergreen had features axed in order to launch on time. Sure, any chip gets features axed, in theory, to launch on time.The question I am asking is, how likely it would be that AMD is willing to invest major ressources into a feature today mainly used for Unigine, Stone Giant and some SDK samples. I really cannot assess how upcoming games are going to really stress tessellation performance, but out of the few currently available DX11 games, I think Battleforge and BF: Bad Company 2 don't use tessellation at all.
The setup-rasteriser architecture is explicitly large-triangle friendly, not small-triangle friendly.
Jawed
What a pointless waste of time. Prolly xxx is running this site.
It's very unlikely that existing FS do that, because you couldn't render with DP precision until very recently.Flight simulators need to use double precision because the size of the earth is large enough that single precision is only accurate to +- 1 meter or so at one earth radius from the origin. This is fine for mountains and such, but what about simulating a country airstrip, where the runway is almost-but-not-quite level? A bump that's merely an inch high is quite noticeable when you're rolling over it at 100 MPH during takeoff...
Patents which I've linked and discussed + very strong recommendations from AMD devrel not to tessellate below 8-fragment triangles + the stunningly awful performance in non-game tests.What exactly are you basing this on?
Tell me, did we leave "the early days" of DX10's Geometry Shader? (I don't have to spell out the analogies here, have i?).The setup-rasteriser architecture is explicitly large-triangle friendly, not small-triangle friendly. It needs a complete overhaul for future scaling. While it appears adequate for games - that's mostly because it's early days, I reckon. And lack of analysis.
Apparently, people associate bad-ass slowliness with "in software" and that's what i was talking about - not whether or not Fermi might use transistors for other stuff than the tessellation stage.I'm waiting for a decent analysis of the behaviour of Fermi architecture here.
Frankly, I have no idea what you're saying here. Which difference? Whose raster perf is poor? And which z-rates are poorly utilized?That difference didn't really cost NVidia anything. I think that's partly because rasterisation performance in games is poor these days and partly because NVidia had a Z rate advantage - though not necessarily being used very well.
Since only multiplying setup/rasterizer doesn't get you anywhere if you do not also reinforce the necessary infrastructure… And to do it properly, you'll have to walk the painful way, I guess.How big is the setup-rasteriser in GT200? Also, how much of that growth was caused by improved early-Z culling, screen-space tiling, etc.? In other words, 10% isn't very meaningful
You forgot one very important key change: The number of units. Granted, it's a rather obvious thing, but if you have a performance, cost and yield target, you also have to factor in, exactly how many of the engineer's dreams you can incorporate into the new design in order to meet these goals.I think there are 4 key areas of change in R700->Evergreen:[/COLOR]
It's Egg Chicken Circle.Most MatLab users are students who run it on laptops. How many MatLab users are free to both buy and configure their systems as they please, and of those who do (I do) how many would choose to configure it with a top-end videocard (I wouldn't)? And how do those numbers compare to the number of people buying graphics cards to play games?
It makes sense to optimize your product to fit your market - it allows higher performance/lower power draw at lower cost, benefiting customers while still allowing healthier margins and greater market flexibility.
thread generation is still bound by rasterisation rate it seems
Jawed
69k assuming an average die size mix of ~250mm2 and 90% average yield.
78k assuming an average die size mix of ~250mm2 and 80% average yield.
90k assuming an average die size mix of ~250mm2 and 70% average yield.
104k assuming an average die size mix of ~250mm2 and 60% average yield.
125k assuming an average die size mix of ~250mm2 and 50% average yield.
156k assuming an average die size mix of ~250mm2 and 40% average yield.
207k assuming an average die size mix of ~250mm2 and 30% average yield.
So pick a yield and scale by what you assume the average die size is. I've already factored in a 10% wafer trim factor (basically unusable space on the wafer which is a combination of geometric issues and min spacing issues).
Not sure what NVIDIA would know about the "RV790" size / approach. The simple fact of the matter is that NVIDIA had to make a larger change to their architecture to support Tessellation simply because they didn't have it before, whereas it has been ingrained in our desigsn for multiple generations.Anyways, I had the same question and they said: 10% more compared to an approach analogue to GT200/RV790.
I'm not really sure(Don't have any real knowledge in that), but does Video Encoders use DP or do they us SP?
Video decode/encode is almost exclusively integer ops based. And 8bit and 16bit ops dominate there.I'm not really sure(Don't have any real knowledge in that), but does Video Encoders use DP or do they us SP?
Patents which I've linked and discussed + very strong recommendations from AMD devrel not to tessellate below 8-fragment triangles + the stunningly awful performance in non-game tests.
The instant a triangle falls below 4 quads in size and occupies only one screen space tile, one rasteriser is idle. This kills performance on z-prepass and shadow buffer rendering.
Tessellation performance?
http://www.hitechlegion.com/reviews/video-cards/4742-evga-geforce-gtx-460-768mb-video-card?start=18
http://www.hitechlegion.com/reviews/video-cards/3177?start=17
See the SubD11 sample result at the bottom of those pages. This is something that AMD demonstrated running on Juniper over 1 year ago. I bet the guys at NVidia had a chuckle when they saw the performance.
NVidia's water demo is comparatively kind
Jawed
Yes, of course. And of course their 0.5-triangles-drawn-per-clock-approach would have made them look REALLY bad in tessellated workloads. OTOH, their knowledge of your chips is - IMHO - second only to your own (and vice versa). Both companies should have tools for analyzing chips normal people like us can only dream of.Not sure what NVIDIA would know about the "RV790" size / approach. The simple fact of the matter is that NVIDIA had to make a larger change to their architecture to support Tessellation simply because they didn't have it before, whereas it has been ingrained in our desigsn for multiple generations.