NVIDIA GF100 & Friends speculation

Would this be taking into account that the rasterizers would be running at 1/2 shader clock, while the L2/ROPs (according to Anandtec) run in their own domain, possibly the remnant of the non-hot clock domain of GT200?

If Nvidia had hit the high end of its target clocks, the throughput of the rasterizers would have matched up more closely with the 48 ROPs, if they stayed at the rather sedate global clocks of the preceding architectures.
Aren't we looking at less than 20% here due to clocks?

Perhaps the overhead for keeping 4 rasterizer blocks properly consistent is too high if they had used something wider than 2x4.
I'm guessing that's a rasterisation granularity issue. I'm guessing that with 2x16 rasterisation an entire hardware thread is populated with fragments for "one triangle" even if the triangle only occupies 4 fragments - though I was under the impression, historically, that NVidia didn't have that problem and could pack multiple triangles' fragments into a hardware thread (respecting pixel quad boundaries). Can I be bothered to rummage through patents...

Of all the hardware costs possible, I wouldn't think the cost of having of just having additional scan/raster units per block would be as significant as other things Nvidia has splurged on.
For what it's worth, the architecture really looks to me a lot like 4 GPUs that just happen to have a common command processor, L2, ROPs and memory and general gubbins.

Jawed
 
The "serial" 5870 also has 2 rasterizers, which can operate in parallel for non-overlapping tris (should be rather common when tess is turned on) fed by a single setup unit.
I can't work out HD5870's setup/rasterisation configuration. If anything it would appear that it only accelerates triangles that span screen-space tile boundaries - quite the opposite from being good for small triangles produced by tessellation :???:

Jawed
 
The whole discussion about buying a GF100 card right now.
Everybody has two options: Waiting for GF100 or to buy a new card in the next days. After the announcement of the "graphics side" of GF100 i will wait. But that's only my opinion.
But it's sounding like waiting for a GF100 will be a lot longer than "the next days"...much more like "the next months". ;)
 
How? If DS is a bottleneck, wouldn't that leave a lot of setup throughput unused as too little tris are coming into to the setup pipe?
Of course.

I was responding to the misapprehension here:

Hang on, isn't triangle setup outside both DS and TS. Higher tess factors will surely increase time spent in both. So I can't see how DS or TS affect tri-setup.
It's quite clear that DS/TS can affect setup, i.e. if either is the bottlneck then setup isn't.

Jawed
 
Maybe this card is designed for a future of micropolygon rendering? Seems like even with 2 giant 30" displays this card could handle about 10 polygons/pixel @ 30fps if I did my math right.
Nope, you did't your math right because it's just theoretical polys. Did you see Froblins demo? in papers about it states of 8 million polys in frame, at first just look at wireframe of demo you did't see fully whyte screen there;), then let count how much of them need to meet poor TS, 850mln trys per sec/8mln trys per frame=106 FPS and you are there, then if we try something not so easy like toads, lets say render to cube map that much of game use and you meet TS more early maybe at 15 or some higher FPS, or if you do z prepass in some complex scenes like in Crysis you have high chances to meet TS emphasis
 
so how about the micropolygon style rendering? Either my question was so retarded as to not deserve an answer....or it was lost in the bickering.. :cry:

Edit:: Thanks Oleg for the response. The triangle rate for the GF100 is somewhere around 700mhz * 4 = 2.8GT/s... @ 30fps that is about 2.8*10^9 / 30 = 93.3 million triangles per frame. On a single 2560x1600 display that is 93,333,333/4,096,000 = 22.8 triangles/pixel. I have no idea what the "real world" values will be vs. theoretical...but at least that is some place to start. I also am not too sure about how good the tesselation will be at making sure each pixel has at least 1 full triangle to make the micropolygon style rendering work properly?!
 
Last edited by a moderator:
I do honestly wonder about utilization of AMD's stream processors. I like that we do have two very interesting and differing architectures from both NVIDIA and AMD, which makes for good times debating their merits... but do we have any hard numbers about my above stated question? I don't think I have seen anything yet about the HD 5000 series and percent utilization of the smaller units in both regular workloads, and under tessellation (my concern there is that the tessellator is working as a bottleneck, and therefore the other functional units are underutilized). While on the other hand... under pervasive tessellation usage in apps would the NV GF100 be doing a majority of the work with their CUDA cores in tessellation/geometry work and thereby diminishing pixel shading/post processing work.

I think this is gonna be a fun spring trying to find out!
 
Sure, Fermi may have some problems such as power, scalability, yields, whatever, but Nvidia has done something that's fundamentally creative and extended the concept of GPUs in so doing. It may take them another generation to perfect it, but, without efforts like this, neither they nor ATI nor Intel would be as good tomorrow as they will be now.
Thanks, that sums up my view of things quite nicely.
(Sorry guys but because of language barrier i'm not always able to express my views properly.)

In fact, it could very well be the case.
So you're basically saying that NV's engineers don't know what they're doing?

Add to that the possible unbalanced derivation of the architecture.
It's unbalanced right now because quite often you get higher triangle rate in middle end than in high end. What Fermi does is solve this disbalance.

How do these setup engines/rasterizers work when you disable half a GPC's SIMD units or a quarter of 2 of them?
Less work isn't a problem so I think they work just fine. You have a reason to believe they're not?

That raises at least as many questions as that answers to "will it be faster?".
If you want to rise some questions no one can stop you. But not all questions are smart you know.

Can I play the benchmark or otherwise gain enjoyment from it in any way? Does it push the hardware to its absolute peak, giving me insight into how my games will run in some way? Or is it just a pretty tech demo with a framerate counter with very little secondary meaning?
Unigine is as close to a real DX11 engine with heavy tesselation as possible right now. It's not some kind of a synthetic benchmark. And what's interesting is that it was developed on AMD's DX11 hardware. Sure you can say that it's not a game and thus it's irrelevant. But then everything's irrelevant beyond what we have now in games. Cypress' DX11 is irrelevant too. And most of today games run just fine even on an RV770 because these are console ports made for 5-year old hardware. No reason to buy GF100 or Cypress for them. So let's talk about things that matters then?

OK, I'll play. I think you're (sometimes hilariously) biased :devilish:
As always you may think whatever you like. But me not crying in dissapointment over GF100 graphics architecture doesn't make me biased sorry. (And the opposit does actually.)

Did you get those performance deltas for GF100 from a comprehensive review using a multitude of theoretical and game benchmarks, ideally from games you're interested in, at the resolutions you want to run at, using the IQ settings that you need as a minimum to enjoy the graphics fidelity, from an outlet that you can trust as much as is humanly possible to give you an accurate view of real-world performance? Or did they come from NVIDIA?
I've seen enough vendor-provided benchmarks to know what to expect in a real world judging from them. A vendor can pick results but he can't lie. So it's a matter of painting the whole picture from the information made avialable to us. Sure a proper review is neccessary but just to prove that your guess was right or wrong. And for the last 5 years my guess was wrong only once -- with RV770.

Cypress has GF100 licked in some non-subtle ways, by big margins. You might not enjoy a modern Radeon architecturally (I struggle sometimes, so it's cool, you're in good company) but it's hard to argue with their raw single precision numbers in Cypress, big ROP performance and that large dollop of sampling and filtering.
Numbers are irrelevant, it's how you use them. You're asking me if i've seen a proper review of GF100 and then you're saying that Cypress is winning by the numbers. That's a contradiction. You need to test the sample yourself just as i need to see a review before making any assumptions just from the number of units. But we have more than that already. We have some performance numbers. And from what I'm seeing here people are saying "oh well 64 TMUs are less than 80 -- that's settled then, it's worse already". Yeah, well, 240 SPs are less then 800 and 40 TMUs are less than 80 but that didn't mean much in a GT200 vs RV770 battle isn't it?

No they haven't. Where's my clocks!
You don't know the planned delta? =)

Yeah, why did you bother?
Did I? I'm sorry my English isn't very good.

I'd like to see a post proving that, especially considering I used one for about a year.
I'm using 5850 right now. How's that for a revelation?
 
Aren't we looking at less than 20% here due to clocks?
I was going by the speculation that the upper bound of the shader clocks was initially hoped to be in the 1.7 GHz range, the actual clocks that are achievable in with actual silicon notwithstanding.

If the base clock that the L2 and ROPs had was around 600 MHz, the rasterization/ROP throughput would be balanced.
 
As an aside, I've been wondering for years why ATI's 1 triangle per hardware thread isn't a disaster of epic proportions - imagine a 1 fragment triangle leaving 15 quads out of the 16 in a hardware thread doing nothing. Maybe it is an epic disaster, and we're now seeing that in tessellated scenes. It'd be nice to find out, for sure, exactly what ATI's doing here - since I'm not 100% on the 1 triangle per hardware thread thing.Jawed

Heh, could be why AMD says it's not setup limited... So esssentially, if the pixel shader takes more than 20 cycles (thread interleaving doesn't matter here), cypress can't be setup limited. (if the case).

Btw, regarding the shader load with high tesselation, don't forget the additional number of fragments on multisampled targets when we have more triangles covering the same area.
 
Unigine is as close to a real DX11 engine with heavy tesselation as possible right now. It's not some kind of a synthetic benchmark. And what's interesting is that it was developed on AMD's DX11 hardware. Sure you can say that it's not a game and thus it's irrelevant. *choppe out a bit* So let's talk about things that matters then?
My point is that we haven't been able to run Unigine ourselves yet on real hardware. Never in my entire time doing this have I ever used a pre-release benchmark from a hardware vendor to make a decision about real-world perf. Nor theoretical really. You you're calling it already for NV (even if you're eventually right, it pays to wait until you're 100% sure).

As always you may think whatever you like. But me not crying in dissapointment over GF100 graphics architecture doesn't make me biased sorry. (And the opposit does actually.)
It's appears to be a pretty great graphics architecture from where I'm sitting, crying would be a bit silly. I'm not miffed you like it, it's how readily you (and others) do (and in the other direction with ATI hardware and its fanboys).

I've seen enough vendor-provided benchmarks to know what to expect in a real world judging from them. A vendor can pick results but he can't lie. So it's a matter of painting the whole picture from the information made avialable to us. Sure a proper review is neccessary but just to prove that your guess was right or wrong. And for the last 5 years my guess was wrong only once -- with RV770.
Cool, I respect that point of view (and nice one calling it on RV770, I don't think was as enamoured until I had a chance to test one).

Numbers are irrelevant, it's how you use them. You're asking me if i've seen a proper review of GF100 and then you're saying that Cypress is winning by the numbers. That's a contradiction. You need to test the sample yourself just as i need to see a review before making any assumptions just from the number of units.
Hook, line and sinker ;) That's my point, we need more data.

You don't know the planned delta? =)
Nope, but then I don't think anyone does outside of NVIDIA.

Did I? I'm sorry my English isn't very good.
It's infinitely better than my command of your native tongue, and you express yourself just fine in English.

I'm using 5850 right now. How's that for a revelation?
Actually spat my coffee out :LOL:

This little head-to-head sums up what bugs me about this entire thread. The fanboys aren't scared of just unzipping and plonking it on the table, despite only having a tiny part of the big picture to hand and only their preset personal feelings about a hardware vendor to fill in the rest.

We're going to take a really dim view of it in the future, and this is the last thread that'll go this badly from a balanced discussion point of view. Keep it sane, unpolarised, impersonal (sorry for having a bit of a go, Degustator, it was to make a wider point), technical and on-topic from now on folks (and thanks to those in the arch thread keeping it level there). I'll cheerfully close the thread otherwise.
 
A few days ago on chiphell tomsmith(post #66, talking about Fellix's numbers posted here previously):


ie Current cards at 650Mhz/448 shaders are at the previously quoted performance figures with scope to increase frequency a little. 512 shader product will require a retape.

So a long wait still..... :cry:

NV was telling the AIBs the same thing at CES.

-Charlie
 
Btw, regarding the shader load with high tesselation, don't forget the additional number of fragments on multisampled targets when we have more triangles covering the same area.
So, basically... everything they could gain from this is better efficiency with sub-pixel triangles, which would require more ALU throughput than what GF100 will ever have to show a significant lead (10 to 15fps in a game is not a significant difference, 40 to 60 generally is).

If we consider it has lower texturing throughput too and even some compressed ROP/texture filtering issues, all depends on the computing architecture efficiency (and TWIMTBP program for teaching devs how to use their GPU even if it hurts almost all other GPUs, but that's another issue).

As for the missing SMs in some of the GPCs, it's still not clear it could have no effect on performance, as that would act like different GPUs working together on the same frame, headache in perspective?
 
ATi had a killer product with HD5870 and I didn't see any benches 2 months before launch .
I think they are still finalizing clocks and drivers .

Had you been here:
http://www.semiaccurate.com/2009/06/03/ati-shows-working-dx11-chips/

you would have seen benches and numbers ~4 months before launch. If they trusted you would keep your mouth shut, you would have seen a version of this:
http://www.semiaccurate.com/2009/06/09/ati-evergreen-code-names-explained/
with numbers, and other demos. I know I did, as did several others. I also know a half dozen people personally outside of DAAMIT that had cards by that time, so they could bench anything they wanted on them.

NV hasn't given out cards to AIBs yet, don't have a clue what clock bins they will end up with, and are praying that power is within reason right now. ATI on the other hand shipped a month earlier than promised.

-Charlie
 
So, basically... everything they could gain from this is better efficiency with sub-pixel triangles, which would require more ALU throughput than what GF100 will ever have to show a significant lead (10 to 15fps in a game is not a significant difference, 40 to 60 generally is).

If we consider it has lower texturing throughput too and even some compressed ROP/texture filtering issues, all depends on the computing architecture efficiency (and TWIMTBP program for teaching devs how to use their GPU even if it hurts almost all other GPUs, but that's another issue).

As for the missing SMs in some of the GPCs, it's still not clear it could have no effect on performance, as that would act like different GPUs working together on the same frame, headache in perspective?
You've made almost the entirety of that post up. Why does it require more ALU throughput? The end game isn't really sub-pixel polygons for real-time rendering, IMHO. It's 1 triangle per unit your eye can resolve. Let's call that a pixel for sake of argument, because most people will probably claim they can see the individual pixels on their display, but they probably can't see the subpixel (and shouldn't). 10-15fps is 50%. 40-60 generally is a big difference? Compared to the bigger difference earlier in your post? Did you mean a difference the user will appreciate more? Urgh.

What compressed ROP/texture filtering issues? Computing architecture efficiency is right there in front of you. It's a scalar, highly-efficient graphics architecture. Has been for three years.

Missing SMs on what GeForce product? Of course it'll affect performance. It won't act like different GPUs working together on the same frame at all. That's now how the parallel nature of graphics works in this instance.

Your contribution to this thread is far from productive, please take some time out from it to consider how you post when you come back :smile:
 
Couldn't care less: http://forum.beyond3d.com/showpost.php?p=1181502&postcount=157

What was it you just wrote upstream about those who accuse others of being biased?
OK, I'm sorry, it looks like my memory was wrong and you're weren't negative to G80 and GT200 before their release.
So here is an answer:

There's so much double-talk in this post my head is spinning. It's hardly inconceivable that a 5870 might offer 75-80% of the performance of Fermi in most 2010 games for ~60% the cost come this March or April. Will also be easier to attain, and probably run cooler and quieter.
Prices and performance has nothing in common. 5670 cost $100 and 5970 cost $700 -- is it 7x faster? No. So does that mean that everyone should go and buy 5670? Nope. Price is what you're ready to pay for a product and wrt graphics cards performance in today's games is not the only factor of pricing. So if 5870 will have 75-80% perfomance of GF380 and will cost 60% of GF380 then that's because GF380 has some other benefits to a buyer beyond performance alone. I've already described some of these benefits. Surely if you don't thik they're important then you're better of buying 5870 -- IF you're OK with it's performance because deltas aren't absolute numbers. If enough people will think the same NV will be forced to drop the prices. So that'll be solved one way or another, so I don't see any reason to talk much about it.

Your "faith" that NV will "set right prices" is kind of amusing too; NV will have target margins they need to hit.
You're hitting at least something selling cards instead of not selling them at all. A good example is a GT200 price history. They're selling one at $150 now and are making a profit as a company. In any possible situation GF100 shoudn't be worse than GT200 in comparision to competition.

They might be forced to adjust for lowered margins due to the competiitve landscape, but they can only go so low, and my biggest concern for Fermi is that Charlie, despite all his obvious biases, might not be too far off when writing that TSMC's 40nm process might not be the best for this chip. If yields are initially as bad as can be reasonably expected, don't you think that's gonna impact NV mgmt's initial pricing for Fermi parts?
As I've said it's better to sell at a loss than not to sell at all. The pricing will be competitive or the products won't be on the market at all.

Like you wrote, I don't plan on multi-display gaming anytime soon, my single 30" works just fine. So a 5870 is plenty fast for me, so unless Fermi can offer more of a performance gap over it than 20-25% then its price better reflect that relative performance for this consumer. Otherwise, the better value for me, considering the pace at which I upgrade anyways, would be a 5870. But of course I'm waiting to see how yields, clock speeds, power, pricing, etc., work out before I decide to buy.
5870 isn't fast enough for me on my 24" 1920x1200 so I don't really understand how is it fast enough for you on a 30" display. Fermi's key points are not only performance but features as well. So it's really a question of you caring about those features (PhysX, CUDA, 3D Vision etc). If you do then you don't really have a choice. If you don't then, well, you need to judge from performance pov. For me PhysX is a more killer feature than DX11 for the moment so I don't really have much choice (well I could wait for a Fermi middle end GPU and use it as a dedicated PhysX accelerator but why would I want to do something like that instead of simply buying a GF100 card?).

This little head-to-head sums up what bugs me about this entire thread. The fanboys aren't scared of just unzipping and plonking it on the table, despite only having a tiny part of the big picture to hand and only their preset personal feelings about a hardware vendor to fill in the rest.
Point taken.
However I'm kinda hoping that I have a bigger picture in view than what's publically avialable right now -)
I don't know how it'll end up in the end with the whole Fermi line-up but with GF100 i'm 90% sure that I have a pretty good understanding of what (and when) to expect from a final products.
That's why I'm saying that it's strange to see anyone dissapointed with GF100 graphics architecture info. It looks like people are dissapointed with not knowing fps numbers but somehow that translates into a dissapointment over the whole GF100 architecture. So in general I'd say that those who aren't impressed by charts and graphs from the whitepaper should simply wait a month or two and they'll get their games performance numbers. For me getting GTX285+130% with MSAA 8x in HAWX is enough already not to be dissapointed with the provided info. As for the rest of performance numbers -- well, it's just a matter of time now.
 
Last edited by a moderator:
Back
Top