NVIDIA GF100 & Friends speculation

Discussion in 'Architecture and Products' started by Arty, Oct 1, 2009.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Aren't we looking at less than 20% here due to clocks?

    I'm guessing that's a rasterisation granularity issue. I'm guessing that with 2x16 rasterisation an entire hardware thread is populated with fragments for "one triangle" even if the triangle only occupies 4 fragments - though I was under the impression, historically, that NVidia didn't have that problem and could pack multiple triangles' fragments into a hardware thread (respecting pixel quad boundaries). Can I be bothered to rummage through patents...

    For what it's worth, the architecture really looks to me a lot like 4 GPUs that just happen to have a common command processor, L2, ROPs and memory and general gubbins.

    Jawed
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I can't work out HD5870's setup/rasterisation configuration. If anything it would appear that it only accelerates triangles that span screen-space tile boundaries - quite the opposite from being good for small triangles produced by tessellation :???:

    Jawed
     
  3. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,532
    Location:
    Winfield, IN USA
    But it's sounding like waiting for a GF100 will be a lot longer than "the next days"...much more like "the next months". ;)
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Of course.

    I was responding to the misapprehension here:

    It's quite clear that DS/TS can affect setup, i.e. if either is the bottlneck then setup isn't.

    Jawed
     
  5. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    805
    Likes Received:
    1,634
    Nope, you did't your math right because it's just theoretical polys. Did you see Froblins demo? in papers about it states of 8 million polys in frame, at first just look at wireframe of demo you did't see fully whyte screen there:wink:, then let count how much of them need to meet poor TS, 850mln trys per sec/8mln trys per frame=106 FPS and you are there, then if we try something not so easy like toads, lets say render to cube map that much of game use and you meet TS more early maybe at 15 or some higher FPS, or if you do z prepass in some complex scenes like in Crysis you have high chances to meet TS emphasis
     
  6. CNCAddict

    Regular

    Joined:
    Aug 14, 2005
    Messages:
    290
    Likes Received:
    2
    so how about the micropolygon style rendering? Either my question was so retarded as to not deserve an answer....or it was lost in the bickering.. :cry:

    Edit:: Thanks Oleg for the response. The triangle rate for the GF100 is somewhere around 700mhz * 4 = 2.8GT/s... @ 30fps that is about 2.8*10^9 / 30 = 93.3 million triangles per frame. On a single 2560x1600 display that is 93,333,333/4,096,000 = 22.8 triangles/pixel. I have no idea what the "real world" values will be vs. theoretical...but at least that is some place to start. I also am not too sure about how good the tesselation will be at making sure each pixel has at least 1 full triangle to make the micropolygon style rendering work properly?!
     
    #826 CNCAddict, Jan 18, 2010
    Last edited by a moderator: Jan 18, 2010
  7. JoshMST

    Regular

    Joined:
    Sep 2, 2002
    Messages:
    467
    Likes Received:
    25
    I do honestly wonder about utilization of AMD's stream processors. I like that we do have two very interesting and differing architectures from both NVIDIA and AMD, which makes for good times debating their merits... but do we have any hard numbers about my above stated question? I don't think I have seen anything yet about the HD 5000 series and percent utilization of the smaller units in both regular workloads, and under tessellation (my concern there is that the tessellator is working as a bottleneck, and therefore the other functional units are underutilized). While on the other hand... under pervasive tessellation usage in apps would the NV GF100 be doing a majority of the work with their CUDA cores in tessellation/geometry work and thereby diminishing pixel shading/post processing work.

    I think this is gonna be a fun spring trying to find out!
     
  8. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    Thanks, that sums up my view of things quite nicely.
    (Sorry guys but because of language barrier i'm not always able to express my views properly.)

    So you're basically saying that NV's engineers don't know what they're doing?

    It's unbalanced right now because quite often you get higher triangle rate in middle end than in high end. What Fermi does is solve this disbalance.

    Less work isn't a problem so I think they work just fine. You have a reason to believe they're not?

    If you want to rise some questions no one can stop you. But not all questions are smart you know.

    Unigine is as close to a real DX11 engine with heavy tesselation as possible right now. It's not some kind of a synthetic benchmark. And what's interesting is that it was developed on AMD's DX11 hardware. Sure you can say that it's not a game and thus it's irrelevant. But then everything's irrelevant beyond what we have now in games. Cypress' DX11 is irrelevant too. And most of today games run just fine even on an RV770 because these are console ports made for 5-year old hardware. No reason to buy GF100 or Cypress for them. So let's talk about things that matters then?

    As always you may think whatever you like. But me not crying in dissapointment over GF100 graphics architecture doesn't make me biased sorry. (And the opposit does actually.)

    I've seen enough vendor-provided benchmarks to know what to expect in a real world judging from them. A vendor can pick results but he can't lie. So it's a matter of painting the whole picture from the information made avialable to us. Sure a proper review is neccessary but just to prove that your guess was right or wrong. And for the last 5 years my guess was wrong only once -- with RV770.

    Numbers are irrelevant, it's how you use them. You're asking me if i've seen a proper review of GF100 and then you're saying that Cypress is winning by the numbers. That's a contradiction. You need to test the sample yourself just as i need to see a review before making any assumptions just from the number of units. But we have more than that already. We have some performance numbers. And from what I'm seeing here people are saying "oh well 64 TMUs are less than 80 -- that's settled then, it's worse already". Yeah, well, 240 SPs are less then 800 and 40 TMUs are less than 80 but that didn't mean much in a GT200 vs RV770 battle isn't it?

    You don't know the planned delta? =)

    Did I? I'm sorry my English isn't very good.

    I'm using 5850 right now. How's that for a revelation?
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I was going by the speculation that the upper bound of the shader clocks was initially hoped to be in the 1.7 GHz range, the actual clocks that are achievable in with actual silicon notwithstanding.

    If the base clock that the L2 and ROPs had was around 600 MHz, the rasterization/ROP throughput would be balanced.
     
  10. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    746
    Likes Received:
    41
    Location:
    Copenhagen
    Heh, could be why AMD says it's not setup limited... So esssentially, if the pixel shader takes more than 20 cycles (thread interleaving doesn't matter here), cypress can't be setup limited. (if the case).

    Btw, regarding the shader load with high tesselation, don't forget the additional number of fragments on multisampled targets when we have more triangles covering the same area.
     
  11. John Reynolds

    John Reynolds Ecce homo
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    4,491
    Likes Received:
    267
    Location:
    Westeros
    Couldn't care less: http://forum.beyond3d.com/showpost.php?p=1181502&postcount=157

    What was it you just wrote upstream about those who accuse others of being biased?

    Hopefully Aliens vs. Predator comes out this spring, along with Dirt 2 that'll be two titles with DX11 + some, albeit fairly limited, tessellation use for performance comparisons.
     
  12. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    My point is that we haven't been able to run Unigine ourselves yet on real hardware. Never in my entire time doing this have I ever used a pre-release benchmark from a hardware vendor to make a decision about real-world perf. Nor theoretical really. You you're calling it already for NV (even if you're eventually right, it pays to wait until you're 100% sure).

    It's appears to be a pretty great graphics architecture from where I'm sitting, crying would be a bit silly. I'm not miffed you like it, it's how readily you (and others) do (and in the other direction with ATI hardware and its fanboys).

    Cool, I respect that point of view (and nice one calling it on RV770, I don't think was as enamoured until I had a chance to test one).

    Hook, line and sinker :wink: That's my point, we need more data.

    Nope, but then I don't think anyone does outside of NVIDIA.

    It's infinitely better than my command of your native tongue, and you express yourself just fine in English.

    Actually spat my coffee out :lol:

    This little head-to-head sums up what bugs me about this entire thread. The fanboys aren't scared of just unzipping and plonking it on the table, despite only having a tiny part of the big picture to hand and only their preset personal feelings about a hardware vendor to fill in the rest.

    We're going to take a really dim view of it in the future, and this is the last thread that'll go this badly from a balanced discussion point of view. Keep it sane, unpolarised, impersonal (sorry for having a bit of a go, Degustator, it was to make a wider point), technical and on-topic from now on folks (and thanks to those in the arch thread keeping it level there). I'll cheerfully close the thread otherwise.
     
  13. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    Which nicely explains why Nvidia is deathly afraid of releasing it.

    -Charlie
     
  14. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    NV was telling the AIBs the same thing at CES.

    -Charlie
     
  15. PSU-failure

    Newcomer

    Joined:
    May 3, 2007
    Messages:
    249
    Likes Received:
    0
    So, basically... everything they could gain from this is better efficiency with sub-pixel triangles, which would require more ALU throughput than what GF100 will ever have to show a significant lead (10 to 15fps in a game is not a significant difference, 40 to 60 generally is).

    If we consider it has lower texturing throughput too and even some compressed ROP/texture filtering issues, all depends on the computing architecture efficiency (and TWIMTBP program for teaching devs how to use their GPU even if it hurts almost all other GPUs, but that's another issue).

    As for the missing SMs in some of the GPCs, it's still not clear it could have no effect on performance, as that would act like different GPUs working together on the same frame, headache in perspective?
     
  16. Mize

    Mize 3dfx Fan
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,079
    Likes Received:
    1,149
    Location:
    Cincinnati, Ohio USA
    Unigine Heaven isn't a synthetic benchmark?
    How's that?? What level are you on?
     
  17. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    Had you been here:
    http://www.semiaccurate.com/2009/06/03/ati-shows-working-dx11-chips/

    you would have seen benches and numbers ~4 months before launch. If they trusted you would keep your mouth shut, you would have seen a version of this:
    http://www.semiaccurate.com/2009/06/09/ati-evergreen-code-names-explained/
    with numbers, and other demos. I know I did, as did several others. I also know a half dozen people personally outside of DAAMIT that had cards by that time, so they could bench anything they wanted on them.

    NV hasn't given out cards to AIBs yet, don't have a clue what clock bins they will end up with, and are praying that power is within reason right now. ATI on the other hand shipped a month earlier than promised.

    -Charlie
     
  18. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    You've made almost the entirety of that post up. Why does it require more ALU throughput? The end game isn't really sub-pixel polygons for real-time rendering, IMHO. It's 1 triangle per unit your eye can resolve. Let's call that a pixel for sake of argument, because most people will probably claim they can see the individual pixels on their display, but they probably can't see the subpixel (and shouldn't). 10-15fps is 50%. 40-60 generally is a big difference? Compared to the bigger difference earlier in your post? Did you mean a difference the user will appreciate more? Urgh.

    What compressed ROP/texture filtering issues? Computing architecture efficiency is right there in front of you. It's a scalar, highly-efficient graphics architecture. Has been for three years.

    Missing SMs on what GeForce product? Of course it'll affect performance. It won't act like different GPUs working together on the same frame at all. That's now how the parallel nature of graphics works in this instance.

    Your contribution to this thread is far from productive, please take some time out from it to consider how you post when you come back :smile:
     
  19. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    I said keep it impersonal :smile:
     
  20. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    OK, I'm sorry, it looks like my memory was wrong and you're weren't negative to G80 and GT200 before their release.
    So here is an answer:

    Prices and performance has nothing in common. 5670 cost $100 and 5970 cost $700 -- is it 7x faster? No. So does that mean that everyone should go and buy 5670? Nope. Price is what you're ready to pay for a product and wrt graphics cards performance in today's games is not the only factor of pricing. So if 5870 will have 75-80% perfomance of GF380 and will cost 60% of GF380 then that's because GF380 has some other benefits to a buyer beyond performance alone. I've already described some of these benefits. Surely if you don't thik they're important then you're better of buying 5870 -- IF you're OK with it's performance because deltas aren't absolute numbers. If enough people will think the same NV will be forced to drop the prices. So that'll be solved one way or another, so I don't see any reason to talk much about it.

    You're hitting at least something selling cards instead of not selling them at all. A good example is a GT200 price history. They're selling one at $150 now and are making a profit as a company. In any possible situation GF100 shoudn't be worse than GT200 in comparision to competition.

    As I've said it's better to sell at a loss than not to sell at all. The pricing will be competitive or the products won't be on the market at all.

    5870 isn't fast enough for me on my 24" 1920x1200 so I don't really understand how is it fast enough for you on a 30" display. Fermi's key points are not only performance but features as well. So it's really a question of you caring about those features (PhysX, CUDA, 3D Vision etc). If you do then you don't really have a choice. If you don't then, well, you need to judge from performance pov. For me PhysX is a more killer feature than DX11 for the moment so I don't really have much choice (well I could wait for a Fermi middle end GPU and use it as a dedicated PhysX accelerator but why would I want to do something like that instead of simply buying a GF100 card?).

    Point taken.
    However I'm kinda hoping that I have a bigger picture in view than what's publically avialable right now -)
    I don't know how it'll end up in the end with the whole Fermi line-up but with GF100 i'm 90% sure that I have a pretty good understanding of what (and when) to expect from a final products.
    That's why I'm saying that it's strange to see anyone dissapointed with GF100 graphics architecture info. It looks like people are dissapointed with not knowing fps numbers but somehow that translates into a dissapointment over the whole GF100 architecture. So in general I'd say that those who aren't impressed by charts and graphs from the whitepaper should simply wait a month or two and they'll get their games performance numbers. For me getting GTX285+130% with MSAA 8x in HAWX is enough already not to be dissapointed with the provided info. As for the rest of performance numbers -- well, it's just a matter of time now.
     
    #840 DegustatoR, Jan 18, 2010
    Last edited by a moderator: Jan 18, 2010
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...