Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    I wasn't aware of that. :(

    Given that it's been filed only after G80's launch (and thus not been issued yet) I doubt this particular assumption.
     
  2. ChrisRay

    ChrisRay <span style="color: rgb(124, 197, 0)">R.I.P. 1983-
    Veteran

    Joined:
    Nov 25, 2002
    Messages:
    2,234
    Likes Received:
    26
    If I remember correctly. Two of the game tests that did pixel shader 1.1 were actually using 1.4 on any card that supported it. So GT2 and GT3 were actually 1.4 shaders. Unless those were replaced. 1.1 shaders were not always faster than 1.4 shaders on the FX cards. It really depended on register usage.
     
  3. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    Restrictions mostly at the hardware level.

    Not grim IMO, but rather shows what will become important. For example, note how the BG/P OS doesn't do disc backed memory, pages are always physically pinned so DMA engine has low latency and CPU doesn't touch pages during communication. What I gather from all of it is that eventually the hardware is going to consist of cores and interconnect which provides dedicated hardware support for the most important parallel communication patterns, so that the cores aren't involved in communication which is latency bound. Things like CPUs manually doing all the work on interrupts (preemption) just isn't going to scale ... nor is ALUs doing atomic operations on shared queues between cores ... etc. I think all this goes away at some point for dedicated hardware, and a different model of general purpose computing.

    My little brother (James Lottes, different last name) worked at Argonne in the MCS Division on tough scaling issues for Bluegene (until he decided to go back to get his PHD this year, now he works there on/off). An interesting paper related to the issues of scaling algorithms in interconnect limited cases, http://www.iop.org/EJ/article/1742-6596/125/1/012076/jpconf8_125_012076.pdf?request-id=12293745-5238-4326-9be2-43b91b4c4753, covers how they adjust data exchange strategies for the problem to lower network latency.

    If you haven't read this PTX simulator paper, http://www.ece.ubc.ca/~aamodt/papers/gpgpusim.ispass09.pdf, you might find it interesting. Their results showed performance more sensitive to interconnection network bisection bandwidth rather than latency. They also added a cache in their simulation, which indeed helped some of the apps, but also reduced the performance of a lot of them.
     
  4. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    You are correct that they were ps1.4 on hardware that supported it.
    And I'm not entirely sure, but I vaguely recall that nVidia may have reported ps1.1 capability in those tests because it ran faster than ps1.4 on the FX.
     
  5. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    3D Mark 2001 is called DX8 test, but it doesn't test real DX8 capabilities at all:

    You can run tests 1-3 in full quality on any other DX6 compatible graphics card. No effect will be missing. The only advantage of DX7/8 graphics card in these tests is hardware accelerated geometry.

    Test 4 use PS1.1 on the lake surface, which is shown for 15-20% of testing time - that's the only DX8 exclusive effect, which can reflect DX8 performance in the score.

    Score is calculated via this formula: (total low-detail FPS * 10) + (total high-detail FPS + nature FPS) * 20

    Here are results of DX8 graphics card: (107,1 + 98,6 + 103,2)*10 + (41,4 + 67,3 + 46,9 + 29,4)*20 = 6789 3D Marks

    The bold value (29,4) is framerate in Nature test. Imagine, that the graphics card would be so crappy in pixel-shading, that the performance in PS/lake scenes would be zero. We know, that the lake scenes takes about 18% of the test time, so it's quite easy to count, what the framerate will be: 29,4*0,82 = 24,1 FPS

    If I use the 3D Mark formula, the graphics card would score 6683 3D Marks. Well, this "DX8 benchmark" shows 1,5% difference between fast DX8 graphics card and graphics card with zero DX8 performance.

    Do you understand now, why I rate 3D Mark 2001 as DX6 test? :wink:

    As for GeForce 2 scoring near 10k in 3DM01 - are you sure? 10k score was typical for GeForce 4 Ti...
    You don't need non-TnL GPU to prove my point. Just switch to SW TnL in 3DMark. For majority of DX7 TnL cards, SW TnL on 2GHz+ GPU will score slightly better in 3DM score. 8-lights tests score will be about twice as high with SW TnL.

    The real performance advantage of GF2 wasn't hidden in the TnL engine, but in the 4x2 configuration. Competition was 4x1, 2x2, or 2x3 - GF2 simply offered almost double fill-rate...
     
  6. Unknown Soldier

    Veteran

    Joined:
    Jul 28, 2002
    Messages:
    4,047
    Likes Received:
    1,670
    I'd thank you but the forums don't use thanks.

    Oh wait! 'Thanks'

    :D

    US
     
  7. XMAN26

    Banned

    Joined:
    Feb 17, 2003
    Messages:
    702
    Likes Received:
    1

    I'm sorry, but I disagree with you and for funs and giggles, I will put together a P4 2.8Ghz HT fsb800 machine together and use a GF4/3 or 2MX(depending on what I can find stashed away) and will post the number from 2k1. And I will guaruntee, that SW T&L will not be faster than hardware cept for maybe the 2mx.
     
  8. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,455
    Likes Received:
    471
    :???:
     
  9. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    For me the most importan physics improvement: HAIR. When will be have proper hair physics ?.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    The provisional filing date was a year earlier. I'm not even sure what value there is in a comparison of patent application filing date and launch date for a technology.

    Jawed
     
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Normally, you file you patent as soon as you're done with your work and do not wait 'til all the other execution stages + marketing are done also.

    But since the provisional filing was a year earlier, which I did not notice, this is also moot.
     
  12. KonKort

    Newcomer

    Joined:
    Dec 29, 2008
    Messages:
    89
    Likes Received:
    0
    Location:
    Germany, Ennepetal
    Nvidia G300 has got taped out. He is actually running well at A1 step.
    The GDDR5 memory, he is used, clocks higher than 1,000 MHz. So you can expect a bandwidth higher than 256 GB/s.

    Source: Hardware-Infos
     
  13. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    That (doubling bandwidth to ~280ish GByte/sec.) would IMO only be necessary if they've really decided to ditch the FF-ROPs (thus also removing quite a bit of compression/decompression hardware) and are doing all this stuff in the shader ALUs.

    If I am not mistaken, the scheduler/scoreboarding stuff could also be simplified quite a lot with this step, since each pixel/thread is effectively "fire and forget", once it's left for the shader core. If there's geometry stuff to be done, it can be re-queued from VRAM.
     
  14. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    Noooo, R600 all over again, noooo!.

    Seriously it that would be the case i hope they had a real AA solve shader solution this time or what is the same: lots of flops!
     
  15. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    I know NVIDIA's design decisions haven't always impressed everyone lately, but I hope you're not suggesting they replaced all their engineers by drunk monkeys?
     
  16. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    @Love_in_Rio: First of all, I think it could make a difference, if you're planning your architecture around this "feature"/"economization" or if you have to bolt it on afterwards.

    Second: please look at what Edge-Detec-AA costs you on HD 4890. I've just had time to run Deep Freeze from 3DMark 06 (at least it uses HDR-Rendering) in 1.680x1.050:

    1x MSAA: 72,2 Fps
    4x MSAA: 53,2 Fps
    8x MSAA: 42,3 Fps
    4x & EDAA: 47,3 Fps

    Nice, isn't it?

    @Arun:
    At least shader-based AA seems feasible IMO. What else would you suggest one could need that amount of bandwidth for? We're talking about doubling again! If it's at all true, that is.
     
  17. XMAN26

    Banned

    Joined:
    Feb 17, 2003
    Messages:
    702
    Likes Received:
    1

    Would GF2/4MXs be fine by you then, its not like teh T&L engine stopped being fixed on the 3/4s. And you claimed SW T&L on a 2+Ghz proc would be faster than on majority of DX7 capable hardware. GF3/4s are capable of DX7 or did they stop supporting it when they became DX8 capable? Something tells me they didn't.
     
  18. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
  19. Lukfi

    Regular

    Joined:
    Apr 27, 2008
    Messages:
    423
    Likes Received:
    0
    Location:
    Prague, Czech Republic
    Today's DX10/DX10.1 obviously support DX7 as well, yet you probably wouldn't call them "DX7 hardware" ;)
     
  20. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    But, doesn´t RV760 use RBEs to solve AA ?.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...