Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    I wasn't trying to argue that it's bad. I personally think it's a great thing (assuming the textures are actually analysed for quality beforehand). I've always enabled AI on my ATi cards. Performance increase with little or no visible differences, great.
    I wasn't even trying to argue that ATi is the only one who does this (although the ATi fanboys here ofcourse assumed that for trolling's sake)...
    I was just pointing out that there are documented cases of this happening. One of the first occurences was with 3dmark2000 I believe... we noticed that some videocards scored better on the fillrate tests than their theoretical specs allowed... These videocards introduced texture compression... Further investigation pointed out that the driver forced texture compression on during the fillrate tests.

    Bottom line is just that if you design a benchmark where you assume a certain pixelformat, and the driver does something else, your bandwidth calculations will be inflated, and won't represent the actual hardware capabilities.
     
  2. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,987
    Likes Received:
    6,236
    The problem with that is, if it does it in all applications, then by not benchmarking it's default behavior you are in fact not representing the real world hardware capabilities.

    Granted you wouldn't be respresenting the theoretical capabilities of the cards.

    The problems that arise is when you start significantly altering the quality of what's presented in a commonly accepted "bad" way.

    For example alterning an image by compressing textures such that texture compression is noticable is generally agreed to be bad.

    On the other hand, alterating textures/edges through Aliasing (and transparency aliasing) is generally considered good, even though it noticeably alters what is presented.

    I think the problem comes in when people misunderstand whether a benchmark is supposed to be gauging the "Real World" performance of a card in certain tests or whether it's supposed to be gauging the "Theoretical" performance of a card in certain tests.

    And then add in having to determine whether any optimizations are overall good or overall bad...

    If someone can make an optimization that is unnoticable to the naked eye then awesome. If someone can make an optimization that actually increases IQ (touchy one since it's subject to personal taste) than awesome. If someone makes an optimization that noticeable degrades quality though, uh...yeah...

    Regards,
    SB
     
  3. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    That is exactly what started this discussion.
    Someone referred to benchmark results, claiming that ATi had nearly the same level of performance as nVidia with only about half the physical texturing hardware.
    Then someone else remarked that because of AI, you aren't measuring the physical hardware.
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I was under the impression that ATI does selective-texture angle-dependent optimisations. e.g. normal maps will have lower quality aniso than "uppermost" albedo textures.

    I hadn't heard of what you were talking about so was curious to hear more.

    Jawed
     
  5. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    AI does a number of things, including optimizations for texture filtering and shader replacement.
    [MOD: TONE IT DOWN]
    There's nothing secret about AI, since it can now be disabled by the user. Basically it's just the same stuff that was considered 'cheating' before the user had any choice. I believe it is documented somewhere, either in the control panel help, or on ATi's site.

    Your question wasn't focusing on the technique though, but rather on applications. So this explanation makes no sense whatsoever.
    Also, if you've never heard of it, you must not have been around very long. As I say it's very old, around 3dmark2000 days.
     
    #505 Scali, Apr 10, 2009
    Last edited by a moderator: Apr 14, 2009
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    You said they've got a 2 year lead, that's why their GPU is bigger. Bizarre.

    Interestingly, as far as I can tell folding@home cannot use GPUs for all projects because precision is a problem.

    A recent change in the ATI client was made to improve the precision, which slows it down. Very vague - I'm not trying to imply that double-precision calculations are being done, merely that precision is an issue.

    Additionally:

    http://www.brightsideofnews.com/new...ome-meets-the-power-of-graphics.aspx?pageid=1

    Jawed
     
  7. I.S.T.

    Veteran

    Joined:
    Feb 21, 2004
    Messages:
    3,174
    Likes Received:
    389
    OK, this is false. He has slammed the R600 more than once in the last few pages. Why would an ATI/AMD fanboy do that? It wasn't GFFX level bad, a chip so bad that no one can deny it.
     
  8. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    No I didn't.
    You could at least have the decency and respect to properly read my posts and not to misrepresent them.
    What I said was:
    "In nVidia's case, the chip is larger because of the implementation they chose... We are now about to find out if this implementation is going to pay off or not, in GPGPU tasks."
    Now unless you want to go down the dead-end line of argument that nVidia's implementation is equal to ATi's, what I said makes perfect sense.

    There will always be isolated cases. Doesn't mean the majority of GPGPU software needs to be double-precision... or even that the majority of calculations in software like folding@home need to be double-precision.
    If you need double-precision in some places, but only spend a few % of the total processing time in those places, it still isn't going to be a significant advantage for Larrabee.

    So my point still stands.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Knowing the applications I could have gone and searched. Back in the 3DMark2000 days I was playing Quake3 and Counter Strike. In fact, I still do :shock: :lol:

    I think about a year ago I ran it for the first time because I wanted to see what it looked like and the web videos looked like crap.

    Jawed
     
  10. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    Yes, and how about we get back on topic? If one wants to discuss optimizations present in current drivers a new thread would be far more adequate, wouldn't it?
     
  11. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    Or reverse this in the context of OpenCL, and label NVidia's shared memory as software managed L1 cache. Both serialize bank conflicts on scatter/gather to this "cache". The difference to me is that LRB has an L2 backing, perhaps less non-cached memory bandwidth (guessing here on that), and less latency hiding, and NV with perhaps more memory bandwidth and better latency hiding and thus better at non-cache bandwidth limited cases (IMO the more important case). This situation is like SPU programming on the PS3. Good SPU (software managed cache) practices map well to non-SPU code (ie processors with a cache). Often writing code as if you had a software managed cache is ideal for a cached CPU (hint why LRB has cache line management).

    Infact, if you were really crazy you could do software rendering into OpenCL shared memory with 1/8 micro tiles (DX11 CS/OpenCL 32KB shared is only 1/8 the size of Larrabee L2) :wink:
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I'm sorry, I did miss that you'd answered :oops:

    I agree - I was merely fleshing out.

    Jawed
     
  13. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,435
    Likes Received:
    181
    Location:
    Chania
    Don't know if it has been mentioned before but Jon Olick's presentation at Siggraph08 had quite a few interesting points about next generation parallelism in games (after page 91):

    http://s08.idav.ucdavis.edu/olick-current-and-next-generation-parallelism-in-games.pdf

    Interesting dilemma "duplicate the GPU into each core" or add a triangle sorting stage?

    Page 203/4 has an interesting performance prediction for next generation platforms.
     
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,561
    Likes Received:
    601
    Location:
    New York
    Based on what metric? As far as GPGPU goes it's been a remarkable success compared to the non-existent competition.

    I'm not really sure why you think it's just bloat. What alternative architecture do you propose would have put them in a similar or better position than they are in today?

    I'm baffled as to how can you draw these conclusions with nothing to compare against? Where is AMD's scheduler light architecture excelling exactly?

    Yeah that's unfortunate because that point renders these discussions moot. The fact that not even AMD has been able to produce something that highlights their architectures strengths is pretty telling to me.

    I was referring to clause demarcation. That's done at compile time as well no?

    Well I didn't mention VLIW. I thought we were talking about scheduling. And that doesn't answer the question. Where are the apps that prove out the viability of AMD's approach as a general compute solution? Is CUDA's success solely a function of Nvidia's dollar investment and marketing push or is there something to the technology too?
     
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,561
    Likes Received:
    601
    Location:
    New York
    Yeah I don't see a big difference here either. Larrabee doesn't get a free ride just because it has an L1. Only one cache line can be read per clock so in order to avoid starving the ALUs software is gonna have to manage data carefully to maximize aligned reads from "shared memory". Whereas you get this for free with CUDA. Sure Larrabee's L1/L2 will be bigger but that's no guarantee at all that they'll be faster.
     
  16. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,435
    Likes Received:
    181
    Location:
    Chania
    Bigger compared to what?
     
  17. compres

    Regular

    Joined:
    Jun 16, 2003
    Messages:
    553
    Likes Received:
    3
    Location:
    Germany
    From what I have seen, double precission as a requirement is more the normal case than the isolated case. That's why x86/RISC SMP/MPI systems are still being used in spite of the GPUs higher throughput. Naturally there are other issues like maturity of high performance libraries and compilers, IEEE compliance, etc.

    DP is IMO a very strong advantage for ATI, the problem is the inmaturity of CAL.
     
  18. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    The little read-only caches in GT200 and RV770.
     
  19. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,435
    Likes Received:
    181
    Location:
    Chania
    I'm trying to bounce back the debate to GT3x0 ;)
     
  20. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,561
    Likes Received:
    601
    Location:
    New York
    Hehe, nice try.

    Well LRB is gonna have 32KB L1 and 256KB L2 right? GT200 has 16KB. It's really unlikely that GT300 is gonna expand on that in a big way if it sticks to the current CUDA model.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...