NV33 Rumours, if anyone is interested

Discussion in 'Architecture and Products' started by elroy, Feb 14, 2003.

  1. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    Internal cache on R200 is 2K. Double that for RV250/M9.

    MuFu.
     
  2. SpellSinger

    Newcomer

    Joined:
    Jan 10, 2003
    Messages:
    60
    Likes Received:
    0
    Does this mean NV31 is not DX9?

    Will ATI's RV350 not also be used for mobile just like RV250 and M9, RV200 and M7, RV100 and M6?

    I would bet ATI is more than ready for nVidia. They will not give up market share easily and nVidia has never met their power expectations. ATI kills them here.
     
  3. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    NV31 is fully DX9 compliant (in nVidia's eyes anyway). NV34 is "DX9-compatible", whatever that means...

    Yeah - it is virtually identical to M10, although the latter has some pretty [​IMG], mobile-specific technology.

    MuFu.
     
  4. Doomtrooper

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,328
    Likes Received:
    0
    Location:
    Ontario, Canada
    Yes and I hope that feature doesn't carry over to the Desktop part :cry:
     
  5. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Oh, then I finally understood why the 8500 was so darn slow compared to the GF4
    The GF4 is 40% cache, according to publicly released nVidia information. That's obviously a LOT more than 2K. A lot, lot more. It's even a lot more than 4K...


    Uttar
     
  6. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    40% cache?!

    The RV250 figure is from a internal document. I presume it refers to texture cache only.

    MuFu.
     
  7. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Oh, texture cache only? Then that's probably about the same as the GF4.

    But yes, all cache counted, the GF4 is 40% cache & 60% logic
    I could have my numbers wrong by 5% ( it's been a while since I seen nVidia stating it ) , but probably not more.
    BTW, those figures are for a NV25. Should have been more precise, it's most likely quite different for a GF4 MX.

    Texture cache is really just a part of the overall cache. Here are several other caches used in a GPU:
    - Vertex Cache, Pixel Cache & Primitive Cache
    - Shader Caches ( that's temporary registers, instructions, ... )
    - AGP cache ( Not sure if that really exists: very few things talk about it. And if it does, it's probably fairly small. I'd love some more info about it. )

    A good reason many people don't see where all this cache go is that they don't consider the Shader Caches. Many would tend to consider it all as logic. But it isn't :)

    It wouldn't surprise me if that shader cache might actually be guilty for a good part of the NV30 transistor count increase over the R300. 1024 PS instructions gotta cost a lot...
    IMO, nVidia should have limited itself to about 512 instructions in the NV30. 1024 was kinda overkill... The R300 limit of 96 seems too little, however ( even Carmack says he already crossed that border several times when experimenting with stuff! )

    Speculation: I think a good part of the cost of programmable architectures such as the GF3 is the cache. Because when it's programmable, you've got to cache what to do next, too.
    I'd love to know NV30 & NV17 cache ratio, so we could know the real effect of programmability on it.


    Uttar

    P.S. : I can already imagine people wondering why there isn't more texture cache...
    Well, the reason is simple. You know pretty much for *sure* that you ain't gonna use the same texture info on the other side of the triangle, and keeping useless info isn't optimal.
    Much larger texture cache wouldn't provide a performance benefit AFAIK.
     
  8. Heathen

    Regular

    Joined:
    Jul 6, 2002
    Messages:
    380
    Likes Received:
    0
    Instructions

    Thought the R300 limit was 160 instructions?
     
  9. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Re: Instructions

    Oopsy, my mistake. 96 is for PS2.0. , and the R300 is very slightly above specs ( it's 160 as you say )

    But it's slightly more complex than that, too.
    The PS2.0. spec divide instruction limit in texture & arithmetic, requiring 32 and 64 respectively.

    PS3.0. & the NV30 put all in one huge pool.

    The R300, however, divide it further. From http://www.beyond3d.com/articles/nv30r300/index.php?p=6#ppp

    Sorry for the mistake. Anyway, Carmack was referring to the R300 instruction limit, so I guess he was talking of 160 instructions.


    Uttar
     
  10. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    Re: Instructions

    I guess he's wanting for XXXX with unlimited instructions... ;)
     
  11. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Re: Instructions

    Eh, don't we all want that in secret? :D

    I don't see him saying he crossed the 1024 limit of the NV30, however.

    Uttar
     
  12. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Re: Instructions

    It's 32 texture address ops, 64 vector ops and 64 scalar ops in parallel, but this ability to execute one vector op and one scalar op in parallel is not exposed in D3D, IIRC.

    I remember having read that Glaze3D was supposed to have 24 KiB of texture cache, 16 KiB for even mip levels and lightmaps, and 8 KiB for odd mip levels (or something like that)

    2 KiB is really a bit small for a chip that supports 6 textures per pass.

    btw, GFFX stores PS code in video memory AFAIK. There are no jumps, so access is predictable.
     
  13. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,517
    Likes Received:
    24,424
    Re: Instructions

    He probably got tired of waiting for the NV30 to finish executing it... :wink:
     
  14. MuFu

    MuFu Chief Spastic Baboon
    Veteran

    Joined:
    Jun 12, 2002
    Messages:
    2,258
    Likes Received:
    51
    Location:
    Location, Location with Kirstie Allsopp
    Re: Instructions

    I thought that too - since they doubled the cache going from R200 to RV250 then perhaps that figure refers to the allocation per mapping unit or per pipe.

    MuFu
     
  15. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Re: Instructions

    Hmm, well yeah, with 6 textures and 32-bit textures, it is too small.
    2048/6 = 341

    But then, there are 4 pipes...

    That's 85 bytes... Assuming 32-bit textures , that's 21 pixels.

    Now, that seems too little.

    Triangles are NOT processed on a per-line basis, there's multiple pixels being processed on a line, then you move to the next line. Then you move to the right. Then you again treat a part of the two lines.
    That's because, otherwise, you'd need texture cache being able to fit two full lines. In such a system, you save a lot of transistors and barely lose any memory bandwidth.

    But still, that would mean 10 pixels on a line being processed at the same time. That seems insufficent.

    But then again, in most cases, you won't use 6 textures.

    So, well, 30 pixels on a line when using 2 textures seems sufficent. It should give sufficent efficiency.
    And in the case of not using 32-bit textures but something like low quality DXTC using 8 bit, it would be 120 pixels. That's nearly too much! :) Many games use 3 or 4 textures, so using DXTC and that, it should be "okay".

    Something I wonder, too, is if the hardware can automatically determine how much pixels on a line are processed to get maximum efficiency based on texture cache size. That would be a lot more important. And in the case it can't, which is actually quite likely, it might real bad ( near zero ) efficiency when using 6 textures...

    But even forgetting that problem, it would be "okay" - not much better.
    If I didn't do any of my calculations wrong, 4KB for the four pipes might very well be fine.
    But then again, more could always give a slight boost to performance. The real question is wether that boost is sufficent to justify the transistor count increase :)

    Another interesting factor is the decreasing size of triangles. I don't think texture cache efficiency is good ( if not automatically nil ) when keeping texture info from another triangle. So, with the decreasing size of polygons, could something like 20 pixels/line be sufficent in most situations?


    Uttar

    EDIT: Sounds like you are right: the GFFX *does* store all of its instructions in Video Memory. Sounds like that's a good reason for NV31 & NV34 to support 1024 instructions too.
    This would indeed unable Dynamic Branching to work effictively in the PS, I guess. But could Static Branching still work well in the PS using that? I'd guess it could, but I might be wrong.
    But GFFX temp registers are still stored in cache. As are several others things used in shaders. And those things are more expensive than on the R300, because they're FP32 ( yes, although FP32 performance is bad and nVidia is trying to make DX9 drivers use FP16 everywhere, it sounds like they made everything with FP32 in mind - performance probably isn't on par with their expectations... )

    EDIT 2: After rethinking about it, I just don't understand how putting all of that in video memory makes sense...
    Let's imagine each instruction is 45 bits, just like in the case of the VS according to the B3D article. Or rather, let's imagine it is 40 bits, just to be conservative.
    Imagine an average of 20 instructions/pixel, and 1600x1200. All that at 60FPS.
    That's 12GB/s...

    Now, I just don't quite understand how that makes sense. There gotta be a misunderstanding somewhere. Unless nVidia found a way to defy mathematics, too! :D Woah, that's gotta need serious driver tuning.
     
  16. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    Re: Instructions

    of course, but perhaps Mr Carmack will have it sooner rather than later ;)
     
  17. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Re: Instructions

    As the resident texture compression advocate, let me remind everyone that DXTC is not 'low quality'.

    In the vast majority of cases DXTC is visually indistinguishable from 32-bit textures, and in most cases the ability to have more textures is far better for image quality than any percieved degredation from conversion to DXTC - as long as the DXTC is applied intelligently, to the right textures (about 80-90% of them will be highly compressible).

    Put it this way: if you had the choice of 128M of 32-bit textures, or 128M of 20% 32-bit and 80% DXTC textures, there's no contest as to which would give better image quality.
     
  18. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Re: Instructions

    Okay, okay... Let me rephrase that.
    "And in the case of not using 32-bit textures but something like low per-pixel quality DXTC"
    IMO, DXTC got bad per-pixel quality. But where it shines, it's that it enables you to use bigger textures.
    And we're talking pixels here, you know :)


    Uttar
     
  19. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    :) I still disagree that DXTC is in any way 'low quality'.

    When I was first plugging S3TC our open challenge was for anyone to bring an image in, we'd compress it and then play spot the difference, with a pint bet that they couldn't. I won a lot of beer from that.

    Even won the 'Tank Girl on a Mandelbrot background' that I was really quite worried about when I first saw the image.

    Far too many people have only seen the results from low-quality DXTC compressors. The compressor is key. If you've got a good one, then once you've applied trilinear filtering you'll never spot the difference except on pathological cases (like the sky in Quake3).
     
  20. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Interpolating in 24-bit color is also very important... ;)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...