modern GPUs and on-chip caches ?

Discussion in 'Architecture and Products' started by chavvdarrr, May 7, 2003.

  1. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    Assuming single-port SRAM (6 transistors per bit; add 2 extra transistors per bit for dual-port SRAMs) and about 1 transistor overhead per bit, 25 million transistors would be 25Mt/ (7*8 )t/byte = about 436 KiB.

    As for things other than texture maps to cache: what about -- framebuffer? Z/Stencil buffers? Vertex arrays? Also, with pixel shaders, you get a large number of pixels in fligt * a large number of registers per pixel = a fairly large amount of SRAM as well.
     
  2. ram

    ram
    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    218
    Likes Received:
    0
    Location:
    Switzerland
    Could be. Anyone has some facts about tranistor density of SRAM cells vs logic gates?
     
  3. Hyp-X

    Hyp-X Irregular
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,170
    Likes Received:
    5
    Indexed primitives as a vertex buffer with vertices, and an index buffer containing indices referencing those vertices.
    During drawing a vertex can be referenced multiple times (expect it around 6 times), so it makes sense to keep a cache for them.
    This is especially important when the vertex buffer is in AGP memory which is quite slow.
    Hence the pre-transformed vertex cache.

    Because the vertices are transformed on demand those 6 occurences of the vertex can mean that a vertex can be transformed multiply times just to reach the same conclusion.
    Hence the post-transformed vertex cache.
     
  4. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Color and depth caches are used to allow memory prefetching. This is important because the latency to local memory is a number of clocks. 25, 50, 100, etc, it depends on the implementation.

    The values that are needed are requested and space is allocated in the cache. In the mean time data in the pipeline is stored in fifos. By the time the pipeline data exits the fifos the requested data from memory is waiting in the cache.

    This is why memory latency is relatively unimportant for graphics chips. Within reason of course because fifos do cost gates.
     
  5. asicnewbie

    Newcomer

    Joined:
    Jun 29, 2002
    Messages:
    116
    Likes Received:
    3
    But the funny thing is, I bet it's like an 'open secret.' Embedded RAMs, whether they be SRAM, 1T-RAM, eDRAM, flash, present *unmistakeable* visual signatures on die-photographs. Even an untrained observer (like myself) could easily identify a RAM-block (larger than 1024 bytes) on a blow-up die photograph, thanks to their rectangular structure and regular (repetitive) structure. Once identified, it's a simple matter of taking a ruler or other straight-edge, tabulating size of all such arrays. From here, one could deduce a very good estimate of the RAM's transistor count based on known public info, like the process lithography size and RAM-technology (SRAM, 1T-SRAM, eDRAM.) So basically, companies who include die-photographs in their PR-kit, are tacitly giving away this 'secret.'

    The core (digital-logic) datapath will NOT have the same kind of regular structure. And analog/mixed-signal circuits (PLLs, RAMDACs) have very large feature sizes that easily rule them out as anything except analog blocks.

    If NVidia's using the standard-cell Artisan library (given to TSMC customers free of charge), the Artisan library includes an SRAM memory-macro compiler. The macro-compiler auto-generates EDA/tool-views for layout and routing. All the info needed to answer your question is in the design kit, but unfortunately the customer must sign an NDA to acquire the design kit (even though it doesn't cost anything.) The memory-compiler for 0.18u (which I've used) can generate a variety of SRAM configs - single-port (R/W), two-port (R + R/W), and true dual-port (R/W + R/W.)

    Asking about logic gate size is like asking about the length of an x86-instruction ... depends on the gate's function (OR, NOT, AND, etc.) and its drive-strength (X1, X2, X4, X8, etc.) And finally, standard-cell gates are laid out on a 'grid', so the gate's active area can be somewhat smaller than the grid's granularity. ATI has already said they use custom-design techniques in their digital-logic synthesis, meaning speed (or area) critical blocks are hand-designed (allowing the designer to defeat the grid restrictions.)
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    There are companies who sell reverse engineered info of "popular" ICs ... I am pretty sure I have seen such offered for NVIDIA chips, with some global info for free as a teaser, although I dont remember where.

    Anyway, competitors know as much about the chip as they are willing to pay for.
     
  7. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    It only takes a few hundred dollars (plus a chip) to take it to your local FA test lab and have it decapped. Then, as asicnewbie says, its all a matter of measuring the thing. With the new bare die flip chip packaging, you might not even have to decapp it (since its not embedded in epoxy). Just pry the thing loose from the substrate/package and you can see it all.
     
  8. shaderman

    Newcomer

    Joined:
    Jan 3, 2003
    Messages:
    19
    Likes Received:
    0
    add one for SCAN.

    - SM
     
  9. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    Nope. Scan applies to D flip-flops, not SRAM cells - there is a rather big difference between them. Scan costs about 6-8 transistors per bit. For SRAMs, you usually need to add test logic that is rather different from standard scan.
     
  10. UberLord

    Newcomer

    Joined:
    Nov 21, 2002
    Messages:
    54
    Likes Received:
    0
    Thanks for those answers guys - I think I know what you both meant :oops:
     
  11. shaderman

    Newcomer

    Joined:
    Jan 3, 2003
    Messages:
    19
    Likes Received:
    0
    duh :)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...