Interesting question about the L1 texture cache size of G200 used in David's review

Discussion in 'Architecture and Products' started by littlehead, Sep 16, 2008.

  1. littlehead

    Newcomer

    Joined:
    Jun 24, 2008
    Messages:
    4
    Likes Received:
    0
    David mentioned the L1 texture cache size is 8K for one TCP. so the total L1 on the chip should be 8K * 8 on G80 or 8K * 10 on G200.
    Ok. If the L1 is private, usually be true on CPU side. So the max L1 size which could be accessed by single pixel shading thread is 8K.
    So I use the pointer chasing approach to verify this. the data looks quite interesting. and the point has big L1 miss latency is appeared on 20Kb.
    That thread will loop on an given working set and will create significant latency if the workset larger than the cache.
    The data:
    workset latency
    8192Byte 0.174301
    12288Byte 0.180389
    16384Byte 0.187164
    20480Byte 0.289429
    24576Byte 58.757538
    28672Byte 58.551575
    32768Byte 59.707016
    36864Byte 59.930389


    here's the shader code,
    ps.3.0;
    dcl_texcoord0 v0.xy;
    dcl_2d s0;
    dcl_2d s1;
    dcl_2d s2;
    dcl_2d s3;
    mov r1.xy, v0.xy;
    rep i0;
    rep i1;
    rep i2;
    texld r1.rgba , r1, s0;
    texld r1.rgba , r1, s1;
    texld r1.rgba , r1, s2;
    texld r1.rgba , r1, s3;
    endrep;
    endrep;
    endrep;
    nrm r2, r1;
    mov oC0.rgba ,r2 ;
     
  2. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    I think dkanter says it's 8KB per SM, or 24KB per TCP, which is correct as i understand it.
     
  3. littlehead

    Newcomer

    Joined:
    Jun 24, 2008
    Messages:
    4
    Likes Received:
    0
    It's TPC

    Based on the the diagram draw by David,
    The 8K L1 texture cache is exactly below the SM controller, the SM controller is accept request from SM[0] SM[1] and SM[2]. So the 8K is shared by 3 SM and won't be 24K.
    I raise the question because 8K for each cluster is reasonable. But the data show different indicatation. So this is why I'm confusing. :sad:
    Or David draw the diagram wrong?
     
  4. Farhan

    Newcomer

    Joined:
    May 19, 2005
    Messages:
    152
    Likes Received:
    13
    Location:
    in the shade
    Have you looked at the updated diagram?
     
  5. dkanter

    Regular

    Joined:
    Jan 19, 2008
    Messages:
    360
    Likes Received:
    20
    To clarify - my first diagram was wrong and had each L1 Texture cache as 8KB shared by 2 or 3 SMs.

    That has now been corrected. Perhaps I should keep a change log in my articles, but I don't (yet).

    The L1 tex caches are located in the tex pipeline in each TPC. However, 8KB is allocated to each SM (statically). So G80 has 16KB, GT200 has 24KB.

    David
     
  6. littlehead

    Newcomer

    Joined:
    Jun 24, 2008
    Messages:
    4
    Likes Received:
    0
    Why 8K per SM

    Why 8K per SM, cut them on low end GPU of the same series?
    Based on my measurement, it seems G80 and G200 both use 20K per TPC. There's no L2 cache on low end card like 8500GT.
    Fix me if i'm wrong.
     
  7. littlehead

    Newcomer

    Joined:
    Jun 24, 2008
    Messages:
    4
    Likes Received:
    0
    And L2 cache for G80 I measured is 192KB.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...