Unified Shader Architecture: Point sampling in addition to Bilinear Texturing

Discussion in 'Architecture and Products' started by Jawed, Apr 29, 2006.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    In Xenos there's point-sampling functionality (primarily aimed at vertex fetching) in addition to the normal bilinear (or better) texture fetching/filtering.

    Since it's a unified architecture, these point-sampling units are available to any shader concurrently with the bilinear texturing units (erm, I presume they are!). What I'm wondering is, what's the impact going to be on the performance of pixel shading?

    It's my understanding that there's generally a fair amount of point-sampled texturing used in pixel shaders, to perform "look-ups". At the same time (not being a dev) I don't know the degree to which point sampling is typically used.

    So, will the ability of a pixel shader to perform bilinear (or better) texturing concurrently with point sampled texturing make a signicant performance difference :?: If there are 16 TMUs and 16 TPSs available (texture point samplers, for want of a better abbreviation - perhaps I should just stick to VTF, vertex texture fetch; even if it's a bit confusing because of the context), will this make a significant difference to the performance of pixel shading?

    One caveat, as I understand it from Xenos, is that the point-sampling units are really meant for 1D access. I don't know why this constraint exists, or whether it's enforced in any way - it might simply be about performance (e.g. half speed address calculation for 2D textures). I don't know whether it's reasonable to expect this constraint to carry over to future USAs, such as R600.

    Jawed
     
  2. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,342
    Likes Received:
    175
    Location:
    On the path to wisdom
    I doubt it's what I would call "a fair amount". Linear interpolation is often better than none at all. The most important exceptions being
    - lookup textures that contain indices
    - shadow mapping on hardware without PCF
    - custom filter kernels (although some can take advantage of bilinear filtering)

    Likely not for the majority of shaders. Probably only in the shadow mapping with custom filter case. Which can be quite important of course.

    If that's true, they are really simple memory load units (with format conversion, and possibly wrap). Far cheaper than a full TMU, but also far less useful.
     
    Jawed likes this.
  3. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Shadow mapping is definately something I was thinking about when reading about this feature in the B3D Xenos article. Imagine 8-16 jittered samples of a high precision format, especially if selectively sampled near edges via shadowmap post processing and dynamic branching during scene rendering. You could get very nice shadows, and your filtered texture units would be free for other uses.

    However, I heard that the point samplers have a different cache structure. So Jawed, you may be right about the performance implications.

    While on this topic, does anyone know Xenos' filtering abilities? FP10? FP16? I16?

    EDIT: God, I'm sorry Jawed. I knew it was you, just wrote it wrong.
     
    #3 Mintmaster, Apr 29, 2006
    Last edited by a moderator: Apr 29, 2006
  4. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Pretty sure its capable of fp 10 and fp 16, not sure about i16.

    edit: Yeah its capble of all 3 ;)
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    Hey! it's Jawed, not Jaws!

    Jawed
     
    Mintmaster likes this.
  6. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Dude, Xenos supports 32-bit fixed point filtering? Judging how FP16 is converted to 16.16 and then filtered, it certainly seems like it. That's awesome.

    From the sounds of it, they have special 8-bit filtering units that are capable of chaining together. The filtering weights are probably only 8-bits at most, so TMU's will need to multiply an 8-bit number by a 32-bit number, which I think can be done with four 8-bit MADD operations. Adding the four weighted samples together can also be pipelined this way if your adders have the necessary carry ports.

    Why didn't they put this in the PC cards? Man, I would love to get my hands on a XB360 for coding. To bad the dev kits cost an arm and a leg.

    BTW, apologies again for the name mixup.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    How much of this stuff is coming to D3D10? Perhaps you'll get a chance to play soon.

    Jawed
     
  8. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I'm not too familiar with DX10 (I think JHoxley is the resident expert on that), but I'm hoping a good chunk of these features will be at least in ATI's DX10 parts. They will be leveraging Xenos tech, after all.

    I'm planning on getting a X1600 Pro to tide me over until then. Won't be much of a performance boost, if at all, over my 9800 Pro, but in terms of coding there's a lot of stuff I can do in the meantime.
     
  9. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    true pretty much is dx10 :wink:
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    Some more stuff, which seems to be along precisely the lines you were talking about:

    [​IMG]



    [​IMG]


    [​IMG]


    Jawed
     
  11. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    I haven't read up much on Xenos. But from the text above, I don't see anything saying that bilinear and point sampling units are available in paralell. I'd say the most likely way is that it's the same unit, but that it runs faster with point sampling. (Because it removes some arithmetic and internal bandwidth bottlenecks.)
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    I think this is fairly explicit:

    I've emboldened the relevant bits.

    Jawed
     
    Guden Oden likes this.
  13. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Actually I was just talking about how the hardware implementation of 32-bit fixed point filtering can be very cheap, provided you only need either a quarter the channels or a quarter the speed of normal filtering. Why didn't they do this for the PC parts? :sad:

    For shadow mapping I was talking about the point samplers, but those slides are referring to the filtered samplers. Weights are nice for fast bilinear PCF, but that's a rather crappy way of doing shadow mapping given recent developments. Does Xenos support Fetch4 as well?

    Getting access to the mip-level is neat, and quite useful. The fractional part there is probably the 3rd weight for blending between mipmaps that's used in trilinear filtering. I don't know how to calculate this in a pixel shader. Anyone know how?

    In the last slide, it's rather odd they chose to divide by invTexSize rather than just multiply by TexSize. Maybe to save constants? Bah, I'm nitpicking.
     
    #13 Mintmaster, Apr 29, 2006
    Last edited by a moderator: Apr 29, 2006
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    It seemed to me that the second slide uses point sampling on the tfetch2D instruction :???: If it does, I don't know which texture unit (TMU or VTF) would execute it, though...

    As for Fetch4:

    which seems to imply no, it's completely manual.

    Jawed
     
  15. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Interesting. So it looks like instead of hardwiring the way an index buffer is used to retrieve vertex data, Xenos exposes it in the shader. Cool.

    Chances are it's just as you speculated: A 1D array lookup. I still think it could be used for fetching shadow map samples if you were clever, leaving the filtered samplers free for everything else. Not sure how cache friendly it would be, but it seems reasonable at first glance.

    Oh well. Like I said before, PCF doesn't look that good anyway. I don't know of any other major uses.
     
  16. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    Ar you sure that they shouldn't be matched as:
    Ie: that those bolded lines are kind of the equivalent for VS and PS.
    And that both happen outside the unified part of the shader, aren't controlled from inside the shader, and the result of them just appears in the input registers of the respective shader. Thus, vertex textures and non filtered textures run through the 16 bilinear TMUs, but with filtering turned off.

    Another argument is that the lines explicitly talks about "vertex data" and "pixel shader", which doesn't fit in the unified shader architecture. So it's likely it describes something outside it.
     
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    One more question I have about Xenos is regarding multisampling. Can you control the sampling positions? And can you get access to the unresolved MSAA buffer? If you could revert to a square grid, you'd get pseudo-high-res rendering for free. Great for shadow maps.

    I'm guessing no because there would have to be some synchronization with the eDRAM for it to do the Z interpolation. Not a deal breaker, but the eDRAM logic is pretty basic.
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,493
    Likes Received:
    1,853
    Location:
    London
    Note the separate caches and that the texture pipes (TP) only process normal textures:

    [​IMG]

    I'm thinking I shouldn't be calling it vertex texture fetch - argh, that implies something else. I should have just called it vertex fetch. Sorry Basic, I think that's the source of the confusion.

    Jawed
     
    #18 Jawed, Apr 30, 2006
    Last edited by a moderator: Apr 30, 2006
  19. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I don't think so, because the second line you put in bold simply refers to the iterators. They just interpolate between already calculated values.

    What do you think of my theory above?



    Well, this is what's in the B3D article:
     
  20. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Yes, as the article states, they were described to me as independant units, both available simultaneously, if needed (its perfectly feasible to doing vertex texture lookups on one shader array while filtered texturing for PS in another). Its described as "Vertex Fetch Units" for want of a better description really; I think this is for any type of fixed point float texture sampling really.

    WRT to the point on shadowmap sampling, though, isn't this done is a separate pass? [Edit: scratch that; generation of will be, sampling of won't].
     
    Geo likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...