A primer on the X360 shader ALU's as I understand it

Discussion in 'Console Technology' started by superguy, Mar 21, 2006.

  1. Zeross

    Regular

    Joined:
    Jun 3, 2002
    Messages:
    280
    Likes Received:
    11
    Location:
    France
    It's a way of organizing your data in memory before you send it to the SIMD unit. Look at this example to better understand :

    Code:
    //AOS style
    
    struct coordinate
    {
         float x;
         float y;
         float z;
         float w;
    }
    coordinate vertices[NUMBER_OF_VERTICES];
    Code:
    //SOA style
    
    struct coordinate
    {
        float x[NUMBER_OF_VERTICES];
        float y[NUMBER_OF_VERTICES];
        float z[NUMBER_OF_VERTICES];
        float w[NUMBER_OF_VERTICES];
    }
    
    coordinate vertices;
    As you can see AOS is more intuitive but SOA is typically more efficient because it involves only vertical ops and not horizontal ops.

    More likely the tesselation engine inside Xenos, but memexport can also be helpful for other things.
     
  2. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    thank you ;)
     
  3. Asher

    Regular

    Joined:
    Jul 1, 2005
    Messages:
    972
    Likes Received:
    8
    Location:
    Calgary, Alberta
    SOA = Struct of Arrays
    AOS = Array of Structs
     
  4. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,984
    Likes Received:
    846
    Location:
    Planet Earth.
    Debating an off-topic subject is not accepted, proving it wrong in two lines is fine, but best is to STAY ON TOPIC. (or very close to it.)
    This thread is good, I intend it to stay at this level of quality.

    [edit]
    Looks like I skipped two nasty posts in my first clean-up, sorry for that people ^^
    (May not have gone bad again otherwise <wishfull thinking>)
     
  5. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Can someone can make an approximation on what order of magnitude (performance wize) can we expect between something 4 VMX units (2 cores on this job one busy this other kind of code) and the current number of spe used for this job.
     
    #85 liolio, Mar 23, 2006
    Last edited by a moderator: Mar 23, 2006
  6. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    There are 3 VMX unit on xbox360, not 6. btw, what's the current number of SPEs used for this (?) job? You can use all the SPEs you want to as you desire, whether it makes sense is another matter
     
  7. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Don't worry, you're absolutely right.

    I've never given the impression that I am, have I? My reliance on calculations and comparisons should tell you I'm making educated guesses.

    I've worked at ATI a while ago, and later with a games startup (PC only at the time), but it was tragically cut short due to various factors. Very smart people doing very good work, and when the CEO was forced to work for an already established dev, he saw their advanced work mirror our 3D engine. Sigh...

    Now I'm in a very different field, but still love 3D graphics.
     
  8. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Indeed. You need a pretty thorough background in 3D graphics, both in software and hardware, to be able to do any meaningful comparison of GPUs, especially when they're so radically different.
     
  9. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    sorry for my poor english.
    Ok,i'll try to do better.
    For that part of you answer.

    I was speaking of quiet math heavy calculations fafalada and you were speaking sooner.
    My question is more like can this be done using xenon's vwx unit in a nearly as effective manner as it's perform by the spe? (i don't know if this is better lol).

    for this
    I've read the old arstechnica article on xenon. It seems optimized for AOS but can do SOA too, but AOS seems to be a better use of the lots of registers. Which cpu fafalada is speaking about

    For this part.
    I understand that the cell spe can use all its spe for these kind of calculations because the ppe is here,but I guess that there is no need and that the spe do quiet well.
    Anyway i don't want you to lose your time, my math knowledge of matrix is old and poor. A trivial response will be good enought ;)
    Anyway thank for your first responce ;)
     
    #89 liolio, Mar 23, 2006
    Last edited by a moderator: Mar 24, 2006
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Deleting batches is something the CPU should do anyway, so that doesn't seem to be related to the VS to me. Except for rigid objects, all culling must be done post transform, so I don't see how you can save GPU cycles.

    Well, on second though, I guess rigid transforms are pretty common for non-character batches, so you have a point. Could be as simple as a subtract and dot product per triangle. Still, once you take into account all the index list manipulation, compares, etc., it would really surprise me if you could exceed RSX's setup rate without hogging most of the SPUs.

    I think nAo is right on this one. Not many people are going to use Cell for vertex shading or anything that RSX's geometry related silicon can do. For stuff it can't do, like silhouette detection, custom tesselation routines, etc. it makes sense. I fully expect all vertex data to be held in XDR, and Cell would be great at data amplification if the decompression algorithm can't be mapped to the vertex shader's abilities.
     
  11. Fafalada

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,773
    Likes Received:
    49
    I wasn't talking about batches That big, and also thinking more in terms of generated geometry (procedural, HOS or some other form of tesselation), not statically stored stuff. Before this generation started I was already a believer that we should abandon the concept of latter alltogether, it served its purpose and it's time for it to go away(granted PC market will not be ready for such a move for quite a bit longer).

    Why? Unless I am desperately needing to do skinning/morphing on GPU, there's basically no reason NOT to cull on SPE side if I find it a performance benefit.

    Meh, the new console GPUs have a relatively low setup rate (compared to last 10 years of consoles) so exceeding it is far less of a challenge then you might think.

    He is quite right - but his reasons for saying that are different then what is discussed in this thread. But that aside - yea, it's obvious that doing just the basic VS processing on them is a waste of SPEs - when they are capable of so much more.
     
  12. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    Never a truer word spoken... CPU work is best spent on methods to reduce geometry pre-vertex shader in the pipeline, things like packet level triangle culling, HOS etc.
     
  13. Gholbine

    Regular

    Joined:
    Jun 19, 2005
    Messages:
    294
    Likes Received:
    1
    So can we safely assume that at least 1 of the SPEs will be dedicated to vertex processing to alleviate a potentially vertex-bound GPU?

    Does anyone think that the pipeline setup for the RSX has been altered for improved pixel shading power, while sacrificing vertex shading power, because of the assistance that the Cell can provide?
     
  14. Hardknock

    Veteran

    Joined:
    Jul 11, 2005
    Messages:
    2,203
    Likes Received:
    53
    I don't think you can reasonably make that statement. Because the things you just listed are already happening on the Xbox 360. I remember reading that devs are using the Xcpu for vertex and leaving Xenos to solely handle pixels. Not sure if this was due to the rushed launch software or what. But similar means to an end can be used on the 360 as they can on PS3.
     
  15. Mefisutoferesu

    Regular

    Joined:
    Jun 16, 2005
    Messages:
    717
    Likes Received:
    4
    I was under the impression that this was never confirmed. Just nVidia PR talk to play down that the US design in Xenos wasn't all that hot due to load blanacing issues, or something along that line. Anyway, I think it was just PR talk.
     
  16. AlBran

    AlBran Ferro-Fibrous
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,671
    Likes Received:
    5,761
    Location:
    ಠ_ಠ
    maybe the lines between classifying a shader as pixel or vertex are blurry :?:

    or maybe there just aren't that many vertex shader programs with these launch games and this is exactly why ATI are so eager to move to a USA.

    ...brain...deeroiraitng 7 hours staring computer scnreen
     
  17. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Given that it's been available in OpenGL on NVidia hardware for awhile now (EXT_pixel_buffer_object) and accelerated by hardware (not a software driver copy), I don't see why not. NVidia claims that the extension allows render to vertex array to run in hardware (as opposed to past hacks which resulted in copies)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...