Xenos Chip Package

Discussion in 'Beyond3D News' started by Dave Baumann, Jun 16, 2005.

  1. Karma Police

    Regular

    Joined:
    Sep 2, 2004
    Messages:
    433
    Likes Received:
    6
    Location:
    192.168.2.1
    lolol :lol:

    Besides, my comp is my "baby", and I put as much mulla into my baby as I do my gf!

    ergo--->female
     
  2. Dave Glue

    Regular

    Joined:
    Apr 25, 2002
    Messages:
    634
    Likes Received:
    25
    So it's a true 256GB/sec connection now? I thought I just read in the monster thread that was more of a "calculated" bandwidth based on what you'd normally expect 4XAA to take?
     
  3. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    32GB/sec true bandwidth BETWEEN the two chips. 256GB/sec true bandwidth INSIDE the eDram chip for AA amongst other things.
     
  4. DeathKnight

    Regular

    Joined:
    Jun 19, 2002
    Messages:
    744
    Likes Received:
    4
    Location:
    Cincinnati, OH
    Yes, this is the true bandwidth inside the daughter core between the logic and the dram :)
     
  5. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    remember the pentium pro?
    It had two dies on the same package, one for CPU and one for L2. It was said to be very expensive because Intel couldn't test the dies separately, had to assemble them on the package first, thus multiplying the L2 die's yield with the CPU die's yield. (or connecting the two dies was problematic)
    The insane, ultra expensive and rare pentium pro 1MB L2 even had two dies for L2 cache !

    I assume they can test the two dies separately on Xenos?
     
  6. davepermen

    Regular

    Joined:
    Aug 27, 2003
    Messages:
    422
    Likes Received:
    2
    Location:
    Switzerland
    so the new p4 boards are father-boards.. ?
     
  7. Hanners

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    816
    Likes Received:
    57
    Location:
    England
    Nah, they're more kinda lesbian boards. Or bisexual boards maybe...
     
  8. Dave Glue

    Regular

    Joined:
    Apr 25, 2002
    Messages:
    634
    Likes Received:
    25
    Ah, gotcha. Thanks.
     
  9. Farid

    Farid Artist formely known as Vysez
    Veteran Subscriber

    Joined:
    Mar 22, 2004
    Messages:
    3,844
    Likes Received:
    108
    Location:
    Paris, France
    Thanks for the clarification Dave.
     
  10. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    The official answer is: 232M for parent, 105M for daughter.
     
  11. tEd

    tEd Casual Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,095
    Likes Received:
    62
    Location:
    switzerland
    Thanks for the clarification.
     
  12. Wunderchu

    Regular

    Joined:
    Nov 6, 2003
    Messages:
    873
    Likes Received:
    3
    Location:
    Burnaby, B.C., Canada
    yeah, thanks :)
     
  13. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    8,478
    Likes Received:
    592
    Location:
    WI, USA
    So..........who hacked off the top of that Pentium Pro? :)
     
  14. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    so Xenos has something like 250+ Mtransistors devoted to logic..
    It doesn't seem the unified shading approach consumes much more transistors than the standard approach since RSX should have a small edge in shading power and eight more ROPs with 300 Mtransistor/s.
     
  15. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    How are you calculating that?
     
  16. Mulciber

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    413
    Likes Received:
    0
    Location:
    Houston
    well the RSX should have 48 alus in addition to its vertex pipes. plus it will be clocked higher.
     
  17. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    48 alu's that are each capable of what and do any participate in texturing duties? In other words, the RSX is capable of how many mad + tex per clock?
     
  18. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    I have reasons to believe RSX pixel pipelines will be just an improved version of NV40 pixel pipelines.
    Excluding special function units a current NV40 pixel pipe can do 12 flops per cycle (1 fmadd4 + 1 mul4) and I think a RSX pixel pipe will be able to perform 16 flops per clock as nvidia should have extended the second ALU to handle fmadds too (not just muls anymore)
    It's not clear how we should count SFUs flops but I arbitrary say SFUs give us other 4 flops per cycle ;) (and I think I'm being conservative here..)
    So we have 24 pixel pipelines * 20 flops = 480 flops per cycle, the same amount of programmable flops Xenos can handle.
    RSX has also 8 vertex shaders AFAIK -> 10*8 = 80 more flops per cycle
    Xenos has 16 simple TMUs (point filtering) + 16 complex TMUs (bilinear filtering), RSX on the contrary has 8 simple TMUs + 24 complex TMUs, AFAIk.
    RSX has 16 ROPs and Xenos has 8 ROPs, but Xenos with its edram is likely to have a real world higher fill rate than RSX with MSAA on.
    At the end of the day we have 2 roughly comparable parts that expose a quite similar number of programmable ops per cycle, but even if RSX sports some bigger number Xenos could be way faster than RSX IF it can be much more efficient than a standard GPU as RSX should be (as I already stated multiple times I know current CPU are very inefficient at keeping all their programmable units feed and running most of the time)

    ciao,
    Marco
     
  19. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    926
    Likes Received:
    39
    Location:
    On my rock
    I mostly agree with your estimation of the pipeline arrangement. However, if the above it true, it will not be capable of 24 tex + 48 mad4 + 24 sfu as you suggest. More like 0 tex + 48 mad4 + 0 sfu, as use of the tex or sfu units would proclude a mad4 op.

    Does anyone know if the Xenos is capable of performing FP blending within the texture address processor, or like ATI's PC parts, are they using pixel shaders for that? I'm inclined to thing that it can't because it probably would have been mentioned. But, I can't seem to get an answer on this.
     
  20. The GameMaster

    Newcomer

    Joined:
    Feb 9, 2005
    Messages:
    109
    Likes Received:
    1
    About the ALU configuration in the NV4x and NV5x architectures... the NV5x GPUs are almost identical to the NV4x GPUs, with the exception being more pipelines... and a few minor improvements to its "UltraShadow" and "Intelisample" technologies. Anyway... on the NV4x/5x architectures each pixel shader pipeline actually consists of 2 4-way ALUs, however if you need to do a texture lookup then you lose half of those ALUs. So basically each pixel shader pipeline on the Geforce 6800GT can process 2 shader operations (Vec4) per cycle -OR- 1 shader operation (Vec4) and one texture lookup per cycle. So in total you have anywhere from 16-32 shader operations per cycle for all 16 pixel shader pipelines depending on the number of texture lookups you do... so say roughly 6.4 to 12.8 billion pixel shader operations per second assuming maximum efficiency. Or roughly... 16tex + 16mad4 -OR- 0tex + 32mad4 operations per cycle with the pixel shader pipelines. Vertex pipelines each have 1 ALU so the Geforce 6800 Ultra has 6 more ALUs for this... no special math needed here. On the Geforce 7800GTX (and the RSX if it is based on this) that would be 24 pixel pipelines and 8 vertex pipelines. Again it will likely have the same setup... so each shader pipeline can process 2 shader operations (Vec4) per cycle -OR- 1 shader operation (Vec4) and 1 texture lookup per cycle. So in total you would have anywhere from 24-48 shader operations per cycle for all 24 pixel shader pipelines depending on the number of texture lookups you do... so say roughly 10.3 to 20.6 billion shader operations per second assuming maximum efficiency. Or roughly 24tex + 24mad4 -OR- 0tex + 48mad4 operations per cycle with the pixel shader pipelines.

    As for XENOS... if I am not mistaken each pipeline consists of a single 16-way ALU that can process 2 shader operations (Vec4) AND 2 scalar operations per cycle. These pipelines can be used for either pixel shader or vertex programs and is said to have a very high rate of efficiency due to multithreading and interleaving. Texture lookups are done OUTSIDE of the pipelines so it does not consume ALUs on the shader pipelines and it has been stated there are 16 texture pipelines that can process 16 UNFILTERED and 16 FILTERED texture samples per cycle, or a total of 32 texture samples per cycle combined. Anyway... 48 unified pipelines provides you a total of a maximum of 96 shader operations per cycle, but can be anywhere from 0-96 shader operations per cycle (though ATI did say that if programmed for correctly it could be double this), or roughly 0-48 billion shader operations per second. Or roughly 32tex+96mad4+96scalar operations per cycle with the unified pipelines if I am interpreting this correctly. :shock:

    In regards to your question... here is a quote from Dave's XENOS article that may answer that.

    Hope that answers your question... as while I do have a good idea of this, I am not exactly sure yet.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...