ISSCC 2005

Discussion in 'Console Technology' started by ChryZ, Jan 20, 2005.

  1. version

    Regular

    Joined:
    Jul 27, 2004
    Messages:
    452
    Likes Received:
    5
    ok if fully pipelined and work on 6 vertex in same time but code is big

    vload r1,r,r
    vload r2,r,r
    vload r3,r,r
    vload r4,r,r
    vload r5,r,r
    vload r6,r,r
    vmul r,r,r1
    vmadd r,r,r2
    vmadd r,r,r3
    vmadd r,r,r4
    vmadd r,r,r5
    vmadd r,r,r6
    vmul r,r,r1
    vmadd r,r,r2
    vmadd r,r,r3
    vmadd r,r,r4
    vmadd r,r,r5
    vmadd r,r,r6
    vmul r,r,r1
    vmadd r,r,r2
    vmadd r,r,r3
    vmadd r,r,r4
    vmadd r,r,r5
    vmadd r,r,r6
    vmul r,r,r1
    vmadd r,r,r2
    vmadd r,r,r3
    vmadd r,r,r4
    vmadd r,r,r5
    vmadd r,r,r6


    this is matrix-vertex mulytiply on 6 vertexs, it is a HUMOR :D, but no stall
     
  2. MrFloopy

    Regular

    Joined:
    May 8, 2002
    Messages:
    300
    Likes Received:
    11
    Location:
    Adelaide, South Australia
    Yes size / speed is the tradeoff in such a simple case, however for more complicated cases, the size issue minimizes as requirements for loop unrolling to hide data fetches is no longer required.


    Anyway this is getting OT as these are general issues affecting programming on pretty much every processor available in past 10 or so years, and are not unique to cell.
     
  3. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    He is not loop unrolling for data fetches, or at least not anymore than for the latency of every other instruction he used in there.
     
  4. Guden Oden

    Guden Oden Senior Member
    Legend

    Joined:
    Dec 20, 2003
    Messages:
    6,201
    Likes Received:
    91
    How would YOU know how much extra hardware might be needed to run both int and float ops simultaneously? :p The SPU is already 21 million trannies, that's what, 2/3 the size of the original AMD Athlon? Besides, local storage only delivers 128 bits of data per cycle anyway, if you don't have all your data in registers already chances are very good you're not going to see any speedup by simultaneous execution anyway. That's likely part of the reason why STI didn't make the bugger do simultaneous execution in the first place.

    You complain way too much. If you're going to keep whining like that, better stay away from the gaming business altogether... Go work for the Microsoft Office team instead. There, nobody is going to ask you to write high-performing tight code! :D
     
  5. version

    Regular

    Joined:
    Jul 27, 2004
    Messages:
    452
    Likes Received:
    5
    if add and mul are 6-7 cycle , divide is 30 i mean
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    3dnow does it in 2 iterations, that is a bit of an overestimation IMO.
     
  7. version

    Regular

    Joined:
    Jul 27, 2004
    Messages:
    452
    Likes Received:
    5
    yes 2 cycle imprecise result, and more cycle with iterations(Newton-Raphson method)
     
  8. pc999

    Veteran

    Joined:
    Mar 13, 2004
    Messages:
    3,628
    Likes Received:
    31
    Location:
    Portugal
    In that case is better you ask about performace by die size, they usually make really mosters in those market
     
  9. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Dunno if it is true, but a guy called KelleyCook on Ars Forums wrote that:
    EDIT: I googled around and it seems e500 doesn't support 4-sp vectors :?

    ciao,
    Marco
     
  10. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Actually I was mistaken, only 1 lookup and 1 iteration needed for single precision result with 3DNow! ... the iteration is just broken up in 2 steps because of the lack of FMA.

    Anyway as I said, 30 cycles is an overestimation.
     
  11. archie4oz

    archie4oz ea_spouse is H4WT!
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,608
    Likes Received:
    30
    Location:
    53:4F:4E:59
    The e500 doesn't support *any* vectors at all without SPE... However I think this person is confusing Motorola's SPE (their 2nd SIMD arch) for the SPE units in Cell...
     
  12. DudeMiester

    Regular

    Joined:
    Aug 10, 2004
    Messages:
    636
    Likes Received:
    10
    Location:
    San Francisco, CA
    This thread was an interesting read, Cell looks very nice, crosses fingers for high quality real time raytracing!
     
  13. AutomatedMech

    Newcomer

    Joined:
    Feb 7, 2005
    Messages:
    10
    Likes Received:
    0
    Edited by moderator




    Stop trolling the boards deadmeat. You have been banned in the past. You are not allowed to post here.
     
  14. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Ahh, ANALysts. So called because they have their head up.....

    10x? bah, I can get coprocessors today that do 100 Gflops+ with all the same restrictions and programming headaches.

    Aaron Spink
    speaking for myself inc.
     
  15. Deadmeat4

    Newcomer

    Joined:
    May 4, 2004
    Messages:
    27
    Likes Received:
    0
    ...

    Edited by moderator



    Strike 2
     
  16. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    Something that i only asked about and never got a solid explaination, since when was is known that Cell does not have any eDRAM ?

    what happened to 16-64 MB of eDRAM that was supposed to be one of the major advantages of Cell ?
     
  17. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    My guess would be they figured it was too difficult to manufacture the eDRAM at 65nm so they increased the LS and cache instead. Also the eDRAM was probably too slow? If you recall the eDRAM in EE+GS@90nm wasn't using 90nm. It was using 130nm IIRC.
     
  18. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242

    ok I suppose that all makes sense. well, Sony had better compensate for the lack of eDRAM with lots more external main memory.
    <that's the ram hugger in me talking> :lol:
     
  19. archie4oz

    archie4oz ea_spouse is H4WT!
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,608
    Likes Received:
    30
    Location:
    53:4F:4E:59
    eDRAM would be *much* easier than logic to deal with...

    Plus I dunno about the slow part... While eDRAM has a longer latency than SRAMs do, the much higher density offered by eDRAM mean less wire-delay than you get with SRAMs which can almost offset the latency penalty suffered by eDRAMs...
     
  20. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    Like I said before, the patent doesn't mentioned eDRAM. It was because of the 1024 bit bus that people assumed it was eDRAM.

    Though with just a single Cell, do you think 25GB/s of memory bandwidth is sufficient for PS3 to feed Cell and NV GPU without eDRAM somewhere in the system ?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...