Anand talk R580

Discussion in 'Architecture and Products' started by Unknown Soldier, Oct 20, 2005.

  1. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,470
    Likes Received:
    1,686
    Location:
    Treading Water
    So buy it from the ones that do?

    You are never going to trip over boxes of these high end products while walking into your local computer shop. I can't even purchase a 7800GTX256 anywhere locally and its been out for 5 months.
     
  2. dizietsma

    Banned

    Joined:
    Mar 1, 2004
    Messages:
    1,172
    Likes Received:
    13
    Ati are still slightly lagging in Europe

    Gainward GeForce 7800GTX 512MB GDDR3, PCI-Express,"U/3550PCX XP 512MB", 550Mhz
    VÃ¥rt varenr.: 315016 / 471846200-7548
    Tilgjengelighet: Ikke på lager. Ubekreftet 100 stk 2005-11-22


    whereas XT are typically

    Retail
    VÃ¥rt varenr.: 314207 / 4710810936876
    Tilgjengelighet: Ikke på lager. Ubekreftet 289 stk 2005-12-01

    Still almost 300 peices is pretty good going.

    I think the PE might be 675/850 as a good first guess.
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Yep, I like those numbers. 2006 is going to be extremely interesting. :smile:
     
  4. Martin Eddy

    Regular

    Joined:
    Oct 5, 2003
    Messages:
    491
    Likes Received:
    4
    Location:
    Australia,Brisbane
    How long until we see memory speeds in excess of 1000mhz (2000mhz effective)?

    Or will they simply go to 512 bit at 500 mhz?
     
  5. _xxx_

    Banned

    Joined:
    Aug 3, 2004
    Messages:
    5,008
    Likes Received:
    86
    Location:
    Stuttgart, Germany
    I'd like to see 512bit @1000 MHz :)

    I think memory will rather go serial, so I don't think we'll ever see anything more than 512 bit with the kind of memory we have now, if at all.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I think GDDR3 is supposed to be good for 1GHz+. All the same, GDDR4 is close (6 months?) and that should start in the region of 1GHz.

    Jawed
     
  7. Putas

    Regular

    Joined:
    Nov 7, 2004
    Messages:
    737
    Likes Received:
    354
    Also expect first 1 GHz GPUs around that time, the race is on.
    512 bit bus... probably never.
     
  8. Khronus

    Newcomer

    Joined:
    Apr 15, 2004
    Messages:
    62
    Likes Received:
    2
    The core package would have to be a monster to have room for a 512bit bus!
     
  9. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    That was a typo. The context of my message is clear that I was arguing about simple pipeline/clock scalability and not architectural changes, just like you are trying to extrapolate the performance of a R580 by quadrupling the RV350.

    GTX256 / 6800 is 430/400Mhz * 24/16 = 61% theoretical increase (not counting slightly more memory), and this matches pretty well. GTX512/6800 = 550/400 * 24/16 = 2.06 which also maps pretty well to the benchmarks. If we carry this further, a hypothetical 32-pipe 90nm core @ 650Mhz would be 650/400 * 32/16 = 3.25 relative to 6800. Since a X1600XT seems on par with a 6800GS, we can do the analysis on that. 650/425 * 32/12 = 4.07 relative to 6800GS. Now, if you take a 12 "pipe" 1600XT and quadruple it, you get a 4.0 ratio.

    Thus, I don't think an R580 and a 32-pipe G7x clocked similarly will be a blowout, especially since it depends on very heavy ALU workload. We have to expect something less than best workloads for the R580 that will put the G7x in a bad light for any kind of smackdown to occur. If R580 is shipped anytime in the next 6 months, it will be dealing with a paultry few games that can really show off its advantages, so NVidia would be justified in shipping a G7x refresh stopgap to fill the spring/summer gap for a winter/fall G80 release.
     
  10. Turtle 1

    Newcomer

    Joined:
    Oct 21, 2005
    Messages:
    77
    Likes Received:
    0
    Location:
    Mapleview MN
    Even though I usually always buy the Top of the line ATI card . Its not speed that I am after . The word graphics does it imply speed or visualization. Ok so we all know it implies Visualizations. Thats why I buy ATi it just looks the best and is always the fastest single card you can buy . 7800GTX 512 came to earily gives ATi chance to Bring out X1800XtPE.
    So at the end of the year ATI will have the fsatest card At the lowest price. Way to go nvididiot.
    I have delivered 3 PC's with X1800xl in them in the last week. All and this is the amazing part. With Intel GPU"s. People like Intel on water.
    Here's what the customer gets for his 8500 pcmark 05 score .
    Intel 650 3.4 GHZ @ 4GHz @DDR2 533@DDR2 640@3-2-2-8 3Dmark05 8265 cost $230

    Intel660 3.6GHz@4.2GHZ@DDR2 533@DDR2@640@3-2-2-8 3Dmark05 8469 Cost $400

    Intel 670 3.8Ghz@4.4Ghz@DDR2 533@DDR2@640@3-2-2-8 3Dmark05 8703 Cost $600

    All these scores were attained with A X1800xl @ 625 core and stock memory all watercooled. These are the scores I got on these PC'S before they were sent to customers.

    Now I really don't see whats the big deal about 7800GTX 512 PC Gettting 3Dmark score of 9400 thats with An AMDFX57 $1000 cpu.
    I will probably never know what an X1800XTpe gets as I will never buy 1 But I would guess in the 13000 points in 3Dmark 05 2 750core x 800memory With an Intel 3.8@4.8GHz

    Why would these people pay $5000. for a Pc with a Cheaper Video card. Because it was a free card. Thats right I gave them the X1800xl's free. Because they bought R600's today. $750. But there not out so I had to give them something . All are very happy.:shock: :shock:
     
  11. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    That's, uh, not true. The 7800 GTX 512 is available now, and is faster than anything ATI can offer. If angle-independent filering is extremely important to you, obviously the R520 is going to be better, but other than that the G70 looks every bit as good as the R5xx. Better in some cases where you can enable supersampling AA (ex. UT2004).
     
  12. Moloch

    Moloch God of Wicked Games
    Veteran

    Joined:
    Jun 20, 2002
    Messages:
    2,981
    Likes Received:
    72
    3dmark:lol:
    Ya that's great and all, but how bout you play some real games at 1600x1200 with 4x fsas and 16AF and tell us that we dont need anything faster than a 1800xl.
     
  13. Turtle 1

    Newcomer

    Joined:
    Oct 21, 2005
    Messages:
    77
    Likes Received:
    0
    Location:
    Mapleview MN
    Oh Its true and the X1800XL'S are out And the X1800Xt are out and the X1800XTPe will arrive before years end . . If you don't think shimmering matters your a fool who is just hanging on to a brand name . If nvidia looked better than ATI I would buy nvidia. The plain truth is ATI looks better.

    Now if some hardware site would O/C a 7800 GTX 512 and the X1800XT to there highest stable O/C I believe ATI would Come out on top. Now lets not flame about this as I am sure that a hardware site will do just that within the next 2 weeks.
    Now lets talk paper launch 7800GTX 512 sold out in 2 days. More available next week maybe. The only thing I know for fact is when they get restocked look for prices @$800 and not the $750 there selling for now.
    Ati was realy late to the table and for all purposes lost this round. But at the end of the year ATI will have the fastest single card available in 2005. same as 2004 ,2003,2002 need I say more . For you nividia fans I sell cring towels for $5. E-mail And I will give you a free Nvidia cring towel.
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    You're not thinking about this from a compiler's viewpoint and the dependency tree. You don't need to have the whole function you're integrating in memory at once. You don't need to have the whole matrix in memory at once. More registers only saves you redundant loads of the smallest matrix in the calculation, leading to a negligible advantage.

    For example, consider the vector output from V1*M1*M2.
    Dimensions:
    V1: 1 x 1,000
    M1: 1,000 x 1,000
    M2: 1,000 x 4

    Algorithm:
    Code:
    1. Clear r2-r6.
    
    2. Load the first 4 elements of V1 into r0.
    
    3. A set of 4 substeps:
     - Load the first 4 elements of column 1 of M1 into r1. r2.x += r0 dot r1. 
     - Load the first 4 elements of column 2 of M1 into r1. r2.y += r0 dot r1. 
     - Load the first 4 elements of column 3 of M1 into r1. r2.z += r0 dot r1. 
     - Load the first 4 elements of column 4 of M1 into r1. r2.w += r0 dot r1.
    
    4. Repeat steps 2-3, using the next 4 elements in step 1 and step 2. 
       Do 249 times until you do all 1,000 elements
    
    5. A set of 4 substeps:
     - Load row 1 of M2 into r0. r3 += r2.x * r0. 
     - Load row 2 of M2 into r0. r4 += r2.y * r0. 
     - Load row 3 of M2 into r0. r5 += r2.z * r0. 
     - Load row 4 of M2 into r0. r6 += r2.w * r0.
    
    6. Repeat steps 2-5, using the next 4 columns in step 3 and the 
       next 4 rows in step 5. Do this 249 times until you do all 
       10,000 columns/rows.
    
    7. The final product is r3+r4+r5+r6.
    Voila! Over a million matrix elements, and it took only 7 registers.

    So there is a bit of redundancy in loading v0. You load V1 a total of 1,000/4 = 250 times. So if you're load limited, computation time scales as 1/(1 + 1/n), where 2+1.25n is the number of registers used. Jumping from n=4 (as above) to n=8 means you get 11% more performance. Meaningless. Furthermore, using more temporary registers will hurt latency hiding. Yes, that was more of an issue with NV30, but don't for a minute think that ATI and NVidia wasted transistors in putting enough FIFOs to keep 32 registers of data in flight with no reduction in latency hiding.


    Believe me, I thought of matrix multiplication too, as that's seemingly the most obvious answer. However, you really don't need many temporaries in matrix multiplication. 2D integration is similar to matrix multiplication.

    In order to need lots of registers, you need a shader where you're doing many things in parallel and sharing data between those many things. Parallelism and interdependency all within a pixel shader? :???: I'm stumped.

    Maybe DemoCoder or Colourless can think of something.
     
  15. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    :lol: :lol: :lol:

    You know, you're not doing ATI a favour making BS claims like that. At the very best you might see 10k. Besides, 3DMark05 isn't really a good gauge of anything.
     
  16. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    I read a paper a long time ago on compiler register allocators that did an extremely large analysis of millions of lines of code and conclusion that there is almost never a need for more than 7 registers.

    Extra registers are useful for time-space tradeoff if you want to eliminate common subexpressions. If you don't use the extra registers, you just have to recompute some values that get written over.

    The best way to think about is to think about a stack machine architecture. If you look at Forth programs for example, the stack for any given method never gets more than a few elements deep, especially methods that leave only one value as a result.

    For shaders, if you say have 10 expressions that reuse 5 normalized results (at the same time), the compiler can either a) stick those 5 normalized results into registers and reuse them in the 10 expressions, or b) it can recompute the subexpressions (normalize) each time they are needed.

    The challenge is to find a balance. For the NV30, the limitation of 2 FP32 registers of 4FP16 registers (with huge penalty for exceeding) was too much, and aggressive non-cse would be needed, but you are then burning extra cycles. But once you get around 8 registers, you won't need many more except in pathological cases.

    The biggest benefit of the extra registers comes from programming simplicity. Assembly language is easier to write, the the register allocator is simpler in the compiler.
     
  17. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    It's not very logical to release R520XT-PE month before R580, but... we have seen many revisions of R520:
    A11 - prototype
    A13 - prototype
    A14 - production version, some X1800XLs
    A15 - production version, X1800XTs and some X1800XLs

    Dave mentioned A23 version, which is based on new silicon revision, but it wasn't used on any board I've seen. So my question is - could the A23 revision be the XT-PE chip? And secondly - how faster could be this revised chip, than A14/15?
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    I loved Forth, back in the early 80s. Damn, that could tempt me back into programming...

    Jawed
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    As far as I can tell XTs that all the reviewers have OC'd to about 650-675 if they were lucky and XLs OC to about 600-625.

    But the real XTs that have been around for a week or so seem to go to 700-725.

    Jawed
     
  20. Turtle 1

    Newcomer

    Joined:
    Oct 21, 2005
    Messages:
    77
    Likes Received:
    0
    Location:
    Mapleview MN
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...