Anand talk R580

AlphaWolf · Nov 16, 2005

_xxx_ said:
Rare exceptions are possible, but still 99% of e-tailers don't have it.

So buy it from the ones that do?

You are never going to trip over boxes of these high end products while walking into your local computer shop. I can't even purchase a 7800GTX256 anywhere locally and its been out for 5 months.

dizietsma · Nov 16, 2005

Ati are still slightly lagging in Europe

Gainward GeForce 7800GTX 512MB GDDR3, PCI-Express,"U/3550PCX XP 512MB", 550Mhz
VÃ¥rt varenr.: 315016 / 471846200-7548
Tilgjengelighet: Ikke pÃ¥ lager. Ubekreftet 100 stk 2005-11-22

whereas XT are typically

Retail
VÃ¥rt varenr.: 314207 / 4710810936876
Tilgjengelighet: Ikke pÃ¥ lager. Ubekreftet 289 stk 2005-12-01

Still almost 300 peices is pretty good going.

I think the PE might be 675/850 as a good first guess.

trinibwoy · Nov 16, 2005

dizietsma said:
I think the PE might be 675/850 as a good first guess.

Yep, I like those numbers. 2006 is going to be extremely interesting. :smile:

Martin Eddy · Nov 16, 2005

How long until we see memory speeds in excess of 1000mhz (2000mhz effective)?

Or will they simply go to 512 bit at 500 mhz?

_xxx_ · Nov 16, 2005

madmartyau said:
How long until we see memory speeds in excess of 1000mhz (2000mhz effective)?

Or will they simply go to 512 bit at 500 mhz?

I'd like to see 512bit @1000 MHz

I think memory will rather go serial, so I don't think we'll ever see anything more than 512 bit with the kind of memory we have now, if at all.

Jawed · Nov 16, 2005

madmartyau said:
How long until we see memory speeds in excess of 1000mhz (2000mhz effective)?

Or will they simply go to 512 bit at 500 mhz?

I think GDDR3 is supposed to be good for 1GHz+. All the same, GDDR4 is close (6 months?) and that should start in the region of 1GHz.

Jawed

Putas · Nov 16, 2005

Also expect first 1 GHz GPUs around that time, the race is on.
512 bit bus... probably never.

Khronus · Nov 16, 2005

Putas said:
Also expect first 1 GHz GPUs around that time, the race is on.
512 bit bus... probably never.

The core package would have to be a monster to have room for a 512bit bus!

DemoCoder · Nov 16, 2005

Mintmaster said:
WTF am I supposed to think? You are clearly saying the 7800GTX 256MB is 100% faster than the 6800U (or GT) in ShaderMark. Go to Dave's 7800GTX 256MB review, and it's 60% faster, not 100%.

That was a typo. The context of my message is clear that I was arguing about simple pipeline/clock scalability and not architectural changes, just like you are trying to extrapolate the performance of a R580 by quadrupling the RV350.

GTX256 / 6800 is 430/400Mhz * 24/16 = 61% theoretical increase (not counting slightly more memory), and this matches pretty well. GTX512/6800 = 550/400 * 24/16 = 2.06 which also maps pretty well to the benchmarks. If we carry this further, a hypothetical 32-pipe 90nm core @ 650Mhz would be 650/400 * 32/16 = 3.25 relative to 6800. Since a X1600XT seems on par with a 6800GS, we can do the analysis on that. 650/425 * 32/12 = 4.07 relative to 6800GS. Now, if you take a 12 "pipe" 1600XT and quadruple it, you get a 4.0 ratio.

Thus, I don't think an R580 and a 32-pipe G7x clocked similarly will be a blowout, especially since it depends on very heavy ALU workload. We have to expect something less than best workloads for the R580 that will put the G7x in a bad light for any kind of smackdown to occur. If R580 is shipped anytime in the next 6 months, it will be dealing with a paultry few games that can really show off its advantages, so NVidia would be justified in shipping a G7x refresh stopgap to fill the spring/summer gap for a winter/fall G80 release.

Turtle 1 · Nov 16, 2005

Even though I usually always buy the Top of the line ATI card . Its not speed that I am after . The word graphics does it imply speed or visualization. Ok so we all know it implies Visualizations. Thats why I buy ATi it just looks the best and is always the fastest single card you can buy . 7800GTX 512 came to earily gives ATi chance to Bring out X1800XtPE.
So at the end of the year ATI will have the fsatest card At the lowest price. Way to go nvididiot.
I have delivered 3 PC's with X1800xl in them in the last week. All and this is the amazing part. With Intel GPU"s. People like Intel on water.
Here's what the customer gets for his 8500 pcmark 05 score .
Intel 650 3.4 GHZ @ 4GHz @DDR2 533@DDR2 640@3-2-2-8 3Dmark05 8265 cost $230

Intel660 3.6GHz@4.2GHZ@DDR2 533@DDR2@640@3-2-2-8 3Dmark05 8469 Cost $400

Intel 670 3.8Ghz@4.4Ghz@DDR2 533@DDR2@640@3-2-2-8 3Dmark05 8703 Cost $600

All these scores were attained with A X1800xl @ 625 core and stock memory all watercooled. These are the scores I got on these PC'S before they were sent to customers.

Now I really don't see whats the big deal about 7800GTX 512 PC Gettting 3Dmark score of 9400 thats with An AMDFX57 $1000 cpu.
I will probably never know what an X1800XTpe gets as I will never buy 1 But I would guess in the 13000 points in 3Dmark 05 2 750core x 800memory With an Intel 3.8@4.8GHz

Why would these people pay $5000. for a Pc with a Cheaper Video card. Because it was a free card. Thats right I gave them the X1800xl's free. Because they bought R600's today. $750. But there not out so I had to give them something . All are very happy.

KimB · Nov 16, 2005

Turtle 1 said:
Thats why I buy ATi it just looks the best and is always the fastest single card you can buy . 7800GTX 512 came to earily gives ATi chance to Bring out X1800XtPE.
So at the end of the year ATI will have the fsatest card At the lowest price. Way to go nvididiot.

That's, uh, not true. The 7800 GTX 512 is available now, and is faster than anything ATI can offer. If angle-independent filering is extremely important to you, obviously the R520 is going to be better, but other than that the G70 looks every bit as good as the R5xx. Better in some cases where you can enable supersampling AA (ex. UT2004).

Moloch · Nov 16, 2005

3dmark

Ya that's great and all, but how bout you play some real games at 1600x1200 with 4x fsas and 16AF and tell us that we dont need anything faster than a 1800xl.

Turtle 1 · Nov 16, 2005

Oh Its true and the X1800XL'S are out And the X1800Xt are out and the X1800XTPe will arrive before years end . . If you don't think shimmering matters your a fool who is just hanging on to a brand name . If nvidia looked better than ATI I would buy nvidia. The plain truth is ATI looks better.

Now if some hardware site would O/C a 7800 GTX 512 and the X1800XT to there highest stable O/C I believe ATI would Come out on top. Now lets not flame about this as I am sure that a hardware site will do just that within the next 2 weeks.
Now lets talk paper launch 7800GTX 512 sold out in 2 days. More available next week maybe. The only thing I know for fact is when they get restocked look for prices @$800 and not the $750 there selling for now.
Ati was realy late to the table and for all purposes lost this round. But at the end of the year ATI will have the fastest single card available in 2005. same as 2004 ,2003,2002 need I say more . For you nividia fans I sell cring towels for $5. E-mail And I will give you a free Nvidia cring towel.

Mintmaster · Nov 16, 2005

Chalnoth said:
Well, the output limitation really isn't all that much of an issue. It's not hard at all to think of a mathematical situation where it may become useful to make use of many registers, while still only having one output. One simple case would be integration. Another would be the product of a vector, a matrix and another vector (the result of which is just a number).

You're not thinking about this from a compiler's viewpoint and the dependency tree. You don't need to have the whole function you're integrating in memory at once. You don't need to have the whole matrix in memory at once. More registers only saves you redundant loads of the smallest matrix in the calculation, leading to a negligible advantage.

For example, consider the vector output from V1*M1*M2.
Dimensions:
V1: 1 x 1,000
M1: 1,000 x 1,000
M2: 1,000 x 4

Algorithm:

Code:

1. Clear r2-r6.

2. Load the first 4 elements of V1 into r0.

3. A set of 4 substeps:
 - Load the first 4 elements of column 1 of M1 into r1. r2.x += r0 dot r1. 
 - Load the first 4 elements of column 2 of M1 into r1. r2.y += r0 dot r1. 
 - Load the first 4 elements of column 3 of M1 into r1. r2.z += r0 dot r1. 
 - Load the first 4 elements of column 4 of M1 into r1. r2.w += r0 dot r1.

4. Repeat steps 2-3, using the next 4 elements in step 1 and step 2. 
   Do 249 times until you do all 1,000 elements

5. A set of 4 substeps:
 - Load row 1 of M2 into r0. r3 += r2.x * r0. 
 - Load row 2 of M2 into r0. r4 += r2.y * r0. 
 - Load row 3 of M2 into r0. r5 += r2.z * r0. 
 - Load row 4 of M2 into r0. r6 += r2.w * r0.

6. Repeat steps 2-5, using the next 4 columns in step 3 and the 
   next 4 rows in step 5. Do this 249 times until you do all 
   10,000 columns/rows.

7. The final product is r3+r4+r5+r6.

Voila! Over a million matrix elements, and it took only 7 registers.

So there is a bit of redundancy in loading v0. You load V1 a total of 1,000/4 = 250 times. So if you're load limited, computation time scales as 1/(1 + 1/n), where 2+1.25n is the number of registers used. Jumping from n=4 (as above) to n=8 means you get 11% more performance. Meaningless. Furthermore, using more temporary registers will hurt latency hiding. Yes, that was more of an issue with NV30, but don't for a minute think that ATI and NVidia wasted transistors in putting enough FIFOs to keep 32 registers of data in flight with no reduction in latency hiding.

Believe me, I thought of matrix multiplication too, as that's seemingly the most obvious answer. However, you really don't need many temporaries in matrix multiplication. 2D integration is similar to matrix multiplication.

In order to need lots of registers, you need a shader where you're doing many things in parallel and sharing data between those many things. Parallelism and interdependency all within a pixel shader? :???:

I'm stumped.

Maybe DemoCoder or Colourless can think of something.

Mintmaster · Nov 16, 2005

Turtle 1 said:
I will probably never know what an X1800XTpe gets as I will never buy 1 But I would guess in the 13000 points in 3Dmark 05 2 750core x 800memory With an Intel 3.8@4.8GHz

You know, you're not doing ATI a favour making BS claims like that. At the very best you might see 10k. Besides, 3DMark05 isn't really a good gauge of anything.

DemoCoder · Nov 16, 2005

I read a paper a long time ago on compiler register allocators that did an extremely large analysis of millions of lines of code and conclusion that there is almost never a need for more than 7 registers.

Extra registers are useful for time-space tradeoff if you want to eliminate common subexpressions. If you don't use the extra registers, you just have to recompute some values that get written over.

The best way to think about is to think about a stack machine architecture. If you look at Forth programs for example, the stack for any given method never gets more than a few elements deep, especially methods that leave only one value as a result.

For shaders, if you say have 10 expressions that reuse 5 normalized results (at the same time), the compiler can either a) stick those 5 normalized results into registers and reuse them in the 10 expressions, or b) it can recompute the subexpressions (normalize) each time they are needed.

The challenge is to find a balance. For the NV30, the limitation of 2 FP32 registers of 4FP16 registers (with huge penalty for exceeding) was too much, and aggressive non-cse would be needed, but you are then burning extra cycles. But once you get around 8 registers, you won't need many more except in pathological cases.

The biggest benefit of the extra registers comes from programming simplicity. Assembly language is easier to write, the the register allocator is simpler in the compiler.

no-X · Nov 16, 2005

ATI was quite clear they will be introducing a "PE version" of X1800XT to compete with 7800GTX 512.

It's not very logical to release R520XT-PE month before R580, but... we have seen many revisions of R520:
A11 - prototype
A13 - prototype
A14 - production version, some X1800XLs
A15 - production version, X1800XTs and some X1800XLs

Dave mentioned A23 version, which is based on new silicon revision, but it wasn't used on any board I've seen. So my question is - could the A23 revision be the XT-PE chip? And secondly - how faster could be this revised chip, than A14/15?

Jawed · Nov 16, 2005

DemoCoder said:
If you look at Forth programs for example,

I loved Forth, back in the early 80s. Damn, that could tempt me back into programming...

Jawed

Jawed · Nov 16, 2005

no-X said:
It's not very logical to release R520XT-PE month before R580, but... we have seen many revisions of R520:
A11 - prototype
A13 - prototype
A14 - production version, some X1800XLs
A15 - production version, X1800XTs and some X1800XLs

Dave mentioned A23 version, which is based on new silicon revision, but it wasn't used on any board I've seen. So my question is - could the A23 revision be the XT-PE chip? And secondly - how faster could be this revised chip, than A14/15?

As far as I can tell XTs that all the reviewers have OC'd to about 650-675 if they were lucky and XLs OC to about 600-625.

But the real XTs that have been around for a week or so seem to go to 700-725.

Jawed

Turtle 1 · Nov 16, 2005

I think 1300 is inline
http://www.nforcershq.com/article4191.html

Anand talk R580

AlphaWolf

Specious Misanthrope

dizietsma

trinibwoy

Meh

Martin Eddy

_xxx_

Jawed

Putas

Khronus

DemoCoder

Turtle 1

KimB

Moloch

God of Wicked Games

Turtle 1

Mintmaster

Mintmaster

DemoCoder

no-X

Jawed

Jawed

Turtle 1

Similar threads