PDA

View Full Version : The NEXT LAST R600 Rumours & Speculation Thread


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [21] 22 23

Jawed
10-May-2007, 13:16
ADD, MUL, MAD - you name it.


Because that is what you'll get normally. If there's no macros involved for example.


Because sometimes it could be possible to schedule a scalar op in addition to the normal vec4-op.
OK, so you're changing the definition of an op depending on whether there's 1/2 or 5...

Jawed

satein
10-May-2007, 13:19
If you believe The Inquirer, not.

http://www.theinquirer.net/default.aspx?article=39526

It's quite a long article, but a couple of choice quotes:

It would be interesting on believing them as I found a preview on the up-coming laptop with HD2600 on the test which I posted it here...
http://forum.beyond3d.com/showpost.php?p=984178&postcount=357

In short this is the pre-test score related to graphic system...

HP Pavilion HDX Entertainment Notebook PC
We expected to see high scores from the HDX, and we weren't disappointed. This preproduction model wouldn't run PCMark05, but its 3DMark03 score of 12,240 and 3DMark06 score of 4,002 mean this system can plow through graphics-intensive applications. Strangely, the HDX notched 31 fps on our F.E.A.R gaming test on autodetect, and 26 fps with the settings maxed out. (We suspect this subpar showing might be a driver issue related to Windows Vista.)

I believe that the delay on consumer RV610/630 mainly due to the chip needed to go to laptop manufacturers...

NocturnDragon
10-May-2007, 13:25
Work station cards have certain features activated over the regular consumer cards (and some ogl extenstions), which do improve performance in certain 3d apps, like Max and Maya or other digital art programs.

Yes Razor, that's what I said. :P

I was asking what was the meaning of the comparison between the quadro and the other 2 cards. As they are in totally different segments with totally different prices.

neliz
10-May-2007, 13:28
I believe that the delay on consumer RV610/630 mainly due to the chip needed to go to laptop manufacturers...

Indeed, there is nothing wrong with the performance of the 2600, it's a better choice than the 8600.

vertex_shader
10-May-2007, 13:29
If R600 sucks ATI would be the first to know so I doubt they were surprised by any of this. Hopefully it's just software issues at work here cause I can't fathom how that calibre of hardware could be having trouble with a significantly hobbled G80 part.

Its can be sw problems, but fixing this problems need time, and this is what amd not have now.

Razor1
10-May-2007, 13:29
Yes Razor, that's what I said. :P

I was asking what was the meaning of the comparison between the quadro and the other 2 cards. As they are in totally different segments with totally different prices.

ah sorry missunderstood :oops:

vertex_shader
10-May-2007, 13:31
Indeed, there is nothing wrong with the performance of the 2600, it's a better choice than the 8600.

You mean amd hard launch the mobil parts next monday?

CarstenS
10-May-2007, 13:31
OK, so you're changing the definition of an op depending on whether there's 1/2 or 5...

Jawed
Nope. Sorry for the confusion. An op is the issue of a ADD, MUL.... whatever over any number of scalars/vectors.

So a scalar-ADD and a Vec5-MUL are both 1 op.

If you've got a purely scalar architecture, you'd have one instruction per op. If you've got a vector-n-processor, you get n instructions per op. Or am i the one confusion things?

Galduta
10-May-2007, 13:32
And this contradictorius numbers ?? is true ?

http://gathering.tweakers.net/forum/list_messages/1217141/17


BENCHMARKS

FarCry 1.4 SM3 HDR [1280x1024]
- 8800 GTX: 123.3 fps
- 2900 XT: 119.3 fps
- 8800 GTS: 106.2 fps
- X1950XT: 85.8 fps

FarCry 1.4 SM3 HDR [1920x1200 - 4x AA 8x AF]
- 8800 GTX: 76.7 fps
- 2900 XT: 70.2 fps
- 8800 GTS: 54 fps
- X1950XT: 50.1 fps

Prey [1280x1024]
- 8800 GTX: 184.3 fps
- 2900 XT: 131.2 fps
- 8800 GTS: 127.3 fps
- X1950XT: 117.6 fps

Prey [1920x1200 - 4x AA 8x AF]
- 8800 GTX: 89.9 fps
- 2900 XT: 70.5 fps
- 8800 GTS: 65.5 fps
- X1950XT: 55 fps

Splinter Cell: Chaos Theory SM3 HDR [1280x1024]
- 2900 XT: 142.5 fps
- 8800 GTX: 131.7 fps
- X1950XT: 94.2 fps
- 8800 GTS: 94 fps

Splinter Cell: Chaos Theory SM3 HDR [1600x1200 - 4x AA 8x AF]
- 8800 GTX: 85.6 fps
- 2900 XT: 86.5 fps
- 8800 GTS: 68.5 fps
- X1950XT: 64.7 fps

X3: The Reunion [1280x1024]
- 2900 XT: 118.5 fps
- 8800 GTX: 99.2 fps
- X1950XT: 97 fps
- 8800 GTS: 88.1 fps

X3: The Reunion [1920x1200 - 4x AA 8x AF]
- 8800 GTX: 74 fps
- X1950XT: 65.4 fps
- 8800 GTS: 58.8 fps
- 2900 XT: 56.5 fps

Company of Heroes [1280x1024]
- 8800 GTX: 142.5 fps
- 2900 XT: 121.4 fps

Company of Heroes [1920x1200 - 4x AA 8x AF]
- 8800 GTX: 75.8 fps
- 2900 XT: 69.5 fps

S.T.A.L.K.E.R. [1280x1024] *
- 8800 GTX: 107.8 fps
- 2900 XT: 96.2 fps

S.T.A.L.K.E.R. [1920x1200 - 4x AA 8x AF] *
- 8800 GTX: 87 fps
- 2900 XT: 46.9 fps


*ohh yeahh a new nice DX 8 bench :evil:

trinibwoy
10-May-2007, 13:34
And this contradictorius numbers ??

http://gathering.tweakers.net/forum/list_messages/1217141/17

Was about to post those. They are much more reasonable. At least it's beating the GTS as expected.

neliz
10-May-2007, 13:36
You mean amd hard launch the mobil parts next monday?

no.

neliz
10-May-2007, 13:40
Was about to post those. They are much more reasonable. At least it's beating the GTS as expected.

The guy is one of the editors at hardware.info
The 1950XTX numbers are with 1950XTX release drivers..

KTE
10-May-2007, 13:41
it's so true ! and talking about gaming performance, some food to mao5 :

source: nordichardware forum (http://www.nordichardware.com/forum/next-vt8428.html?postdays=0&postorder=asc&start=25)

Not necessarily a failure. Read the same thread onwards. ;)

Add this in there too.

R600 @ IT-review:

Those guys broke the NDA, that should tell you something. They do have the cards, that isn't a lie

4 NDA guys that I've personally spoken to said quite clearly those figures are like those of the older drivers, not the ones they are using.

They also said its a pre-release NDA breaking review. Take it with a brick of salt and wait for the real reviews before making conclusions or spreading false opinions.

Sampsa and Denny both were using older drivers for the info you've been given online so far (3 hours before now at least that I've seen). Drivers are seeming as actually affecting performance quite a bit for this card, hence the key.

Yesterday there was supposed to be another driver released that fixes more issues. Thus even for pre-release that's more like a beta preview.

The test rig used was ASUS M2N32-SLI Premium + AMD X2 6000+ FWIW, since they make it quite ambiguous.

vertex_shader
10-May-2007, 13:47
And this contradictorius numbers ?? is true ?

http://gathering.tweakers.net/forum/list_messages/1217141/17


R600 sun is shineing now again, looks like i made some comment to early, best solution to wait monday with deep conclusions :smile:

Cuthalu
10-May-2007, 13:50
I've heard A LOT about some phenomenal "new drivers" for a long, long time. How come "nda-breaking review" has "old" drivers when they should have the same drivers as other people with nda?

Maybe it's true, but I'm starting to be very skeptical about these drivers.

Geo
10-May-2007, 13:50
Will B3D have a review up on the day NDA lifts?

Well, we actually have a three part process for new architecture launches. Arch, IQ, Performance. Rys would obviously like to have all three up, but whether he will or not isn't entirely clear to me yet. At any rate, the timing will be much more compact than the G80 triumvirate, as there isn't a new site to write extending things.

dizietsma
10-May-2007, 13:53
Was about to post those. They are much more reasonable. At least it's beating the GTS as expected.

They look better to me as well. From the results it's rather uncanny how well it finds it's own niche between the GTX and GTS, at least in DX9 current games. Perhaps it (more than?) matches the GTX in future games if it is that forward looking as well as people have suggested and history tends to show?

If those numbers are true then I would guess that the performance of the 8800 GTS/GTX when it was released probably did catch ATI by surprise and perhaps part explains the delay in release rather than X, Y or Z reasons ?? Afterall, if the 320MB/640MB GTS was faster than the 2900XT as well as the GTX that would have been not very good at all, but now 2900XT seems to provide another valid choice for the user contemplating the $ to performance ratio.

nelg
10-May-2007, 13:57
For me, one of the most interesting things that might come out of this launch is going to be B3D's review itself. The gauntlet Dave through down in his excellent Xenos article sets a pretty high benchmark in its ability to explain a new and complex architecture. I trust that Rys et al will rise to the occasion though.

Razor1
10-May-2007, 13:58
I've heard A LOT about some phenomenal "new drivers" for a long, long time. How come "nda-breaking review" has "old" drivers when they should have the same drivers as other people with nda?

Maybe it's true, but I'm starting to be very skeptical about these drivers.


The latest driver was sent out yesterday.

Galduta
10-May-2007, 13:58
R600 sun is shineing now again, looks like i made some comment to early, best solution to wait monday with deep conclusions :smile:

I have had the possibility of testing one 2900 XT to do a small review, but I have not accepted. But the suspense is killing me ;) ....The card is to 15 km of my house

Geo
10-May-2007, 13:58
vr-zone's roadmap (aug. 2005)
http://www.vr-zone.com/?i=2612&s=1
So HEY!.. they missed one quarter with R600...



You did note the "F" in those, one hopes? As in Financial year? Which if I'm not mistaken, for the old pre-merger ATI ended (edited) August 31.

Example: http://media.corporate-ir.net/media_files/phoenix/client/10/105421/quarterresults/ATI_Q405.pdf

IbaneZ
10-May-2007, 14:00
The latest driver was sent out yesterday.

It's gonna be a busy weekend for the editors then. :smile:

Dalton Sleeper
10-May-2007, 14:02
Is STALKER that demanding, isn't there any card that can run it at max settings? SLI/X-Fire?

KTE
10-May-2007, 14:03
Oh and yeah. The 2x 6-pin work with the cards perfectly, even for overclocking according to guys I've spoken to. But even they did not see it safe nor wise to do so and are using the correct 8-pin PCIe.

The whole purpose of the 8-pin PCIe was to include 2 ground wires at present to make it safer, as the +12V current can potentially be double what it was before with the 6-pin, so it's definitely advised to not take a chance and use the correct 8-in PCIe, since most PSU MFGs will make new revisions of PSUs to provide accordance to the new specs, or alternatively dish out the adapters.

Since it's the new standard anyway, as partners deemed the power draw to increase thus initiating PCISIG to change the specs, it's only wise to switch over to it now when and IF you can.

And it's NO way that 250W for a total quad core system with one of those cards is pulled under max load. Some people really don't know how to test their system power draw and end up throwing around bunk:

Jon Gerow aka Jonny Guru with his system testing:
http://www.jonnyguru.com/forums/showpost.php?p=20636&postcount=8
http://www.jonnyguru.com/forums/showpost.php?p=20644&postcount=14

795W pulled and 998W pulled VAC, 80% efficiency.

http://img208.imageshack.us/img208/1584/dsc01220uz2.jpg

http://img248.imageshack.us/img248/449/9983wib7.jpg


They even managed to pull over 1000W with that L1N64 -SLI WS + 2 FX74 + 2 G80GTX in SLI under benching load.
So it's not the pioneers of computer technology and those who engineer all we have that don't know what they're talking about when they suggest you get xxxW PSUs for that rig just in case, but it's the other way round ;)

Geo
10-May-2007, 14:09
If those numbers are true then I would guess that the performance of the 8800 GTS/GTX when it was released probably did catch ATI by surprise and perhaps part explains the delay in release rather than X, Y or Z reasons ?? Afterall, if the 320MB/640MB GTS was faster than the 2900XT as well as the GTX that would have been not very good at all, but now 2900XT seems to provide another valid choice for the user contemplating the $ to performance ratio.


If one eliminates the "I'm not sure what he's smoking, but can I have a hit?" sites, and looks at the more sober sites, then it seems to me the target specs have been pretty consistent going back to last spring, and there's no evidence of a "G80-driven spec bump" in fall of last year.

neliz
10-May-2007, 14:17
You did note the "F" in those, one hopes? As in Financial year? Which if I'm not mistaken, for the old pre-merger ATI ended (edited) August 31.


yeah. .the 2005 report wasn't that swanky.. the 2006 doesn't mention financial quarters though.

dizietsma
10-May-2007, 14:19
If one eliminates the "I'm not sure what he's smoking, but can I have a hit?" sites, and looks at the more sober sites, then it seems to me the target specs have been pretty consistent going back to last spring, and there's no evidence of a "G80-driven spec bump" in fall of last year.

I was thinking more of number of revisions of core and speed attained than a specification change Geo. Granted, it is all speculation :).

Fornowagain
10-May-2007, 14:23
Link (http://forums.hardwarezone.com/showthread.php?t=1607936)

okey dokey. i can only consider myself one of those lucky guys to get my hands on some new toys (retail version) before it's launched next week for some testing.

however, i encounter some issues in running benchmarks on Win XP, thus have to do it on Windows Vista. Would feedback to ATI on this issue & see if there's any new or official drivers to solve it.

& of cos, not forgetting it's challenger... i used the 88gts 320MB from Asus as a comparison.

a teaser for all interested.

Test Platform

Intel E6600
Asus P5B Deluxe
2 x 1GB D9GMH KVR
Seagate SATA 80GB HDD
Windows Vista Home Premium

& le't's go

8800GTS Stock
http://img513.imageshack.us/img513/3323/883d06defaultby2.jpg (http://imageshack.us)

ATI Stock
http://img155.imageshack.us/img155/7308/3d06defaulteditedrm3.jpg (http://imageshack.us)

ATI Overdrive MAX OC
http://img155.imageshack.us/img155/8237/3d06gmaxeditedbi9.jpg (http://imageshack.us)

vertex_shader
10-May-2007, 14:23
Barcelona and R600 (http://www.google.com/translate?u=http%3A%2F%2Fnews.mydrivers.com%2F1%2F 82%2F82887.htm&langpair=zh%7Cen&hl=en&ie=UTF8)
http://img509.imageshack.us/img509/3342/barcelonar600xx6.jpg

chavvdarrr
10-May-2007, 14:25
about these "magic" drivers, I wonder if there will be reviews with renamed filenames.

Cuthalu
10-May-2007, 14:27
58c at idle. It might be possible to make it quite cool with underclocking- and volting. :)

Bjorn
10-May-2007, 14:27
Was about to post those. They are much more reasonable. At least it's beating the GTS as expected.

Yep, looks a lot better. Still a big disappointment considering the available bandwidth but certainly not a new NV30.

Jawed
10-May-2007, 14:27
Or am i the one confusion things?
I don't know, it might be a red herring.

Jawed

neliz
10-May-2007, 14:27
Link (http://forums.hardwarezone.com/showthread.php?t=1607936)


Lol, doesn't this guy need to uninstall his ATI drivers before he can use his 8800? (hint, ATI logo in 8800 benchmark pictures.

IbaneZ
10-May-2007, 14:28
And this contradictorius numbers ?? is true ?

http://gathering.tweakers.net/forum/list_messages/1217141/17




Now that's better. Weird though how the R600 just chokes at high res in some games.

Link (http://forums.hardwarezone.com/showthread.php?t=1607936)





Finally, more synthetic benchies. I can't get enough! :smile:

Bjorn
10-May-2007, 14:29
about these "magic" drivers, I wonder if there will be reviews with renamed filenames.

I kinda like to see stuff like that. As long as we don't start to hear "they're cheating", even though there might not be any visible IQ changes.

Bjorn
10-May-2007, 14:30
Now that's better. Weird though how the R600 just chokes at high res in some games.

One or two benchmarks with weird numbers is to be expected with a new card.

Kocur
10-May-2007, 14:38
So, AMD/ATI were intentionally misleading people (that is, lying) when claiming that their drivers were in a 'superb' condition?

And they were lying about family launch?

If true, this is lame and unprofessional.

neliz
10-May-2007, 14:44
So, AMD/ATI were intentionally misleading people (that is, lying) when claiming that their drivers were in a 'superb' condition?

What, because some people who are not under NDA have problem running cards with old beta drivers?


And they were lying about family launch?
May 14th 2007.

Kaotik
10-May-2007, 14:44
So, AMD/ATI were intentionally misleading people (that is, lying) when claiming that their drivers were in a 'superb' condition?

And they were lying about family launch?

If true, this is lame and unprofessional.

It might mean "superb" in terms of stability & compatibility, and performance still coming up, too?
edit:
Also we don't know how old the drivers used in these benches really are

neliz
10-May-2007, 14:47
http://www.techpowerup.com/reviews/ATI/Radeon_HD_2900_XT/1

:D :D :D


* ATI's Radeon HD 2900 XT is priced at a mere $399 which is an extremely competitive price for being a highest-class performance product.



* Breathtaking $399
* Support for DirectX 10, Shader Model 4.0
* HDMI + Audio output included



* Not fastest GPU in the world as expected
* Fan sometimes noisy
* High power consumption


w1zzard is still uploading the graphs for 3dmark03 and beyond but the first 10 pages or so are free for your viewing pleasure..

and yes.. a lot of awkward results.. the XT gets it's but handed by a 1950Pro in low res Far Cry and but is only 2 fps lower at Quake4 at 2048/15 versus a GTX (yes, faster than a 1900CF setup.

Can we say.. immature?

Kaotik
10-May-2007, 14:49
Macci tested A64 X2 6000+, 580X mobo, 2GB RAM, 4xHDD, R600 combo, it worked fine with Antec NeoHE 430W (tested by playing TDU for 2 hours)

For the finnish people, this can be found from the same thread I linked earlier with what Sampsa said.

Kaotik
10-May-2007, 14:56
http://www.techpowerup.com/reviews/ATI/Radeon_HD_2900_XT/1

:D :D :D




w1zzard is still uploading the graphs for 3dmark03 and beyond but the first 10 pages or so are free for your viewing pleasure..

and yes.. a lot of awkward results.. the XT gets it's but handed by a 1950Pro in low res Far Cry and but is only 2 fps lower at Quake4 at 2048/15 versus a GTX.

Can we say.. immature?

Yeah, it's strange how it's in some trailing 8800GTX closely and in some ending up behind 1900XTX, even 1950 Pro.
And it's also strange how they speak of 2900XT and yet in graphs it's titled 2900XTX :???:

Cuthalu
10-May-2007, 14:56
What is this (from the "review"): "The overclocks are pretty nice. In the end the card runs totally stable at 732 MHz core (4.5%) and 508 MHz memory (27%)" :roll:

IbaneZ
10-May-2007, 14:56
http://www.techpowerup.com/reviews/ATI/Radeon_HD_2900_XT/1


WE SIGNED AN NDA FOR THIS - IF YOU LEAK WE WILL GET SUED

Ehh? Some kind of joke?

Galduta
10-May-2007, 14:58
http://www.hardspain.com/index.php?option=com_content&task=view&id=62&Itemid=2


http://www.imagehost123.uni.cc/uploads/81a60a656b.jpg

Kaotik
10-May-2007, 15:06
I'm inclined to believe the techpowerup thing is nothing but a damn joke, the "xtx" in graphs, "overclocked clocks" lower than retail etc :???:

neliz
10-May-2007, 15:10
Yeah, it's strange how it's in some trailing 8800GTX closely and in some ending up behind 1900XTX, even 1950 Pro.
And it's also strange how they speak of 2900XT and yet in graphs it's titled 2900XTX :???:

Is w1zzerd just teasing us?

fellix
10-May-2007, 15:12
Texture filtering features:

Bicubic filtering:shock:

dizietsma
10-May-2007, 15:13
It still seems odd seeing GDDR3 when the previous generation had GDDR4.

neliz
10-May-2007, 15:13
It still seems odd seeing GDDR3 when the previous generation had GDDR4.

Welcome 8800 GTX

ChrisRay
10-May-2007, 15:15
Welcome 8800 GTX

I dont entirely follow.

neliz
10-May-2007, 15:15
I dont entirely follow.

In reference to this gen using DDR3 and last gen (1950XT) using DDR4

Kaotik
10-May-2007, 15:21
In reference to this gen using DDR3 and last gen (1950XT) using DDR4

It's not entirely comparable, nVidia never moved to GDDR4 on last gen either.

Pressure
10-May-2007, 15:23
I'm inclined to believe the techpowerup thing is nothing but a damn joke, the "xtx" in graphs, "overclocked clocks" lower than retail etc :???:

We can only hope. I mean, getting lower performance in some games than a Radeon X1950GT is...well...sad for a new performance part.

ChrisRay
10-May-2007, 15:23
It's not entirely comparable, nVidia never moved to GDDR4 on last gen either.

Thats what I was thinking. Nvidia has been satisfied with GDDR3.

IbaneZ
10-May-2007, 15:24
And there they pulled it.

I guess it wasn't ment to be seen so to speak... :smile:

leoneazzurro
10-May-2007, 15:29
We can only hope. I mean, getting lower performance in some games than a Radeon X1950GT is...well...sad for a new performance part.

Only in Far Cry. In most of the other tests it was slightly less performing than 8800 GTX. Anyway, if these numbers are real, Far Cry and some other score points out to a very low maturity of R600's drivers (sometimes performing on par with 1950 XTX, some other besting a CF setting).

Cuthalu
10-May-2007, 15:30
Drivers are very old:
Drivers:
NVIDIA: 91.47
ATI: Catalyst 7.1

Edit: newer R600 drivers:
We tested the Radeon HD 2900 XT with ATI's 8.37 driver which is the official benchmarking driver for all R600 reviews.

leoneazzurro
10-May-2007, 15:33
Drivers are very old:

7.1 for X1950. he said somewhere in the article he used 8.37 for benching R600, and that this should be the reviewer's driver.
Hmm... there's maybe another release after that? I read something about a 8.37.4

Jawed
10-May-2007, 15:34
Notable items:

Re-Z :?:
Z Range optimisation - one of the patent applications described this, which is a way to improve Z-testing in the hierarchical-Z system, I believe
Memory read/write cache for improved stream output performance - not surprising, good to see the theory confirmed
Bicubic texture filtering - :lol: someone suggested this when I dug up a patent about texture filtering, seemed unbelievable :!:
PCI Express x16 bus (seems to imply it's not PCI Express 2.0)http://forum.beyond3d.com/showpost.php?p=972191&postcount=2363

Andy suggested bicubic filtering a while back. This is the relevant patent:

http://forum.beyond3d.com/showpost.php?p=972154&postcount=2362

Jawed

Razor1
10-May-2007, 15:36
7.1 for X1950. he said somewhere in the article he used 8.37 for benching R600, and that this should be the reviewer's driver.
Hmm... there's maybe another release after that? I read something about a 8.37.4


From what I know 8.37.4 are the latest and the review drivers.

spidy
10-May-2007, 15:37
Only in Far Cry. In most of the other tests it was slightly less performing than 8800 GTX. Anyway, if these numbers are real, Far Cry and some other score points out to a very low maturity of R600's drivers (sometimes performing on par with 1950 XTX, some other besting a CF setting).
Slightly?

1600x1200 4xAA/16xAF
Prey: GTX 40% faster
Fear: GTX 26% faster
Quake 4: GTX 13% faster
X3: GTX 10% faster

Sorry, there is no way to call this "slightly".

IbaneZ
10-May-2007, 15:37
sigh .. are people bored and url-guessing sites all day?!

By this W1zzard dude who posted the review.

http://www.xtremesystems.org/forums/showpost.php?p=2182156&postcount=14

By the way, did anyone happen to see what system he used to bench the cards?

fellix
10-May-2007, 15:37
Notable items:

Bicubic texture filtering - :lol: someone suggested this when I dug up a patent about texture filtering, seemed unbelievable :!:So, no hope for Lanczos, then. :lol:

Cuthalu
10-May-2007, 15:42
By the way, did anyone happen to see what system he used to bench the cards?
Yep:
CPU: AMD Athlon64 FX-60 @ 2900 MHz
(Toledo, 2x 1024 KB Cache)
Motherboard: Sapphire PC-A9RD580
ATI Radeon XPRESS 3200
Memory: 2x 1024MB G.Skill F1-4000BIU2-2GBHV CL3
Harddisk: WD Raptor 360GD 36 GB
Power Supply: OCZ GameXStream 700W
Software: Windows XP SP2

neliz
10-May-2007, 15:44
Thats what I was thinking. Nvidia has been satisfied with GDDR3.

I ment that, if DDR3 is enough for the GTX, why would it hurt the XT? the Hello 8800 GTX was ment as "here's another product that's running perfectly fine on DDR3"

leoneazzurro
10-May-2007, 15:44
Slightly?

1600x1200 4xAA/16xAF
Prey: GTX 40% faster
Fear: GTX 26% faster
Quake 4: GTX 13% faster
X3: GTX 10% faster

Sorry, there is no way to call this "slightly".

How much a GTS scores at these settings? (and remember that you are talking only about the worst score at every resolution, a fair comparison is an average - and at very high resolution + AA, there's a possibility that XT it's also limited by frame buffer size. )

Jawed
10-May-2007, 15:45
So, no hope for Lanczos, then. :lol:
I'm wondering what the performance will be like. Will it be usable?

Also, is it on top of trilinear/AF or is it instead of? That's my stupid question for today.

Jawed

IbaneZ
10-May-2007, 15:52
Yep:

CPU: AMD Athlon64 FX-60 @ 2900 MHz
(Toledo, 2x 1024 KB Cache)
Motherboard: Sapphire PC-A9RD580
ATI Radeon XPRESS 3200
Memory: 2x 1024MB G.Skill F1-4000BIU2-2GBHV CL3
Harddisk: WD Raptor 360GD 36 GB
Power Supply: OCZ GameXStream 700W
Software: Windows XP SP2

Thanks. :smile:

fellix
10-May-2007, 15:53
It's a planar image resampling (filtering), just like bilinear & etc., it's nothing to do with the higher order of texture interpolation, IMO.
It's costly, though -- quite a lot of MULs and trig op's per tap.

spidy
10-May-2007, 15:54
How much a GTS scores at these settings? (and remember that you are talking about the worst score at very high resolution + AA, there's a possibility that XT it's also limited by frame buffer size. )
The 8800 GTS isn't listed in the various tests. Very crappy imo, because the main competitor of the 2900 XT is the GTS, or perhaps the X1950XTX? :D We'll see. If these numbers are real, I dunno what ATI did the last months. The spec of the 2900 XT seems fine, but even the predecessor can compete with this new, high-tech card. Creepy...

Razor1
10-May-2007, 15:55
How much a GTS scores at these settings? (and remember that you are talking only about the worst score at every resolution, a fair comparison is an average - and at very high resolution + AA, there's a possibility that XT it's also limited by frame buffer size. )


The GTS is right around the 1950xtx other in games that have heavy shader usage where it pulls ahead by quite a bit. (oblivion is a good example of this)

Jawed
10-May-2007, 15:56
It's a planar image filtering, just like bilinear & etc., it's nothing to do with the higher order of interpolation, IMO.
So, whenever a bilinear operation is performed, bicubic replaces the bilinear operation? So this works only for bilinear and trilinear filtering?

Jawed

nicolasb
10-May-2007, 15:58
Weird though how the R600 just chokes at high res in some games.That's beginning to worry me a bit, too. You'd have thought the 512-bit bus would allow it to narrow the gap between it and the 8800GTX as the resolution/AA/AF goes up; but if anything it seems to fall even further behind. Could it simply be running out of memory when working at very high resolutions? Is 512MB not enough?

fellix
10-May-2007, 16:02
I wonder, if this bicubic resampling is a "full" implementation, does it [the hardware] support programmable coefficients -- say for more blurry or more sharpen output? :wink:
It could be somewhat "locked" to a MIP level (AF?) and with gradually iteration to sharpen the output texel as the MIP level increases to aid the AF resampling. D'oh!

leoneazzurro
10-May-2007, 16:08
The 8800 GTS isn't listed in the various tests. Very crappy imo, because the main competitor of the 2900 XT is the GTS, or perhaps the X1950XTX? :D We'll see. If these numbers are real, I dunno what ATI did the last months. The spec of the 2900 XT seems fine, but even the predecessor can compete with this new, high-tech card. Creepy...

Yes, what I see is that R600 is a card that on paper has way more power of X1950XTX. 25-30% more shader power, texturing power unknown but likely to be at least 10-15% more (if they kept the same R580 units, but they say they are improved). Beefed up ROPS and the bandwidth to feed them.
So, if it performs in average on par with X1950 XTX, it can be only

1) ATI made some terrible mistake in designing R600, and there are bottlenecks and chip problems impairing the performance, (i.e. trying to put too many features on it but missing some resouce in one or more fundamental points) or
2) making a performance driver for R600 is really hard, with very big difficulties due to co-issue erformance penalties, practically needing the driver to be optimized heavily for each game or
3) both

And what makes me wonder were the "Preliminary" watermarks all around...

Razor1
10-May-2007, 16:10
hmm that sounds mighty familiar:!:

leoneazzurro
10-May-2007, 16:11
OK, THIS is undoubtely a joke :D

http://www.techpowerup.com/reviews/ATI/Radeon_HD_2900_XT/1

Jawed
10-May-2007, 16:17
3) both
Or, both the TUs and RBEs are completely new, as well as the ring-bus being heavily revised - and they really haven't got on top of performance for these new parts.

I'm certainly not excusing them - they've had bloody ages to sort this stuff out.

Jawed

Rys
10-May-2007, 16:19
Can folks not reproduce W1zzard's article materials here before the NDA expires on Monday, please. While I know you want to chat about the hardware and perf, anything taken from the TPU piece shouldn't have been visible until Monday.

Fornowagain
10-May-2007, 16:20
OK, THIS is undoubtely a joke :D

http://www.techpowerup.com/reviews/ATI/Radeon_HD_2900_XT/1

I'd buy one. 1Gb from 2 chips, amazing.

leoneazzurro
10-May-2007, 16:39
Or, both the TUs and RBEs are completely new, as well as the ring-bus being heavily revised - and they really haven't got on top of performance for these new parts.

I'm certainly not excusing them - they've had bloody ages to sort this stuff out.

Jawed

yeah, they are quite new, but the theoretical numbers in R600 are so HIGH that seeing it near the X1950XTX in many tests leave a strange feeling. It seems like the efficiency at the moment is much lower than R580.

Geeforcer
10-May-2007, 17:02
What's that supposed to mean? The quadro has optimized drivers of those apps...
While neither the 8800 ultra nor (that we know of at least, and I don't think they just unified the consumer drivers with the pro drivers) the R600 are optimizied for those taks.
It would be a fair comparison between the quadro and the new FireGL.

Err... unless I misunderstood him, Silent_Buddha asked how those scores related to workstation card performance.

Kaotik
10-May-2007, 17:18
http://www.hardocp.com/image.html?image=MTE3ODc1NjI4M25xVWV5U3JERFNfMV8yX 2wuanBn

What's up with this, only single 6pin connector yet the computer at least appears to be running based on the led-lightning :???:

Geo
10-May-2007, 17:18
I was thinking more of number of revisions of core and speed attained than a specification change Geo. Granted, it is all speculation :).

Well, I was considering a significant change in target clocks would qualify as a specification change. I haven't heard/seen any evidence that happened after we all ZOMGed over G80. Which is not to say they did or didn't hit their exact targets in the end. It's not all that unusual (in fact, it seems to me it happens more often than not) for both IHV's initial reach to exceed their launch grasp on that point (i.e. core clocks). But I still don't have any reason to think that what AMD is launching at isn't within 10% or so of their way back targets before anyone saw a G80.

NocturnDragon
10-May-2007, 17:24
Err... unless I misunderstood him, Silent_Buddha asked how those scores related to workstation card performance.


Missed that, sorry! This thread is getting too long! :P

Geo
10-May-2007, 17:29
Missed that, sorry! This thread is getting too long! :P

Just because it's coming up on twice as long as any previous thread ever you think it might be getting too long? Well, that's pretty narrow-minded, but okay.

We're going to enjoy locking this one, I grant you. :twisted:

Anon Lamer
10-May-2007, 17:39
If I may add some thoughts into this confusion. My guess is thus: The problem lies in shader scheduling. When asked to produce working, bug free drivers the ATI driver techs opted for simple and naive algorithms that arent efficent but easy to debug and understand. The interesting thing about this is - there is a way to test this, provided by ATI themselves. I propose the following: run benchmark X with an X1800XT, an X1950XT and a R600XT. The X1800XT is texture and memory identical to the X1950XT (or at least it can be clocked identically) and the only difference will then be that the X1950XT has 3x the shader ALUs of the X1800XT. A X1900XT/X may substitute the X1950XT provided its clocked identically to the X1800XT.

If the X1800XT and the X1950XT has similar min/max scores, then the game has a low amount of shaders and the R600XT should do well. If the X1800XT lags behind the X1950XT clearly, then the game has a lot of shaders and the R600 XT should stutter. The best would be if the benchmark could produce a fps graph over time plot.

aeryon
10-May-2007, 18:00
The latest driver was sent out yesterday.

and they still have buggy FSAA performance. new driver will arrive at end of the week to expect correct the issue...

nicolasb
10-May-2007, 18:02
and they still have buggy FSAA performance. new driver will arrive at end of the week to expect correct the issue...How long do they need to get a driver that actually works? What have the driver guys been doing for the past seven months, playing ping pong? :shock:

satein
10-May-2007, 18:03
If I may add some thoughts into this confusion. My guess is thus: The problem lies in shader scheduling. When asked to produce working, bug free drivers the ATI driver techs opted for simple and naive algorithms that arent efficent but easy to debug and understand. The interesting thing about this is - there is a way to test this, provided by ATI themselves. I propose the following: run benchmark X with an X1800XT, an X1950XT and a R600XT. The X1800XT is texture and memory identical to the X1950XT (or at least it can be clocked identically) and the only difference will then be that the X1950XT has 3x the shader ALUs of the X1800XT. A X1900XT/X may substitute the X1950XT provided its clocked identically to the X1800XT.

If the X1800XT and the X1950XT has similar min/max scores, then the game has a low amount of shaders and the R600XT should do well. If the X1800XT lags behind the X1950XT clearly, then the game has a lot of shaders and the R600 XT should stutter. The best would be if the benchmark could produce a fps graph over time plot.

That remines me of the [H] benchmark presentation:wink:

Anyway, may we see this kind of bench or comparison from the B3D too?

IbaneZ
10-May-2007, 18:03
and they still have buggy FSAA performance. new driver will arrive at end of the week to expect correct the issue...

Shouldn't the drivers be tip top by now?

Strategic launch ey? Hmm... :???:

nyr
10-May-2007, 18:13
Has anyone tried running the nvidia DX10 demos on an R600 yet? Or are these locked to nvidia hardware somehow?

Frank
10-May-2007, 18:14
So, what do we think about the instruction scheduling?

1. It uses a VLIW for all ops issued to an 4+1+1 ALU block for each clock.
2. It issues a single instruction word each clock for each ALU.
3. It uses sequential instruction packing (ie. it issues a single instruction word for multiple sequential ops) for each ALU.
4. It issues one (or two) instruction word(s) each clock for each ALU block, but it has a small instruction cache and so can issue multiple ops in a single clock.
5. Like 4, but it issues blocks of instruction words for each batch.

Further:
A. Constants are part of the instruction word.
B. Constants are distributed separately.

Option 1 would fit the picture presented, but would use a significant amount of unneeded bandwidth from the ringbus. Those VLIWs would be quite wide, and they would require you to issue 6 separate instructions (inside the VLIW one) when a short one would do (like, with a conditional vec4 + clamp/modifier op). But it would ease the work of the compiler / scheduler.

Option 2 seems the logical thing to do, but then why group those ALUs, and it would be effectively just as bad as 1 in bandwidth requirements. Effectively, 1 and 2 are the same.

Option 3 is what I expect the G80 to do. It offers interesting possibilities for the R600, but would be hard to schedule. And then why not go fully scalar? I expect a bit of this, but very limited.

Option 4 would make things the most efficient. It is like 3, in that instructions can take multiple clocks and/or do two things sequential (like first calculating and then clamping/modifying/masking), and it would go well with a model that uses slots to run batches for each thread. Ie: each thread has a fixed amount of instruction slots (say, 8) and a minimal amount of clocks (say, 4) for each batch to run. When done, it can switch to the next thread or continue the current one. A texture lookup would terminate the batch and leave a bubble if there were to few ops to fill the slots/clocks. But it would be very hard to implement, and would demand a very complex scheduler and compiler.

Option 5 is probably the way to go. It is like 4, but much better to manage: you only have to move complete blocks around, that hold all you need to execute a single run. Reasonably easy to implement. Then again, you would not be able to do more than would fit inside a single block, and you would still move too much data around if you're only going to do 4 vec4 MULs or such in a single run.

Options A and B both have their strengths and weaknesses as well. It would depend on the option above used what would make the most sense. For option 5, I would put all the constant data needed in the instruction block. And most likely all the other data (registers, flags and texture fetches) as well. It also makes it much easier to buffer all that in a very fast local buffer inside or next to the ALU block.


So, 1 would fit the available data the best, 4 would probably the most efficient, and 5 would be almost as good as 4, but much easier to implement and schedule. And in a sense, it is just about the opposite of how (I think) the G80 does it.

Geeforcer
10-May-2007, 18:59
and they still have buggy FSAA performance. new driver will arrive at end of the week to expect correct the issue...

Yeah..."We are not rushing to lunch before our drivers are ready, unlike other companies"... etc.

Shtal
10-May-2007, 19:03
I hope if history does not repeat itself?

G80-8800GTX intro in Nov-2006, and R600XT HD2900XT should be out by this month May-2007. it is 6 months differentness of time frame.

R300-9700Pro intro was Aug 2002, NV30 was more then 6 months late, I though ATI would have lead over nvidia in future products, I though ATI had a lead in development cycle. But by surprise Nvidia had recovered itself with NV40 and it catchup to ATI in development cycle.

So I hope ATI will recover itself with R700 in time frame and be ready to meet G90.

Julidz
10-May-2007, 19:07
R600 is Vec4 + scalar or Vec5 superscalar like the inquirer said ??


or its the same thing ?

leoneazzurro
10-May-2007, 19:19
R600 is Vec4 + scalar or Vec5 superscalar like the inquirer said ??


or its the same thing ?

No, it isn't the same thing. Each "shader unit" of R600 should be composed of 5 scalar MADD processors.

Morgoth the Dark Enemy
10-May-2007, 19:23
Yeah..."We are not rushing to lunch before our drivers are ready, unlike other companies"... etc.

This tends to give new perspective on the whole gang-bang that has been going on with nV's Vista drivers, and G8x drivers in general, no?Considerably new architecture+considerably new OS=major birthing pains. And nV have had hardware available for a long time and the privilege of putzing with their drivers in the meanwhile...let`s see how this turns out. It is definitely an interesting situation though.

PSU-failure
10-May-2007, 19:40
http://www.hardocp.com/image.html?image=MTE3ODc1NjI4M25xVWV5U3JERFNfMV8yX 2wuanBn

What's up with this, only single 6pin connector yet the computer at least appears to be running based on the led-lightning :???:

Back to the first rumours, it was expected the R600 could work with either 1 8pins PEG power connector or 2 6pins...

Maybe that was true? :?:

Considering the drivers, don't forget that blaming these could simply be AMD's suggestion for those under NDA to not disclose anything.

leoneazzurro
10-May-2007, 19:45
Back to the first rumours, it was expected the R600 could work with either 1 8pins PEG power connector or 2 6pins...

Maybe that was true? :?:

Considering the drivers, don't forget that blaming these could simply be AMD's suggestion for those under NDA to not disclose anything.

Or because slot and card are PCIE 2.0

Kaotik
10-May-2007, 20:01
Back to the first rumours, it was expected the R600 could work with either 1 8pins PEG power connector or 2 6pins...

Maybe that was true? :?:

Considering the drivers, don't forget that blaming these could simply be AMD's suggestion for those under NDA to not disclose anything.

But in that pic, if you look closely, it has 1x 6pin connected, not 1x 8pin?

edit: How much can you draw power from the PCIe 2.0 slot to the video card?

Fornowagain
10-May-2007, 20:12
edit: How much can you draw power from the PCIe 2.0 slot to the video card?
150W on 2.0

FrameBuffer
10-May-2007, 20:13
But in that pic, if you look closely, it has 1x 6pin connected, not 1x 8pin?

edit: How much can you draw power from the PCIe 2.0 slot to the video card?

IIRC the R600 (HD 2900) is not a PCIe 2.0 part where as the RV6x0 parts are and unless my memory fails me PCIe 2.0 allows double the power from PCIe 1.x (75W -> 150W) through the adoption of the 8pin PCIe connector .. EDIT: lol nm got beat to it already.

_xxx_
10-May-2007, 20:27
Wow, slow down there. First off, the entire market does not consist of High-end solutions. In fact, that is were the minority of money are made.


Oh, but that's what Joe will see on every mag title. While pretty much noone will put the low-end stuff there.

Galduta
10-May-2007, 20:55
http://www.nextgpu.com/forum/index.php?topic=17.435

Gigabyte GA-965P-DS3
-Intel Core 2 Duo E6300@3.26Ghz 466Mhzx7
-Team Xtreem 2x1GB D9@933Mhz Cas 5-5-5-15 1:1
HD 2900 xt

The CD drivers are the old 8.361
---------------------------------------------------------------------------
Fear bench 1600x1200, AF 8X, No AA no SS

71 MIN
111 AVERAGE
217 MAXIMUM

---------------------------------------------------------------

1600x1200, AF8X, AA4X, soft shadows ON *
Min:21
Med:58
Max:113

Maybe not is correct !!

1600x1200, AF8X, soft shadows ON

32
55
107

--------------------------------------------------------

One 8800 GTS STOCK clock
158.18 WHQL:

1600x1200, FA@8x, SSon
41
65
177

1600x1200, AA@4x, FA@8x, SSon:
21
38
69
---------------------------------------------------------------------------------------------------------------

Later maybe more , or tomorrow ;) .The person has for my 100% of reliability

_xxx_
10-May-2007, 20:59
They look better to me as well. From the results it's rather uncanny how well it finds it's own niche between the GTX and GTS, at least in DX9 current games. Perhaps it (more than?) matches the GTX in future games if it is that forward looking as well as people have suggested and history tends to show?

Considering the leaked slides, I'd try to describe it in the old terminology: look at it as an 80-pipe chip. There are 80 "fat" pipes going against 120 nV's 80 "single" or including the missing MUL "1.5x" (on average, ass-uming it's used half the time) pipes.

If the scheduling is good and the load balance is favorable for the R600 architecture (higher shader load for example, though that balance also depends on the batch size etc.), it'll gain perf compared to the GTX. The opposite case, high tex/filtering load and less shader load will give nV the advantage of (best case for nV) 128:80 or 16:10 - which also coincidentaly matches the alu-cluster sizes in the chips ;)

So the simplest factor affecting the performance will be if those "fat" pipes can be more often used for multiple ops than for single ops, as well as if the texturing is the bottleneck in the given situation.

chavvdarrr
10-May-2007, 21:17
Considering the leaked slides, I'd try to describe it in the old terminology: Do you take into account that NV pipes are double pumped?

All in all, making same mistake twice in a raw (too few TMU power) can't be coincidence.

_xxx_
10-May-2007, 21:20
I wonder, if this bicubic resampling is a "full" implementation, does it [the hardware] support programmable coefficients -- say for more blurry or more sharpen output? :wink:
It could be somewhat "locked" to a MIP level (AF?) and with gradually iteration to sharpen the output texel as the MIP level increases to aid the AF resampling. D'oh!

I think that rather has something to do with their new hybrid AA algorithms, the "wide/narrow tent" stuff.

_xxx_
10-May-2007, 21:34
Do you take into account that NV pipes are double pumped?

No, I think their simplicity makes up for most of that.
Well yeah, it was a mistake in the sense that they expected the market to rely much more on increased shader power, but obviously they were too early again. And by the time that begins to matter, these cards will be obsolete anyway IMO.

Silent_Buddha
10-May-2007, 21:46
Lol, doesn't this guy need to uninstall his ATI drivers before he can use his 8800? (hint, ATI logo in 8800 benchmark pictures.

Nice catch Neliz, it looks like that guy probably has both the 8800 gts 320 and HD 2900 XT installed at the same time.

Then he probably just changes which is the Primary monitor (thus the one benched on) in Windows Display Manager.

Not sure how trustworthy his results will be then.

Regards,
SB

DemoCoder
10-May-2007, 22:03
I think that rather has something to do with their new hybrid AA algorithms, the "wide/narrow tent" stuff.

Weren't people claiming CFAA was done in the ROPs or scan-out/resolve HW? if the tent filter is being done by a texture unit, I don't see the advantage. A shader-based MSAA resolve pass isn't that expensive fillrate/shader wise, and one could implement arbitrary filter kernels to one's hearts content without wasting silicon on fixed-function bicubic support.

Unknown Soldier
11-May-2007, 00:19
Link (http://forums.hardwarezone.com/showthread.php?t=1607936)

8800GTS Stock
http://img513.imageshack.us/img513/3323/883d06defaultby2.jpg (http://imageshack.us)


Weird, My Gainward Bliss GTS 320 has higher SM2.0 and SM3.0 scores and at def. clocks for the GPU and CPU. My CPU is QX6600 so the score is higher.

3DMark06 - 9651
QX6600 - Default clock
Gainward Bliss GTS 320 - Default Clocks
Resolution is Def. - 1280x1024

---------

3DMark05 - 14080
QX6600 - Default clock
Gainward Bliss GTS 320 - Default Clocks
Resolution is Def. - 1024x768

Will test 3DMark05 at 1280x1024

US

Unknown Soldier
11-May-2007, 00:39
3DMark05 - 13161
QX6600 - Default clock
Gainward Bliss GTS 320 - Default Clocks
Resolution is Def. - 1280x1024

US

Jawed
11-May-2007, 03:50
So, what do we think about the instruction scheduling?
If you haven't already, I guess now is a good time to look at the CTM guide:

http://ati.amd.com/companyinfo/researcher/documents/ATI_CTM_Guide.pdf

since it at the very least provides inspiration and potential comparison!

1. It uses a VLIW for all ops issued to an 5+1+1 ALU block for each clock.
Not sure how you get 5+1+1, since it's MAD/SF+MAD+MAD+MAD+MAD+BR - 1+4+1 if you like.

I assume they're really entirely separate ALUs that are simply clocked in parallel. The width being, erm 4, 8, 16, whatever pixels/primitives/vertices. So:

16x MAD/SF
16x MAD
16x MAD
16x MAD
16x MAD
16x BRif 16-pixels per clock are processed by the ALU pipeline.

2. It issues a single instruction word each clock for each ALU.
I'm going to assume that each instruction lasts 4 clocks, because that's what R5xx does. So the instruction decode and settling time for operand-fetch addressing and so on can be less than frenetic.

3. It uses sequential instruction packing (ie. it issues a single instruction word for multiple sequential ops) for each ALU.
4. It issues one (or two) instruction word(s) each clock for each ALU block, but it has a small instruction cache and so can issue multiple ops in a single clock.
5. Like 4, but it issues blocks of instruction words for each batch.
I think I saw somewhere 512-instruction slots. So if a program is longer than that, then the instructions are paged-in as needed, I presume. It's possible for a single batch to run a clause of code that's hundreds of instructions in length - a clause being bounded by texturing instructions (or a branch). Obviously, a clause can straddle instruction pages.

So I would guess that each instruction of the clause is fed to the ALU pipeline as it's needed, from the instruction cache. I presume that an instruction page fault causes the batch to be switched out of the pipeline, until the page is ready.

Since R5xx has an alternating pipeline where batch instructions are sequenced as AAAABBBB, that seems like a reasonable starting point for R600. If that's the case, you can see immediately that two different instruction pages could be used to feed the ALU pipeline. One batch might be a vertex shader and the other a pixel shader.

R600's instruction scheduler might work by keeping all available batches on the same instruction page, whenever possible (subject to other resource hazards, queues filling up that sorta crap), so deferring instruction-page swaps until as late as possible. Essentially to minimise the number of swaps.

R5xx should be doing some kind of instruction page handling. R4xx may do too, since it can support hundreds of instructions (erm, can't remember how many for SM2.0b... 512?). So, instruction-page handlng is prolly quite normal these days. Dunno if it really amounts to much for us armchair types.

But it adds an extra dimension to the batch scheduling problem. Another dimension to consider is that R600 prolly supports multiple concurrent render contexts. Xenos supports 8 and a patent application for R600 explicitly refers to eight when discussing memory management.

Further:
A. Constants are part of the instruction word.
I expect so, since R5xx supports this kind of inline constant.

B. Constants are distributed separately.
R5xx also has a constant store.

Since a constant store is a key concept in D3D10, it's quite clear that R600 will have a beefed-up version. D3D10 constants can be huge (multi-KB in size), formed as 4096-element structures. It's a whole new ballgame! I wonder if constants and register file actually share memory in some fashion, rather than each having a dedicated pool. But the R600 diagram seems to imply a dedicated store ...

(G80 has a 64KB constant cache shared by all clusters. I don't know if it's monolithic or distributed.)

One of the uses for the constant store in R5xx appears to be to hold vertex attributes, each attribute interpolated for each rasterised fragment (I'm skating on thin ice, admittedly). So that might be vertex colour, vertex normal, texture coordinates etc. It's quite costly. So R600 could do the same, with these interpolated attributes held in the constant store.

Although, if you look at the Xenos functional diagram, you'll see a block called Shader Pipe Interpolators, which appears to be doing on-demand attribute interpolation (as instructions are issued). So, ahem, maybe that's how R600 will work...

http://pcweb.mycom.co.jp/articles/2005/09/09/cedec1/images/012l.jpg

But one of the recent patents seemed to describe how attribute interpolation could be done in parallel with rasterisation, so I'm confused. The R600 diagram contains an SPI block. I trip up on this stuff I'm afraid.

Hmm, just had a thought, maybe the SPI block is actually just a programmable unit that the Sequencer controls in addition to the ALUs and TUs. Since, per vertex, the count and types of attributes that need to be interpolated varies, the quantity of work (and therefore duration of program) varies. Hmm...

Option 1 would fit the picture presented, but would use a significant amount of unneeded bandwidth from the ringbus. Those VLIWs would be quite wide, and they would require you to issue 6 separate instructions when one would do (like, with a conditional vec4 + clamp/modifier op). But it would ease the work of the compiler / scheduler.
Since each of the co-issue ALUs are almost certainly all separate (easiest to think of the BR pipe to see why), each of the six pipes always needs a dedicated instruction. e.g.:

SF+vec2+vec2+BR: RCP+MAD+MAD+ADD+ADD+LT

vec4+scalar+NOP: MAD+MAD+MAD+MAD+MAD+NOP

Option 2 seems the logical thing to do, but then why group those ALUs, and it would be effectively just as bad as 1 in bandwidth requirements. Effectively, 1 and 2 are the same.
I'm not sure what kind of bandwidth you're thinking of here, to be honest.

Option 3 is what I expect the G80 to do. It offers interesting possibilities for the R600, but would be hard to schedule. And then why not go fully scalar? I expect a bit of this, but very limited.

Option 4 would make things the most efficient. It is like 3, in that instructions can take multiple clocks and/or do two things sequential (like first calculating and then clamping/modifying/masking), and it would go well with a model that uses slots to run batches for each thread. Ie: each thread has a fixed amount of instruction slots (say, 8) and a minimal amount of clocks (say, 4) for each batch to run. When done, it can switch to the next thread or continue the current one. A texture lookup would terminate the batch and leave a bubble if there were to few ops to fill the slots/clocks. But it would be very hard to implement, and would demand a very complex scheduler and compiler.
Since R300, texture operations automatically bound a clause of ALU instructions. So the scheduler (don't forget R300 has asynchronous texturing) issues a lump of instructions upto the, known in advance, point at which the texture operation is submitted.

So the only time a bubble should be incurred is when the shader unit has run out of batches to issue due to some horrid combination of texturing latency (e.g. with multiple levels of dependency) and/or dynamic branching.

So, normally, clauses of code will be fed into the ALU pipeline end-to-end, no bubbles.

Option 5 is probably the way to go. It is like 4, but much better to manage: you only have to move complete blocks around, that hold all you need to execute a single run. Reasonably easy to implement. Then again, you would not be able to do more than would fit inside a single block, and you would still move too much data around if you're only going to do 4 vec4 MULs or such in a single run.
Not sure why there'd be "too much data".

Options A and B both have their strengths and weaknesses as well. It would depend on the option above used what would make the most sense. For option 5, I would put all the constant data needed in the instruction block. And most likely all the other data (registers, flags and texture fetches) as well. It also makes it much easier to buffer all that in a very fast local buffer inside or next to the ALU block.
The R600 diagram actually puts the constant cache alongside the instruction cache, which is a clear hint that they're running side-by-side as required by the shaders.

So, 1 would fit the available data the best, 4 would probably the most efficient, and 5 would be almost as good as 4, but much easier to implement and schedule. And in a sense, it is just about the opposite of how (I think) the G80 does it.
Hope you don't mind the fact I've just rambled on, rather than trying to construct a meaningful scenario that answers your questions.

Jawed

Rangers
11-May-2007, 04:24
Yes, what I see is that R600 is a card that on paper has way more power of X1950XTX. 25-30% more shader power, texturing power unknown but likely to be at least 10-15% more (if they kept the same R580 units, but they say they are improved). Beefed up ROPS and the bandwidth to feed them.
So, if it performs in average on par with X1950 XTX, it can be only

1) ATI made some terrible mistake in designing R600, and there are bottlenecks and chip problems impairing the performance, (i.e. trying to put too many features on it but missing some resouce in one or more fundamental points) or
2) making a performance driver for R600 is really hard, with very big difficulties due to co-issue erformance penalties, practically needing the driver to be optimized heavily for each game or
3) both

And what makes me wonder were the "Preliminary" watermarks all around...

It doesn't perform like a X1950XTX..it performs more like 10-15% above that.

A 8800GTS is stronger than a X1950XTX, by a good deal in many cases. They're not equivalent..

I really just do not think R600 is shader bottlenecked..there's just no way that makes any sense. That's not where we should be looking..

BTW, for those speaking of ATI can rally with "good mid-range", I hate to tell you this, but 8800GTS which R600 competes with IS mid-range. They are $240, without Nvidia even trying to get the price down..

R600 IS mid range..

Further they're only going to cut TMU's from here to keep an arbitrary ratio..so it's going to be impossible for them to have a great low-mid part. Competitive, maybe, since 8600 is no great shakes..but it's virtually impossible for it to be great.

radeonic2
11-May-2007, 04:59
It doesn't perform like a X1950XTX..it performs more like 10-15% above that.

A 8800GTS is stronger than a X1950XTX, by a good deal in many cases. They're not equivalent..

I really just do not think R600 is shader bottlenecked..there's just no way that makes any sense. That's not where we should be looking..

BTW, for those speaking of ATI can rally with "good mid-range", I hate to tell you this, but 8800GTS which R600 competes with IS mid-range. They are $240, without Nvidia even trying to get the price down..

R600 IS mid range..

Further they're only going to cut TMU's from here to keep an arbitrary ratio..so it's going to be impossible for them to have a great low-mid part. Competitive, maybe, since 8600 is no great shakes..but it's virtually impossible for it to be great.

240???
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Description=8800GTS

Your continual ati bashing is become tiresome btw.
You should have your name changed to "doomsayerdaamnit" or some such :wink:

BRiT
11-May-2007, 05:51
240???
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Description=8800GTS


The 8800 GTS 320 Meg cards can be had for $240 with specials. The 640 Meg cards cost a bit more.

silent_guy
11-May-2007, 05:51
Option 1 would fit the picture presented, but would use a significant amount of unneeded bandwidth from the ringbus. Those VLIWs would be quite wide, and they would require you to issue 6 separate instructions (inside the VLIW one) when a short one would do (like, with a conditional vec4 + clamp/modifier op). But it would ease the work of the compiler / scheduler.
I don't see how the local instruction storage organization has any influence on bandwidth? My guess is that the combined instruction words are either 64 or 128-bits wide, fetched as such from external memory and stored together as 1 VLIW or as seperate words, one for each ALU/BR, depending on the implementation. Assuming a 4 cycle rotation, instruction fetch scheduling and distribution to the decoders shouldn't be in the critical path, so my bet would be that the instruction words are stored together, since that's more area efficient.

I think I saw somewhere 512-instruction slots. So if a program is longer than that, then the instructions are paged-in as needed, I presume. It's possible for a single batch to run a clause of code that's hundreds of instructions in length - a clause being bounded by texturing instructions (or a branch). Obviously, a clause can straddle instruction pages.

Why not a L1 instruction cache instead of a paging mechanism? Probably as easy to implement than a paging mechanism (though I haven't thought through the consequences of multiple threads), and with less chances of running into freak performance pitfalls due to page straddling?

Rangers
11-May-2007, 06:05
240???
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Description=8800GTS

Your continual ati bashing is become tiresome btw.
You should have your name changed to "doomsayerdaamnit" or some such :wink:

I'm not ATI bashing I'm R600 bashing..and not that until the last few days as it begins to become clear that well, this product is lacking. I was hoping for a great performing part up until very recently. And I still cant believe it draws 220+ watts for that. That's the real kick in the pants here.

And yeah, I fibbed a bit on the 8800GTS, but only a bit. The cheapest on newegg was $260 after rebate, $280 before, with an average price of around $300. ZZF might well have something cheaper though.

nelg
11-May-2007, 06:24
200+ pages and the only conclusion I can draw is that Jawed has spent more time on the R600 than ATI.

tEd
11-May-2007, 06:30
200+ pages and the only conclusion I can draw is that Jawed has spent more time on the R600 than ATI.

:lol:

he's a machine.

Silent_Buddha
11-May-2007, 06:35
I'm not ATI bashing I'm R600 bashing..and not that until the last few days as it begins to become clear that well, this product is lacking. I was hoping for a great performing part up until very recently. And I still cant believe it draws 220+ watts for that. That's the real kick in the pants here.

And yeah, I fibbed a bit on the 8800GTS, but only a bit. The cheapest on newegg was $260 after rebate, $280 before, with an average price of around $300. ZZF might well have something cheaper though.

You may not think you're ATI bashing, but your choice of words and the tone they convey sure makes it seem like you do.

And apparently anyone that actually has a card and is under NDA and has actually tested the power draw... Well, none of them apparently have come even close to 220+ watts unless doing some very serious overclocking. From the little scraps that have come out it would seem that the power draw on the GDDR3 version of the card draws around 175-190watts when not overclocked. Although that would be an abolutely amazing piece of engineering if it's running at 220+ watts normally at 740 mhz and it can easily overclock another 100 mhz without drawing more than 5 more watts of power. ;)

And calling doom and gloom before anyone even remotely reputable has published a review?

By that same token. The 8800 GTX was an absolute and abject failure also right? Since I seem to recall some "benches" (to be kind) that were floating around the net before it came out that didn't paint a rosey picture.

Am I saying R600 is going to be a huge success and kick the pants off the competition? Nope. Am I saying it's a failure before it's been properly reviewed based on some incredibly shoddy "benching" (to be kind) that's been posted? Nope.

Am I waiting until I can actually see a proper review done with proper drivers in a properly controlled environment before making a decision about what video card I will buy? You bet'cha.

Then again if neither company delivers a stable driver (stability first, performance second for me) in Vista 64, then neither will get my money.

That said, it's only 4 days until NDA expires. Think you can hold onto your britches and avoid the whole sky is falling routine until then? :grin: And if the R600 totally falls on it's face and bursts into flames in some reviewer's hands, you can feel free to say you told me so and that the world is coming to an end.

Regards,
SB

Kaotik
11-May-2007, 07:02
I'm not ATI bashing I'm R600 bashing..and not that until the last few days as it begins to become clear that well, this product is lacking. I was hoping for a great performing part up until very recently. And I still cant believe it draws 220+ watts for that. That's the real kick in the pants here.

And yeah, I fibbed a bit on the 8800GTS, but only a bit. The cheapest on newegg was $260 after rebate, $280 before, with an average price of around $300. ZZF might well have something cheaper though.

If it draws 220+ watts, who come Macci can play TDU fine on A64 X2 6000+, 580X mobo, 2GB RAM, 4xHDD, R600 with 430W Antec NeoHE PSU, and Sampsa gotten so far max consumption of 298 watts for whole system (while in this case the system specs are unknown)?

Evildeus
11-May-2007, 07:04
Well the last pull "review" seem on the mark, we can't speak of it since it broke the NDA, but we have a fairly good indication of it's performance before the new magic drivers ;)

SugarCoat
11-May-2007, 07:19
I still cant believe it draws 220+ watts for that. That's the real kick in the pants here.


You're not thinking. Thats a theoretical MAX that a PCIE slot and a 6pin+8pin can deliver in terms of power, NOT what the cards are actually drawing. Think for a moment, the conclusion you're coming to about power consumption is like someone saying the 8800GTX draws 300+ Watts due to its twin 6 pin connectors when it infact draws something like 135. Your conclusion, which others have come to as well, doesnt make any sense.

We can state with some reasonability that it may very well be less then a 8800GTX/Ultra do to having a very close transistor count but also having 256MB less onboard memory that it doesnt need to feed unlike the flagship 8800s. It should come out to something like 110-125Watts unless it has horrific, and when i say horrific i mean run away screaming :runaway: , leakage.

memberSince97
11-May-2007, 07:26
I can't wait for a Homerun

Galduta
11-May-2007, 07:30
Stalker DX 9

HD 2900 XT C2 3200 mhz

Drivers 8.361 , Windows XP VGA at 743/743/1656Mhz:

1600 x 1200 all settings atl maximun AF 16 x Q

Timedemo1: Average: 33,12, Min: 17,51, Max: 88,29,
Timedemo2: Average: 50,50, Min: 31,28, Max: 282,81,

8800 GTS C2 3200 mhz forceware 158.22

Timedemo1: Average: 35 Min: 13 Max: 88 *
Timedemo2: Average: 52 Min: 15 Max: 100 *

measured with fraps ?

Download bench1 (http://www.telefonica.net/web2/hitmaker/bench1.rar)

"demo_play bench1_timedemo"

Download bench2 (http://www.telefonica.net/web2/hitmaker/bench2.rar)

"demo_play bench2_timedemo"

Instruccions

http://www.tweakguides.com/STALKER_10.html

the 2900 need miraculous drivers

Resolution/CPU scaling in one 8800 gts overcloked at 625/1900 , this benchs not are limited for the CPU

http://www.nextgpu.com/forum/index.php?topic=17.msg14606#msg14606

HD 2900 XT overclock

Timedemo2:

1)800/800/1800Mhz: Av: 52,52, Min: 32,07, Max: 297,50,
2)840/840/2000Mhz: Av: 55,30, Min: 34,30, Max: 306,86,

The better over is at 855Mhz " with artifacts " , temperature 78º

leoneazzurro
11-May-2007, 08:36
It doesn't perform like a X1950XTX..it performs more like 10-15% above that.

A 8800GTS is stronger than a X1950XTX, by a good deal in many cases. They're not equivalent..

I really just do not think R600 is shader bottlenecked..there's just no way that makes any sense. That's not where we should be looking..

BTW, for those speaking of ATI can rally with "good mid-range", I hate to tell you this, but 8800GTS which R600 competes with IS mid-range. They are $240, without Nvidia even trying to get the price down..

R600 IS mid range..

Further they're only going to cut TMU's from here to keep an arbitrary ratio..so it's going to be impossible for them to have a great low-mid part. Competitive, maybe, since 8600 is no great shakes..but it's virtually impossible for it to be great.

I was a LOT conservative on R600 numbers, anyway, and US architecture efficiency should be theoretically higher.

neliz
11-May-2007, 08:37
Stalker DX 9

HD 2900 XT C2 3200 mhz

Drivers 8.361 , Windows XP VGA at 743/743/1656Mhz:

1600 x 1200 all settings atl maximun AF 16 x Q

Timedemo1: Average: 33,12, Min: 17,51, Max: 88,29,
Timedemo2: Average: 50,50, Min: 31,28, Max: 282,81,

8800 GTS C2 3200 mhz forceware 158.22

Timedemo1: Average: 35 Min: 13 Max: 88 *
Timedemo2: Average: 52 Min: 15 Max: 100 *


Hmm. .the minimum on the 2900 is twice as high on TD2, the MAx is three times as high.. but the average is still slower?


the 2900 need miraculous drivers :roll:


I hate the rolleyes.. .latest driver is 8.734 come back with results on those and we'll talk..

w0mbat
11-May-2007, 08:37
[...]Drivers 8.361[...]

Who did this bench?

Skinner
11-May-2007, 08:42
Hmm. .the minimum on the 2900 is twice as high on TD2, the MAx is three times as high.. but the average is still slower?



.

Catched my eye to.

The FEAR bench without AA which would averidge on 111 is someting to take note too.

Jawed
11-May-2007, 09:09
Why not a L1 instruction cache instead of a paging mechanism?
I dunno! The diagram shows an instruction cache, so paging could be a irrelevant. The hardware threading model seems key and I'm not sure what other devices you can compare a GPU against :???: Struggling for inspiration here.

Jawed

Galduta
11-May-2007, 10:54
Stalker DX 9
Win XP

HD 2900 XT C2 3200 mhz

Drivers 8.361 , Windows XP VGA at 743/743/1656Mhz:

1600 x 1200 all settings atl maximun AF 16 x Q

Timedemo1: Average: 33,12, Min: 17,51, Max: 88,29,

8800 GTS C2 3200 mhz forceware 158.18

Timedemo1: Average: 35 Min: 13 Max: 88

8800 GTX C2 2750 mhz , Raptor, forceware 160.03 ( my system) ,

Timedemo1: Average: 46,6 Min: 19,6 Max: 95

Download bench1 (http://www.telefonica.net/web2/hitmaker/bench1.rar)

unrar in savegame folder, load the savegame and tip in the console "demo_play bench1_timedemo"

Maybe it is truth. The 2900 XT with its bus of 512 bits is going to compete with the 8800 GTS...

_xxx_
11-May-2007, 11:23
The 2900 XT with its bus of 512 bits is going to compete with the 8800 GTS...

Well, that's what ATI said themselves, so what's new there?

vertex_shader
11-May-2007, 11:25
Stalker DX 9
Win XP

HD 2900 XT C2 3200 mhz

Drivers 8.361 , Windows XP VGA at 743/743/1656Mhz:

1600 x 1200 all settings atl maximun AF 16 x Q

Timedemo1: Average: 33,12, Min: 17,51, Max: 88,29,

8800 GTS C2 3200 mhz forceware 158.22

Timedemo1: Average: 35 Min: 13 Max: 88

8800 GTX C2 2750 mhz , Raptor, forceware 160.03 ( my system) ,

Timedemo1: Average: 46,6 Min: 19,6 Max: 95

Download bench1 (http://www.telefonica.net/web2/hitmaker/bench1.rar)

unrar in savegame folder, load the savegame and tip in the console "demo_play bench1_timedemo"

Maybe it is truth. The 2900 XT with its bus of 512 bits is going to compete with the 8800 GTS...

Use Vista :twisted:

Skinner
11-May-2007, 11:27
Stalker DX 9
Win XP

HD 2900 XT C2 3200 mhz

Drivers 8.361 , Windows XP VGA at 743/743/1656Mhz:

1600 x 1200 all settings atl maximun AF 16 x Q

Timedemo1: Average: 33,12, Min: 17,51, Max: 88,29,

8800 GTS C2 3200 mhz forceware 158.22

Timedemo1: Average: 35 Min: 13 Max: 88

8800 GTX C2 2750 mhz , Raptor, forceware 160.03 ( my system) ,

Timedemo1: Average: 46,6 Min: 19,6 Max: 95

Download bench1 (http://www.telefonica.net/web2/hitmaker/bench1.rar)

unrar in savegame folder, load the savegame and tip in the console "demo_play bench1_timedemo"

Maybe it is truth. The 2900 XT with its bus of 512 bits is going to compete with the 8800 GTS...

Got 50.8 average in 1600x1200 16 AF (Max IQ ingame ). FW 165.01 64 bits. HQ in CP, MultiTSAA (don't know if it make a change ingame) neg. LOD on allow

E6600@ 3 ghz, 8800GTX on default and 4 gb ram.

Galduta
11-May-2007, 11:35
Use Vista :twisted:

8.361 Driver is only for Vista ? The 8800 GTS run with XP and 158.18 and the 2900 ...


tengo los 8.361 en XP

Timedemo1: Average: 33,12, Min: 17,51, Max: 88,29, Mid: 35,51
Timedemo2: Average: 50,50, Min: 31,28, Max: 282,81, Mid: 56,67



I have the 8.361 in XP

:?:

Galduta
11-May-2007, 11:53
Well, that's what ATI said themselves, so what's new there?

Good if, but that rumor is first a FUAD rumor and me I did not believe it absolutely;). But it is for my a little dissapointing , ATI does not arrive at the height of the perfomance of Nvidia, is not good news for the gamers . In many games, like COJ, Armed, SCDA etc, the 8800 GTX not run fine in hi-res . The DX10 features? Yes, for run Call of Juarez at 15 fps

Sc4freak
11-May-2007, 12:05
This single thread constitutes approximately 3% of all posts on this forum.

Arty
11-May-2007, 12:17
Well, that's what ATI said themselves, so what's new there?
Is that a pre-mature admittance of losing your R600 MSRP bet? :lol:

(2900XT going up against 8800GTS rumor is accompanied with MSRP)

I'm just keeping track .. ;)

Arun
11-May-2007, 12:20
This single thread constitutes approximately 3% of all posts on this forum.Not sure how you're counting that (*oops*), but I'd like to point out other such massive threads about upcoming GPU architectures are often moved to: http://forum.beyond3d.com/forumdisplay.php?f=51
Impressive either way, though! :)

vertex_shader
11-May-2007, 12:50
This single thread constitutes approximately 3% of all posts on this forum.

I hope no one open a "beyond r600" thread, because that thread never end :smile:

Silent_Buddha
11-May-2007, 13:11
8.361 driver is evidently what is shipping on the driver CD in retail packages.

Later drivers are ONLY available to people that have actually signed an NDA and thus have access to ATI. And apparently later driver revisions are "supposed" to be faster. Hopefully by faster, they don't means just 1% faster. :P

In other words all these "leaked" benches are from people who either work at a retail shop and/or know someone that works at a retail shop. They've popped open a package and are playing around with it and obviously benchmarking with the older driver since they do not have access to newer driver revisions.

Regards,
SB

PatrickL
11-May-2007, 13:11
And how many post in that thread are fud campaign ?
Saw a french website showing a 2900 XT with 2 six pins connected and claiming that the card was using 270 watts. When I asked to the guy how the card he showed could burn 270 watts while it could at max get 225 W, his answer was I don't know but my source told me so. And it was the main point of his article. I guess his source was not the right green....

vertex_shader
11-May-2007, 13:32
8.361 driver is evidently what is shipping on the driver CD in retail packages.

Later drivers are ONLY available to people that have actually signed an NDA and thus have access to ATI. And apparently later driver revisions are "supposed" to be faster. Hopefully by faster, they don't means just 1% faster. :P

In other words all these "leaked" benches are from people who either work at a retail shop and/or know someone that works at a retail shop. They've popped open a package and are playing around with it and obviously benchmarking with the older driver since they do not have access to newer driver revisions.

Regards,
SB

The problem is silence "damage" AMD everyday, and still some days left, why AMD think no one getting the retail card before NDA expires and leak results with the retail package cd driver?
This is why hard launch can be sucks, this leaks spread with lightspeed, and AMD can't do anything.

trinibwoy
11-May-2007, 14:14
200+ pages and the only conclusion I can draw is that Jawed has spent more time on the R600 than ATI.

:lol:

Kaotik
11-May-2007, 14:25
Not sure how you're counting that (*oops*), but I'd like to point out other such massive threads about upcoming GPU architectures are often moved to: http://forum.beyond3d.com/forumdisplay.php?f=51
Impressive either way, though! :)

Including the previous R600 threads :lol:

Geo
11-May-2007, 14:27
The 8800 GTS 320 Meg cards can be had for $240 with specials. The 640 Meg cards cost a bit more.

Or for free with a gun. But typically neither is a high-volume proposition.

w0mbat
11-May-2007, 14:42
8.361 driver is evidently what is shipping on the driver CD in retail packages.

Later drivers are ONLY available to people that have actually signed an NDA and thus have access to ATI. And apparently later driver revisions are "supposed" to be faster. Hopefully by faster, they don't means just 1% faster. :P

In other words all these "leaked" benches are from people who either work at a retail shop and/or know someone that works at a retail shop. They've popped open a package and are playing around with it and obviously benchmarking with the older driver since they do not have access to newer driver revisions.

Regards,
SB

I havent singed any NDA but ive the 8.374 driver.

Love_In_Rio
11-May-2007, 15:28
I havent singed any NDA but ive the 8.374 driver.

and ? better ?

AnarchX
11-May-2007, 15:55
There is already 8.38 out. :wink:

8.37
http://img292.imageshack.us/img292/731/20070511038dac7bb1ed6b7ix5.th.jpg (http://img292.imageshack.us/my.php?image=20070511038dac7bb1ed6b7ix5.jpg)http://img292.imageshack.us/img292/7204/20070511aca766684a1a9efza5.th.jpg (http://img292.imageshack.us/my.php?image=20070511aca766684a1a9efza5.jpg)

8.38
http://img292.imageshack.us/img292/8273/20070511bbbda2bf981adfalk3.th.jpg (http://img292.imageshack.us/my.php?image=20070511bbbda2bf981adfalk3.jpg)http://img292.imageshack.us/img292/3/2007051118746e1b7f838f9ha6.th.jpg (http://img292.imageshack.us/my.php?image=2007051118746e1b7f838f9ha6.jpg)
http://chiphell.com/viewthread.php?tid=3948&extra=&page=3

trinibwoy
11-May-2007, 15:57
At that rate by the time they hit 8.40 Nvidia might have to reconsider their current pricing :grin:

vertex_shader
11-May-2007, 15:57
There is already 8.38 out. :wink:

8.37:
http://chiphell.com/attachments/month_0705/20070511_aca766684a1a9efca047L27ykGTMVyxa.jpg
8.38:
http://chiphell.com/attachments/month_0705/20070511_18746e1b7f838f9d13504eyw8cSyYw4L.jpg

http://chiphell.com/viewthread.php?tid=3948&extra=&page=3

AMD release every day new driver, than how the reviewers can make reviews for monday?
Some finished there review already.

Kocur
11-May-2007, 16:03
Yes, but at that rate they will release 8.40 tomorrow in the morning and 8.50 on Monday :lol:.

satein
11-May-2007, 16:08
AMD release every day new driver, than how the reviewers can make reviews for monday?
Some finished there review already.

Sound as if the AMD/ATi are using the reviewers as the last beta tester :lol: These would be another reason why they need more time applied on the reviewers on doing their job.

Let see what the revision will be the last to hit the review on Monday and probably within the next few days the reviewer might need to update their score again and again :roll:

AnarchX
11-May-2007, 16:15
I ask me what have they done the last months, it is a bit strange that a new driver which provides 20% or more(said in XS) comes out at a point of time the most/good reviews are already finished. :???:

leoneazzurro
11-May-2007, 16:26
I ask me what have they done the last months, it is a bit strange that a new driver which provides 20% or more(said in XS) comes out at a point of time the most/good reviews are already finished. :???:

Maybe they were already finished time ago and they kept them secret until now.
It's a bit a paranoid attitude, IMHO :???:

AnarchX
11-May-2007, 16:34
http://img501.imageshack.us/img501/623/r600vsg80mp8.jpg (http://imageshack.us)
http://we.pcinlife.com/thread-763485-1-1.html
real source: http://bbs.cpcw.com/viewthread.php?tid=1150115

Bad DX10 performance...

Anarchist4000
11-May-2007, 16:34
They must have a lot of software engineers sitting around the office profiling games as fast as they can. Just how many programmable parts do they have on this card?

seahawk
11-May-2007, 16:37
I fear we are back to the time when every new driver meant that you also had to check closely for changes in the IQ.

Kaotik
11-May-2007, 16:39
Bad DX10 performance...
I wouldn't say that based on one DX10 demo apparently developed on GF8

AnarchX
11-May-2007, 16:42
I wouldn't say that based on one DX10 demo apparently developed on GF8

Crysis and UT3 is also in developement on a GF8. ;)

leoneazzurro
11-May-2007, 16:43
http://img501.imageshack.us/img501/623/r600vsg80mp8.jpg (http://imageshack.us)
http://we.pcinlife.com/thread-763485-1-1.html
real source: http://bbs.cpcw.com/viewthread.php?tid=1150115

Bad DX10 performance...

It ssems they used Catalyst 7.4 (no version of other drivers mentioned). Fake?

Anarchist4000
11-May-2007, 16:45
While I can't say for certain that test is probably using a feature that is only available on Nvidia cards. Either that or ATI simply hasn't gotten around to doing very much along the lines of DX10 optimizations.

And I don't think it's IQ issues that are going to be showing up with these optimizations. They should be the memory controller tweaks we saw with R580. All they were doing was reordering requests and operations to run more efficiently. Which shouldn't have any impact on IQ.

Kaotik
11-May-2007, 16:47
Crysis and UT3 is also in developement on a GF8. ;)

Game developers like Crytek and Epic surely have HD2900's there too, that demo was done by some forum user here I think?

Frank
11-May-2007, 17:13
I dunno! The diagram shows an instruction cache, so paging could be a irrelevant. The hardware threading model seems key and I'm not sure what other devices you can compare a GPU against :???: Struggling for inspiration here.

Jawed
Some other things to think about: you can have many active threads, and with a lot of texture fetches or branches, you have to swap them out often. Including the state data, like registers, flags and masks/partial branch states. They're not going to simply run until a texture lookup / branch is hit (as you said, because they know that up front) and then swap the state data, operands and instructions with the next thread. They break it down in blocks (most likely), and only swap in a new instruction/data block, while storing the state data in a nearby buffer.

Also, if you can have 512 instructions (and I believe that was for SM2.0, I think they're now at a large number) for each thread/context, you're not going to swap all that out every time you switch threads. It would even be very hard to keep the whole program for each thread in local memory all the time.

And, if you know up front you need a texture fetch (and you do, but you might not know the coordinates), you do that first (calculating the coordinates, if needed, in the last run of that thread), and add that data to the block of data (including the instructions and (pointers to) the constants. And you only schedule blocks that contain all you need for the next run.

That would be something like this:

1. Compile the program to blocks of native code.
2. Upload them to the GPU (VRAM), including the metadata needed for the GPU to schedule them and fetch the data needed.
3. The GPU builds a thread table, prefetches the first blocks of instructions needed for each thread, fetches the needed data, calculates/stores the constants and adds it to the scheduler queue.
4. The scheduler keeps track of what ALU block is running what (for resource management and localizing data), and schedules blocks to the "instruction" queue to be executed. Most likely, that queue also holds the state data from the last run, or initial values for a new thread / context. And the operands needed.
5. After a thread/context finishes, runs out of allocated slots or has to wait for data, you store the state in a block, and buffer it.

And that also raises the question what they do with a branch: simply make both paths into new threads? Run the thread twice? And that raises the question: do they still have a fixed execution unit for each pixel target? If they don't it makes a lot of sense to regroup threads after branches. And: do they have meta-instructions as well, for the general managing? Almost surely. I wouldn't be surprised when the actual instructions are different from what you would expect from shader sources, and that there are many more.

If you don't mind some rambling from me as well. ;)

Unknown Soldier
11-May-2007, 17:13
Just realised those tests I did yesterday, that my memory settings are all wrong.

It's currently running at 266x9

I need to change it to 400x6

Anyways, did a 10% OC on my CPU yesterday and got up to 2630Mhz without any problems.

My 3DMark scores went up by 200 or so.

US

CarstenS
11-May-2007, 17:15
While I can't say for certain that test is probably using a feature that is only available on Nvidia cards.

Most things should be unified - even features. DX10 has no caps bits anymore. At least I am not aware of anyhting in GF8 that exceeds DX10 in terms of Feature Set besides MSAA.

Razor1
11-May-2007, 17:19
Game developers like Crytek and Epic surely have HD2900's there too, that demo was done by some forum user here I think?


Yep they do have em

w0mbat
11-May-2007, 17:25
and ? better ?

No, just wanted to show that not all benches out there are fake or with old drivers.

Geeforcer
11-May-2007, 17:26
I wouldn't say that based on one DX10 demo apparently developed on GF8

That's what you get for being 7 months late and 3.5 million DX10 cards short, no?

Kaotik
11-May-2007, 17:28
That's what you get for being 7 months late and 3.5 million DX10 cards short, no?

Have they sold 3.5 million GF8800's? :shock:
Anyway, the point wasn't if it's "justified for them getting that because they're late" or not, but that a single demo, codec by some individual using one card, doesn't necessarily give right picture of the other cards DX10 performance.

Razor1
11-May-2007, 17:34
I wouldn't say that based on one DX10 demo apparently developed on GF8

Well Dx10 is supposed to get rid of that.........:wink:

_xxx_
11-May-2007, 17:36
Is that a pre-mature admittance of losing your R600 MSRP bet? :lol:

(2900XT going up against 8800GTS rumor is accompanied with MSRP)

I'm just keeping track .. ;)

Then also keep track of the fact that the bet was about a card with the same performance level like the GTX or even faster but be priced at $399. That bet is already invalid, unless the reviews on monday prove me wrong..

Kaotik
11-May-2007, 17:37
Well Dx10 is supposed to get rid of that.........:wink:

Feature-wise yes, but that doesn't mean that different cards wouldn't have different strengths and weaknesses

trinibwoy
11-May-2007, 17:40
Well Dx10 is supposed to get rid of that.........:wink:

That's not going to help if Andy went wild with dependent scalar ops. Though I haven't really seen any experienced developer speak to G80's practical advantage in that respect. Are there really enough one or two component instructions in current and upcoming shader algorithms to make a difference?

_xxx_
11-May-2007, 17:42
Weren't people claiming CFAA was done in the ROPs or scan-out/resolve HW? if the tent filter is being done by a texture unit, I don't see the advantage. A shader-based MSAA resolve pass isn't that expensive fillrate/shader wise, and one could implement arbitrary filter kernels to one's hearts content without wasting silicon on fixed-function bicubic support.

I haven't mention the texturing units, just that maybe that's a part of how it's calculated, wherever.

3dilettante
11-May-2007, 17:43
2. Upload them to the GPU (VRAM), including the metadata needed for the GPU to schedule them and fetch the data needed.

I've only skimmed the CTM spec, and it was a while ago, but I didn't see a section on metadata.


And that also raises the question what they do with a branch: simply make both paths into new threads?

At a low level, dynamic branches are still present. If there is divergent behavior between the shader units, execution is serialized and the branch is run both ways, with the incorrect outputs masked out.

Geeforcer
11-May-2007, 17:47
Have they sold 3.5 million GF8800's? :shock:

I am pretty sure that's the number they gave for "DX10" cards, which presumably included 8400/8600 as well.

Kaotik
11-May-2007, 18:12
I am pretty sure that's the number they gave for "DX10" cards, which presumably included 8400/8600 as well.

Well 8400/8600 have been available only less than a month if I'm not mistaken, so they couldn't have sold too many of 'em yet?

Razor1
11-May-2007, 18:16
Well 8400/8600 have been available only less than a month if I'm not mistaken, so they couldn't have sold too many of 'em yet?


Dell, HP and others have pretty big deals with those cards:wink:

AnarchX
11-May-2007, 18:17
Well 8400/8600 have been available only less than a month if I'm not mistaken, so they couldn't have sold too many of 'em yet?

OEM deals could be the answer... :wink:

IbaneZ
11-May-2007, 18:18
http://www.fudzilla.com/index.php?option=com_content&task=view&id=892&Itemid=34

The 3.5 million comes from fudzilla. And it says shipped, not sold.

Sounds like a lot in such short time.

Arty
11-May-2007, 18:24
Then also keep track of the fact that the bet was about a card with the same performance level like the GTX or even faster but be priced at $399. That bet is already invalid, unless the reviews on monday prove me wrong..
Err.. (http://forum.beyond3d.com/showpost.php?p=975010&postcount=2603)
Yes. $50, via PayPal?

And we're talking about the XT and the average price in e-tail on launch day.
Yup, you didnt mention of anything related performance. And the bet is already invalid?

Oh, I do always keep my promises and pay my dues.
:lol: (http://forum.beyond3d.com/showpost.php?p=975070&postcount=2620)

ants
11-May-2007, 18:28
Didn't see this posted yet, sorry if a repost.

http://www.pcadvisor.co.uk/reviews/index.cfm?reviewid=834

PC Advisor semi review, no scores but a very odd conclusion...


Fantastic specs don't always add up to a lead in real-world applications, but ATI has been able to convert its technological advantages into some lethal framerates. It didn't beat the 640MB version of the 8800 GTS in all of our game tests, but it did come awfully close (see chart, below). The advantage is greatest at a resolution of 1,024x768, which suggests that the less detail you want to stack on, the better this card will prove.

Something is very broken it seems...

EDIT: I'm also seeing the R600 pop up in Canada but for about $550...

Geeforcer
11-May-2007, 18:34
The price comment (cheaper than GTS 640) seems strange, but then again, I don't know how much GTS 640 costs in UK.

Jawed
11-May-2007, 18:40
Some other things to think about: you can have many active threads, and with a lot of texture fetches or branches, you have to swap them out often. Including the state data, like registers, flags and masks/partial branch states.
There's no data movement, as such, though. The GPU merely needs to track the status of each thread (meaning batch). Flags and predicate registers are persistently located in batch-specific areas, regardless of whether a thread is in-ALU or waiting its turn. A copy will be brought into the ALU pipeline, but then everything is copied in to an ALU pipeline in order for it to function.

R600 may be different of course, because I'm talking mostly in terms of how I understand R5xx.

For example D3D10 requires a GPU to support 4096 temporary registers per pixel. It's an insane number which to me implies that at some point that degree of swapping requires registers to be shunted out to video memory. This is something that R5xx doesn't do - the hard limit on registers per pixel (128) is traded-off against the number of batches and that's the end of it.

They're not going to simply run until a texture lookup / branch is hit (as you said, because they know that up front) and then swap the state data, operands and instructions with the next thread. They break it down in blocks (most likely), and only swap in a new instruction/data block, while storing the state data in a nearby buffer.
I think you'll find batch-swapping is as fine-grained as the code and population of batches requires. e.g. if you calculate a texture address, look-up that texture and then use that texture result to calculate another texture address, before looking up the second texture, there's no way to avoid those 4 fine-grained swap events. If you don't swap you're facing a monster amount of texturing latency.

Also, if you can have 512 instructions (and I believe that was for SM2.0, I think they're now at a large number)
SM2 allows 32 texture fetches and 64 ALU instructions. It's 2.0a/b and SM3 that went dramatically beyond this. (Though there's the f-buffer in R3xx and R4xx GPUs which meant you could run SM2 programs one after the other, whilst keeping the register values from one program into the next - erm, that's the way I understand it, anyway.)

for each thread/context, you're not going to swap all that out every time you switch threads. It would even be very hard to keep the whole program for each thread in local