PDA

View Full Version : The NEXT LAST R600 Rumours & Speculation Thread


Pages : [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Geo
01-Mar-2007, 03:56
Some history
The ATI R600 Rumours & Speculation Centrum (http://www.beyond3d.com/forum/showthread.php?t=34676)
The (inappropriately named) LAST R600 Rumours & Speculation Thread (http://forum.beyond3d.com/showthread.php?t=37205)
Huddy says R600 (http://www.beyond3d.com/forum/showthread.php?t=31049)
Xbit: This report is the standard for where "64 shaders" comes from. (http://www.xbitlabs.com/news/video/display/20060525104243.html)
B3D on 80nm. (http://www.beyond3d.com/forum/showthread.php?t=29612)
B3D on 512-bit to external memory (http://www.beyond3d.com/forum/showthread.php?t=36695)
B3D on Xenos heritage (http://www.beyond3d.com/forum/showthread.php?t=28611)
Tech Report suggesting Dave Orton said "96 shaders" for R600. (http://www.techreport.com/etc/2006q4/stream-computing/index.x?pg=1) However, actual quote was "next generation", which might leave the door open that he was referring to a 65nm refresh rather than R600.
Roughly 6 pages of talking about reported claimed die shots starting at #674 here (http://www.beyond3d.com/forum/showthread.php?t=34676&page=27)
Site claiming to have an R600 listing specs and "testing" results. (http://www.level505.com) Sober and considered opinion of B3D staff concerning the claimed specs: "Pttthhhppptt!"
DailyTech finding said site credible. (http://www.dailytech.com/article.aspx?newsid=5524)
CJ's leaked specs (http://www.beyond3d.com/forum/showpost.php?p=912896&postcount=641) discussing timeframes, prices and performance estimations on this very thread.
Henri Richard promises Q1 for R600 launch (http://www.beyond3d.com/forum/showthread.php?t=37931)
AMD sets Tech Day for R600 (http://www.beyond3d.com/forum/showthread.php?t=38248)
Beyond3D (http://www.beyond3d.com/forum/showthread.php?t=38927) and Xbit (http://www.xbitlabs.com/news/video/display/20070221023825.html) report R600 pushed to Q2.

Some spec and launch date information from the AMD 690G launch on 2/28/2007. (http://forum.beyond3d.com/showthread.php?p=938465#post938465)

Arty
01-Mar-2007, 04:01
The interesting bits from that EETimes article:
Separately, AMD gave one of the first public demos of the R600, its next-generation graphics controller that uses 320 multiply-accumulate units.
The company showed a Barcelona-based system using two 200W R600 graphics cards to hit a terabit/second benchmark.
AMD also demonstrated working versions of its next-generation graphics chip the R600 to be released by the end of June.
Release of the R600 has been delayed "a few weeks" so that AMD can roll out a full suite of graphics chips covering multiple market segments for the latest Microsoft DirectX 10 applications programming interface.

Farhan
01-Mar-2007, 04:18
It's teraflop, not terabit :)

http://www.reghardware.co.uk/2007/02/28/amd_690g_launch/
http://www.informationweek.com/news/showArticle.jhtml?articleID=197700288&subSection=Breaking+News
http://blogs.zdnet.com/Berlind/?p=363
http://content.zdnet.com/2346-10741_22-57089.html
http://blogs.zdnet.com/Berlind/?p=364

http://www.boincstats.com/stats/host_cpu_stats.php?pr=sah&st=0&or=10 - zomg Barcelona?

R300King!
01-Mar-2007, 04:26
Why the new thread? http://www.beyond3d.com/forum/images/smilies/icon_rolleyes.gif

AMD gave one of the first public demos of the R600, its next-generation graphics controller that uses 320 multiply-accumulate units. The company showed a Barcelona-based system using two 200W R600 graphics cards to hit a terabit/second benchmark.

So doesn't this mean it's 320/2 = 160 units? If you divide the 160 by vec4 you get 40. :)

But not sure if any of this is even true.

INKster
01-Mar-2007, 04:29
So doesn't this mean it's 320/2 = 160 units? If you divide the 160 by vec4 you get 40. :)

But not sure if any of this is even true.

And if you multiply 40 by 2, you get 80. :wink:

R300King!
01-Mar-2007, 04:30
Also, that 1 Terabit(is that correct?)/sec processing power, does that include the CPUs?

The company showed a Barcelona-based system using two 200W R600 graphics cards to hit a terabit/second benchmark

R300King!
01-Mar-2007, 04:33
And if you multiply 40 by 2, you get 80. :wink:

Yeah, I know. :D I mean a single R600 chip will only have 160 or 40 vec4. Maybe that board was with 2 mid-range R600s with few shaders than the XTX version. Who knows? :)



OT Have you seen Level505 recently, it's covered with ads. Way more than before. lol

Jawed
01-Mar-2007, 04:34
Aha, Barcelona/R600 :razz:

Jawed

SirPauly
01-Mar-2007, 04:36
Separately, AMD gave one of the first public demos of the R600, its next-generation graphics controller that uses 320 multiply-accumulate units. The company showed a Barcelona-based system using two 200W R600 graphics cards to hit a terabit/second benchmark.

Release of the R600 has been delayed "a few weeks" so that AMD can roll out a full suite of graphics chips covering multiple market segments for the latest Microsoft DirectX 10 applications programming interface. Rival Nvidia rolled out its high-end DX10 graphics controller, the GeForce 8800 last fall but has not filled out its product line with midrange and low-end parts based on it yet.

"As soon as AMD makes their DX10 announcements, I am sure we will hear about competing products from Nvidia," said McCarron.

In addition, AMD announced a new desktop chip set, the first from the ATI division since the merger last fall. The AMD 690 sports an ATI Radeon X1250 graphics core and a new video decode block. It is also the former ATI's first chip set to support the HDMI video interface with HDCP copy protection for high definition video.

Good news.

SirPauly
01-Mar-2007, 04:41
So does 320 Multiply Accumulate units = 80 shaders?

Sound_Card
01-Mar-2007, 04:42
Just when I was in a down mood with all this bad news of R600 springing up, this stuff pops up and smacks me in the face.

So Indeed, AMD is going for a complete platform launch.:razz:

Natoma
01-Mar-2007, 04:43
So does 320 Multiply Accumulate units = 80 shaders?

Geo is collecting his wager winnings. ;)

Sound_Card
01-Mar-2007, 04:44
So does 320 Multiply Accumulate units = 80 shaders?


It could be some hybrid form me thinks. Like 160 scaler shaders and 40vec4. Or I could be dead wrong and dumb.

Razor1
01-Mar-2007, 04:46
well if its vec4+scalar, so that would be 64 units :) if its vec 3 + scalar then 80 units, at least thats what it sounds like to me.

Sound_Card
01-Mar-2007, 04:50
well if its vec4+scalar, so that would be 64 units :) if its vec 3 + scalar then 80 units, at least thats what it sounds like to me.


64vec4+scaler sounds very good to me.

Cuthalu
01-Mar-2007, 04:54
So, what's the realease date gonna be; "a few weeks" + "by the end of June" = release in April, and June referring to the ending of Q2 which was the final possible date of previous Q2-timewindow?

Geo
01-Mar-2007, 05:01
http://www.beyond3d.com/#news39176

The bit about early rumors would be a reference to both Xbit reporting 64 shaders and ATI (at the time) reporting that they'd leveraged Xenos. Add in todays 320 and some "version 2" of unified hints, and you've got the reasoning we used for that last bit.

Sound_Card
01-Mar-2007, 05:01
So, what's the realease date gonna be; "a few weeks" + "by the end of June" = release in April, and June referring to the ending of Q2 which was the final possible date of previous Q2-timewindow?


I though that "delay of a few weeks" was refering to cebit.

pakotlar
01-Mar-2007, 05:14
http://www.beyond3d.com/#news39176

The bit about early rumors would be a reference to both Xbit reporting 64 shaders and ATI (at the time) reporting that they'd leveraged Xenos. Add in todays 320 and some "version 2" of unified hints, and you've got the reasoning we used for that last bit.

geo that would be pretty exciting if true. 64 vec4 + scalar wouldn't be too shabby at all if the arbiter was decently efficient.

edit: Am i missing something? What happened to plain old 80vec4? I didn't read the EEtimes article. Maybe I should check that.

Geo
01-Mar-2007, 05:25
So, what's the realease date gonna be; "a few weeks" + "by the end of June" = release in April, and June referring to the ending of Q2 which was the final possible date of previous Q2-timewindow?

I don't think we heard anything today to change the opinion we offered on the front page last week that we're probably looking at late April or early May.

Cuthalu
01-Mar-2007, 05:26
What's the difference between vec4 + scalar and vec4? Is it better to have 64 vec4 + scalar than having 80 vec4, and if so, why?

Geo
01-Mar-2007, 05:39
Well, that they are referring to "320" would suggest that they might be all functionally scalar, even tho grouped as 64 5D (our guess) or 80 4D (certainly not impossible). To the degree the scheduling allows them to be treated as scalar, then which it is won't be all that important for most purposes. Scalar at all is the big thing, as vec4 will not be as efficient (tho you could get a lively argument going about how much control logic you have to add to make that happen in calculating the relative efficiency).

Pete
01-Mar-2007, 05:50
Once again, Geo, I didn't think before posting. :oops: Vec4 + scalar lines up with Xenos, so 64 sounds right. A little harder to line up against G80, perhaps, with that rogue scalar, but it'll be an interesting fight, for sure.

(That scalar also makes for a nice "+25%" on top of 64 vec4s. Now, where have I heard "+25%" before? Am I just spinning my brain cells if I think of preemptive PR? :))

But functionally scalar, eh? That'd be a kick in the pants. I wonder if their unified v.2 would take that step.

Anywho, if Xbit was right about 64 shader units, then they're probably right about 16 texture units, and that may mean 16 ROPs. But 16 of NV's, or something more? It'd almost have to be more, seemingly, given all that bandwidth and if we can estimate the shader and so core clock from the 2 * R600 = teraflop figure.

Geo
01-Mar-2007, 05:57
What I want to see is if rwolf can make 320 ALUs and 500 mflops into something 2GHz-ish. :grin:

psurge
01-Mar-2007, 06:05
Geo - easy. They could operate at 2.4GHz but only have throughput of 1 madd every 3 cycles.

Ailuros
01-Mar-2007, 06:07
64*9*0.8 = 461 * 2 = 922 ? Either the frequency is higher or I'm "stealing" 2 FLOPs from my speculative layman's math there. Or it should have read "nearly 1 Teraflop"....

To the degree the scheduling allows them to be treated as scalar, then which it is won't be all that important for most purposes.

No problem at all; but I put that one up as a nice reminder for all those that laughed or protested against the marketed "128 SPs" of G80. We can by now more safely say that it's in reality a 16*Vec8 ALU thingy (and that's open to corrections too); so if I understand the above correctly and it's truly 64 5D ALUs I could either think of 4*Vec16 or 8*Vec8.

Assuming roughly the same efficiency for the ALUs between the two architectures, the major difference so far seems to be the G80 ALU clock domain and R600's "phatter" units.

turtle
01-Mar-2007, 06:19
Well, according to this:

http://www.hardspell.com/doc/hardware/34620.html

A13 silicon seems final, and 'no less than 800mhz'. It also says the GDDR3 version of R600 is 12 layer, the GDDR4 being between 12-16 layer PCB, with the OEM card being 512MB and the retail card being 1GB (if I read it correctly.)

Seems Geo's assumption-based article could very well be right based on the '800mhz' number. :grin:

Article also mentions RV630 is also in AIB hands, and they are preparing cards based on it.

Hello massive family if not enthusiast (4x4 barcelona/crossfire-physics) platform launch?

Ailuros
01-Mar-2007, 06:24
Assuming my idiotic math above has any legs, they'd need roughly 870MHz to fully reach a hypothetical 500 GFLOP rate.

turtle
01-Mar-2007, 06:30
Hmm...Maybe that old "A12 hits 1ghz" rumor has some legs, eh?

Things certainly are starting to come together. :yes:

Rangers
01-Mar-2007, 06:52
The interesting bits from that EETimes article:
Separately, AMD gave one of the first public demos of the R600, its next-generation graphics controller that uses 320 multiply-accumulate units.
The company showed a Barcelona-based system using two 200W R600 graphics cards to hit a terabit/second benchmark.
AMD also demonstrated working versions of its next-generation graphics chip the R600 to be released by the end of June.
Release of the R600 has been delayed "a few weeks" so that AMD can roll out a full suite of graphics chips covering multiple market segments for the latest Microsoft DirectX 10 applications programming interface.

So again, why the hell did they delay it?

I honestly cant believe it. It seems like ATI just did it to lose.

The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first.

ChrisRay
01-Mar-2007, 07:00
SirPauly

http://www.3declipse.com/images/stor...1/8xqplane.png

http://www.3declipse.com/images/stor...16xaaplane.png

The x16 AA clearly isn't offering better smoothing than the X8Q..........and the reason why this shot matters is because it is close to a near horizontal with nice color contrasts to gadge image quality in a static shot. This is easier to notice with a moving screen.

Why are you picking out illustrations that dont cover the comparison as a whole? . Comparing it to 4xAA and 8xAA will further explain the discrepency shown here. But the loss of detail on tiny objects from 4x to 8x/16x 8x CSAA has the exact same problem. I chose that scene for a specific reasons. To Illustrate that near high polygon edges will not benefit from it and to show near distant objects in low resolutions. Nor will high levels of multisampling. Its more interesting to see what CSAA does to it in the very distant objects. Clearly I am familiar with that comparison as I am the one who made it. And I will reiterate. It is still "not" comparable to 6xAA. Your trying to compare it to 6xAA is still poor comparison. There are plenty of other comparisons I used which can illustrate the exact opposite . I certainly am not hiding the issue that CSAA isnt perfect. But its still a huge upgrade to 4xAA in most circumstances as a minimal performance hit. To the point that 4xAA has become a largely irrelevent performance mode for G80 users.

Natoma
01-Mar-2007, 07:03
So again, why the hell did they delay it?

I honestly cant believe it. It seems like ATI just did it to lose.

The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first.

ATI doesn't exist anymore. It's AMD remember. That changes the approach significantly.

nelg
01-Mar-2007, 07:12
Perhaps AMD delayed the R600 to use the new family of Rx6XX cards to bolster the performance of Barcelona.

epicstruggle
01-Mar-2007, 07:14
Hmm...Maybe that old "A12 hits 1ghz" rumor has some legs, eh?

Things certainly are starting to come together. :yes:
I was thinking the same thing. We are going to have to revisit at least a few old rumors in the coming days.

Shtal
01-Mar-2007, 07:21
So again, why the hell did they delay it?

I honestly cant believe it. It seems like ATI just did it to lose.

The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first.

Just my own thoughts about delay!

1st = Their is no solid DX10 driver for Vista. (Example like for G80)
2nd = Their is no DX10 Vista games.
3rd = Probably to surprise Nvidia since they don't know what they are up against, because they have to adjust GF8900GTX to match R600.
4th = Probably their is little or no profit at all for High-End, so they need midrange graphic cards to make up the cost in order for overall profit gain.
5th = Not many people will upgrade their video cards right away (Example like Geo with his GF8800GTX :) )

Dave Baumann
01-Mar-2007, 07:24
http://biz.yahoo.com/bw/070301/20070228006340.html?.v=1

SAN FRANCISCO--(BUSINESS WIRE)--AMD (NYSE: AMD - News) today showcased a single-system, Accelerated Computing platform that breaks the teraflop computing barrier. Organizations are ultimately expected to be able to apply this technology to a wide range of scientific, medical, business and consumer computing applications. At a press event in San Francisco, AMD demonstrated a "Teraflop in a Box" system running a standard version of Microsoft® Windows® XP Professional that harnessed the power of AMD Opteron(TM) dual-core processor technology and two next-generation AMD R600 Stream Processors capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation. This achievement represents a ten-fold performance increase over today's high-performance server platforms, which deliver approximately 100 billion calculations per second.

Anarchist4000
01-Mar-2007, 08:05
So again, why the hell did they delay it?

I honestly cant believe it. It seems like ATI just did it to lose.

The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first.

I'm still leaning towards the platform launch, not just the family launch. And nobody else does it because nobody else can. Nvidia last I checked doesn't make CPUs and Intel's discrete graphics market hasn't quite developed yet.

Assuming my idiotic math above has any legs, they'd need roughly 870MHz to fully reach a hypothetical 500 GFLOP rate.

64*5*3*.8=768GFLOPs * 2 = 1.536TFLOPs

A R600 Crossfire should fairly effectively destroy the TFLOP barrier. Also consider this, with G80's missing MUL a single R600 has more than twice the theoretical FP power.


In regards to the scheduling what if they just didn't bother with making it perfectly efficient as ALUs seem somewhat cheap going by the R520->R580 example.

1+1 = 1+2 = 1+3 = 1+4

If it doesn't branch you can really pack em in there. If it does branch you could look at it like 2 scalars. I can't think of how you'd end up with any shaders that had a greater than 50% scalar:vector ratio. Save the complexity of the scheduling and just add more ALUs and clockspeed.

What I want to see is if rwolf can make 320 ALUs and 500 mflops into something 2GHz-ish.

Simple... Inverted Clock Domains. :runaway:

turtle
01-Mar-2007, 08:21
Hmm. Dual-core 'city' CPU was used at the event instead of QC eh? Hint of things to come? :D

Can you comment on if that mysterious Opteron that showed up on BOINC is of the same breed or a hoax? That article certainly lends credence to the possibility of it being real...

Also, just out of curiosity, what MHZ number would be needed to hit 512GFLOPS using Geo's guesstimate on ALUs? ~890 (going by Al's math)?

Going by at least 1TFlops though, Al's math (which I have no idea is correct) and assuming that Opteron was around 24-25Gflops (which might be slightly off) 975/2 = 487.5/64*9 = 846mhz.

That wouldn't quite be half a teraflop per card, but close.

I'mma guess it was running at 850mhz or greater. :razz:

overclocked
01-Mar-2007, 08:27
If its Vec and Scalar units based could those be clocked differently with the vec slower and scalar faster?

Unknown Soldier
01-Mar-2007, 08:29
So now, speculation prices for this monster that will be out in a few weeks.

$600?

US

epicstruggle
01-Mar-2007, 08:53
So now, speculation prices for this monster that will be out in a few weeks.

$600?

USIMO, $650. Although maybe AMD will try to be aggressive with their pricing since they are launching a whole family of cards and not high end only. It will be interesting to see what other differences this launch will have with previous ones.

epic

rwolf
01-Mar-2007, 09:02
What I want to see is if rwolf can make 320 ALUs and 500 mflops into something 2GHz-ish. :grin:

lol ... ummm, I think they are saving that for the refresh at 65nm. :wink:

R300King!
01-Mar-2007, 09:03
Can the 8800GTX in SLI do 1 TFLOP?

Also, does this include the CPU FLOPs?

How many GFLOPs can lets say an OC'd quadcore QX6700 do?

Joshua Luna
01-Mar-2007, 09:45
lol ... ummm, I think they are saving that for the refresh at 65nm. :wink:

Fast14 lives on :cool:

northfieldz
01-Mar-2007, 10:36
G80 can do it as well when SLi is enabled.

And I don't think the number is too good for ATi, especially considering R600 is more of a vector-machine.

Per B
01-Mar-2007, 10:59
Given the new positive signals both regarding R600 and Barcelona, could it simply be that AMD wants to be able to provide the first R600 (pre-)reviewers with Barcelona systems?!? If they don't, we can be pretty sure that the reviews will be performed on Core 2 Duo or Quad systems, and that doesn't look that good for AMD... it is still the CPU's that are the big thing for AMD.

So the R600 delay could simply be because AMD is on track with Barcelona, as well as the matching chip-sets!

vertex_shader
01-Mar-2007, 11:02
"ATI R600 and the next field demonstration engines GDC"
http://66.249.93.104/translate_c?hl=en&langpair=zh%7Cen&u=http://news.mydrivers.com/1/78/78496.htm

Evildeus
01-Mar-2007, 11:07
[url]Microsoft® Windows® XP ProfessionalHmmmm...

Hopefully, we will have a good GPU in may ;)

vertex_shader
01-Mar-2007, 11:33
Well, according to this:

http://www.hardspell.com/doc/hardware/34620.html

A13 silicon seems final, and 'no less than 800mhz'. It also says the GDDR3 version of R600 is 12 layer, the GDDR4 being between 12-16 layer PCB, with the OEM card being 512MB and the retail card being 1GB (if I read it correctly.)

Seems Geo's assumption-based article could very well be right based on the '800mhz' number. :grin:

Article also mentions RV630 is also in AIB hands, and they are preparing cards based on it.

Hello massive family if not enthusiast (4x4 barcelona/crossfire-physics) platform launch?

Now i can see the sunshine end of the dark tunnel from AMD aspect.
The situation looks good for AMD now (when this news/rumors will be true), maybe time come to buy some share :smile:

So is almost confirmed Barcelona is the reason why R600 delayed?

icecold1983
01-Mar-2007, 11:37
isnt g80 like 330ish gflops? how would 2 in sli make 1 tflop

DemoCoder
01-Mar-2007, 11:38
64*5*3*.8=768GFLOPs * 2 = 1.536TFLOPs


Umm, where's this * 3 coming from? A MADD is 2 FLOPS, not 3. 64 * 5 * 2 * .8 = 512 GFLOPs is more like it.

Don't you think the press release would have said "1.5TFlops!" if your math was right?


A R600 Crossfire should fairly effectively destroy the TFLOP barrier. Also consider this, with G80's missing MUL a single R600 has more than twice the theoretical FP power.


Nope. A G80 with missing MUL = 518Gflops. Certainly, the G80 won't always be able to use the missing MUL every cycle, but neither will the R600 be able to use every SIMD slot of their VEC4 every cycle either unless it is a scalar design IMHO. The true efficiency will be hard to calculate for both, so comparing absolutely unrealistic peak rates is nonsense.

CarstenS
01-Mar-2007, 11:39
64*9*0.8 = 461 * 2 = 922 ? Either the frequency is higher or I'm "stealing" 2 FLOPs from my speculative layman's math there. Or it should have read "nearly 1 Teraflop"....

Nope, but you need to read PR-Statements more carefully... ;)

"AMD demonstrated a "Teraflop in a Box" system running a standard version of Microsoft® Windows® XP Professional that harnessed the power of AMD Opteron(tm) dual-core processor technology and two next-generation AMD R600 Stream Processors capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation."

My bold etc.

Assuming VR-Zone was correct and there really was a barcelona-cpu (which is based on "AMD Opteron(tm) dual-core processor technology " [note the "technology"]), you might end up with significantly less than 500 GFLOPs/sec. here. for single R600, obviously.

Anarchist4000
01-Mar-2007, 11:43
G80 can do it as well when SLi is enabled.

And I don't think the number is too good for ATi, especially considering R600 is more of a vector-machine.

One G80 = ~345GFLOPs ... Two G80's > 1000GFLOPs?

Yeah I'm not quite seeing it. Doubt they're getting 100% efficiency and perfect scaling for the SLI either.

I'm assuming the >1TFLOP mark was measured performance and not a purely theoretical number.

Umm, where's this * 3 coming from? A MADD is 2 FLOPS, not 3. 64 * 5 * 2 * .8 = 512 GFLOPs is more like it.

Don't you think the press release would have said "1.5TFlops!" if your math was right?

MADD=2, ADD=1

Assuming they're still using the MADD+ADD setup they've been using before.

64*5*3*0.8 = 768.0GFLOPs R600
128*3*1.35 = 518.4GFLOPs Normal
128*2*1.35 = 345.6GFLOPs Missin MUL

When measuring the GFLOPs you're running an operation that lines up best to the card so every ALU should be fully utilized most of the time. Best case scenario basically. So R600 should be capable of feeding all of those pipelines. This would be one of those cases where I'd expect R600 to thrash G80 just because of the design focus. It's somewhat meaningless in real world application but for the purpose of doing that many operation R600 is capable of a significant amount more than G80. We don't know by how much R600 broke the barrier but if it was measured performance and discounting FLOPs from the CPUs that's 66% efficiency(1 / 1.5TFLOPs) including the scaling hit for Crossfire. I guess it really comes down to if 1TFLOP was measured or theoretical performance.

Graham
01-Mar-2007, 12:14
The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first.

The major sales and major money come from the medium/low end. If you can launch these before added hype the high end generates fades away, then all the better. Provided the high end card does well, it can only have a positive effect on the lower end cards.

I was told geforce 8000 cards are fastest in the world... wait.. I can't afford it. Never mind.
or..
I was told radeon x2000 cards are the fastest in the world... awesome! they have one at my price point!

Lets hope if amd do release an entire platform in one hit top to bottom, they unify the naming schemes too... Like AMD X[series] [perf] [product]... AMD x2 300 graphics, AMD x2 400 cpu, AMD x2 200 platform,.. whatever. Something like that.

Arnold Beckenbauer
01-Mar-2007, 12:24
Anywho, if Xbit was right about 64 shader units, then they're probably right about 16 texture units, and that may mean 16 ROPs. But 16 of NV's, or something more? It'd almost have to be more, seemingly, given all that bandwidth and if we can estimate the shader and so core clock from the 2 * R600 = teraflop figure.

http://www.beyond3d.com/articles/xenos/index.php?p=06
If I understand the Xenos' diagramm correct, Xenos' TMUs are not a part of the three shader units, they are decoupled or some else.
Decoupled 24 TMUs and 64 5D-ALUs (four SIMD clusters Ã* la Xenos?).
Or is it too crazy?

CarstenS
01-Mar-2007, 12:29
One G80 = ~345GFLOPs ... Two G80's > 1000GFLOPs?

Yeah I'm not quite seeing it. Doubt they're getting 100% efficiency and perfect scaling for the SLI either.

I'm assuming the >1TFLOP mark was measured performance and not a purely theoretical number.


Under Vista i'm getting about 93% percent MADD-efficiency on G80. (~322 out of 346 GFLOPs/sec.)

Anarchist4000
01-Mar-2007, 12:33
Well they've already got tons of X's in the names as well as an affinity for 4 digit numbers so does that count as unified? XL, XT, XTX, FX, X2, x64

But the entire platform launch does look rather appealing from a marketing perspective. Of course all the reviewers are gonna be mad because they get nailed with a massive workload all at once.

Under Vista i'm getting about 93% percent MADD-efficiency on G80. (~322 out of 346 GFLOPs/sec.)

I'm assuming you aren't using SLI though and how exactly was it measured out of curiosity?

CarstenS
01-Mar-2007, 12:41
I'm assuming you aren't using SLI though and how exactly was it measured out of curiosity?
No, it was a single G80. Measured with official 100.65 Forceware, Vista x86 and v1.2.1 of GPUBench's "Scalar vs. Vektor Instruction Issue"-part.

On R580+ I am only getting close to 75 percent efficiency on MAD and only about 50 percent on ADD (Cat 7.1; curious note: Skalar-split does not seem to work anymore in 7.1 drivers but vec4-results are in line with older drivers).

DemoCoder
01-Mar-2007, 12:43
MADD=2, ADD=1

Assuming they're still using the MADD+ADD setup they've been using before.


Unsubstantiated assumption. Xenos doesn't have that setup. And if they are trying to increase efficiency and density, as well as clocks, it's more likely that they've simplified the setup. It's harder to co-schedule dependent ALU instructions via ILP than to use TLP. Efficiency is worse. NVidia learned its lesson, and presumably ATI did too.

This would be one of those cases where I'd expect R600 to thrash G80 just because of the design focus.

Design focus? How do you consider Nvidia's scalar design not focused on efficiency, but then go on to make the assumption that ATI went the route of vectorized ALUs and therefore, efficiency was its design focus? There's way too much cheerleading in these assumptions.

To me, the R600's true efficiency is a big question mark if it is indeed a vectorized GPU, because it takes more HW and compiler magic to extract efficiency of out this setup.

Anarchist4000
01-Mar-2007, 12:52
For this case, benchmarking based on FLOPs, efficiency should be rather good under any condition. If they're using vectors they should be able to pack more into a given area. It's one of those tests were efficiency shouldn't be a significant factor as it would be high on both. Therefore the card with the higher theoretical power wins. "Design focus" probably wasn't the correct description. I meant this was a situation extremely well suited for a vector based design.

I'd agree with you that in terms of actual real world performance the efficiency of R600 will be the deciding factor.

DemoCoder
01-Mar-2007, 13:02
I don't get your efficiency measure. As far as benchmarking is concerned, efficiency = actual throughput/peak theoretical throughput. In this case, vectors lose out. Vectors may win on transistor density, but that's a different efficiency measure.

Without knowing what workload the benchmarks consists of, you can't really make any claims as to real world efficiency. But it is well known that maximizing vectorization of code to match the underlying hardware vector architecture is a difficult problem. Unless you feed handcrafted *ideal* code to the vectorized units, it's unlikely you will close to peak theoretical rates, unless you think the compiler performs voodoo levels of instruction scheduling.

It's simply easier to extract maximum efficiency and parallelism not having to worry about packing and co-issuing 5D operations. There's way more opportunities to screw up.

Now, if you want to claim that ATI fed handcrafted and completely artificial workloads that extracted near peak FLOPs rates, well la-de-da, but the people looking to buy GPUs to run on real world workloads are more interested in how the chip's efficiency compares on a diverse set of workloads.

You know, the PlayStation/2 had amazing peak theoretical rates that one could hack custom and artificial benchmarks to read. It isn't hard with eDRAM. In the real world, you needed the PS/2 performance analyzer to get anywhere near sane efficiency levels.

Techno+
01-Mar-2007, 13:05
I believe that if Anarchists's speculation on 3 FLOPS\cycle is correct, then an R600 at 1 GHZ can achieve a TFLOP.

nexus_alpha
01-Mar-2007, 13:06
Nope, but you need to read PR-Statements more carefully... ;)

"AMD demonstrated a "Teraflop in a Box" system running a standard version of Microsoft® Windows® XP Professional that harnessed the power of AMD Opteron(tm) dual-core processor technology and two next-generation AMD R600 Stream Processors capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation."

My bold etc.

Assuming VR-Zone was correct and there really was a barcelona-cpu (which is based on "AMD Opteron(tm) dual-core processor technology " [note the "technology"]), you might end up with significantly less than 500 GFLOPs/sec. here.

capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation."

DemoCoder
01-Mar-2007, 13:09
Reminds me of the old 3dfx commercial.

Ailuros
01-Mar-2007, 13:11
MADD=2, ADD=1

Assuming they're still using the MADD+ADD setup they've been using before.

64*5*3*0.8 = 768.0GFLOPs R600
128*3*1.35 = 518.4GFLOPs Normal
128*2*1.35 = 345.6GFLOPs Missin MUL

You realize though that it could also be 1 4D MADD + 1D ADD don't you?

nexus_alpha
01-Mar-2007, 13:12
capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation."

Sorry didn't know the words would come out so big can't find the edit button

Anarchist4000
01-Mar-2007, 13:15
It's simply easier to extract maximum efficiency and parallelism not having to worry about packing and co-issuing 5D operations. There's way more opportunities to screw up.

Now, if you want to claim that ATI fed handcrafted and completely artificial workloads that extracted near peak FLOPs rates, well la-de-da, but the people looking to buy GPUs to run on real world workloads are more interested in how the chip's efficiency compares on a diverse set of workloads.

In the interest of running a benchmark that breaks the 1 TFLOP barrier I assumed any test performed by a company and demonstrated to the public would be handcrafted and show a best case scenario. Running a test based on conditional testing of random numbers would be rather insane for a company to actually demonstrate unless it was beneficial in some way to their hardware.

Rereading that article I would say it looks more like a single MADD and not the MADD+ADD setup I was assuming. They would have mentioned that in that article unless they didn't understand what was happening. In the past I was under the idea that ATI used Vec3+1 with each unit being a MADD+ADD and the +1 having additional SF logic.

Razor1
01-Mar-2007, 13:45
correct me if I'm wrong but if this is xenos style ALU's shouldn't it only be 8 + 2 flops per Mad and co issue add?

Oh sorry didn't see your post Democoder ;)

CarstenS
01-Mar-2007, 14:13
Sorry didn't know the words would come out so big can't find the edit button

So what? If you're referring to my 500 GFlOPs - that was single R600.

Geo
01-Mar-2007, 15:02
64*9*0.8 = 461 * 2 = 922 ? Either the frequency is higher or I'm "stealing" 2 FLOPs from my speculative layman's math there. Or it should have read "nearly 1 Teraflop"....



I assumed 10 rather than 9 to make the math work; in other words that 5th D scalar of the original Xenos config is now MADD like the others rather than ADD. Because otherwise I'd think the point of doing scalar in the first place would be more complicated if all the units aren't interchangeable. But then this is speculative, so perhaps you're right and I'm wrong.

No problem at all; but I put that one up as a nice reminder for all those that laughed or protested against the marketed "128 SPs" of G80. We can by now more safely say that it's in reality a 16*Vec8 ALU thingy (and that's open to corrections too); so if I understand the above correctly and it's truly 64 5D ALUs I could either think of 4*Vec16 or 8*Vec8.

And it seems likely that G80 acts in quad mode for certain things too, so it's really about context, isn't it?

Geo
01-Mar-2007, 15:28
Oh, and if you look at what Wavey linked upstream:

two next-generation AMD R600 Stream Processors capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation.

You might wonder why it's specifying MADD. :smile: But, really, if you make the scalar assumption (and there seem to be some people who think that's not in the bag yet, that tossing it out as "320" was just marketing), then wouldn't that nearly demand that the units be all the same capability?

Jawed
01-Mar-2007, 15:32
wouldn't that nearly demand that the units be all the same capability?
If you count G80's MAD FLOPs, entirely excluding the SF unit, then you're left counting 128 MADs per clock.

So R600's 320 MADs per clock may be excluding SF too.

Jawed

Geo
01-Mar-2007, 15:46
Err, come to think of it (he said, looking at his own report on the front page), "mulitply-accumulate units" sounds an awful lot like MADD as reported by a reporter who isn't hip to the usual lingo. Doesn't it?

Razor1
01-Mar-2007, 15:50
so possibly no co issue?

Jawed
01-Mar-2007, 15:51
Err, come to think of it (he said, looking at his own report on the front page), "mulitply-accumulate units" sounds an awful lot like MADD as reported by a reporter who isn't hip to the usual lingo. Doesn't it?
MAC is the generally accepted term for what we GPU people like to call MAD.

Jawed

Geo
01-Mar-2007, 16:03
so possibly no co issue?
Co-issue is poor man's scalar, isn't it? Or maybe it would be more accurate to say scalar is co-issue taken to its logical conclusion.

epicstruggle
01-Mar-2007, 16:12
Ok, Im getting confused. Could some one flesh out the different possibilities?

Geo
01-Mar-2007, 16:23
Ok, Im getting confused. Could some one flesh out the different possibilities?

That sounds like more effort than I want to engage in just now. :smile: But here's a start.

We got three new facts yesterday. Facts, because they were stated by AMD. They are:

1). 200W power usage
2). 320 "multiply-accumulate units"
3). 1/2 teraflop math power

Then you start conjuring from there, from what we know, think we know, etc from "our story so far" regarding R600 going back to early 2006 (tho, when talking about R600, if you wanted to you could really go back to just after the R300 launch, when Orton first enthused about something he called R400 at the time, which explains why X800 was based on "R420" instead. . .but I digress!).

Some of these things are linked upstream in post 1 of this thread. The ones I personally like to lean on would be Xbit's report that R600 was 64 shaders. And AMD's statement that they'd leveraged Xenos. Then mix in some "Version 2 of unified" statements.

Sound_Card
01-Mar-2007, 16:32
So someone at standford is already coding R600 for GPGPU folding at home client?

Jawed
01-Mar-2007, 16:46
So someone at standford is already coding R600 for GPGPU folding at home client?
It should just work on R600, since it's written in DX9 SM3.

Jawed

CarstenS
01-Mar-2007, 16:48
It should just work on R600, since it's written in DX9 SM3.

Jawed

So it's also working on GF6-8?

Jawed
01-Mar-2007, 16:52
So it's also working on GF6-8?
No. The performance of 6/7 plus the driver support meant Stanford gave up - it was a waste of time.

8 appears to be a low priority for NVidia, as NVidia is now marketing CUDA and wants to make the GPGPU word go away.

Jawed

Anarchist4000
01-Mar-2007, 16:58
CTM is based around D3D Assembly so it shouldn't take much work to get working on R600. There could be some tweaking for efficiency but R580 should be a subset of R600 in a matter of speaking. You could likely use the same vectorized code that was used before. I'm assuming CUDA has it's own compiler to move it to different targets. G80 likely could execute the same code if the backend was set up for it.

CarstenS
01-Mar-2007, 16:58
No. The performance of 6/7 plus the driver support meant Stanford gave up - it was a waste of time.

8 appears to be a low priority for NVidia, as NVidia is now marketing CUDA and wants to make the GPGPU word go away.

Jawed
Doesn't make sense to me. If it's just working on R600 since it's written in DX9 SM3 - why would any GF6/7 and especially 8 refuse SM3-code?

Nakai
01-Mar-2007, 17:05
You realize though that it could also be 1 4D MADD + 1D ADD don't you?

But the R600 has 5D-Shader.:roll:

I assumed 10 rather than 9 to make the math work; in other words that 5th D scalar of the original Xenos config is now MADD like the others rather than ADD. Because otherwise I'd think the point of doing scalar in the first place would be more complicated if all the units aren't interchangeable. But then this is speculative, so perhaps you're right and I'm wrong.

I dont know how big the extention to a full scalar MADD is, but the 9 Flops are very promising.
I think ATI will pump the speed with higher clocks, maybe the R600 is clocked with 900 Mhz?

mfg Nakai

Tim Murray
01-Mar-2007, 17:05
Doesn't make sense to me. If it's just working on R600 since it's written in DX9 SM3 - why would any GF6/7 and especially 8 refuse SM3-code?
Brook team says it's a driver bug, NV says it's a Brook bug.

Jawed
01-Mar-2007, 17:07
Doesn't make sense to me. If it's just working on R600 since it's written in DX9 SM3 - why would any GF6/7 and especially 8 refuse SM3-code?
Because NVidia's drivers aren't so hot.

Bear in mind that some of AMD's drivers also break the DX9/SM3 functionality that F@H relies upon.

So, obviously, R600 running F@H is dependent upon the driver.

CTM and CUDA exist (at least partly) to obviate the driver problem.

Jawed

Jawed
01-Mar-2007, 17:08
NV says it's a Brook bug.
Really, where? As far as I know, NVidia retracted that statement.

http://forums.nvidia.com/index.php?showtopic=28868

Jawed

Anarchist4000
01-Mar-2007, 17:10
To clarify myself CTM uses D3D Assembly but not DirectX or the drivers. So while Nvidia could likely take the same code and compile it for their cards you can't simply move it to any system supporting DirectX9 and fire up the application.

Jawed
01-Mar-2007, 17:14
To clarify myself CTM uses D3D Assembly but not DirectX or the drivers. So while Nvidia could likely take the same code and compile it for their cards you can't simply move it to any system supporting DirectX9 and fire up the application.
F@H uses Brook. Brook is currently written using SM3 code to run on DX9 GPUs.

Supposedly there's a CTM implementation of Brook coming.

Stanford hasn't expressed much interest in writing a Brook-CTM or CTM version of F@H because that shuts the door on supporting NVidia cards.

Stanford wants to get G80 onboard, but the DX9 drivers are getting in the way.

Jawed

Anarchist4000
01-Mar-2007, 17:23
I was under the impression that they were using Brook to simply wrap up CTM. So when they were finished it'd just be a matter of changing a compiler option to generate the desired code. So the application would be coded in Brook and then ported to whatever platform they desired.

turtle
01-Mar-2007, 17:24
Here's another thought...I keep thinking of that 512GFLOP number, and how it could work. Not only because of the rumors with that number for over a year and the 'greater than 1/2Tflop' from Orton, but also now this 1Tera exhibition which seems to demonstrate it being higher than 1/2Tflop and the 'no less than 800mhz' report which seems to imply it is the nearest roundish number that will cause it to be over 500Gflops...

Taking that number (512Gflops), and Geo's math (10*64), 800mhz equals exactly 512Gflops.

So currently, I think 10 is correct, and they have been aiming for 800mhz all along. It just all seems to line up.

Geo
01-Mar-2007, 17:27
Brook team says it's a driver bug, NV says it's a Brook bug.

NV now acknowledges its a driver bug and have filed an issue on it. So that's progress. What kind of priority it is getting, however. . . . I haven't heard anyone from NV step up and say. We pinged them on it last week, but they haven't responded.

Rys
01-Mar-2007, 17:35
But the R600 has 5D-Shader.:roll:
That doesn't mean all of those ALUs are equal. Infact in R600's case I'm rather assuming they're not.

Jawed
01-Mar-2007, 17:39
I was under the impression that they were using Brook to simply wrap up CTM. So when they were finished it'd just be a matter of changing a compiler option to generate the desired code. So the application would be coded in Brook and then ported to whatever platform they desired.
CTM is platform specific (in fact there's prolly one CTM for R5xx and another for R6xx) and CTM is newer than Brook and F@H on GPU.

CUDA has the notional advantage of not being GPU-specific. It has been created as a forward-looking, "higher-level" platform.

You can write code for CUDA, now, that will work without any changes on G90. There are two caveats as far as I can tell:

it's possible to write "loose" CUDA that works on G80 now, but doesn't work on G90 because the programmer unwittingly used a "feature" of the execution model on G80 - e.g. it's possible to configure a block and thread sizing which forces a particular execution order on G80
CUDA code written for G80 won't necessarily run at maximum performance on G90I can't see how these caveats can be avoided, because the memory model and execution scheduling will always get "better". The same goes for CTM or GPGPU - newer GPUs will run code more effectively if the code is rewritten specifically for them.

Jawed

Nakai
01-Mar-2007, 17:43
That doesn't mean all of those ALUs are equal. Infact in R600's case I'm rather assuming they're not.

The Xenos Shader are 5D, too, but they are Vec4 + scalar ADD.:wink:

2 R600 bring 1000Gflops of raw MAD power. Maybe they are not Vec4 + scalar ADD, but Vec5 splitt in Vec4 + scalar MAD.

mfg nakai

Rys
01-Mar-2007, 17:48
2 R600 bring 1000Gflops of raw MAD power. Maybe they are not Vec4 + scalar ADD, but Vec5 splitt in Vec4 + scalar MAD.
I'll bet good money that it's not. It's (very) likely that it's a completely scalar processing architecture, but again that doesn't mean all the ALUs are equal (since they can still be different even if they're all scalar and capable of a MAD).

Sound_Card
01-Mar-2007, 17:53
I wonder if these roumors of Barcalona comming Q3 are even true? Could we see a joint lauch of Barcalona and R600 in Q2?

nelg
01-Mar-2007, 17:55
Brook team says it's a driver bug, NV says it's a Brook bug.

Bugs everywhere are saying "sure, blame it all on us". They believe that it is Mike Huston and Jen-Hsun Huang problem.

Rys
01-Mar-2007, 17:58
I wonder if these roumors of Barcalona comming Q3 are even true? Could we see a joint lauch of Barcalona and R600 in Q2?
Well, the likely pairing is Agena + R600, Agena being the Athlon FX variant rather than Barcelona/Opteron. At this point I'm inclined to believe that's a decent possibility, given that I'm sure the FX product team are looking for Agena to be publically available closely behind Barcelona and that the common thinking is that Barcelona is probably closer than Q3 (and close enough for me to touch recently :razz: ). So I wouldn't be surprised at this point if limited Agena samples were created just for the purposes of high-end R600 eval by press.

Sound_Card
01-Mar-2007, 18:03
Well, the likely pairing is Agena + R600, Agena being the Athlon FX variant rather than Barcelona/Opteron. At this point I'm inclined to believe that's a decent possibility, given that I'm sure the FX product team are looking for Agena to be publically available closely behind Barcelona and that the common thinking is that Barcelona is probably closer than Q3 (and close enough for me to touch recently :razz: ). So I wouldn't be surprised at this point if limited Agena samples were created just for the purposes of high-end R600 eval by press.

I'm 100% sure of my self that AMD will have some samples out with R600 to demostrate the power of both the chips.

This is going to be intresting for sure. I wonder how things will play out. I just hope June is not true, little too late for my taste.

Tim Murray
01-Mar-2007, 18:05
Really, where? As far as I know, NVidia retracted that statement.

http://forums.nvidia.com/index.php?showtopic=28868

Jawed
Missed that update, thanks.

WaltC
01-Mar-2007, 18:06
So again, why the hell did they delay it?

I honestly cant believe it. It seems like ATI just did it to lose.

The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first.

Yes, it's apparent that AMD(ATi) is "marching to a different drummer" and going about things in the way that suits it instead doing something cookie-cutter just like everybody else. What you might find strange about that I can't imagine. I think it's refreshing to see a little originality now and then. Really, market leaders and innovators don't spend their time copying what everybody else is doing because they are too busy forging a path of their own--and what usually happens is that after they embark on that path the cookie-cutter crowd falls all over itself trying to fall in line behind them (This sounds a bit cliche', I know, but there it is...;))

Witness the sea change in nVidia's direction after the launch of R300, for instance. R300 went from being "the wrong direction for 3d gaming" according to nVidia press releases and interviews made in the year after the R300 launch, to being the essential platform blueprint for the design of nVidia's nV40. Then there's Intel, which went from ignoring the Athlon as though it did not exist, to pushing Prescott as an "Athlon killer" for an entire year or longer before Prescott launched-then-imploded, to an anti-64 bit desktop, Itanium-based PR campaign, all the way right up to the x86-64 Core 2--which is a lot more like the Athlon 64 in design than it is like Prescott, etc. And now that Barcelona is beginning to be unveiled ( http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939 ), Intel is falling all over itself jabbering excitedly about its upcoming 45nm Core 2 production even at a time when the company is throwing everything it has got into ramping its 65nm Core 2 production capacities.

As to what you mean by "It seems like ATI just did it to lose" I'm afraid I haven't got the foggiest...;) Lose what, exactly? I mean, it's a little hard to "lose" before you get started, I think. Now, if after the company launches it looks like what they've launched isn't competitive with what the cookie-cutter crowd is doing right now, then we can revisit this issue and I'll probably agree with you. But at the moment it isn't clear to me that AMD/Ati has lost anything--yet. We'll know who is "winning" and "who is losing" in a few weeks, imo, with respect to R600.

icecold1983
01-Mar-2007, 18:06
That sounds like more effort than I want to engage in just now. :smile: But here's a start.

We got three new facts yesterday. Facts, because they were stated by AMD. They are:

1). 200W power usage
2). 320 "multiply-accumulate units"
3). 1/2 teraflop math power

Then you start conjuring from there, from what we know, think we know, etc from "our story so far" regarding R600 going back to early 2006 (tho, when talking about R600, if you wanted to you could really go back to just after the R300 launch, when Orton first enthused about something he called R400 at the time, which explains why X800 was based on "R420" instead. . .but I digress!).

Some of these things are linked upstream in post 1 of this thread. The ones I personally like to lean on would be Xbit's report that R600 was 64 shaders. And AMD's statement that they'd leveraged Xenos. Then mix in some "Version 2 of unified" statements.

how much faster do you personally expect r600 to end up being on avg over g80 in situations that matter(aka 1920 x 1200 with 4/16)?

mhouston
01-Mar-2007, 18:15
Bugs everywhere are saying "sure, blame it all on us". They believe that it is Mike *Houston* and Jen-Hsun Huang problem.

It's always the bugs fault. I think we should irradicate all bugs. I mean, honestly, what are they good for?

(Moving quickly away from an R600 thread discussion, but taking the points head on)

FAH hasn't moved to CTM yet, but since FAH is written in Brook, we can in theory retarget to anything we have a Brook backend for. (The Brook CTM backend is alpha, and will move to beta level once we get 2 bug fixes from AMD and I have time to clean up the code a little and add more sanity checking) CTM is actually just an interface API, so that in theory shouldn't change between hardware. What will change in the assembly that hits the chip that is generated by AMD's compiler (amucomp) that you can opt to use. So R5XX and R6XX may differ there, but we are going to compiler from Brook->PS3/4->R5XX/R6XX anyway, so it's not an issue in that case.

CUDA is very much a superset of Brook. Besides a more robust compiler on the user side allowing for more flexible C/C++ expressions, CUDA adds the notion of shared memory which Brook has never dealt with (Sequoia is designed for complex memory hierarchies). Scatter is in the full Brook language spec, but is not in the GPU spec currently as it's not available in a neutral API (I really wanted it in DX10). In theory, if you were willing to not use the PDC on G80 and live with a subset of CUDA functionality, someone could write a source to source translater from Brook to CUDA. My preference would be to have access to what CUDA actually uses in the driver, but I can understand the reluctance to do so.

Anarchist4000
01-Mar-2007, 18:22
CTM is platform specific (in fact there's prolly one CTM for R5xx and another for R6xx) and CTM is newer than Brook and F@H on GPU.

From what I've seen that isn't the case. You can make programs using D3D Assembly or HLSL and they gets turned into card specific assembly at runtime. When a device is initialized it pulls an ID for it. When the command buffers are created they are compiled to the desired target. It'd be no different than writing a shader capable of running on hardware from many IHVs.

I'm sure support for R600 is only a matter of AMD adding support to their compiler. After all D3D Assembly is something all cards are capable of running and compiled by the driver at runtime. From the CTM programs I've written I haven't seen any reason that would prevent the code from running on R600. You could write all the functions in HLSL if you really wanted which should easily be portable to any platform.

CJ
01-Mar-2007, 18:30
Well it's kinda official now. AMD today told several people that they will launch a complete DX10 lineup in the middle of Q2. The middle of Q2 is probably May (at least it is in my book). And it looks like they'll have availability at launch.

Anarchist4000
01-Mar-2007, 18:49
Some additional R600 "prototype" pix.
http://content.zdnet.com/2346-10741_22-57089-1.html

Jawed
01-Mar-2007, 18:51
From what I've seen that isn't the case. You can make programs using D3D Assembly or HLSL and they gets turned into card specific assembly at runtime.
Oh, I thought CTM has no runtime component. I thought it was a purely compiled environment.

When a device is initialized it pulls an ID for it. When the command buffers are created they are compiled to the desired target. It'd be no different than writing a shader capable of running on hardware from many IHVs.
That's more flexible than I realised ... too long since I read the CTM documentation :oops:

Though I guess you still run the risk that CTM written originally for R5xx doesn't make the most effective use of R6xx or newer GPUs simply because of architecture.

Jawed

Geo
01-Mar-2007, 18:53
Witness the sea change in nVidia's direction after the launch of R300, for instance. R300 went from being "the wrong direction for 3d gaming" according to nVidia press releases and interviews made in the year after the R300 launch, to being the essential platform blueprint for the design of nVidia's nV40.

While "essential platform blueprint" is a bit over the top, I think, it is true that a leading web site <kaff> said this at the launch of NV40:

Jen-Hsun’s comments at the start of the recent NVIDIA Editors Day suggested that they had taken onboard the principals that ATI set forth with R300 of going for a very parallel architecture. As you look beyond the wide pipelined nature of NV40 and begin to look at little more at the pixel shader composition and the various quality options available you begin to see that this is not all NVIDIA have adopted. NV40 is not a particularly revolutionary architecture, but very evolutionary from many of the principals ATI delivered on some 20 months ago, combined with some of the better elements of NVIDIA’s previous architectures.

But that was then, and it might be time to let go of R300 vs NV30, for both sides. I mean, really. . .4.5 years isn't just a long time in the graphics world, it's ancient. And while we shouldn't forget history, neither should we overly fixate on one point over others.

INKster
01-Mar-2007, 18:54
And still no sign of retail reference card pics... :???:
Is it a OEM version all they have to demonstrate at this point ?

Geo
01-Mar-2007, 18:58
And still no sign of retail reference card pics... :???:
Is it a OEM version all they have to demonstrate at this point ?

Well, there was that one partial at VR-Zone, but yeah, I thought it curious they chose to show the OEM version yesterday.

Anarchist4000
01-Mar-2007, 19:11
Oh, I thought CTM has no runtime component. I thought it was a purely compiled environment.

That's true in as far as it's only an interface to the card itself. There are other utilities such as AMUComp that mhouston mentioned that can build those compiled binaries for you. I don't think any of the documentation or libraries are public though but they do exist. But just like any shader you can write one that performs better on one card versus another. But since you're using either fxc or amucomp to compile programs a lot of optimizations are done for you. You still have to start with something that can be optimized however.

IbaneZ
01-Mar-2007, 19:55
So it's late june now?

Crap.

Tim Murray
01-Mar-2007, 19:58
So it's late june now?

Crap.
no, it's "sometime in Q2." Q2 ends in June, so it will be out by the end of June, period

theystolemyname
01-Mar-2007, 20:10
http://content.zdnet.com/2346-10741_22-57089.html

http://i.zdnet.com/gallery/57090-525-204.jpg

IbaneZ
01-Mar-2007, 20:15
no, it's "sometime in Q2." Q2 ends in June, so it will be out by the end of June, period

Well, who gives a shite. Too late and too boring.

Short summer up here man, I'd rather spend it on the beach. ;)

Natoma
01-Mar-2007, 20:20
http://content.zdnet.com/2346-10741_22-57089.html

http://i.zdnet.com/gallery/57090-525-204.jpg

Already linked several times.

CJ
01-Mar-2007, 20:36
no, it's "sometime in Q2." Q2 ends in June, so it will be out by the end of June, period

Actually "sometime in Q2" officially changed to "middle of Q2" today. But that can mean anything, like having limited availability at launch and good availability in June or good availability from day 1.

Geo
01-Mar-2007, 20:38
Well, as I said, we're still sticking to what we said on the front page last week. And it sounds like others are coming around. :smile:

vertex_shader
01-Mar-2007, 20:50
Actually "sometime in Q2" officially changed to "middle of Q2" today. But that can mean anything, like having limited availability at launch and good availability in June or good availability from day 1.

So the delay reason was a lie when no hard launch coming :roll: , it will be a huge dissapointment for me when after all this delays AMD can't make a hard launch with r600xtx, and with midrange cards.

vertex_shader
01-Mar-2007, 20:54
Well it's kinda official now. AMD today told several people that they will launch a complete DX10 lineup in the middle of Q2. The middle of Q2 is probably May (at least it is in my book). And it looks like they'll have availability at launch.

May 15 12pm is the middle of Q2, and its tuesday, ATi favorite day for product launches :smile:

3dilettante
01-Mar-2007, 21:01
If May 12 were the date, it would be a pretty good bet that it would not be a R600+Barcelona platform launch.

Barcelona's not expected until June from what I've read, and if AMD's insisting on a hard launch for R600, it would be kind of iffy if they went soft on Barcelona.

hoom
01-Mar-2007, 21:15
One thing I really hope to get from the launch of R600 is a good description of the evolution of the config from R400 -> R500 -> Xenon -> R600

Maybe they are waiting on Duke Nukem Forever for a R600 + Barcelona + DNF launch? :arrow: Exit stage right

trinibwoy
01-Mar-2007, 21:46
As to what you mean by "It seems like ATI just did it to lose" I'm afraid I haven't got the foggiest...;) Lose what, exactly? I mean, it's a little hard to "lose" before you get started, I think. Now, if after the company launches it looks like what they've launched isn't competitive with what the cookie-cutter crowd is doing right now, then we can revisit this issue and I'll probably agree with you. But at the moment it isn't clear to me that AMD/Ati has lost anything--yet. We'll know who is "winning" and "who is losing" in a few weeks, imo, with respect to R600.

Heh, Walt only you could twist a delayed product announcement by ATi into an innovative breath of fresh air. So shipping and selling products on time is now "cookie-cutter"? You're the breath of fresh air man - just made my day :lol:

This isn't a game where you fight only when you know you can win - how is R600 launching six months after G80 going to be any kind of victory? But alas, R600 could launch in 2009 and you'll still think it's the best strategic move ever!

trinibwoy
01-Mar-2007, 22:01
So are we sold on AMD delaying R600 intentionally just so it can launch with Barcelona? I have a really hard time grasping the strategic and economic advantages of such a move - can someone smarter than me help me out here?

I could guess that they're banking on stalling the market somewhat since they know a lot of people are waiting for R600. Now it would only make sense to do that if they were banking on these same people picking up a CPU while they're at it. Yet we have no word on the desktop variants so that plan sounds doubtful.

I would really like seeing some points on this that don't distill down to "it will be cool to launch all the cards together".

Geo
01-Mar-2007, 22:06
Well, Richard claimed to have actual market feedback on the point:

Launching Direct X 10 chipsets piecemeal was not welcome among AMD's OEMs, and would not give the best economic value for the firm, despite the clamour for new kit from hardcore gamers and computing enthusiasts, he added.

Which arguably begins to look like a sign of AMD changing ATI practices to focus more towards mainstream/volume corporate goals.

Edit: Which is actually different than your Barcelona point, on which I'm not sold. I was just addressing the "intentional" part.

Arty
01-Mar-2007, 22:08
My understanding of the situation is that AMD would never had decent availability if they stuck to their original schedule - launching in march. However delaying the launch by a little allows them to have better availability plus they could also afford to soft launch their mainstream.

I still find it hard to picture Barcelona into the equation. :???:

Sound_Card
01-Mar-2007, 22:22
how is R600 launching six months after G80 going to be any kind of victory?

and that all depends on performance.:wink:

Unknown Soldier
01-Mar-2007, 22:39
"ATI R600 and the next field demonstration engines GDC"
http://66.249.93.104/translate_c?hl=en&langpair=zh%7Cen&u=http://news.mydrivers.com/1/78/78496.htm

Kewl... looks like Gears of War. Hope it plays well and that there's inter-operability between PC, XBTS and PS3 for deathmatches. :D

US

Sound_Card
01-Mar-2007, 22:42
Uh, I think Barcelona is actully closer than we actully think....

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939&p=2


http://images.anandtech.com/reviews/cpu/amd/barcelona/die.jpg

zealotonous
01-Mar-2007, 22:45
So are we sold on AMD delaying R600 intentionally just so it can launch with Barcelona? I have a really hard time grasping the strategic and economic advantages of such a move - can someone smarter than me help me out here?

I could guess that they're banking on stalling the market somewhat since they know a lot of people are waiting for R600. Now it would only make sense to do that if they were banking on these same people picking up a CPU while they're at it. Yet we have no word on the desktop variants so that plan sounds doubtful.

I would really like seeing some points on this that don't distill down to "it will be cool to launch all the cards together".

I agree that there is no strategic benefit to delaying this product to coincide with other product launches. Limited availability or mediocre performance, or a combination of both is my guess. Buying time allows one or both of those issues to be resolved. Now if R600 comes out and crushes G80, then I think it would be safe to say that availability was the issue.

Natoma
01-Mar-2007, 22:46
Well, Richard claimed to have actual market feedback on the point:



Which arguably begins to look like a sign of AMD changing ATI practices to focus more towards mainstream/volume corporate goals.

Edit: Which is actually different than your Barcelona point, on which I'm not sold. I was just addressing the "intentional" part.

I just want to point out for the record that when the news of the delay first came out, that was the first thing I mentioned. And some of you mods poo poo'd me into oblivion. :twisted:

* Natoma looks askance http://www.rage3d.com/board/images/smilies/bleh2.gif

Unknown Soldier
01-Mar-2007, 22:47
So AMD 'hold back' the R600 so that they can demonstrate Barcelona, and by the time the R600 is released, Barcelona will still not be on shelves?

Would be a stupid move, a damn stupid move imo. I think Barcelona will be released by the time R600 is, although why not releasing R600 now is still a stupid move imo.

US

swaaye
01-Mar-2007, 22:56
Maybe a R600-based FireGL will launch too?

zealotonous
01-Mar-2007, 23:15
Uh, I think Barcelona is actully closer than we actully think....

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2939&p=2


http://images.anandtech.com/reviews/cpu/amd/barcelona/die.jpg

According to that Anadtech article, it is saying that the Opteron part based on Barcelona will launch first (which is typical for AMD) around the middle of the year. The desktop chips including the FX will follow. Considering that the Opteron is their workstation or server based product, why would they delay an enthusiast part like the R600 to launch it alongside an Opteron? Wouldn't it make more sense for them to launch R600 alongside an FX based part?

Anyone know when the Barcelona based FX part will launch?

Arty
01-Mar-2007, 23:33
According to that Anadtech article, it is saying that the Opteron part based on Barcelona will launch first (which is typical for AMD) around the middle of the year. The desktop chips including the FX will follow. Considering that the Opteron is their workstation or server based product, why would they delay an enthusiast part like the R600 to launch it alongside an Opteron? Wouldn't it make more sense for them to launch R600 alongside an FX based part?

Anyone know when the Barcelona based FX part will launch?
I'll quote Rys (http://www.beyond3d.com/forum/showpost.php?p=938869&postcount=100) from page 4 of this very thread:

Well, the likely pairing is Agena + R600, Agena being the Athlon FX variant rather than Barcelona/Opteron. At this point I'm inclined to believe that's a decent possibility, given that I'm sure the FX product team are looking for Agena to be publically available closely behind Barcelona and that the common thinking is that Barcelona is probably closer than Q3 (and close enough for me to touch recently :razz: ). So I wouldn't be surprised at this point if limited Agena samples were created just for the purposes of high-end R600 eval by press.

flippin_waffles
02-Mar-2007, 00:10
http://www.syndrome-oc.net/articles.php?article=94

About 3/4 to 7/8 way through. By Philip Eisler: "I think you'll have to come to Amsterdam to find out". I thought it was cancelled?

[edit]
Spelled Amsterdam wrong.

Geo
02-Mar-2007, 00:17
Amsterdam was cancelled? :shock: Where will Euros go for their hookers and drugs now? :cool:

Okay, so a specific date was scratched off. Does that mean when rescheduled it won't be in Amsterdam?

trinibwoy
02-Mar-2007, 00:42
Which arguably begins to look like a sign of AMD changing ATI practices to focus more towards mainstream/volume corporate goals.

That I can definitely buy into. At my firm there is a distinct divide between short-term and long-term greedy. If AMD is trying to develop more effective procedures for getting its products to market and making lives easier for their bread and butter OEM partners then more power to them.

However, this still doesn't explain the sudden change in plans. If you're committing to a new corporate strategy you don't do so by making tactical changes at the last minute. So that still leaves a question unanswered - what made AMD postpone editor's day?

flippin_waffles
02-Mar-2007, 00:49
Amsterdam was cancelled? :shock: Where will Euros go for their hookers and drugs now? :cool:

Okay, so a specific date was scratched off. Does that mean when rescheduled it won't be in Amsterdam?


No it does not. But I think you know exactly what I mean (in regards to Amsterdam being cancelled). Sorry if my articulation wasn't up to your standards.... :lol:

DemoCoder
02-Mar-2007, 01:01
This isn't a game where you fight only when you know you can win - how is R600 launching six months after G80 going to be any kind of victory? But alas, R600 could launch in 2009 and you'll still think it's the best strategic move ever!

I bet Walt thinks 3DRealms Duke Nukem Forever is the greatest business strategy ever devised. :)

trinibwoy
02-Mar-2007, 01:22
Heh I get the sense that geo knows the real deal and he's getting really frustruated with all us headless chickens pecking around in the dark. He seems to have a low tolerance for the whiners and bitchers as of late :lol:

Geo
02-Mar-2007, 01:32
Believe me, no one dislikes it more than the staff does when either IHV makes a screwup. Because we're left to deal with the fallout. :smile:

Kaotik
02-Mar-2007, 01:51
Well, there was that one partial at VR-Zone, but yeah, I thought it curious they chose to show the OEM version yesterday.

Maybe far fetched idea, but could it be as simple as "they've seen the oem card already in million pics, we can still keep the retail form secret if we use them on demo machines"

bdotobdot2
02-Mar-2007, 01:53
Mt apologies if this was mentioned, tough to read the whole thread at work...

For reference, how many flops would a top end AMD X2 6000+ and two x1950XTXs generate?

Rangers
02-Mar-2007, 02:52
So are we sold on AMD delaying R600 intentionally just so it can launch with Barcelona? I have a really hard time grasping the strategic and economic advantages of such a move - can someone smarter than me help me out here?

I could guess that they're banking on stalling the market somewhat since they know a lot of people are waiting for R600. Now it would only make sense to do that if they were banking on these same people picking up a CPU while they're at it. Yet we have no word on the desktop variants so that plan sounds doubtful.

I would really like seeing some points on this that don't distill down to "it will be cool to launch all the cards together".


I cant tell you my real feelings on this or I'll get in trouble.

Lets just say the sooner Intel gets into graphics the better, because ATI is not interested in competing, to the point of losing on purpose, imo.

I know that sounds stupid, but it's also very obvious. R600 was ready, and they chose to delay it for no reason at all, essentially. What other explanation is there?

I'm sick of this company. They might as well let Nvidia buy them already, it's what they've wanted all along.

It's going to suck having only one graphics IHV, but oh well. ATI has proved what they want, failure.

Your questions Trinibwoy are obvious ones. Why the hell did they do this. A bug, hardware problems? Those would make sense but they're clearly not the case. So it boggles the mind that they would delay, when they are already so late, it truly does.

There only seem to be two flimsy possible reasons. To launch with Barcelona, or too launch a full lineup at once. Both reasons are emintly stupid and ridiculous, and a true sign of a anti-competitive company. It's not even imaginable that Nvidia would do this, for example.

turtle
02-Mar-2007, 03:45
You're forgetting the most probable reason, the possibility of a hard vs. soft launch, and the time needed to create the chips for a hard launch exceeding the previous launch window. It may also allow them to do these others things at the same time by doing so; family launch, K10, better drivers, physics API/CTM apps beyond F@h, cheaper ram prices for cheaper production and perhaps cheaper prices...Who knows what else could happen in that time frame compared to launching in March.

I don't think it's a bad idea when MANY of their high-profile launches over the last several years have been soft, and they BADLY need to reverse the PR on their launches. Nvidia has done it, with the exception of the 7800GTX512, and it's done wonders for them. I think AMD is trying to do the same for ATi.

Cause (creating supply for hard launch) = effect (possible family launch...etc), rather than effect (late launch) = cause (waiting for Barcelona).

INKster
02-Mar-2007, 04:00
You're forgetting the most probable reason, the possibility of a hard vs. soft launch, and the time needed to create the chips for a hard launch exceeding the previous launch window. It may also allow them to do these others things at the same time by doing so; family launch, Barcelona, better drivers, physics API/CTM apps beyond F@h, cheaper ram prices for cheaper production and perhaps cheaper prices...Who knows what else could happen in that time frame compared to launching in March.

I don't think it's a bad idea when MANY of their high-profile launches over the last several years have been soft, and they BADLY need to reverse the PR on their launches. Nvidia has done it, with the exception of the 7800GTX512, and it's done wonders for them. I think AMD is trying to do the same for ATi.

Cause (creating supply for hard launch) = effect (possible family launch...etc), rather than effect (late launch) = cause (waiting for Barcelona).

And if they do that, what will happen over the next several months until 2008 ?
No launches ? No press ?

Will they leave G80 refresh to Nvidia's marketing alone ?
Will they leave Core 2's 45nm refresh to Intel ?

Too many answers are yet to be found in this line of thinking.


IMHO, the reasons for the delay were simple: high speed GDDR4 price and availability.

overclocked_enthusiasm
02-Mar-2007, 04:07
Think about this from a purely business standpoint from AMD. CPU sales account for about $1.35 billion per quarter. ATI sales are about $500 million per quarter total. Say about 60-70% of that is pure GPU sales so let's call it $300 million. Therefore, the CPU business drives 4x the amount of revenue that the GPU business does and brings in much fatter margins.

AMD need to do several things right now to regain traction against Intel:

1. Demonstrate they have the technology lead again.
2. Drive sales of their chipsets, CPUs and GPUs concurrently with a "total platform" approach.
3. Improve corporate gross margins at all costs to improve cash flow.
4. Regain that "top of mind awareness" that comes from having the technology lead and the hot product(s).

AMD needs to regain traction in servers, high end desktop and mobile to keep their company liquid. If Barcelona isn't it, AMD is screwed...totally screwed. So what do they do? They use EVERY tool they have to show Barcelona in the BEST possible light to get the BUZZ going again about AMD products. Launching R600 by itself would only prove that ATI produced a great chip...not what AMD has done to counter Intel.

AMD needs Barcelona to be another Opteron or they are in serious financial trouble...they know it and are betting the farm on Barcelona and using R600 to help the buzz. They MUST stimulate server and high end desktop again or their cashflow will remain negative with their product mix skewed to the low end SKUs.

By showing a Barcelona/R600 launch you get MAJOR buzz. Likely the fatsest CPU/GPU combination on the planet. So instead of getting headlines that say "R600 fastest video card" you get "AMD powered computer sets new records". People can say in one breath...AMD has the fastest hardware on the planet. CPU, GPU and maybe chipset as well. AMD wants to sell CPUs first, followed by GPUs followed by chipsets to improve their gross margins.

In regards to OEMS, having multiple products available in quantity will help them with design wins and help them move product and maybe grab the coveted high end spots like XPS at Dell. Right now they are painted into the low margin low cost corner and it is crushing their margins.

So I can see why AMD is delaying R600 IF and ONLY if Barcelona desktop chips are not far behind. Hard launch R600, hard launch Barcelona servers and paper launch Barcelona desktop parts all on the same day. Otherwise what is the point and there must be a yield/technical issue or supplier availability issue that was delaying the finished product.

What I CANNOT explain is why it appears this was a last second decision? Maybe Q1 unit sales were in the toilet, Q2 was looking the same and they simply HAD to do this or risk major carnage. Or maybe someone had an epiphany???

EDIT* Typos

overclocked_enthusiasm
02-Mar-2007, 04:23
That Barcelona/R600 demo was no coincidence and if you read Henri's comments the past week or so I think they can see where they are going.

turtle
02-Mar-2007, 04:53
509Gflops?

http://hardocp.com/images/news/1172804747QRR8ewfqvI_1_1_l.jpg

What looks like retail R600's using a 4x4 chipset/Agena FX CPUs they used to demo the teraflop:

http://hardocp.com/images/news/1172804747QRR8ewfqvI_1_2_l.jpg


I'm still betting on 512GFLOPs, and them being clocked at 800mhz, but a few flops are needed for the display etc and are unavailable to brook (F@H)
I guess it could also be a dual opteron board that supports crossfire...Maybe. I like the first option better. :)

http://hardocp.com/news.html?news=MjQ0MTcsLCxobmV3cywsLDE=

Cuthalu
02-Mar-2007, 05:04
Those look just as long as the OEM-version with the same fan placement. :-?

Scary powerfull PSU aswell.

Sobek
02-Mar-2007, 05:04
No internal/external CF bridge?

Oh and by the way, drool. :smile:

*edit* Thank god I have dual PSU's and a Stacker. I was getting worried for a second there :razz:

Arty
02-Mar-2007, 05:04
What I CANNOT explain is why it appears this was a last second decision? Maybe Q1 unit sales were in the toilet, Q2 was looking the same and they simply HAD to do this or risk major carnage. Or maybe someone had an epiphany???
How smooth was ATI's integration into AMD?

Those look just as long as the OEM-version with the same fan placement. :-?

Scary powerfull PSU aswell.
Because those are OEM cards.

As for the PSU, dual-cpu, dual-gpu system and its obvious. :P

INKster
02-Mar-2007, 05:08
Did i read the article correctly ?
300W power consumption for a single GPU ??? 600W for a Crossfire Setup ??????
Jesus Christ ! (and i'm not even a religious man :D)


P.S.:
AMD's cable management skills suck.

Razor1
02-Mar-2007, 05:10
hmm anti christ? :lol: room will sure get hot enough!

Where the hell does 509 flops come from, what config lol

Natoma
02-Mar-2007, 05:10
509Gflops?

If Crossfire is 100% efficient in doubling R600, then yes. But that just doesn't happen. So I'm wagering it's actually much higher than 509 Gflops for an individual card.

Cuthalu
02-Mar-2007, 05:19
How could they possibly screw up thermal managment so bad that with about the same amount of transistros as nVidia they can achieve double of nVidias power consumption? This is false rumour, or ATi needs to get better engineers...

turtle
02-Mar-2007, 05:21
Where the hell does 509 flops come from, what config lol

Hey, it was a reasonable guess before. :razz:

It seems to me that test is only testing the R600's (surely I could be wrong)...So 1018/2 = 509GFLOPS/S. If nothing else, it seems like they're all MADD...and darn close to the theoretical 100% if I'm right.

If Crossfire is 100% efficient in doubling R600, then yes. But that just doesn't happen. So I'm wagering it's actually much higher than 509 Gflops for an individual card.

Very good point, and you very well could be right. Does anyone know off-hand the difference in usable Flops in F@H with one x19xxxx vs 2x in crossfire?

Jawed
02-Mar-2007, 05:29
509Gflops?
At 2D, not 3D clocks?

Jawed

Sobek
02-Mar-2007, 05:30
Hmm, correct me if i'm wrong, but doesn't that giant cooler on the OEM card need 24w or so? If so, I think we can potentially say a single R600 would only need ~280w...sounds better than 300! :razz:

Jawed
02-Mar-2007, 05:32
If Crossfire is 100% efficient in doubling R600, then yes. But that just doesn't happen. So I'm wagering it's actually much higher than 509 Gflops for an individual card.
They'll be running as two independent cards. When you run the dual-GPU folding@home you have to turn off CrossFire.

Jawed

INKster
02-Mar-2007, 05:32
hmm anti christ? :lol: room will sure get hot enough!

Where the hell does 509 flops come from, what config lol

I'll tell you one thing.
No mater how powerful they are, i will not have another space-heater in my main rig if that 300W figure checks out.
I had a hard time swallowing 240W, but 300W is really pushing it over the edge.
That's basically double the power consumption of a loaded 8800 GTX, even though it's made on a more advanced process (80nm).

And if the performance doesn't completely "incinerate" the competition (no, 10 to 15% won't do in my book anymore, not with this much juice), ATI/AMD is in a really big trouble.

Rangers
02-Mar-2007, 05:38
I'll tell you one thing.
No mater how powerful they are, i will not have another space-heater in my main rig if that 300W figure checks out.
I had a hard time swallowing 240W, but 300W is really pushing it over the edge.
That's basically double the power consumption of a loaded 8800 GTX, even though it's made on a more advanced process (80nm).

And if the performance doesn't completely "incinerate" the competition (no, 10 to 15% won't do in my book anymore, not with this much juice), ATI/AMD is in a really big trouble.


Eh, if the performance was there, enthusiasts dont care about power consumption.

Otherwise SLI/Crossfire/QuadSLI wouldn't exist.

Believe me, if it's powerful enough there will be lines to get it. Go on Hardocp forums, that particular type of user, everybody has a monster SLI rig and switched to Core Dou the minute it came out. To those guys, a 1,000 watt PSU is a badge of honor.

INKster
02-Mar-2007, 05:41
Eh, if the performance was there, enthusiasts dont care about power consumption.

Otherwise SLI/Crossfire/QuadSLI wouldn't exist.

Believe me, if it's powerful enough there will be lines to get it. Go on Hardocp forums, that particular type of user, everybody has a monster SLI rig and switched to Core Dou the minute it came out. To those guys, a 1,000 watt PSU is a badge of honor.

I may be an enthusiast, but I'm not an insane one.

Arty
02-Mar-2007, 05:44
Eh, if the performance was there, enthusiasts dont care about power consumption.

Otherwise SLI/Crossfire/QuadSLI wouldn't exist.

Believe me, if it's powerful enough there will be lines to get it. Go on Hardocp forums, that particular type of user, everybody has a monster SLI rig and switched to Core Dou the minute it came out. To those guys, a 1,000 watt PSU is a badge of honor.
Exactly.

All my reasoning with enthusiasts have always fell on deaf ears. So far the only thing they care is about who is the top dog.

Natoma
02-Mar-2007, 05:44
Did i read the article correctly ?
300W power consumption for a single GPU ??? 600W for a Crossfire Setup ??????
Jesus Christ ! (and i'm not even a religious man :D)


P.S.:
AMD's cable management skills suck.

How could they possibly screw up thermal managment so bad that with about the same amount of transistros as nVidia they can achieve double of nVidias power consumption? This is false rumour, or ATi needs to get better engineers...

Let's see. EETimes reported that each R600 uses 200W while [H] is reporting that they use 300W.

Hmmm. Who to believe. :cool:

Rangers
02-Mar-2007, 05:44
On R600 performance, it seems overclocking 8800GTX shader units has little effect compared to overclocking the core (think it was Firing Sqaud that determined this). Because it's everything but the shaders that is the bottleneck on current software.

So I dont know, if AMD is hinting at 320 scalar units with the 320 Mul comments, and they have a core that is at 800mhz plus while the G80, even overclocked, is maybe 675 tops, they could be sitting pretty.

Of course that's all total speculation.

Natoma
02-Mar-2007, 05:51
They'll be running as two independent cards. When you run the dual-GPU folding@home you have to turn off CrossFire.

Jawed

Ah I wasn't aware of that. How exactly does it calculate the final #? One card after the other or concurrently?

Geo
02-Mar-2007, 06:00
So I'm flipping thru the April Maximum PC tonight, and there's an article about the increasing requirements of PSUs. And, lo and behold, a pic of an engineering sample of an 8800GTX (so presumably from last year) with both a 6-pin and 8-pin power connector. Apparently the 8-pin was a late scratch from the production model. I thought, "Man, as many posts as we wasted freaking out over that on R600. . . "

The magazine says:

Power supply vendors tell me the design was yanked at the last minute out of fear that a tyro PC builder would jam the existing eight-pin plug that should go into the motherboard into the GPU. It's tough to do, but with enough force it's actually possible, and the result would be disastrous.

INKster
02-Mar-2007, 06:02
So I'm flipping thru the April Maximum PC tonight, and there's an article about the increasing requirements of PSUs. And, lo and behold, a pic of an engineering sample of an 8800GTX (so presumably from last year) with both a 6-pin and 8-pin power connector. Apparently the 8-pin was a late scratch from the production model. I thought, "Man, as many posts as we wasted freaking out over that on R600. . . "

The magazine says:

But Geo, if someone inserted the 8pin EATX plug on a R600, wouldn't that basically stop the CPU from getting any power and therefore prevent the whole system from even booting up ?

Farhan
02-Mar-2007, 06:07
But Geo, if someone inserted the 8pin EATX plug on a R600, wouldn't that basically stop the CPU from getting any power and therefore prevent the whole system from even booting up ?

When your $600 graphics card dramatically bursts into flames when you try to switch in on, i'm sure the system won't boot up.

Geo
02-Mar-2007, 06:09
But Geo, if someone inserted the 8pin EATX plug on a R600, wouldn't that basically stop the CPU from getting any power and therefore prevent the whole system from even booting up ?

Dunno. Maybe not before it fried the GPU. Just telling you what they said. . .

INKster
02-Mar-2007, 06:11
When your $600 graphics card dramatically bursts into flames when you try to switch in on, i'm sure the system won't boot up.

Which one gets power first. The CPU or the GPU ? ;)
And if there is no CPU, the system won't boot (i think)...

But anyway, the people who buy these things are less likely to do that kind of mistake, if not because there are so few of them. :razz:

Jawed
02-Mar-2007, 06:12
Ah I wasn't aware of that. How exactly does it calculate the final #? One card after the other or concurrently?
Normally there's more than one final number in data parallel applications! For folding, I guess the result is a set of positions and motion vectors for the atoms in the molecule.

Additionally, folding@home uses the two GPUs for two distinct work units. Work is not shared between the two GPUs.

You could design an application to spread itself over two or more GPUs. So, both cards will be computing equal shares of the application. The best performance comes when data does not need to be generated by one GPU and read by the other - or the sharing is minimal.

Jawed

Cuthalu
02-Mar-2007, 06:18
Kyle Bennet seems to be absolutely sure about that 300w:
http://www.hardforum.com/showpost.php?p=1030710530&postcount=35
http://www.hardforum.com/showpost.php?p=1030710546&postcount=39

Natoma
02-Mar-2007, 06:21
Normally there's more than one final number in data parallel applications! For folding, I guess the result is a set of positions and motion vectors for the atoms in the molecule.

Additionally, folding@home uses the two GPUs for two distinct work units. Work is not shared between the two GPUs.

You could design an application to spread itself over two or more GPUs. So, both cards will be computing equal shares of the application. The best performance comes when data does not need to be generated by one GPU and read by the other - or the sharing is minimal.

Jawed

Err, I was actually asking about the MAD test :oops:

Natoma
02-Mar-2007, 06:24
Kyle Bennet seems to be absolutely sure about that 300w:
http://www.hardforum.com/showpost.php?p=1030710530&postcount=35
http://www.hardforum.com/showpost.php?p=1030710546&postcount=39

Not to bring up old news, but wasn't he "absolutely sure" that NV30 would stomp all over R300 too?

Just sayin, EETimes and InformationWeek reported 200W. Not saying that those publications can't be incorrect, but I'd take their reporting as gospel LONG before I take Kyle's reporting as gospel.

turtle
02-Mar-2007, 06:24
I still think it requires 300w of connectors...but not 300w. IOW >225W and < 300W.

There is no way it's going to use any ounce of electricity from those connections, hence, 300w is BS.

Still, greater than 225W is a lot of juice, don't get me wrong, and having 2x8pin and 2x6pin connectors on a psu (or some mangled combination of connectors/adapters adding up to 450w of connections from the PSU not counting mobo power) is a tough pill to swallow.

None-the-less, we've been expecting this for some time AFAIK.

Arty
02-Mar-2007, 06:25
They also suggest that the final R600 board will be 13" long, which doesnt jive either.

turtle
02-Mar-2007, 06:28
It doesn't...Yet those boards in the pic don't look like the OEM parts which are black.

*Hits forehead*

The Stream Processor version of R600 probably does look like that even for retail.

Who the hell knows...At least we've got a better idea of what's going on now. :razz:

DemoCoder
02-Mar-2007, 07:40
Err, I was actually asking about the MAD test :oops:

I think it's pretty clear that they are assigning two independent kernels to two different cards and that the MAD test is the absolute optimal workload, so I doubt crossfire has anything to do with performance. ALU tests are generally one big uber shader that writes out junk data at the end (or throws it away if you can stop the compiler from dead-code elimination). There is no sharing, no texturing, and framebuffer bandwidth is irrelevent. No z-writes either. No AA. No AF. It's just an ALU test, period.

So IMHO, the combined 1018 GFlop figure probably implies 509GFlop peak on a single card, ergo, it's probably 512GFlop theoretical.

overclocked_enthusiasm
02-Mar-2007, 08:06
http://www.extremetech.com/article2/0,1697,2099613,00.asp
"Wednesday's meeting was also used to clear up what Richard characterized as a swath of misconceptions and rumors concerning the delayed R600 —or, what is to be AMD/ATI's first DirectX 10-capable graphics card.

"We pushed out the launch of the R600 and people thought is must be a silicon or software problem…it's got to be a bug," said Dave Orton, president and chief executiveof ATI. "In fact, our mainstream chips are in 65nm and are coming out extremely fast. Because of that configuration, we have an interesting opportunity to come to market with a broader range of products," he explained.

"Instead of having them separate, we thought, lets line that up, so we delayed for several weeks," Orton continued, referring to the R600 family as a whole, which AMD now says will come out at the same time (a matter of weeks as opposed to months, according to Richard) instead of just the high-end version."


Not a whisper about Barcelona...

Anarchist4000
02-Mar-2007, 09:28
They probably got left out of something and are going on an evil tirade because of it. From an engineering standpoint their comments seem to hold little weight. I've yet to see a picture that actually has 3 power connectors on it and even then pulling 100% of the potential power is unlikely. Besides with the 200W reports from the event I'm assuming someone from AMD quoted that as the power draw.

It's possible one of the true SPs might get up there with 2GB of ram and the fan that sucks in small children but I think it would be rather obvious what you're looking at when you see those specs. Not to mention probably hear it in the next room.

CarstenS
02-Mar-2007, 09:35
NV now acknowledges its a driver bug and have filed an issue on it. So that's progress. What kind of priority it is getting, however. . . . I haven't heard anyone from NV step up and say. We pinged them on it last week, but they haven't responded.

Thanks all for the clarification on that matter! :)

pakotlar
02-Mar-2007, 10:37
Kyle Bennet seems to be absolutely sure about that 300w:
http://www.hardforum.com/showpost.php?p=1030710530&postcount=35
http://www.hardforum.com/showpost.php?p=1030710546&postcount=39

Being wrong doesn't take confidence.
edit: That was meant to imply that just because Kyle sometimes shoots streamers out of his ass doesn't mean hes a firework.

Reputator
02-Mar-2007, 11:00
MADD=2, ADD=1

Assuming they're still using the MADD+ADD setup they've been using before.

64*5*3*0.8 = 768.0GFLOPs R600That's incorrect. That's like saying there's a MADD and ADD for every component of each shader. There is actually one MADD in four components (8), plus 1 scalar ADD. If you used your math, the Xenos would have 360 GFLOPs peak. In reality it has 216.

So 64*9*0.8=512 like DemoCoder said.

nicolasb
02-Mar-2007, 11:00
As far as the 300W figure is concerned, that's calculated by adding together the maximum amount of power one can draw through all of the sources available to the card: slot, 6-pin connector, 8-pin connector. The fact that the card has this configuration suggests that it will, occasionally, use more power than a pair of 6-pin connectors plus the slot could provide - in other words that the peak power usage will be over 225W. However, if the peak usage occasionally hits 225W, the typical power usage will be lower.

There's also a strong rumour that the card will function quite happily with a pair of 6-pin connectors in all respects other than that it will disable overclocking in the driver settings. If that's correct then it means the card will achieve default clock-speeds and never exceed 225W even at peak consumption, but that overclocking may push the peak power up to 225W.

All in all, a typical power output of 200W fits quite well with that, I think.

Ss far as the delay is concerned, ATI/|AMD's official line is that R600 was delayed so that it could be launched alongside the mid-range equivalents. I've read nothing from AMD or ATI that suggests a launch with Barcelona - it's simply about launching all R600 variants at the same time so as not to annoy their OEM partners. At least, that's the official story.

If this is true then ATI has insulted its enthusiast customers by holding back an enthusiast-level product when there was no technical need to. If they're lying - well, that's bad too. Either way, ATI has nothing to be proud of, here.

As far as the 1 Teraflops figure is concerned, this is clearly a marketing number. It is therefore possible (IMO) that it includes not just the GPUs but also the CPUs in the box they were demonstrating.

Ailuros
02-Mar-2007, 11:26
That's incorrect. That's like saying there's a MADD and ADD for every component of each shader. There is actually one MADD in four components (8), plus 1 scalar ADD. If you used your math, the Xenos would have 360 GFLOPs peak. In reality it has 216.

So 64*9*0.8=512 like DemoCoder said.

Minor nitpick: 64*9*0.8 = 461 and NOT 512.

It's either more FLOPs per ALU or the frequency is a lot higher than 800MHz.

CarstenS
02-Mar-2007, 11:54
As far as the 1 Teraflops figure is concerned, this is clearly a marketing number. It is therefore possible (IMO) that it includes not just the GPUs but also the CPUs in the box they were demonstrating.

I thought the same, but the pics clearly state that brooks only running two clients on the two R600.

Anteru
02-Mar-2007, 12:00
Minor nitpick: 64*9*0.8 = 461 and NOT 512.

It's either more FLOPs per ALU or the frequency is a lot higher than 800MHz.

888 MHz or 10 operations per cycle ... maybe a 5D-MADD? (if the 64 are right ...)

Anarchist4000
02-Mar-2007, 12:05
That's incorrect. That's like saying there's a MADD and ADD for every component of each shader. There is actually one MADD in four components (8), plus 1 scalar ADD. If you used your math, the Xenos would have 360 GFLOPs peak. In reality it has 216.

I was saying there is a MADD and ADD for every component. I already changed my assumption that it's using a MADD+ADD like they have in the past and went to 5 pure MADDs. Besides how do we know they aren't actually using a MADD+ADD configuration like the the R5x0 series. If they're all MADDs aren't we just looking at a really big xenos with almost no other changes besides the ADD->MADD? They quoted 95% efficiency with Xenos, no clue why they'd set out to actually try and improve that. R580 is already capable of a good deal of the DX10 requirements so they wouldn't need very many radical changes to make that concept work. I'm still not entirely convinced they aren't using a Vec4+1 MADD+ADD setup. If they ran code that didn't use the ADDs we'd see exactly what we are now.

64*5*MADD*.8=512GFLOPs

Also HardOCP said R600 REQUIRED 300W. That's like saying a computer with a 1KW supply will suck 1000W out of your wall socket.

vertex_shader
02-Mar-2007, 12:31
Kyle Bennet seems to be absolutely sure about that 300w:
http://www.hardforum.com/showpost.php?p=1030710530&postcount=35
http://www.hardforum.com/showpost.php?p=1030710546&postcount=39

BS :smile:

trinibwoy
02-Mar-2007, 12:52
I'm still not entirely convinced they aren't using a Vec4+1 MADD+ADD setup. If they ran code that didn't use the ADDs we'd see exactly what we are now.

64*5*MADD*.8=512GFLOPs


If it was vec4+1 MADD+ADD and they ran a MADD test you would get 64*4*MADD*.8 ~ 410 GFLOPs.

IIRC G80s measured MADD issue rate was near peak but it's only 345 GFLOPS worth. So I'm assuming that R600 is MADD capable on all 320 thing-a-magigs.

Anarchist4000
02-Mar-2007, 13:04
[MADD+ADD][MADD+ADD][MADD+ADD][MADD+ADD] + [MADD+ADD]

This setup where all of the ADDs essentially drop out because they aren't being used. So it would effectively look like 64*5*MADD on the benckmark, in reality it would be 64*5*[MADD+ADD]. Doubt that's what they're doing but I'm not going to rule it out yet.

vertex_shader
02-Mar-2007, 13:11
What looks like retail R600's using a 4x4 chipset/Agena FX CPUs they used to demo the teraflop:

Its not the retail cards, the motherboard not ATX style, has 2 cpu socket and very long, check the PSU is long too, looks like as a enermax galaxy http://www.enermax.com/english/product_Display1.asp?PrID=60

Mariner
02-Mar-2007, 13:11
If this is true then ATI has insulted its enthusiast customers by holding back an enthusiast-level product when there was no technical need to. If they're lying - well, that's bad too. Either way, ATI has nothing to be proud of, here.


I don't know, it seems to me that it would be more insulting to customers do the usual 'soft' release where just a few thousand of cards go on the market initially with most consumers having to wait a month or two to get their hands on one. NVidia has been in a strong position for their past couple of launches so they've been able to delay launch long enough for a proper 'hard' launch with plenty of cards on the market.

If R600 is a compelling product which is 'better' than G80 and equal to the expected G80 refresh and if the R600 family mainstream/low-end parts outperform their G80 equivalents, the delay might work out very well indeed. A couple of 'ifs' in there of course!

I must say that I've always been somewhat bemused by the way the real money-makers (i.e. mainstream and low-end chips) are released so long after the high-end parts. I can see why the IHVs do it that way but surely it is better to release the whole family around the same time?

trinibwoy
02-Mar-2007, 13:24
[MADD+ADD][MADD+ADD][MADD+ADD][MADD+ADD] + [MADD+ADD]

This setup where all of the ADDs essentially drop out because they aren't being used. So it would effectively look like 64*5*MADD on the benckmark, in reality it would be 64*5*[MADD+ADD]. Doubt that's what they're doing but I'm not going to rule it out yet.

Yeah that certainly wouldn't be indicative of Xenos heritage. It would also put a single R600 at > 750Gfops. Pretty unlikely IMO.

trinibwoy
02-Mar-2007, 13:25
If R600 is a compelling product which is 'better' than G80 and equal to the expected G80 refresh and if the R600 family mainstream/low-end parts outperform their G80 equivalents, the delay might work out very well indeed. A couple of 'ifs' in there of course!

Wouldn't it have been better to just beat up on G80?

nicolasb
02-Mar-2007, 13:30
I don't know, it seems to me that it would be more insulting to customers do the usual 'soft' release where just a few thousand of cards go on the market initially with most consumers having to wait a month or two to get their hands on one.But that's not what AMD/ATI are saying. Other people have speculated that they are lying, that the real reason for the delay was a lack of availability, and that the business of a simultaneous launch with mid-range products is just a cover-story. However, what AMD/ATI have said is that the only reason for holding up the launch was to launch high-end and mid-range products simultaneously; according to them it was not an availability issue but a purely political decision.

Razor1
02-Mar-2007, 14:38
Well its political and financial if they don't have enough at launch, don't want another x800xt pe, that launch and subsiquent quantities was disasterous. At least this way they could get enough out for just the hand full of people that really want the highest of highest I suppose.

vertex_shader
02-Mar-2007, 14:39
But that's not what AMD/ATI are saying. Other people have speculated that they are lying, that the real reason for the delay was a lack of availability, and that the business of a simultaneous launch with mid-range products is just a cover-story. However, what AMD/ATI have said is that the only reason for holding up the launch was to launch high-end and mid-range products simultaneously; according to them it was not an availability issue but a purely political decision.

Here is another aspect why the delay reason come from AMD sounds like not true:
"AMD R600 delay causes partners to blub with dismay"
http://www.theinq.com/default.aspx?article=37961

When this things from AMD continue in the future, we will see Sapphire geforce cards :smile:

Sound_Card
02-Mar-2007, 14:41
What ever the reason is, I just hope they a good performing chip to stay in the fight.

vertex_shader
02-Mar-2007, 14:55
What ever the reason is, I just hope they a good performing chip to stay in the fight.

I think most of the users (not fanboys) will be dissapointed, so much later coming the card they are except like avarage 60% performance incrase in DX9 over the 8800gtx.

Someone need to make a vote here in beyond3d, i will be dissapointed when the r600xtx not avarage 20% faster in DX9 than the 8800gtx, not have any new kickass IQ future, not come out with hard launch, i not care about anything else, like power usage, size, .. :smile:

I not except a beast anymore, so maybe AMD surprise me :smile:

Geo
02-Mar-2007, 15:00
I think most of the users (not fanboys) will be dissapointed, so much later coming the card they are except like avarage 60% performance incrase in DX9 over the 8800gtx.



So we should be disappointed in a 8800GTX refresh that isn't at least average 60% performance increase over 8800GTX too? My goodness such standards. Can you please refresh my memory on which refreshes in graphics history have met that standard?

trinibwoy
02-Mar-2007, 15:03
:lol:

vertex_shader
02-Mar-2007, 15:05
So we should be disappointed in a 8800GTX refresh that isn't at least 60% performance increase too? My goodness such standards.

Most of the users are live in dreamland, and except things never will be happening.
Whats your vote, when you will be dissaponted about R600?

Can you please refresh my memory on which refreshes in graphics history have met that standard?

What i write about the 60% it was sarcasm from me :wink:
I know its never happend, and never will be happening, but i read many forums and users except things like this.. maybe they are nv fanboys just don't know yet :wink:

Razor1
02-Mar-2007, 15:05
hehe, with all the hype, 20% would surely be a dispointment for a 6 month wait, but its a faster card god DAMMIT! :grin: Wish we knew what type of ALU's the r600 has............

epicstruggle
02-Mar-2007, 15:08
Most of the users are live in dreamland, and except things never will be happening.
Whats your vote, when you will be dissaponted about R600?
I will only look at numbers at the higher end of benchmarks. I plan on using this card with high res, and high AF/AA. Those are the numbers we need to see a big increase in. (and IQ needs to be great).

vertex_shader
02-Mar-2007, 15:12
hehe, with all the hype, 20% would surely be a dispointment for a 6 month wait, but its a faster card god DAMMIT! :grin: Wish we knew what type of ALU's the r600 has............

The real deal is the dx10 performance, and for this we need to wait for long (i not mean stupid dx10 benchmarks,and PR garbage like FPS creator x10, i mean games like Crysis, Hellgate London,..).

Mariner
02-Mar-2007, 15:20
But that's not what AMD/ATI are saying. Other people have speculated that they are lying, that the real reason for the delay was a lack of availability, and that the business of a simultaneous launch with mid-range products is just a cover-story. However, what AMD/ATI have said is that the only reason for holding up the launch was to launch high-end and mid-range products simultaneously; according to them it was not an availability issue but a purely political decision.

Well, time will tell, won't it? If a whole family of R600 products are released in good quantities with almost immediate availability in a couple of months we'll be able see how things work out with this launch.

Here's what Dave Orton said on the matter (combined quotes taken from the Extremetech article):

"We pushed out the launch of the R600 and people thought is must be a silicon or software problem…it's got to be a bug. In fact, our mainstream chips are in 65nm and are coming out extremely fast. Because of that configuration, we have an interesting opportunity to come to market with a broader range of products. Instead of having them separate, we thought, lets line that up, so we delayed for several weeks.

Sounds pretty clear to me. For those who assume he's lying, surely we have to assume he's telling the truth here? My knowledge of legal matters isn't exactly detailed but as an officer of the company he's leaving them wide open to a future shareholder lawsuit if he's lying (or so I believe)?

If I remember correctly, in the past, it's never quite worked out that either of the two big IHV's has released both high-end and mainstream parts at the same time (ignoring the lower-volume cut-down cards such as the Radeon 9500). Now that the 80nm process is pretty stable and the 65nm process is ramping, AMD/ATI have the opportunity to release both at the same time. The nearest we've been (that I can remember) was back when R300 was launched and RV350 was shown at the same time still come to the market for a few months afterwards.

Of course, it should be noted that ATI only have this (possible) opportunity because R600 is so late in the first place! :wink:

Geo
02-Mar-2007, 15:22
Most of the users are live in dreamland, and except things never will be happening.
Whats your vote, when you will be dissaponted about R600?

If it doesn't win more than it loses at 8xaa/16xaf at launch and the next few months past that I think it'd be fair to call it a disappointment. How much of a disappointment would depend on just how badly it fails to meet that standard.

If best IQ doesn't improve vs X1k, I'd be disappointed. . .but this is unlikely to happen, as they've nearly certainly at a minimum bumped up from 6x to 8x AA. I still have hopes they've got something else in their pocket on the IQ front, but no evidence to support it.

If they're not more competitive in the midrange at launch than the previous two generations I'd be disappointed as well. Three generations in a row of having your hat handed to you in the fat part of the market is not a good thing in the least, "halo" or not. I'm pointing at the $159-$249 range here.

vertex_shader
02-Mar-2007, 16:00
If they're not more competitive in the midrange at launch than the previous two generations I'd be disappointed as well. Three generations in a row of having your hat handed to you in the fat part of the market is not a good thing in the least, "halo" or not. I'm pointing at the $159-$249 range here.

After i read what Dave Orton said "In fact, our mainstream chips are in 65nm and are coming out extremely fast." looks like chance is here now for AMD to catch up in mainstream if the performance good, no more x700/x1600 series please.

trinibwoy
02-Mar-2007, 16:05
Heh, it will feel kinda strange recommending a mid-range ATi part but at least the X1650 series is preparing us for that somewhat :)

3dilettante
02-Mar-2007, 16:12
Sounds pretty clear to me. For those who assume he's lying, surely we have to assume he's telling the truth here? My knowledge of legal matters isn't exactly detailed but as an officer of the company he's leaving them wide open to a future shareholder lawsuit if he's lying (or so I believe)?


Difficulties in manufacturing aren't bugs, I didn't see a quote about that.
Did he mention how they're swimming in swimming pools filled with full-speed perfect R600 cores?

I wonder if anyone asked him, "so in other words, are you saying that if you wanted to do a full launch of R600 earlier, you could have?"

You can decide to do a full product line launch. It doesn't mean you were able to choose to not do a full product line launch.

zealotonous
02-Mar-2007, 16:14
I'll quote Rys (http://www.beyond3d.com/forum/showpost.php?p=938869&postcount=100) from page 4 of this very thread:

Whooshed right over my head. Thanks!

Voltron
02-Mar-2007, 16:16
After i read what Dave Orton said "In fact, our mainstream chips are in 65nm and are coming out extremely fast." looks like chance is here now for AMD to catch up in mainstream if the performance good, no more x700/x1600 series please.

The whole thing still doesn't add up to me.

If you have a great R600 part and you have scheduled a release (not to mention you told Wall Street on your last conference call that it is a Q1 product and you are very desperate to have Wall STreet like you because you might need to raise some cash), why not just launch it?

Then in another month, when your great 65 nm parts are ready to to get out the door, why not do a separate launch, so you really hammer home how great your whole family is.

That is - unless you are trying to muddle things or sweep things under the rug. Simultaneous hard launch of R600 and softish launch of mainstream. Happened X1000 family on R520 release, didn't it?

It is just very weird when AMD tells Wall St. it will be Q1, has something scheduled, then pulls it. Maybe all along they we were just overly aggressive with a production ramp or something.

But maybe not, maybe it is going to be an incredible family of products that will blow the industry away.

Mariner
02-Mar-2007, 16:28
You can decide to do a full product line launch. It doesn't mean you were able to choose to not do a full product line launch.

I take your point. It all sounds a bit Machiavellian for me though!

IbaneZ
02-Mar-2007, 16:34
Here is another aspect why the delay reason come from AMD sounds like not true:
"AMD R600 delay causes partners to blub with dismay"
http://www.theinq.com/default.aspx?article=37961

When this things from AMD continue in the future, we will see Sapphire geforce cards :smile:

Well, the X1950 Pro is selling pretty good. Don't know why they're whining. :wink:

epicstruggle
02-Mar-2007, 16:37
Just a quick question, wasnt R600 rumored to have native hdmi (displayport?) ports? The leaked zdnet pics show regular dvi ports. Can we scratch that rumor for good?

vertex_shader
02-Mar-2007, 16:53
Well, the X1950 Pro is selling pretty good. Don't know why they're whining. :wink:
Yes, but from future and business aspect Cebit is a very good PR opportunity to tell the world what they have in the pocket, closed door presentation not really what AIB's want :wink:

Geo
02-Mar-2007, 17:02
Just a quick question, wasnt R600 rumored to have native hdmi (displayport?) ports? The leaked zdnet pics show regular dvi ports. Can we scratch that rumor for good?

I look at 690G and think we're going to see pretty nice HDMI capability in X2k series, particularly from an OEM perspective. Probably more important in the RV6x series than R6x itself tho. I'm beginning to think these large R600 "OEM models" we've been seeing are in fact aimed at gpgpu/stream processing, so a great deal of variety in video out capabilities would not be so important and thus left off the board, even tho the chip would support them. That's speculative, however.

Edit: Come to think of it, that would help explain the. . . err. . .robust. . . cooling solution too. If these 12" are aimed at that kind of professional market and are intended to run 24/7/365 crunching gpgpu workloads that'd be a big plus.

3dilettante
02-Mar-2007, 17:55
If they want to go into the market with a GPGPU for HPC, I'd hope they'd want to use some kind of ECC on their RAM, and possibly more robust error checking and RAS on the chips.

Come to think of it, pretty much any ECC and RAS on a R600 board would put it way ahead of its predecessors.

Bouncing Zabaglione Bros.
02-Mar-2007, 17:59
Edit: Come to think of it, that would help explain the. . . err. . .robust. . . cooling solution too. If these 12" are aimed at that kind of professional market and are intended to run 24/7/365 crunching gpgpu workloads that'd be a big plus.

Didn't one of the folding guys write some time last year that they had melted some R580s just because they had them running at full tilt and maximum utilisation for months on end? If AMD are going to make a professional stream processor and stick two of them in a box for this market, they'd better not have any meltdown scenarios, so this does kind of make sense.

R300King!
02-Mar-2007, 18:13
With that metal plate on the back of that OEM board(obviously no memory), how much ram do you think they can they put on the front of that board? only 512MB? 1GB? 2GB?

I also saw that video from that video interview going around. The guy on the right talked about the R600 needing more power because its performance is greater(or something like that). :-) He also said they'll be available in crossfire(2 boards on the MB) at release and later this year with even 4 cards on the motherboard.

How much power draw would 4 cards consume? he didn't say R600s but he said 4 cards on the MB. Maybe he meant R630s or something. Still, that's a hellofalot of powah. :D

Oh, and that same guy also said the R600 would be around the same size as the X1950XTX. He said it will fit in a conventional case. It's got some good tidbits in it. ;)

Anarchist4000
02-Mar-2007, 18:29
Yeah but are the cards going to be dedicated SPs or a video/SP solution. If all you wanted was a dedicated SP it'd be cheaper to pull all the unnecessary parts off of it. And didn't someone say somewhere that the long ones were intended for Apple? Probably some supped up graphics workstation for making art or editing video.

As for the Quadfire I can't think of any PSUs out there that could actually power that given even conservative power estimates. I don't see 800W being out of line.

Farhan
02-Mar-2007, 18:43
If they want to go into the market with a GPGPU for HPC, I'd hope they'd want to use some kind of ECC on their RAM, and possibly more robust error checking and RAS on the chips.

Come to think of it, pretty much any ECC and RAS on a R600 board would put it way ahead of its predecessors.

I am hoping for this as well. But somehow i doubt it will happen next generation (R600). Hopefully when they have double precision by end of this year they will have enhanced RAS too.

turtle
02-Mar-2007, 18:56
, how much ram do you think they can they put on the front of that board? only 512MB? 1GB? 2GB?

That all depends on densities. AFAIK, 64MB is the max for GDDR4 atm, but who knows, that could change, or may have already. Somehow they fit 1GB GDDR3 on the R580 SP, although I don't know in what config (front/back of card, memory chip density/amount of chips). I suppose it's possible we could see a SP version with 16 on the front, 16 on the back, and perhaps 128MB chips for 4096MB at some point...but surely not right away.

ATM, I think the consumer max is and will be 1GB, 16x64= 1024MB (1GB) for some time, and on R600 it seems they are all on the front. Lower-end versions will probably be the same, but use 16x32MB configs, similar to the 640 vs 320 8800gts' which use 10 chips.

Pete
02-Mar-2007, 19:40
Unknown Soldier, I believe that R600 was demoing Battlefield: Bad Company (http://www.beyond3d.com/forum/showthread.php?t=39157), not a UE3 game. I'm sure someone will keep an eye open to see if parts of B:BC, or specifically Dice's new rendering engine, be presented on a R600 at GDC (http://www.beyond3d.com/forum/showpost.php?p=939196&postcount=2).

IbaneZ, how well can we tell X1950s are selling? Steam's survey (http://steampowered.com/status/survey.html), while perhaps not representative of the whole market, shows them numbering slightly fewer than the newer and more expensive 8800s.

nicolasb
02-Mar-2007, 20:12
As for the Quadfire I can't think of any PSUs out there that could actually power that given even conservative power estimates. I don't see 800W being out of line.They said "later in the year". I suspect that means after the jump to 65nm at the high end.

Reputator
02-Mar-2007, 20:31
Minor nitpick: 64*9*0.8 = 461 and NOT 512.

It's either more FLOPs per ALU or the frequency is a lot higher than 800MHz.Woops, I was tired when I made that post. It came too late anyway, this thread is moving too fast for me. >_<

Yeah I probably got that number from full MADD, not Xenos, obviously.

Razor1
02-Mar-2007, 20:49
The real deal is the dx10 performance, and for this we need to wait for long (i not mean stupid dx10 benchmarks,and PR garbage like FPS creator x10, i mean games like Crysis, Hellgate London,..).


I really don't know why people think there will be a huge difference from Dx9 to Dx10 performance, the same things that make Dx9 faster will make Dx10 faster, of course there are new features in Dx10 that might be a different issue, but why would AMD/ATi skimp on this, I don't think they will, but its comparative, nV might have skimmed, but I don't think they did either. Dx8 to Dx9 where there was a fundimental change in shader calculations, going from fixed point to programmable shaders, this was a big change, and Dx9 to Dx10 there is no fundimental change of this magnitude.

turtle
02-Mar-2007, 21:15
So...say the breakdown of this math is correct:

64*10
800mhz
__________
512 Gflops (64x10x.8=512)

What would the most likely breakdown of those ALUs in terms of flops and MADD/SIMD/Scalar/4D etc??

rwolf
02-Mar-2007, 21:19
I really don't know why people think there will be a huge difference from Dx9 to Dx10 performance, the same things that make Dx9 faster will make Dx10 faster, of course there are new features in Dx10 that might be a different issue, but why would AMD/ATi skimp on this, I don't think they will, but its comparative, nV might have skimmed, but I don't think they did either. Dx8 to Dx9 where there was a fundimental change in shader calculations, going from fixed point to programmable shaders, this was a big change, and Dx9 to Dx10 there is no fundimental change of this magnitude.

- geometry shader and the ability to create ploygons.
- no cap bits all features in hardware eliminating multiple code paths in games.
- virtual memory
- Substantially reduced API object overhead
- Unified instruction sets (HLSL 10)
- Shader model 4.0
- Standard Storage Formats

Razor1
02-Mar-2007, 21:39
- geometry shader and the ability to create ploygons.
- no cap bits all features in hardware eliminating multiple code paths in games.
- virtual memory
- Substantially reduced API object overhead
- Unified instruction sets (HLSL 10)
- Shader model 4.0
- Standard Storage Formats


Where is the fundimental change of how graphics is programmed?

Cap elimination is great it will actually reduce the amount of work for engineers, no extra features for dx outside the spec.

Virtual memory, does this really have much to do with over all speed of the graphics card, think developers will really want to use virtual memory when making thier games

API object overhead reduction, I don't see where that really fits in its accessible to any dx10 graphics card

HLSL10, and SM 4.0 and geometry shaders, this is the only area that might be of any concern which I mentioned anything that has been done so far by nV, in the g80 which has increased Dx9 performance will also show up in Dx10 games. Now if we had something to compare to would be nice :)

standard storage formats this is accessible to all dx10 cards as well.

Anarchist4000
02-Mar-2007, 21:41
- geometry shader and the ability to create ploygons.
- no cap bits all features in hardware eliminating multiple code paths in games.
- virtual memory
- Substantially reduced API object overhead
- Unified instruction sets (HLSL 10)
- Shader model 4.0
- Standard Storage Formats

While these really help to advance the software side of things they don't do a whole lot with the hardware itself. If a card supported DX9 you wouldn't have to check any caps as long as you were using fairly common formats. No programmer in his right mind would want to use virtual memory for anything. API overhead will help the cards work more efficiently but doesn't really require additional hardware. Unified instruction sets is a purely software thing. SM4.0 is, well the same as the above comment. And most of the standard storage formats were only there.

The only things that come to mind hardware wise are the HDR+AA which would have been included anyways and the specified precision for the ALUs. Even geometry shaders are almost an extension of functionality that the cards already had. The big thing is that DX10 should be faster just because of the reduced overhead. The reason why it will kill your framerates is because developers can fairly easily run some really intense code, mainly due to the geometry shaders.

Frank
02-Mar-2007, 22:21
DX10 won't remove the need for multiple code paths either. Simply because an Intel integrated GPU is going to be a whole lot slower than an 8800.

Twinkie
02-Mar-2007, 22:44
http://www.hardocp.com/news.html?news=MjQ0MTcsLCxobmV3cywsLDE=

R600 cards are showing up in working order. ZDNet showed a picture of a lone OEM ATI R600 (http://www.hardocp.com/news.html?news=MjQ0MDYsLCxobmV3cywsLDE=) yesterday. Today for you we have pictures of two ATI R600 cards working in unison, but rather than graphics, they are pounding away showing off their prowess in a Teraflop Stream demo (MADD results shown below).

Certainly it is good to see ATI's next-gen graphics cards showing up even if it is not pushing pixels. We have been able to confirm some other R600 news that is not good. ATI is stating that a SINGLE R600 high end configuration will require 300 watts of power (+/-9%) and a DUAL R600 "CrossFire" high end configuration will require, as you might guess, 600 watts of power (+/-9%). Compare that to a single GeForce 8800 GTX that will pull 150 to 180 watts. Add in a CPU to that mix and you overtake most power supplies’ peak ratings on the retail shelves today.

Also many have been worried about the size of the new R600 video cards. ATI is stating that 345mm / 13.6" will be the required space to fit the cards in a case. (Yes, the retail version will be somewhat shorter, or at least that is what is told to us even though we have yet to see one.) This figure is the card's 335mm physical length plus 10mm to properly install the card. A GeForce 8800 GTX is just under 11 inches longs while the 4 GPU 3dfx Voodoo 5 6000 comes in at 11.75 inches. Certainly if you have an R600, it looks like you will be able to claim having the longest in the room….


Just came across this. Might be a repost.

http://www.hardocp.com/images/news/1172804747QRR8ewfqvI_1_2_l.jpg

http://www.hardocp.com/images/news/1172804747QRR8ewfqvI_1_1_l.jpg

Natoma
02-Mar-2007, 22:50
http://www.hardocp.com/news.html?news=MjQ0MTcsLCxobmV3cywsLDE=

Just came across this. Might be a repost.

Yup, it is. http://www.beyond3d.com/forum/showthread.php?p=939315#post939315

:)

Twinkie
02-Mar-2007, 23:02
Yup, it is. http://www.beyond3d.com/forum/showthread.php?p=939315#post939315

:)

I knew it. 300W sounds B* IMO, but seriously, where are the leaked pics of the retail version. Its kind of strange that only the pics of the OEM version has hit the interweb.

TG01
02-Mar-2007, 23:17
No programmer in his right mind would want to use virtual memory for anything.

ehm.. because it's faster than accessing a harddrive perhaps..?

pakotlar
02-Mar-2007, 23:29
I knew it. 300W sounds B* IMO, but seriously, where are the leaked pics of the retail version. Its kind of strange that only the pics of the OEM version has hit the interweb.

kyles not the sharpest tool in the shed. if the card actually used 300w theres no way its power envelope would be 300w (6+8+pcie). No engineer in the world would produce a product with 0 tolerance.

Entropy
02-Mar-2007, 23:31
No programmer in his right mind would want to use virtual memory for anything.
ehm.. because it's faster than accessing a harddrive perhaps..?
I think you had better explain exactly what you mean by this. :)

pakotlar
02-Mar-2007, 23:36
Yeah that certainly wouldn't be indicative of Xenos heritage. It would also put a single R600 at > 750Gfops. Pretty unlikely IMO.

how sweet would that be though. and with ~140GB/s of bandwith and 24 ROP's @ 800mhz, it wouldn't be nearly as bottlenecked as the g80.

TG01
03-Mar-2007, 00:15
I think you had better explain exactly what you mean by this. :)

DX10 Virtual memory means accesing your physical memory directly from the GPU
this can be used to preload all kinds of stuff so the GPU can access that data directly.

or I could be wrong (again) .. :)

Unknown Soldier
03-Mar-2007, 00:23
Unknown Soldier, I believe that R600 was demoing Battlefield: Bad Company (http://www.beyond3d.com/forum/showthread.php?t=39157), not a UE3 game. I'm sure someone will keep an eye open to see if parts of B:BC, or specifically Dice's new rendering engine, be presented on a R600 at GDC (http://www.beyond3d.com/forum/showpost.php?p=939196&postcount=2).


Hi Pete, I got that bit, thing is was the demo actually demo'd on the R600 or an Xbox?

Pete also note that since EA signed a contract with Epic to use the UE3 engine (http://www.tgdaily.com/2006/08/21/electronicarts_epic_unreal3engine_deal/), Battlefield: Bad Company's frostbite seems very much like the UE3(maybe an updated version?).

If it is a completely new engine, then WOW! EA has actually gone and surprised me.

US

Kaotik
03-Mar-2007, 00:51
Hi Pete, I got that bit, thing is was the demo actually demo'd on the R600 or an Xbox?

Pete also note that since EA signed a contract with Epic to use the UE3 engine (http://www.tgdaily.com/2006/08/21/electronicarts_epic_unreal3engine_deal/), Battlefield: Bad Company's frostbite seems very much like the UE3(maybe an updated version?).

If it is a completely new engine, then WOW! EA has actually gone and surprised me.

US

I believe it's their own, and thanks go rather to DICE than EA (to my understanding, they're still acting very much as their "own studio", similar to CryTek and so on, and not the gazillion studios integratred 100% and now acting with no name of their own)

Unknown Soldier
03-Mar-2007, 01:45
And the Battlefield Bad Company Demo (http://www.gamershell.com/download_18061.shtml), was that actually run on the R600 or was it run on the Xbox360?

US

Rangers
03-Mar-2007, 01:48
Here is another aspect why the delay reason come from AMD sounds like not true:
"AMD R600 delay causes partners to blub with dismay"
http://www.theinq.com/default.aspx?article=37961

When this things from AMD continue in the future, we will see Sapphire geforce cards :smile:

Ugh, not the inq for a source again.

What a bunch of FUD that site is!