![]() |
The interesting bits from that EETimes article:
|
It's teraflop, not terabit :)
http://www.reghardware.co.uk/2007/02...d_690g_launch/ http://www.informationweek.com/news/...=Breaking+News http://blogs.zdnet.com/Berlind/?p=363 http://content.zdnet.com/2346-10741_22-57089.html http://blogs.zdnet.com/Berlind/?p=364 http://www.boincstats.com/stats/host...sah&st=0&or=10 - zomg Barcelona? |
Why the new thread? http://www.beyond3d.com/forum/images...n_rolleyes.gif
Quote:
But not sure if any of this is even true. |
Quote:
|
Also, that 1 Terabit(is that correct?)/sec processing power, does that include the CPUs?
Quote:
|
Quote:
OT Have you seen Level505 recently, it's covered with ads. Way more than before. lol |
Aha, Barcelona/R600 :razz:
Jawed |
Quote:
|
So does 320 Multiply Accumulate units = 80 shaders?
|
Just when I was in a down mood with all this bad news of R600 springing up, this stuff pops up and smacks me in the face.
So Indeed, AMD is going for a complete platform launch.:razz: |
Quote:
|
Quote:
It could be some hybrid form me thinks. Like 160 scaler shaders and 40vec4. Or I could be dead wrong and dumb. |
well if its vec4+scalar, so that would be 64 units :) if its vec 3 + scalar then 80 units, at least thats what it sounds like to me.
|
Quote:
64vec4+scaler sounds very good to me. |
So, what's the realease date gonna be; "a few weeks" + "by the end of June" = release in April, and June referring to the ending of Q2 which was the final possible date of previous Q2-timewindow?
|
http://www.beyond3d.com/#news39176
The bit about early rumors would be a reference to both Xbit reporting 64 shaders and ATI (at the time) reporting that they'd leveraged Xenos. Add in todays 320 and some "version 2" of unified hints, and you've got the reasoning we used for that last bit. |
Quote:
I though that "delay of a few weeks" was refering to cebit. |
Quote:
edit: Am i missing something? What happened to plain old 80vec4? I didn't read the EEtimes article. Maybe I should check that. |
Quote:
|
What's the difference between vec4 + scalar and vec4? Is it better to have 64 vec4 + scalar than having 80 vec4, and if so, why?
|
Well, that they are referring to "320" would suggest that they might be all functionally scalar, even tho grouped as 64 5D (our guess) or 80 4D (certainly not impossible). To the degree the scheduling allows them to be treated as scalar, then which it is won't be all that important for most purposes. Scalar at all is the big thing, as vec4 will not be as efficient (tho you could get a lively argument going about how much control logic you have to add to make that happen in calculating the relative efficiency).
|
Once again, Geo, I didn't think before posting. :oops: Vec4 + scalar lines up with Xenos, so 64 sounds right. A little harder to line up against G80, perhaps, with that rogue scalar, but it'll be an interesting fight, for sure.
(That scalar also makes for a nice "+25%" on top of 64 vec4s. Now, where have I heard "+25%" before? Am I just spinning my brain cells if I think of preemptive PR? :)) But functionally scalar, eh? That'd be a kick in the pants. I wonder if their unified v.2 would take that step. Anywho, if Xbit was right about 64 shader units, then they're probably right about 16 texture units, and that may mean 16 ROPs. But 16 of NV's, or something more? It'd almost have to be more, seemingly, given all that bandwidth and if we can estimate the shader and so core clock from the 2 * R600 = teraflop figure. |
What I want to see is if rwolf can make 320 ALUs and 500 mflops into something 2GHz-ish. :grin:
|
Geo - easy. They could operate at 2.4GHz but only have throughput of 1 madd every 3 cycles.
|
64*9*0.8 = 461 * 2 = 922 ? Either the frequency is higher or I'm "stealing" 2 FLOPs from my speculative layman's math there. Or it should have read "nearly 1 Teraflop"....
Quote:
Assuming roughly the same efficiency for the ALUs between the two architectures, the major difference so far seems to be the G80 ALU clock domain and R600's "phatter" units. |
Well, according to this:
http://www.hardspell.com/doc/hardware/34620.html A13 silicon seems final, and 'no less than 800mhz'. It also says the GDDR3 version of R600 is 12 layer, the GDDR4 being between 12-16 layer PCB, with the OEM card being 512MB and the retail card being 1GB (if I read it correctly.) Seems Geo's assumption-based article could very well be right based on the '800mhz' number. :grin: Article also mentions RV630 is also in AIB hands, and they are preparing cards based on it. Hello massive family if not enthusiast (4x4 barcelona/crossfire-physics) platform launch? |
Assuming my idiotic math above has any legs, they'd need roughly 870MHz to fully reach a hypothetical 500 GFLOP rate.
|
Hmm...Maybe that old "A12 hits 1ghz" rumor has some legs, eh?
Things certainly are starting to come together. :yes: |
Quote:
I honestly cant believe it. It seems like ATI just did it to lose. The whole thing about to introduce a whole "suite" is just stupid, as neither Nvidia nor anybody else does that. You go high end first. |
Quote:
|
Quote:
|
Perhaps AMD delayed the R600 to use the new family of Rx6XX cards to bolster the performance of Barcelona.
|
Quote:
|
Quote:
1st = Their is no solid DX10 driver for Vista. (Example like for G80) 2nd = Their is no DX10 Vista games. 3rd = Probably to surprise Nvidia since they don't know what they are up against, because they have to adjust GF8900GTX to match R600. 4th = Probably their is little or no profit at all for High-End, so they need midrange graphic cards to make up the cost in order for overall profit gain. 5th = Not many people will upgrade their video cards right away (Example like Geo with his GF8800GTX :) ) |
http://biz.yahoo.com/bw/070301/20070228006340.html?.v=1
Quote:
|
Quote:
Quote:
A R600 Crossfire should fairly effectively destroy the TFLOP barrier. Also consider this, with G80's missing MUL a single R600 has more than twice the theoretical FP power. In regards to the scheduling what if they just didn't bother with making it perfectly efficient as ALUs seem somewhat cheap going by the R520->R580 example. 1+1 = 1+2 = 1+3 = 1+4 If it doesn't branch you can really pack em in there. If it does branch you could look at it like 2 scalars. I can't think of how you'd end up with any shaders that had a greater than 50% scalar:vector ratio. Save the complexity of the scheduling and just add more ALUs and clockspeed. Quote:
|
Hmm. Dual-core 'city' CPU was used at the event instead of QC eh? Hint of things to come? :D
Can you comment on if that mysterious Opteron that showed up on BOINC is of the same breed or a hoax? That article certainly lends credence to the possibility of it being real... Also, just out of curiosity, what MHZ number would be needed to hit 512GFLOPS using Geo's guesstimate on ALUs? ~890 (going by Al's math)? Going by at least 1TFlops though, Al's math (which I have no idea is correct) and assuming that Opteron was around 24-25Gflops (which might be slightly off) 975/2 = 487.5/64*9 = 846mhz. That wouldn't quite be half a teraflop per card, but close. I'mma guess it was running at 850mhz or greater. :razz: |
If its Vec and Scalar units based could those be clocked differently with the vec slower and scalar faster?
|
So now, speculation prices for this monster that will be out in a few weeks.
$600? US |
Quote:
epic |
Quote:
|
Can the 8800GTX in SLI do 1 TFLOP?
Also, does this include the CPU FLOPs? How many GFLOPs can lets say an OC'd quadcore QX6700 do? |
Quote:
|
G80 can do it as well when SLi is enabled.
And I don't think the number is too good for ATi, especially considering R600 is more of a vector-machine. |
Given the new positive signals both regarding R600 and Barcelona, could it simply be that AMD wants to be able to provide the first R600 (pre-)reviewers with Barcelona systems?!? If they don't, we can be pretty sure that the reviews will be performed on Core 2 Duo or Quad systems, and that doesn't look that good for AMD... it is still the CPU's that are the big thing for AMD.
So the R600 delay could simply be because AMD is on track with Barcelona, as well as the matching chip-sets! |
"ATI R600 and the next field demonstration engines GDC"
http://66.249.93.104/translate_c?hl=...1/78/78496.htm |
Quote:
Hopefully, we will have a good GPU in may ;) |
Quote:
The situation looks good for AMD now (when this news/rumors will be true), maybe time come to buy some share :smile: So is almost confirmed Barcelona is the reason why R600 delayed? |
isnt g80 like 330ish gflops? how would 2 in sli make 1 tflop
|
Quote:
Don't you think the press release would have said "1.5TFlops!" if your math was right? Quote:
|
Quote:
"AMD demonstrated a "Teraflop in a Box" system running a standard version of Microsoft® Windows® XP Professional that harnessed the power of AMD Opteron(tm) dual-core processor technology and two next-generation AMD R600 Stream Processors capable of performing more than 1 trillion floating-point calculations per second using a general "multiply-add" (MADD) calculation." My bold etc. Assuming VR-Zone was correct and there really was a barcelona-cpu (which is based on "AMD Opteron(tm) dual-core processor technology " [note the "technology"]), you might end up with significantly less than 500 GFLOPs/sec. here. for single R600, obviously. |
Quote:
Yeah I'm not quite seeing it. Doubt they're getting 100% efficiency and perfect scaling for the SLI either. I'm assuming the >1TFLOP mark was measured performance and not a purely theoretical number. Quote:
Assuming they're still using the MADD+ADD setup they've been using before. 64*5*3*0.8 = 768.0GFLOPs R600 128*3*1.35 = 518.4GFLOPs Normal 128*2*1.35 = 345.6GFLOPs Missin MUL When measuring the GFLOPs you're running an operation that lines up best to the card so every ALU should be fully utilized most of the time. Best case scenario basically. So R600 should be capable of feeding all of those pipelines. This would be one of those cases where I'd expect R600 to thrash G80 just because of the design focus. It's somewhat meaningless in real world application but for the purpose of doing that many operation R600 is capable of a significant amount more than G80. We don't know by how much R600 broke the barrier but if it was measured performance and discounting FLOPs from the CPUs that's 66% efficiency(1 / 1.5TFLOPs) including the scaling hit for Crossfire. I guess it really comes down to if 1TFLOP was measured or theoretical performance. |
Quote:
I was told geforce 8000 cards are fastest in the world... wait.. I can't afford it. Never mind. or.. I was told radeon x2000 cards are the fastest in the world... awesome! they have one at my price point! Lets hope if amd do release an entire platform in one hit top to bottom, they unify the naming schemes too... Like AMD X[series] [perf] [product]... AMD x2 300 graphics, AMD x2 400 cpu, AMD x2 200 platform,.. whatever. Something like that. |
Quote:
If I understand the Xenos' diagramm correct, Xenos' TMUs are not a part of the three shader units, they are decoupled or some else. Decoupled 24 TMUs and 64 5D-ALUs (four SIMD clusters Ã* la Xenos?). Or is it too crazy? |
Quote:
|
Well they've already got tons of X's in the names as well as an affinity for 4 digit numbers so does that count as unified? XL, XT, XTX, FX, X2, x64
But the entire platform launch does look rather appealing from a marketing perspective. Of course all the reviewers are gonna be mad because they get nailed with a massive workload all at once. Quote:
|
Quote:
On R580+ I am only getting close to 75 percent efficiency on MAD and only about 50 percent on ADD (Cat 7.1; curious note: Skalar-split does not seem to work anymore in 7.1 drivers but vec4-results are in line with older drivers). |
Quote:
Quote:
To me, the R600's true efficiency is a big question mark if it is indeed a vectorized GPU, because it takes more HW and compiler magic to extract efficiency of out this setup. |
For this case, benchmarking based on FLOPs, efficiency should be rather good under any condition. If they're using vectors they should be able to pack more into a given area. It's one of those tests were efficiency shouldn't be a significant factor as it would be high on both. Therefore the card with the higher theoretical power wins. "Design focus" probably wasn't the correct description. I meant this was a situation extremely well suited for a vector based design.
I'd agree with you that in terms of actual real world performance the efficiency of R600 will be the deciding factor. |
I don't get your efficiency measure. As far as benchmarking is concerned, efficiency = actual throughput/peak theoretical throughput. In this case, vectors lose out. Vectors may win on transistor density, but that's a different efficiency measure.
Without knowing what workload the benchmarks consists of, you can't really make any claims as to real world efficiency. But it is well known that maximizing vectorization of code to match the underlying hardware vector architecture is a difficult problem. Unless you feed handcrafted *ideal* code to the vectorized units, it's unlikely you will close to peak theoretical rates, unless you think the compiler performs voodoo levels of instruction scheduling. It's simply easier to extract maximum efficiency and parallelism not having to worry about packing and co-issuing 5D operations. There's way more opportunities to screw up. Now, if you want to claim that ATI fed handcrafted and completely artificial workloads that extracted near peak FLOPs rates, well la-de-da, but the people looking to buy GPUs to run on real world workloads are more interested in how the chip's efficiency compares on a diverse set of workloads. You know, the PlayStation/2 had amazing peak theoretical rates that one could hack custom and artificial benchmarks to read. It isn't hard with eDRAM. In the real world, you needed the PS/2 performance analyzer to get anywhere near sane efficiency levels. |
I believe that if Anarchists's speculation on 3 FLOPS\cycle is correct, then an R600 at 1 GHZ can achieve a TFLOP.
|
Quote:
|
Reminds me of the old 3dfx commercial.
|
Quote:
|
Quote:
|
Quote:
Rereading that article I would say it looks more like a single MADD and not the MADD+ADD setup I was assuming. They would have mentioned that in that article unless they didn't understand what was happening. In the past I was under the idea that ATI used Vec3+1 with each unit being a MADD+ADD and the +1 having additional SF logic. |
correct me if I'm wrong but if this is xenos style ALU's shouldn't it only be 8 + 2 flops per Mad and co issue add?
Oh sorry didn't see your post Democoder ;) |
Quote:
|
Quote:
Quote:
|
Oh, and if you look at what Wavey linked upstream:
Quote:
|
Quote:
So R600's 320 MADs per clock may be excluding SF too. Jawed |
Err, come to think of it (he said, looking at his own report on the front page), "mulitply-accumulate units" sounds an awful lot like MADD as reported by a reporter who isn't hip to the usual lingo. Doesn't it?
|
so possibly no co issue?
|
Quote:
Jawed |
| All times are GMT +1. The time now is 04:46. |
Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.