AMD: R7xx Speculation

Status
Not open for further replies.
Thinking about redundancy (1 in 17) for the SIMDs and also thinking about VLIW, I have to admit I'm wondering if each column consists of distinct areas for X, Y, Z, W and T lanes in the SIMD. The reason I say this is because it seems unlikely to me that a redundancy mechanism would stretch over such a long distance (the height of a full column - 6-8mm?).

It seems more likely that each "blob" is one of the five VLIW lanes, so that redudancy would be within a small area.

That would make the big blob at the bottom of each column the transcendental/MAD ALU - which appears to be almost as big as two of the other units.

Jawed
 

CJ @ tweakers.net forum;
Call me a non-believer, but I'm not quite sold on that pic (and the 40 tu which come with it).
Now, I'd certainly believe that the TU no longer have the separate TA/fetch for point sampling. But it seems to me like changing the TUs to operate within a single shader array instead of across all arrays is quite an architectural change. Not that it would be impossible (they'd just have to run for multiple clocks for doing a texture lookup within the array), but why change this? Also, the numbers we've seen so far don't seem to back that up - it certainly could be possible there's some bottleneck elsewhere (not memory bandwidth, but maybe attribute interpoloation?), but I remain sceptic until proven wrong...
The ROPs also changed a bit and now could do twice the z tests per clock (which looks reasonable - though OTOH it could indicate that slide is just made up of rumours where everyone hoped this would be increased...). Though by looking at the differences, I noticed the R600 had "fog/alpha" units in the ROPs (not labeled at this picture but elsewhere) which are now gone. Weird, those "fog/alpha" units had no reason to be there in the ROPs in the first place (from the hardware point of view, alpha test is pretty much the same as a tex kill, and fog is a classic shader alu (lerp) operation).
 
I don't agree, I know more than enough about the rest of the semiconductor industry to know that many engineers would gaze in awe if they truly managed to fit in 800 FPUs in only 250-270mm² given that it's far from 100% of the die, there's a bunch of extra control logic & large register files and, well, one or two extra things I can't mention that make it even more impressive.
Before you start thinking this is impossible, remember that Xenos' parent die fit 240 sp's in around 1/4 the transistors. Sure, RV770's SP's are more flexible and there's lots of other differences on the die, but it's a good indication of the density that ATI should have had a long time ago.
 
No, 6*10*5 SPs = 300 SPs at 2.67x the core clock = 800 SPs
EDIT: Just to make myself clear, it might also be 600 SPs @ 1GHz and much denser or whatever. Point remains though that I would be extremely surprised if there wasn't some shader clock trickery going on here. It's the only sane explanation for the astonishingly low die size, anyway...

Uhm, and what about Fast14? I read that creating a cpu or a gpu through this technology could lead to a much more dense chip... :smile:
 
Uhm, and what about Fast14? I read that creating a cpu or a gpu through this technology could lead to a much more dense chip... :smile:

I think this has more to do with using second step of 55nm. Fast 14 as far as i know is more related with faster transistors.
 
HD 4850 =
800SP.gif




Maybe RV700XT 1000SP+40TMUs and RV770PRO 800SP+32TMUs :D
Maybe... ;)



NO WAY THAT IS REAL-TIME! :oops:

But if so then im just spechless especially at the real-time reflections of everything including the chick, ray tracing based?
It should be, as the video and the ideas are part of the CINEMA 2.0 presentation.

Big question is, when is AMD going to put that one up.

Hell.. 2.0 is so much better than HD.

Go read this -> Rage3D


EDIT:
Rendered in real-time and interactive, this is a brief video from the first Cinema 2.0 demo, premiered by AMD in San Francisco on June 16, 2008. The interactive demo was rendered by a single PC equipped with two "RV770" codenamed graphics cards powered by an AMD Phenom™ X4 9850 Processor and AMD 790FX Chipset. The full demo shows cinema-quality digital images rendered in real-time with interactivity. Check back later this summer for a video of the full Ruby Cinema 2.0 demo.
 
Last edited by a moderator:
Thinking about redundancy [...] which appears to be almost as big as two of the other units.
Heh, this die shot will hopefully be easier to dissect once we know more about the implementation details. Right now, all of us are making assumptions that very well might turn out to be false, so sadly the interpretation is guesswork at best.
Mintmaster said:
Before you start thinking this is impossible, remember that Xenos' parent die fit 240 sp's in around 1/4 the transistors. Sure, RV770's SP's are more flexible and there's lots of other differences on the die, but it's a good indication of the density that ATI should have had a long time ago.
I agree, but Xenos also had much cheaper TMUs than RV670 and the ROPs were on the eDRAM chip. Realistically, maybe the shader core was twice as dense as on RV670... However, it should also be pretty easy to see that it should be cheaper for a variety of reasons, so the difference in terms of efficiency isn't quite that large.

As for Fast14, that's what I was thinking of when I said it might be clocked 1.33x or even 2.67x faster (although that alone wouldn't be enough for the latter). Who knows, though...
 
Heh, this die shot will hopefully be easier to dissect once we know more about the implementation details. Right now, all of us are making assumptions that very well might turn out to be false, so sadly the interpretation is guesswork at best.
I agree, but Xenos also had much cheaper TMUs than RV670 and the ROPs were on the eDRAM chip. Realistically, maybe the shader core was twice as dense as on RV670... However, it should also be pretty easy to see that it should be cheaper for a variety of reasons, so the difference in terms of efficiency isn't quite that large.

As for Fast14, that's what I was thinking of when I said it might be clocked 1.33x or even 2.67x faster (although that alone wouldn't be enough for the latter). Who knows, though...

Fast14 logic could be there but instead of 1,5 teraflops at 1 ghz they chose 110 w and 1 teraflop with 625 mhz...
40 nm sounds great for these little soldiers. How many of these will have Xbox next ? :)
The only thing that remains from here apart from improving things here and there is the innovation of a effective multi gpu system integration. Will we see it with HD 4870X2 or will have to wait til R800... ?
 
Last edited by a moderator:
Not really.
I'm sure GT200 architecture is still more efficient when it comes down to put its arithmetic units to some use, AMD design probably uses less control logic and it allows them to devote more area to ALUs.
On the other hand when AMD packed in this 'little' monster is astonishing, I wonder if this is an example of a custom ALUs/TMUs designs done by hardcore CPU guys :)
 
Realistically, maybe the shader core was twice as dense as on RV670...
You'd be better off comparing Xenos and RV630/5 in terms of transistors allocated to functional blocks - after all RV630 has way more transistors than Xenos+EDRAM.

I seriously think you've got the wrong end of the stick when you talk about "density". There's a vast amount of "infrastructure" in R6xx that radically skews density comparisons across generations. The scaling from RV635 to RV670 shows this pretty clearly.

Anyway density and scaling using RV670 as a baseline seems even more futile than trying to decide whether this is 800/40 or something else.

Jawed
 
First real world RV770Pro CF results, I think. These cards seem damn fast, if results are true.

EDIT: already posted.
 
Wow!

HD 4870 CF should reach > 30 fps without problem in crysis very high 1900x1200 with 4AA.

This is the greatest thing that has happened to 3d graphics industry since Voodoo 1, above all by the possibilities it can bring to us in the future.
 
Puts a little more backing to that "4850 is 75% of a GTX280" statement..

It seems the 4850 CF result was obtained on a 3 GHz Phenom system with 790FX whereas the computerbase results were on a 4 GHz Quad Penryn sysytem. (Edit: I know that's difficult being CPU limited at that settings, anyway, I am wondering if there's something "secret" in the platform itself)
 
Status
Not open for further replies.
Back
Top