AMD: R7xx Speculation

w0mbat · Jun 7, 2008

@Jawed: Look @ the original they referring to, there are much bigger "mistakes" and if u know how ppt works u would see that thats just a normal adding of a row.

annihilator · Jun 7, 2008

Jawed said:
Ooh, looks like quite a convincing argument for a fake.

The screw-up with the white/grey could reflect another rush job by AMD though - plenty of mistakes in previous slides.

Jawed

Just look at that real 2900xt slide and notice the "Giga FLOPS" mistake made in the last row (all others say GigaFLOPS where the last one says Giga FLOPs), which is quite a big mistake, and whoever did that mistake could very well do the mistakes pointed out in the "fake" slide.

Jawed · Jun 7, 2008

annihilator said:
Holy crap. I think on this site someone actually HAD that idea that RV770 was 800 shaders. Who was that?

This appears to be the start of the 800 rumour:

http://forum.beyond3d.com/showpost.php?p=1136325&postcount=756

Jawed

Jawed · Jun 7, 2008

annihilator said:
Just look at that real 2900xt slide and notice the "Giga FLOPS" mistake made in the last row (all others say GigaFLOPS where the last one says Giga FLOPs), which is quite a big mistake, and whoever did that mistake could very well do the mistakes pointed out in the "fake" slide.

AMD or ATI have a history of mistakes, so I wouldn't be surprised. It's interesting that NVidia appears to have started making mistakes, but generally NVidia's slides seem to spend longer in QA...

Jawed

AnarchX · Jun 7, 2008

In R600 slide-ware it was named sometimes as X2900XT.

ZerazaX · Jun 7, 2008

Or as I said in that thread, it could be used to highlight that a mainstream card (4850 pricing rumors put it very close to where the 2600XT was) can provide many times the Flops performance over a high priced quad core

After all, these slides are likely used during PR presentations and people are explaining the slides as they go along so aesthetics are but one part of it

ZerazaX · Jun 7, 2008

I'll offer one more piece... and this was posted in this very own thread but dismissed as unlikely it went from 64 * 5D to 160 * 5D...

Remember that GPU-Z shot that showed RV770 at 256mm^2 and people were saying it might be fake since how can a database program know the area size unless its programmed within the BIOS? Well we all know that w1z gets a lot of this knowledge before we often do for GPU-Z (and the GPU-Z shots of GTX280 all have info there as well that might not be in the card).

Anyways, on the techpowerup forums almost an entire month ago, w1z posted this:

http://forums.techpowerup.com/showpost.php?p=790539&postcount=36

see the yellow thingies inside the red shader blocks? the big question is how many of these are in the rv770 per red block.

if it's 5 then 800 is correct (160 * 5 = 800)
if it's 3 then 480 is correct (160 * 3 = 480)

He already hinted at 160 with that post since at the time everyone was saying 96 * 5D for 480... now all of a sudden 160 * 5D is very possible

Pressure · Jun 7, 2008

annihilator said:
Holy crap. I think on this site someone actually HAD that idea that RV770 was 800 shaders. Who was that?

Arun.

Lukfi said:
I don't believe ATi will adopt CUDA and PhysX, I don't even believe what Fudzilla says about nVidia offering those technologies to ATi. A few weeks ago I spoke to an nVidia PR guy and he told me CUDA and PhysX are two things their cards will have and the competition won't, giving nVidia a clear advantage. Now there's the question - would it be better for them if they risked keeping it for themselves, or will they play safe and let the competition support it as well so that no developers will be afraid of using it?

If it will not be adopted it will die at some point. Just like all the things AMD/ATI brought to the gaming market that was never adopted by nVIDIA.

ATI should have no desire to actually support this, seeing as it is something nVIDIA develops.

Jawed · Jun 7, 2008

Ha, I've definitely been a strong voice against 800:32, while Arun seems to have been trying to caution "ignore the die size", though he was mostly trying to suggest that the die would be much bigger than 250mm2.

Jawed

mczak · Jun 7, 2008

annihilator said:
As R6xx was obviously texture and z-fillrate bound; now that the math processing power has increased to 2.5x what it was, shouldn't the texture fillrate have increased more than three times? I think it only increased less than 2 times as the texture units doubled and their clockrate actually lowered.

I'm not quite sold on the 800SPs. If really true, are we talking about "similar" 5D units (though everything else would be a big change architecture-wise) as on r6xx? What's the arrangement? 5 clusters x 32 (increasing batch size by a factor of 2), 10 clusters x 16 (probably adding quite a bit of overhead due to increased thread dispatch / arbiting)?
Also, if there really would be 800SPs, despite ALUs "being relatively cheap", maybe other stuff could be gone to make room for those SPs? With so much general math power I'd say screw interpolators, and screw texture filtering units - a bilerp can be done in 3 lerps = 6 mads, * 4 channels - so for 32 bilerps/clock you'd need 768SPs. And you'd even get full rate FP32 filtering this way (though probably it would still be half rate, texture fetch might not be able to deliver so much data to the register file per clock), not to mention custom texture filtering...
Such changes would be quite a bit more than a more or less simple refresh part "everybody" assumed this would be, however.

Bouncing Zabaglione Bros. · Jun 7, 2008

Jawed said:
AMD or ATI have a history of mistakes, so I wouldn't be surprised. It's interesting that NVidia appears to have started making mistakes, but generally NVidia's slides seem to spend longer in QA...

Jawed

It's a common technique for identifying who leaked the info. Different mistakes to different people, and you can figure out where the leak came from.

Lukfi · Jun 7, 2008

Pressure said:
If it will not be adopted it will die at some point. Just like all the things AMD/ATI brought to the gaming market that was never adopted by nVIDIA.

I wouldn't be so sure about that. nVidia has much greater market share than ATi, they work with game devs, they have very dexterous marketing and they can push a new technology if they want to. It'll be a little harder, but they can do it.

ATI should have no desire to actually support this, seeing as it is something nVIDIA develops.

Both companies should have a desire to accept one standard and cooperate on developing it. But do you really believe nVidia would be willing to let ATi support CUDA and PhysX after they spent years developing the first and millions of dollars acquiring the second? Right now, nVidia has those technologies, ATi doesn't and the green team's marketing will use this to their advantage.

Domell · Jun 7, 2008

annihilator said:
Holy crap. I think on this site someone actually HAD that idea that RV770 was 800 shaders. Who was that?

Here you are

http://forum.beyond3d.com/showpost.php?p=1167077&postcount=2083

Jawed · Jun 7, 2008

mczak said:
I'm not quite sold on the 800SPs. If really true, are we talking about "similar" 5D units (though everything else would be a big change architecture-wise) as on r6xx? What's the arrangement? 5 clusters x 32 (increasing batch size by a factor of 2), 10 clusters x 16 (probably adding quite a bit of overhead due to increased thread dispatch / arbiting)?

I have to admit I'm torn between the batch size of 128 for 5 SIMDs (not good) and 10 SIMDs seems unlikely due to overheads. Yet this is a ~250mm2 die. I think even Aaron Spink would be impressed.

Also, if there really would be 800SPs, despite ALUs "being relatively cheap", maybe other stuff could be gone to make room for those SPs? With so much general math power I'd say screw interpolators,

Any idea how big they are?:

Vertex data processing with multiple threads of execution

I don't know what the attribute throughput rate is. 16 per clock? It should be possible to work this out using GPUSA (since it reports when pixel shading is interpolator-bottlenecked).

and screw texture filtering units - a bilerp can be done in 3 lerps = 6 mads, * 4 channels - so for 32 bilerps/clock you'd need 768SPs. And you'd even get full rate FP32 filtering this way (though probably it would still be half rate, texture fetch might not be able to deliver so much data to the register file per clock), not to mention custom texture filtering...
Such changes would be quite a bit more than a more or less simple refresh part "everybody" assumed this would be, however.

Texture filtering is still a few years off

I have been wondering whether Z testing and alpha-blending could end up on the ALUs soon.

It'd be interesting to compare the workload of attribute interpolation and Z-test/alpha-blend.

Jawed

Mariner · Jun 7, 2008

From the first time it appeared, I thought the chart was more than a bit fishy.

If something seems to good to be true, it usually is - it would be very, very easy to knock together a slide like that in Photoshop in just a few minutes. Personally, I very much doubt the authenticity.

nAo · Jun 7, 2008

If the ALUs and die area numbers we have are correct it looks like AMD can pack 6x ALUs than NVIDIA in the same area.
NVIDIA ALUs run at higher clock but I really wouldn't expect them to be 5 or 6 times bigger than AMD counterparts

Now control logic is likely to be more complex on NVIDIA GPUs and they also tend to pack more TMUs on the same idea than the competition, but again the disparity here seem to be unheard of.
Or AMD really removed big chunks of their fixed function hardware (TMUs?) or I should just shut up and wait for some real number that makes sense.

Jawed · Jun 7, 2008

nAo said:
If the ALUs and die area numbers we have are correct it looks like AMD can pack 6x ALUs than NVIDIA in the same area.
NVIDIA ALUs run at higher clock but I really wouldn't expect them to be 5 or 6 times bigger than AMD counterparts
Now control logic is likely to be more complex on NVIDIA GPUs and they also tend to pack more TMUs on the same idea than the competition, but again the disparity here seem to be unheard of.
Or AMD really removed big chunks of their fixed function hardware (TMUs?) or I should just shut up and wait for some real number that makes sense.

The whole thing is fascinating.

The actual computation portion of NVidia's ALUs, at least the MAD units, shouldn't be big - though the high clock rate presumably results in more stages. The multifunction interpolator/transcendental unit is clearly more costly than if it were just a transcendental unit.

NVidia appears to pay a heavy price in operand windowing/hardware-thread scoreboarding. The pay-back seems to consist of minimising the number of threads (per SIMD) that need to be in flight in order to hide TEX latency. Additionally it appears to simplify register file porting.

ATI has a very similar MAD:SF ratio, (5:1 instead of 4:1 - still a bit of a mystery over whether some NVidia transcendentals run at half-speed in D3D/OGL), appears to schedule hardware threads at a very high level, depends on considerably more threads in flight (per SIMD) and seems to have at least 3 read ports on the register file (perhaps 4?).

ATI also has a "fixed function" interpolator unit and it appears that ATI has a huge cache for the attribute data generated by this unit.

NVidia's 30 SIMDS with 240 elements retired per clock seem to cost a lot in scheduling but can work with a low cost register file (though perhaps doubled per elment in GT200). ATI has chosen the utilisation-overhead of VLIW instead of high-cost scheduling + the very high density of multiple register files (for the porting) + attribute cache to retire only 160 elements per clock at a considerably lower clock.

Jawed

Arty · Jun 7, 2008

I dont see Arun dancing around, which he should have been since he stuck out his neck on this one. Maybe we should go back to the 'creative math' (alu) Ail was suggesting.

A.L.M. · Jun 7, 2008

Got some inside info from a source who doesn't wish to be named.

HD 4870 X2 has 1024MB GDDR5, 2x 256bit memory interface, 1050MHz shader clock, 1800MHz memory clock. Core clock is defined by the AIB partners and ASUS will have the highest clocks.

While Radeon 3870X2 relys on the PLX chip to communicate between the GPUs, 4870X2 GPUs will comunicate with each other through the memory. Since the GDDR5 is clocked at 1800, the total bandwidth will be roughly 160 GB/s at 1 Gigabyte through the 256bit-bus, compaired to the 8 GB/s of the PLX chip of the HD3870x2. Also 4870x2 WILL NOT have micro studdering.

http://forums.vr-zone.com/showthread.php?t=285719

Linked through the memory? Does anyone have an idea of what this could mean?

Jawed · Jun 7, 2008

serenity said:
I dont see Arun dancing around, which he should have been since he stuck out his neck on this one. Maybe we should go back to the 'creative math' (alu) Ail was suggesting.

Maybe he's in NDA hell.

Jawed

AMD: R7xx Speculation

w0mbat

annihilator

Jawed

Jawed

AnarchX

ZerazaX

ZerazaX

Pressure

Jawed

mczak

Bouncing Zabaglione Bros.

Lukfi

Domell

Jawed

Mariner

nAo

Nutella Nutellae

Jawed

Arty

KEPLER

A.L.M.

Jawed

Similar threads