AMD: R7xx Speculation

Status
Not open for further replies.
Heh, I understand the enthusiasm guys, but this thread is getting quite noisy... So please try to improve content/post a bit? Cheers! :D

I think that people (like myself) are geniunely surprised about a 4850 CF competing against a $649 card.
 
Hmm, I somehow fail to see why everybody seems to be so suprised by these scores???
You don't have to be surprised to be wowed. Are you suprised when you see a fantastic play from Kobe Bryant or Alexander Ovechkin or Tiger Woods?

It's just a very impressive turnaround for ATI. It's not every day that you see a near doubling of per mm2 flops and GT/s performance on the same process.
 
The scorpion demo was made by the film director David Fincher (director of "Fight Club") on a computer. Was it a video game? An interactive movie? AMD wouldn't say. But it looked scarily real.

AMD Scorpion Demo Still -> link
 
Scaling is always easier when you have the luxury of a lower starting point.
Not that I don't find it interesting how great the improvements are, but they are somewhat magnified by the lower bar set by their predecessors.
 
AMD will actually make some money this year then? This is really good news. ;)

My 8800GT has been a good friend, but it's time to get rid of it. Two 4850s will be so sweet. Problem is that I have a P35 mobo, so they'll run at 16x/4x in CF. But from what I've read the performance drop ain't that bad.
 
A new shot of the die:
http://www.tweaktown.com/popImg.php?img=news_4850erly3.jpg

EDIT
On this new shot it's easy to count forty blocks, would that confirm four vec5 ALU per texture unit?
thus 160 vec5 alus and 40 TMU for the whole chip?

EDIT
If some results hint @ 32tmu could that mean that in 4850 128vec5 ALUs are enable => 640sp? (8 array out of ten enable?)

EDIT
Given the clock of the chip something is weird as it would not match the terraflop figure.
 
Last edited by a moderator:
That could be both a driver performance tuning nightmare (it's progressively harder to scale across 2, 3 or 4 GPU's, now imagine 6...) and a physical impossibility (unless the HD4870 X2 now has two Crossfire connectors, that is).
My bad, I misread the 5TFlop post in this thread.

The cold, hard fact is that nothing will likely touch a standard three-way SLI setup with 3 GTX 280's for the remainder of 2008.
Correction, its hot and heavy (power consumption) but not sure if its a fact .. :LOL:
 
54337130hg9.png


Ten rows by four elements/quads of SPs... ?!

The I/O PHY logic is quite hefty! :oops:
 
Nobody want to speculate about the fact that in the 4850:
8 arrays out of 10 are enable
or 640 SP and 32 TMU
and
that Arun were right(somewhat), there are different clock domains inthe chip.
NO?
:LOL:
 
A new shot of the die:
http://www.tweaktown.com/popImg.php?img=news_4850erly3.jpg

EDIT
On this new shot it's easy to count forty blocks, would that confirm four vec5 ALU per texture unit?
thus 160 vec5 alus and 40 TMU for the whole chip?
This doesn't really confirm anything. Maybe 10 clusters. The "four rows" could still be 4x4 vec5 alus. It doesn't confirm anything about tmus (honestly you can't really see what is what there...). "Traditional" rv670 tu arrangement would still make this 32 tu (assuming 2 tu blocks per quad and tus shared across clusters). Though if you'd interpret the stuff the the left of the "cluster block" as tus, you could indeed maybe see 10 (presumably quad blocks still) of them (which would indicate they are indeed not shared across clusters, but each cluster would have its own set).
 
Last edited by a moderator:
Nobody want to speculate about the fact that in the 4850:
8 arrays out of 10 are enable
or 640 SP and 32 TMU
and
that Arun were right(somewhat), there are different clock domains inthe chip.
NO?
:LOL:

Hmm maybe 4850 is a 640 SP 32 TMU salvage part like the GTX 260 and 4870 is the full 800 SP 40 TMU?
 
Are not ATi's SPs scalar processors? They may be arranged in groups of 5, but that does not make them "vec5".
It doesn't make them completely independent either as you can 'only' schedule 5 instructions per clock on the same pixel.
NVIDIA architecture is still way more flexible.
 
That could be both a driver performance tuning nightmare (it's progressively harder to scale across 2, 3 or 4 GPU's, now imagine 6...) and a physical impossibility (unless the HD4870 X2 now has two Crossfire connectors, that is).
The cold, hard fact is that nothing will likely touch a standard three-way SLI setup with 3 GTX 280's for the remainder of 2008.

Can I borrow your crystal ball?
 
It doesn't make them completely independent either as you can 'only' schedule 5 instructions per clock on the same pixel.
NVIDIA architecture is still way more flexible.

How do GPGPU workloads map to these architectures nowadays? I imagine there are a lot of highly data parallel workloads that have very low ILP. That would be one area where Nvidia could establish a foothold as it essentially drops AMD's stuff to 1/5th theoretical throughput.
 
Nobody want to speculate about the fact that in the 4850:
8 arrays out of 10 are enable
or 640 SP and 32 TMU
Hmm, it might explain why they're easy to buy right now - is HD4870 going to be supply-constrained? If so, due to GDDR5 or yields?

Also, if SIMDs are "horizontal" now, it would mean that losing a dud TU would only affect one SIMD - whereas in R6xx losing a TU affects all SIMDs.

Jawed
 
Status
Not open for further replies.
Back
Top