AMD: R7xx Speculation

Slyne · May 28, 2008

annihilator said:
I believe you're talking about R770, not RV770. R770 = 4870x2 = 960SP.

Welcome to the forum.
And yes, he meant RV770. Please read the thread from page 1.

Edit: added smiley, seemed too dry otherwise.

Berek · May 28, 2008

trinibwoy said:
Suggested E-tail Price?

News articles around the web are talking about launches at the end of June retail wise, but true availability (easy to buy or whatever) will be in Q3... July.

Arty · May 28, 2008

Arun said:
Well, yes, I'm just extremely skeptical of the 480 SPs stuff. What makes me even more dubious now is the supposed X2 delay. Seems like a perfectly phased rumour, given that it came slightly before you might have started to expect seeing board pic leaks or AIBs getting briefed... So yeah, call me crazy, but I'm still giving a 65%+ chance of RV770 being monolithic (350mm²+) *shrugs*. Might be usable for a X2 with some units disabled down the line though, who knows.

Once again, I could be horribly wrong here, but I wouldn't have been surprised at all if the everyone has been misled. This would hardly be unprecedented in the history of the industry either so I don't think it'd be horribly surprising...

I wish I could hop on your train

but knowing AMD I'll set myself up for dissappointment ..

edit - I'll go ahead and stick my neck out. I'll 99% sure that RV770 is 480sp and about ~250mm2.

Jawed · May 28, 2008

Arun said:
Well, yes, I'm just extremely skeptical of the 480 SPs stuff.

I dare say I'm pretty confident in the number Pande stated, I think he's leaked stuff before.

I can't shake off the feeling that ATI GPUs are destined to be stuck at 16 TUs and 16 RBEs for, well, forever. It's been 4 years now... Texturing performance will merely scale with clock.

What makes me even more dubious now is the supposed X2 delay. Seems like a perfectly phased rumour, given that it came slightly before you might have started to expect seeing board pic leaks or AIBs getting briefed...

X2 should be quite different from HD3870X2 if the reasonably firm rumours that it'll be "more like a single GPU" hold up - therefore a delay due to driver-quality seems likely. I'm not in the least surprised that X2 is a few months down the road.

So yeah, call me crazy, but I'm still giving a 65%+ chance of RV770 being monolithic (350mm²+) *shrugs*. Might be usable for a X2 with some units disabled down the line though, who knows.

Once again, I could be horribly wrong here, but I wouldn't have been surprised at all if the everyone has been misled. This would hardly be unprecedented in the history of the industry either so I don't think it'd be horribly surprising...

I think it's merely drowning in wishful thinking, a perennial state of affairs since R600's delay began.

Jawed

Razor1 · May 28, 2008

Arun, there are three strong points that go against that though

first one Performance, unless its serverly bottlenecked somewhere or the chips that are out there are very underclocked or heavily disabled.

second one The rv770 is based off the rv670, if the SPs are now 2 flops each, no more vec 5 which would be a big change to its shader array.

third one pricing rumors and recent slides all have to be fake.

*mass confusion*

Jawed · May 28, 2008

annihilator said:
Anyway, so the bottlenecking point of 3870's were thought to be their texture fill rates, yeah?

Number 2 after z fillrate in my opinion.

3870 has a fillrate of 775*16 = 12400MT/s and a processing rate of 775*320*2 = 496GFLOPS

4870 has a fillrate of 850*32 = 27200MT/s and a processing rate of 1050*480*2 = 1008GFLOPS

TU and ALU having different clocks is a seriously unlikely prospect - ATI's never done it and I don't see why RV770 would be the start. Plus I think the register file wouldn't be happy - but that's just another guess.

I've missed a lot of rumors about 4870's power, but it's probably doubled 3870;

HD3870 is about 115W I think, HD4870 appears to be upto 160W, so about 40% extra.

Jawed

Jawed · May 28, 2008

Razor1 said:
first one Performance, unless its serverly bottlenecked somewhere or the chips that are out there are very underclocked or heavily disabled

All the indications are that RV770 will be significantly less than 2x the performance of RV670.

I believe that simple cherry-picking of 4xMSAA benchmarks with RV770 specified as 480 ALU lanes, 16 TUs, 16 RBEs (with 4x Z per clock) is all that AMD needs to claim the performance gains they're indicating.

Jawed

Jawed · May 28, 2008

ShaidarHaran said:
ATi has found the solution for this with R700 AKA 4870 x2. They've ditched NUMA all together and gone to a unified memory space for multiple GPUs. This will provide performance gains in many scenarios. You can count on it.

NUMA means that access time to memory varies depending upon address - which is normally a consequence of having distributed MCs.

I suppose you could say that the MCs dotted around R6xx's ring bus constitute a NUMA configuration.

Jawed

Freak'n Big Panda · May 28, 2008

<snip>

Razor1 · May 28, 2008

I wouldn't say significantly less, then x2 the performance of the rv670. Its less but not that much less then double.

annihilator · May 28, 2008

Jawed said:
NUMA means that access time to memory varies depending upon address - which is normally a consequence of having distributed MCs.

I suppose you could say that the MCs dotted around R6xx's ring bus constitute a NUMA configuration.

Jawed

But doesn't NUMA mean that in a multi-processor system there be separate memory chips for each processor? Or is its definition simply that "the memory access time depending on the memory location"?

I've too heard that the 4870x2 will be a 1GB card (and effectively 1GB, not 512mb like 3870x2); which means there will be only one memory space but two memory buses for two GPUs.

Jawed · May 28, 2008

Razor1 said:
I wouldn't say significantly less, then x2 the performance of the rv670. Its less but not that much less then double.

With the MSAA/Z fillrate bottlenecks lifted (more than 2x faster than RV670), but with texturing performance only improved by clock rate (<40%) then it'll just be TEX limited the whole time - hence overall performance gain will be far short of 2x.

---

I have been wondering if an increase in ALU:TEX ratio is capable of improving TU utilisation. I also wonder if an increase in the number of threads in flight per SIMD might help TU utilisation. In other words I think there are mechanisms by which TEX throughput could scale better than expected simply because of threading inefficiencies in RV670.

Jawed

Jawed · May 28, 2008

nAo said:
IIRC Charlie wrote sometihng about it a long time ago..

This is the other major hint that it's not merely a HD3870X2 kind of configuration:

http://www.hardforum.com/showpost.php?p=1031906993&postcount=28

Though there's no doubt the "impression" of something like the PLX chip in what appears to be HD4870X2's cooler doesn't bode well.

Jawed

Jawed · May 28, 2008

annihilator said:
But doesn't NUMA mean that in a multi-processor system there be separate memory chips for each processor? Or is its definition simply that "the memory access time depending on the memory location"?

http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access

I've too heard that the 4870x2 will be a 1GB card (and effectively 1GB, not 512mb like 3870x2); which means there will be only one memory space but two memory buses for two GPUs.

It's certainly expected.

I'm still unclear on the mechanism by which the two GPUs on HD3870X2 access memory - are they really locked-out of each other's space? Do all data transfers between them depend upon the CPU initiating PCI Express operations?

Jawed

annihilator · May 28, 2008

Jawed: If your guess is correct and 4870 is only %40 faster than 3870, I don't think AMD should even bother with a SRP of $350. And I don't understand why texturing performance scales only with core clock, from the Expreview chart it looks like texture units have doubled and it's the z-fillrate that only increases with core clock, what am I seeing wrong?

I agree that introducing an ALU:TEX ratio (which is bigger than 1) doesn't make sense for ATI, it should have been the other way around if they were to be asynchronous at all.

annihilator · May 28, 2008

I can't edit my posts :S anyway, I wanted to say whether it's the Z-fillrate that only benefits from the clock increase or it's the texture fillrate; that would mean only a %40 increase in performance either way, since they both were bottlenecking the R600 family. But then the superbly increased (more than 2x) math processing rate is all a waste? Why would they do that?

Or are all of these speculations wrong, including the shader count and the clocks?

Jawed · May 28, 2008

annihilator said:
Jawed: If your guess is correct and 4870 is only %40 faster than 3870,

Put me down for 20-80% on average :smile:

I don't think AMD should even bother with a SRP of $350. And I don't understand why texturing performance scales only with core clock, from the Expreview chart it looks like texture units have doubled and it's the z-fillrate that only increases with core clock, what am I seeing wrong?

I'm going out on a limb, suggesting that RV770 is only a minor architectural change (4x Z per clock instead of 2x) with an increase in ALUs and clocks.

Jawed

Jawed · May 28, 2008

Copying-across from the other thread, via a low-bandwidth bridge:

MfA said:
Any moment now one of the mods is going to notice we now have two R7xx threads.

Before that happens though ... the most straightforward way to do better than crossfire/"SLI" is simply adding a bridge on the ring bus and connecting the chips through it, this requires minimum invasiveness in the rest of the design (everything else could be done in software). You would still need the bridge chip though.

This simple bridge is what I've been thinking too, but I don't understand the requirement for a bridge chip.

Come to think of it, my previous estimates of the needed bandwidth were inflated ... you would only need enough bandwidth for vertices and to copy finalized render buffers.

If the memory space is unified then the bandwidth associated with reading textures from the foreign MCs would have to be counted wouldn't it?

Jawed

Arty · May 28, 2008

Freak'n Big Panda said:
you guys are all going to shit bricks when rv770 is released

Fingers crossed!

Is this some insider opinion or just an opinion?

MfA · May 28, 2008

The chip still has to be setup and receive commands from the host, simply reading and writing on the ring bus won't let you do that.

You could just replicate the textures (except for the rendered ones).

AMD: R7xx Speculation

Slyne

Berek

Arty

KEPLER

Jawed

Razor1

Jawed

Jawed

Jawed

Freak'n Big Panda

Razor1

annihilator

Jawed

Jawed

Jawed

annihilator

annihilator

Jawed

Jawed

Arty

KEPLER

MfA

Similar threads