R700 Inter-GPU Connection Discussion

Kinda off topic on this but interseting to see what AMD might have next in line. Kyle at [H] posted this

R800 will be single GPU design and Bulldozer will do reverse hyperthreading.

What in the world is reverse hyperthreading?
 
multipule cores appearing as one logical core. personally i think thats BS and if it isn't why would they go that way for CPU but go back to monolthic for GPU even though they have seemed to focus on it.
 
The alleged benefit of reverse hyperthreading is auto-parallelization. They are able to take a single monolithic software task and scale it's execution across multiple cores without any effort on the part of the software developers. This saves the developers from the tedious and error-prone manual parallelization process and software rewrite using more complicated algorithms.

The reason why they're moving in seemingly different directions is the nature of each realm. Typical GPU tasks are inherently parallel behind the scenes (read: at the sub-pixel level). Typical CPU tasks are inherently sequential.
 
Last edited by a moderator:
I'm waiting to buy my 4870 until more info comes out about this card. If multi-gpu issues are solved I'm there. I just wish ATI would be more forthcoming with information because I would imagine many people like me are waiting to find out if the R700 is really bringing home the bacon. I understand they why they would want to wait but it's not like a Nvidia can pull out a similar card out of a hat if ATI truly has solved the multi-gpu problem.
 
While shared memory seems increasingly unlikely, someone did mention that in this pic: http://www.ocxtreme.org/opb/hd4850/r700slide.JPG

The GPU-Z shows 1GB of memory being detected. Now obviously that's an ATI PR slide and who knows if they photoshopped anything, but a few tidbits is that the core clocks are still 750 MHz (might be changed soon) and that the memory is showing 1GB.

No idea if GPU-Z actually detects the memory itself or if the memory values are in a database, but I figure that detecting memory is one of the easier things to do.
 
What does reverse HT really help out in the end anyway?

Mitosis seems on the pessimistic extreme end of scaling. Ugh.

I do, however, believe in the R800/single chip rumour. In fact, I had one of them made when RV670 debuted. ;)
 
Supposedly "Reverse Hyperthreading" is a form of speculative execution, similar to what you see in (for example) the Itanium processor. If you encounter a branch in the code, which is dependent on a calculated value, then, in a conventional processor, you risk a pipeline stall: it has to try and predict which branch to take before the calculated value comes out of the pipeline, and, if it gets it wrong, you have to flush out the entire pipeline to start on the other branch.

With speculative execution the processor immediately starts to execute both branches at once. Once the calculated value comes out out of the pipeline it discards one branch and continues with the other without interruption.

This obviously uses up considerably more total CPU time, but, if you've got an otherwise single-threaded application it allows you to make use of what would otherwise be an idle second processor core - and the result is that you never experience any pipeline stalls due to branch prediction going wrong (and, effectively, you don't even need any branch prediction hardware any more).

Personally I've always been thoroughly sceptical about "Reverse Hyperthreading". Speculative execution works nicely in Itanium, but, like many other Itanium features, it is very strongly dependent on the compiler churning out code that makes use of the feature. The claim with "Reverse Hyperthreading" was that AMD chips would be able to do the same thing with legacy code without recompiling it. That sounds a lot less likely to me. I'm not sure there's ever been any evidence that "Reverse Hyperthreading" is a real feature, as opposed to one dreamed up in the fevered imaginations of especially rabid AMD fanboys.
 
While shared memory seems increasingly unlikely, someone did mention that in this pic: http://www.ocxtreme.org/opb/hd4850/r700slide.JPG

The GPU-Z shows 1GB of memory being detected. Now obviously that's an ATI PR slide and who knows if they photoshopped anything, but a few tidbits is that the core clocks are still 750 MHz (might be changed soon) and that the memory is showing 1GB.

No idea if GPU-Z actually detects the memory itself or if the memory values are in a database, but I figure that detecting memory is one of the easier things to do.

The slide is legit.
 
With speculative execution the processor immediately starts to execute both branches at once. Once the calculated value comes out out of the pipeline it discards one branch and continues with the other without interruption.
The case you're discussing with Itanium is actually not unique. Compilers can leverage predicated instructions to unconditionally fold branch outcomes into a single code stream.
GPUs have predication as well.
x86 and other CPU ISAs without predication can do something similar with conditional moves. x86 could do more if it actually had more registers.

This is used carefully in compiled code because a succession of branches can explode the number of instructions that have to be coalesced into a single stream.

Such branch folding is actually something of a bad thing for OoO CPUs, as predication and conditional moves force an explicit data dependence that dynamic scheduling cannot break.

This obviously uses up considerably more total CPU time, but, if you've got an otherwise single-threaded application it allows you to make use of what would otherwise be an idle second processor core - and the result is that you never experience any pipeline stalls due to branch prediction going wrong (and, effectively, you don't even need any branch prediction hardware any more).
It would also be a waste >90% of the time.
That 90% would be the proportion of branch encounters that are properly predicted by branch prediction.
90% of branches would have the CPU consuming twice the resources and twice the power for the same amount of work.

If this were implemented, I'd bet there would be a predictor structure that basically tracks branches the CPU keeps on mispredicting, and only then would it use such capability.

The claim with "Reverse Hyperthreading" was that AMD chips would be able to do the same thing with legacy code without recompiling it. That sounds a lot less likely to me. I'm not sure there's ever been any evidence that "Reverse Hyperthreading" is a real feature, as opposed to one dreamed up in the fevered imaginations of especially rabid AMD fanboys.

The possibly more realistic claims I've seen indicate a more modest sharing of units. I haven't seen claims about executing down both branch paths.

edit:
Back on topic:

It appears from the way the chips are aligned in the board pics that each RV770 has one side on the die with the sideband port.
Each chip is rotated 180 degrees from the other.
This fits with a two-lane bus of some kind, with each chip's in lane lining up with an out lane from the other.
It does seem to put a limit of 2 chips per board, at least for this implementation.
 
Last edited by a moderator:
The slide is legit.

Yeah the slide's legit, I'm just wondering if GPU-Z detecting RAM is actually just GPU-Z looking it up in a database like other things or if it actually is detecting 1GB for that GPU. So either the R700 is 2 x 1 GB or 1GB shared... hm!
 
Kinda off topic on this but interseting to see what AMD might have next in line. Kyle at [H] posted this

LOL, wrong on both accounts. Kyle must have the same AMD sources as Fuad.

What in the world is reverse hyperthreading?

Something that doesn't exist, sadly. It's the "holy grail" method of extracting instruction level parallelism for multi-core CPUs. I started a discussion on RHT @ RWT at the time that Inq article came out and it was soundly trounced. Great idea, simply unrealistic to implement.

The idea was to create a control scheme by which a single thread could be parallelized across multiple homogenous cores, presumably in an x86 CPU (hence the name).
 
3dilettante/nicolasb:

WRT your RHT definition(s) and speculative execution/full branch traversal (multiple independent branch path execution) jeez that's a mouthful :p

Do not several in-order execution CPUs lacking predication hw do this already? Cell, for instance.
 
Yeah the slide's legit, I'm just wondering if GPU-Z detecting RAM is actually just GPU-Z looking it up in a database like other things or if it actually is detecting 1GB for that GPU. So either the R700 is 2 x 1 GB or 1GB shared... hm!

I haven't used this pic in months!
burns_excellent.jpg

Moohoohoohaha! :devilish:

Come on, show me a shared memory architecture you mo fackies! Give consumers the biggest revolution in GPU performance - ever.
 
LOL, wrong on both accounts. Kyle must have the same AMD sources as Fuad.

I don't think so. Kyle was probably the first to back up, like 6 months ago, the theories about a R700 that was not only a CF on a card. Which now seems extremely likely.
I think that with that post he's hinting to the fact that R800, even if it will be a multi-gpu card, it will be viewed by the system as a single gpu card.
 
I don't think so. Kyle was probably the first to back up, like 6 months ago, the theories about a R700 that was not only a CF on a card. Which now seems extremely likely.
I think that with that post he's hinting to the fact that R800, even if it will be a multi-gpu card, it will be viewed by the system as a single gpu card.

Or he's simply joking.
 
Back
Top