Anti-Hyperthreading and GPU's

epicstruggle

Passenger on Serenity
Veteran
Amd mentioned recently that future chips might contain an Anti-Hyperthreading (AH) feature. Essentially making a single thread app, run across 2+ cores. My question: Given GPU's highly parallelized nature, would AH bring anything to the table given recent events/advancements (SLI, CrossFire, Quad SLI) to the graphics world?

epic
 
epicstruggle said:
Amd mentioned recently that future chips might contain an Anti-Hyperthreading (AH) feature. Essentially making a single thread app, run across 2+ cores. My question: Given GPU's highly parallelized nature, would AH bring anything to the table given recent events/advancements (SLI, CrossFire, Quad SLI) to the graphics world?

epic

I was actually thinking the other direction, that AH is actually cpu's trying to take a page out of gpu's book (the aforementioned SLI/CrossFire).
 
the most important thing would be free performance gains again.. an ordinary app with no or only mediocre multithreading design would then run on both (all..) cpu's automagically, thus being up to twice (or 4x, etc..) as fast as on a same-clocked single cpu.

but while yes, a similar thing could put single threaded apps onto gpu's over more than one thread/pipeline, there is no real use for this. at least there doesn't seem to be any.
 
davepermen said:
the most important thing would be free performance gains again.. an ordinary app with no or only mediocre multithreading design would then run on both (all..) cpu's automagically, thus being up to twice (or 4x, etc..) as fast as on a same-clocked single cpu.

First off, I think this report is all very fuzzy. That said:

I would think that most of those gains would be rater insignificant since both AMD and Intel have gone pretty far to exploit as much instruction level parallelism out of legacy code as practical. I really doubt we'd see 2x gains for such code on a dual core CPU using this rumored technology, even 50% seems unlikely, but I guess if you've got a single threaded app using ~100% of a single core while the other core sits idle, even a 15% speedup might seem worthwhile.

One place it might pay off is if one core could borrow SIMD units from the other core to execute SSE123 instructions, especially for AMD, which, if I remember correctly, doesn't go as wide as intel when executing SIMD code. Meaning that there is proably a fair amount of single-threaded multimedia and game code out there that could run faster on an an a64 that could somehow borrow SIMD units from the other core.

davepermen said:
but while yes, a similar thing could put single threaded apps onto gpu's over more than one thread/pipeline, there is no real use for this. at least there doesn't seem to be any.

GPUs are already highly pararallel execution engines. If GPU designers have more transistors to work with, they'll just increace the number of functional units, and beef up the infrastructure for keeping them fed. When they run out of transistors, they find ways to rope a whole nother GPU into the mix (a la SLI and Crossfire)
 
epicstruggle said:
would physics calculations benefit from it?
Yes. Basically what you get is a CPU with the double amount of execution units (compared to a regular single-core), capable of running two threads (instead of two separate sets of execution units each capable of running their own thread).

So if you have for example a thread that is very floating-point intensive, and at a certain clock cycle there are two independent multiplications ready to get executed (these can be quite far apart in the instruction stream), they can both start in the same clock cycle. On a regular single-core the second multplication would have had to wait one clock cycle to go to the multiplier pipeline.

So theoretically it could double the speed of a single thread. And although it shares the exeuction units with a second thread they can both run at higher performance than on a regular dual-core if the usage of execution units doesn't overlap too much.

In practice, for applications that didn't get any specific optimizations, the results will be modest. I have done some experiments with a processor emulator a while ago where the number of execution units could be varied. Doubling the number of execution units (starting from a configuration that was already well balanced), increased the performance by 20-60% if I recall correctly. Optimized applications can reach around 80%. All single threaded.
 
I thought it was Reverse-Hyperthreading not Anti... did it get lost in translation or are they different things?
 
Back
Top