Prescott's HyperThreading 2 == 4 logical CPUs?

I doubt it.

If you run multiple threads the chances of getting instructions that aren't in the trace cache increase, which can reduce IPC.

Hyper Threading from what I know/understand of it simply polls threads on alternate cycles for instructions. IMO this scheme shows diminishing returns rather quickly.

Even with large caches, cache thrashing will be a big issue.
 
Saem said:
I doubt it.

If you run multiple threads the chances of getting instructions that aren't in the trace cache increase, which can reduce IPC.

Hyper Threading from what I know/understand of it simply polls threads on alternate cycles for instructions. IMO this scheme shows diminishing returns rather quickly.

Even with large caches, cache thrashing will be a big issue.

No... it doesn't poll alternate cycles. It actually issues instructions for two threads each cycle. :)
 
IMHO unless Prescott has more execution resources, the effect of 4 threads will be quite minimal at most times. Of course, for latency sensitive applications, more threads will help. However, this would make it more like a MTA rather than a SMT. The PC memory architecture is not particularly well-suit for a MTA.
 
No... it doesn't poll alternate cycles. It actually issues instructions for two threads each cycle.

Issuing and fetching instructions from two code streams is another matter altogther.
 
Tagrineth said:
No... it doesn't poll alternate cycles. It actually issues instructions for two threads each cycle. :)

I don't think it does even that to be frank. I believe it simply switches thread when it runs into a page fault and tries to run the alternate thread while waiting for data for the first one.

I don't see how the P4 could issue instructions for two separate threads within the same execution units. That does not sound feasible at all.


*G*
 
Grall said:
I don't think it does even that to be frank. I believe it simply switches thread when it runs into a page fault and tries to run the alternate thread while waiting for data for the first one.

I don't see how the P4 could issue instructions for two separate threads within the same execution units. That does not sound feasible at all.


*G*
Yet, Intel claims that the architecture is capable of issuing instructions from 2 threads at the same time. Just changing threads at page fault time buys you nothing over software threading (as you still need to take a context change and actually handle the page fault in software) - are you perhaps thinking of cache misses? Even so, if you change active threads only at cache miss time, you still need to cancel lots of already-issued instructions (P4 is an Out-Of-Order architecture, meaning that it could issue lots of instructions before, and even after, the cache miss is detected, even to the point of overlapping multiple misses), and take a full ~20-cycle pipeline bubble - in OOO architectures, it is actually a fair bit cheaper/easier to execute instructions from multiple threads at the same time than doing the switch-on-event multithreading that you suggest.
 
I don't see how the P4 could issue instructions for two separate threads within the same execution units. That does not sound feasible at all.

From my understanding after micro ops are in the reorder buffer, the rest of the pipeline doesn't have to much to be concerned with which thread the instruction came from, it only important when the instructions are being retired, since that's where you have to know to which thread the results of the operations belong.
 
Back
Top