Using and in-order-core to perform out-of-order code search.

wildcardx · Jul 19, 2005

Is it possible to use an in-order-core to perform the task of finding code that can be run (out of order) at the hardware level or even using it run software to search for code and feed an IOC?

Why or why not?
And if so what are the limitations for each method?

ShootMyMonkey · Jul 19, 2005

The time taken up by such an algorithm run in software would probably cancel out any performance advantage. A more reasonable alternative would be a good compiler that re-orders the code such that instruction latencies are not so poorly hidden. Most any modern compiler does this, though if we'd never switched to OOOE (P6), they might have gone farther on this matter. Also, reordered instructions alone will help hide small stalls, but not big ones. OOO cores don't take code into the issue window out-of-order, but instead continue to issue non-dependent instructions when stalls occur such that instructions *complete* out of order (something that software OOOE won't really be able to do), so reordering instructions to reduce consecutive data dependencies is still necessary.

Moreover, you're talking about something that happens at the machine code level. It'd be damn annoying to manually write something that has to decode and determine the dependencies between instructions, which in general is not so different from emulating the processor you're anyway running on.

darkblu · Jul 19, 2005

yes, that'd be a property of virtual machines. don't know of any that do that atm, though.

Alpha_Spartan · Jul 19, 2005

With a good compiler, it's not necessary. Secondly, virtual machines have alot of overhead. That would do more harm than good.

Qroach · Jul 19, 2005

Perhaps dynamic recompilation would make more sense?

Sonic · Jul 20, 2005

This thread does not belong in this forum. Locked until moved to appropriate forum.

Rolf N · Jul 21, 2005

Alpha_Spartan said:
With a good compiler, it's not necessary.

Compiler decisions are static, which just cannot be as good as doing it dynamically at runtime.
Consider data dependent branches. A branch hint (which is the best a compiler can do anyway) that works well for one data set might be counterproductive for another. Further, while there is some coherency to conditional branch outcome more often than not, the path you're going to take most of the time may change over the lifetime of the application.

E.g.

Code:

while (1)
{
   float delta_t=time_since_last_frame();
   if (bullet_time_key_down()) delta_t*=0.25f;
   world->animate(delta_t);
}

The outcome of this branch doesn't just switch erratically. You'll much more likely have dozens to hundreds of frames in a row where it's taken, and then another run of many frames where it's not taken.
A dynamic branch predictor as implemented in current x86 CPUs takes just a few mispredicts to completely turn around and predict this branch correctly again. A static compiler "best guess" can't adapt at all.

Or take even cache hit vs cache miss on a simple memory access. You cannot statically schedule such things in an optimal fashion.

Aplha_Spartan said:
Secondly, virtual machines have alot of overhead. That would do more harm than good.

Depends on what kind of virtual machine you're referring to.
E.g. x86 on x86-type VMs have zero overhead for executing garden variety code. User level instructions can just execute as is. There is significant overhead though if the hosted code hits something that needs to be emulated (privileged resources such as hardware registers, descriptors etc).

nAo · Jul 21, 2005

zeckensack said:
A branch hint (which is the best a compiler can do anyway) that works well for one data set might be counterproductive for another.

If a branch hint can be dynamic and if a compiler has enough instructions to schedule ahead to make the hint effective you have win-win situation, the hint is never counterproductive.

Rolf N · Jul 21, 2005

nAo said:
zeckensack said:

A branch hint (which is the best a compiler can do anyway) that works well for one data set might be counterproductive for another.

Click to expand...

If a branch hint can be dynamic and if a compiler has enough instructions to schedule ahead to make the hint effective you have win-win situation, the hint is never counterproductive.

I'm not sure I understand what you're saying.
I only know of branch hints that are tied to a dynamic branch instruction. When the machine encounters the instruction, it continues issuing the instructions on the most likely branch, and (later) rolls back if that turned out wrong. I don't know any machines that a)leave any window for further instructions between a branch hint and the corresponding branch nor b)allow the hints themselves to change over time (without resorting to self-modifying code).

Of course you can combine compiler hints with dynamic hardware predictors, to the effect that the hint given by the compiler stacks with the predictor's current idea of the branch outcome. In that context I agree that there wouldn't be much of a misprediction problem. But then, that's not an entirely static model anymore. IMO the adaptive portion is much more important than the compiler hint. It doesn't hurt all that much to omit the hint if you have an adaptive hardware predictor.

ERP · Jul 21, 2005

zeckensack said:
nAo said:

zeckensack said:

A branch hint (which is the best a compiler can do anyway) that works well for one data set might be counterproductive for another.

Click to expand...

If a branch hint can be dynamic and if a compiler has enough instructions to schedule ahead to make the hint effective you have win-win situation, the hint is never counterproductive.

Click to expand...

I'm not sure I understand what you're saying.
I only know of branch hints that are tied to a dynamic branch instruction. When the machine encounters the instruction, it continues issuing the instructions on the most likely branch, and (later) rolls back if that turned out wrong. I don't know any machines that a)leave any window for further instructions between a branch hint and the corresponding branch nor b)allow the hints themselves to change over time (without resorting to self-modifying code).

Of course you can combine compiler hints with dynamic hardware predictors, to the effect that the hint given by the compiler stacks with the predictor's current idea of the branch outcome. In that context I agree that there wouldn't be much of a misprediction problem. But then, that's not an entirely static model anymore. IMO the adaptive portion is much more important than the compiler hint. It doesn't hurt all that much to omit the hint if you have an adaptive hardware predictor.

He's suggesting that the processor has a "hint" instruction that can be used before the branch to minimise the mispredict penalty.

I think branch prediction is actually the lesser of the evils for an in order core. Hiding cache and instruction latency is a much larger issue and it isn't necessarilly trivial. The mechanisms are understood, and I keep hearing "but a good compiler...." but I have yet to see a compiler do anything more than make a half assed attempt at it.

By that definition of "good compiler" all compilers suck.

Look at how long VCL takes to hide 4 cycle latency in trivial VU programs, and it still doesn't deal with the general case, put a branch inside a loop and it simply fails.

nAo · Jul 21, 2005

zeckensack said:
I'm not sure I understand what you're saying.
I only know of branch hints that are tied to a dynamic branch instruction.

I was talking about a hint instruction which takes as arguments dynamic values (registers, of course

) , so it can know in advance if a branch would be taken or not.
CELL's SPE processors should employ a mechanism like that.

nAo · Jul 21, 2005

ERP said:
By that definition of "good compiler" all compilers suck.

I admit I have zero experience in writing compilers but it doesn't seem to me an insurmountable problem to take advance of SPE's hint instruction, am I asking tuo much?

Look at how long VCL takes to hide 4 cycle latency in trivial VU programs, and it still doesn't deal with the general case, put a branch inside a loop and it simply fails.

Eheh..it takes AGES..I know

Even if branchy loops sucky code it's not entirely a VCL fault..it can't be done much better than that once you have a loop filled with instructions from iteration n-1th,nth,n+1th an do so on..
I try to avoid braches in my loops designing all my code to be composed of many branchless passes.

MfA · Jul 21, 2005

It probably just does an exhaustive search.

They need to start with something like CoSy this time, not gcc.

nAo · Jul 21, 2005

MfA said:
It probably just does an exhaustive search.

It does

They need to start with something like CoSy this time, not gcc.

Ehhe..VCL is just an advanced preprocessor anyway :?

ERP · Jul 21, 2005

nAo said:
MfA said:

It probably just does an exhaustive search.

Click to expand...

It does

They need to start with something like CoSy this time, not gcc.

Click to expand...

Ehhe..VCL is just an advanced preprocessor anyway :?

The only reason I brought up VCL is to demonstrate it's not a trivial problem despite first appearances. It has simple scheduling rules and short programs. Yet it can't solve the general case and it take forever to generate close to optimal solutions.

MfA · Jul 22, 2005

As long as the compiler can run in a distributed mode you can always just buy a cluster to speed up compilation

Rolf N · Jul 22, 2005

nAo said:
zeckensack said:

I'm not sure I understand what you're saying.
I only know of branch hints that are tied to a dynamic branch instruction.

Click to expand...

I was talking about a hint instruction which takes as arguments dynamic values (registers, of course ) , so it can know in advance if a branch would be taken or not.
CELL's SPE processors should employ a mechanism like that.

I still don't get it. To take the correct path, the branch predicate must be known. Likewise, to give the correct hint, the same (?) predicate must be known. I don't understand how that's beneficial. You're just introducing a second dependency on the branch condition. It must be computed anyway for the branch.

Can you whip up some (pseudo) code example to shed some light on the situation?

MfA · Jul 22, 2005

Nao seems to be mixing up split branches with branch hints, we dont know whether there will be dynamic branch hints AFAIK. Im not sure a dynamic branch hint would be usefull even.

What we do know is that it uses split branches (just like PowerPC). So as long as you have enough work to do after you know the predicate you can simply use an unconditional branch, no branch prediction necessary.

EDIT : Actually reading the ISSCC paper they say "it inserts branch hint instructions to identify branches and load the probable targets into the SMBTB". Hmmm that can mean a lot of things. On the face of it it sounds like it isnt really a split branch architecture as such, but instead has an instruction which both hints (probably statically) which way the next branch will go and what it's target is. Since the target is dynamic though, you would be able to use that type of instruction for a dynamic hint ... as the static hint use branch taken, and if during run time you think it wont be then store the address of the instruction after the branch in the BTB (assuming it's smart enough not to rewind even though the direction hint was wrong).

Anyway that is all irrelevant, the important thing is that if you have work which can be done in between the predicate being known and the branch then you will never waste time ... unlike with say the x86 (where you are at the mercy of the branch prediction).

Oh BTW, on the example ... never no way would the history for the branch still be available after rendering an entire frame

nAo · Jul 22, 2005

zeckensack said:
I still don't get it. To take the correct path, the branch predicate must be known. Likewise, to give the correct hint, the same (?) predicate must be known.

Right

I don't understand how that's beneficial. You're just introducing a second dependency on the branch condition. It must be computed anyway for the branch.

To tell the truth the 2 branch predicates are one, they're the same thing, of course.
It's beneficial to dynamically tell to the CPU which path will be taken if you can do it many cycles before that path will be taken and you have some useful work to schedule between the hint and its associated branch.
This way one can move the branch 'misprediction' penalty and thus it can be hidden if we have some useful work to do.

Can you whip up some (pseudo) code example to shed some light on the situation?

Sure:
//////////////////////////////////////////////////////////////
float fSide = Dot3(vLocalEye, vLocalNormal)

//Hint can be placed here..
//A branch will be taken in the future according fSide value

// all these matrix vector muls can be schedule after the hint and before the branch
vEye = mLocal2World*vLocalEye;
vNormal = mLocal2World*mLocalNormal;

// here we have a branch..
if (fSide >= 0.0f) vColor = ComputeFrontLighthing(vEye, vNormal, vLight)
else vColor = ComputeBackLighthing(vEye, vNormal, vLight)
//////////////////////////////////////////////////////////////

If vEye and vNormal computation is comprised of enough instructions to hide our processor pipeline length (+ hint instruction latency) we'd have no branch penalty at all, even if fSide is a completely random value.
I think no BPU can beat that

Using and in-order-core to perform out-of-order code search.

wildcardx

ShootMyMonkey

darkblu

Alpha_Spartan

Qroach

Sonic

Senior Member

Rolf N

Recurring Membmare

nAo

Nutella Nutellae

Rolf N

Recurring Membmare

ERP

nAo

Nutella Nutellae

nAo

Nutella Nutellae

MfA

nAo

Nutella Nutellae

ERP

MfA

Rolf N

Recurring Membmare

MfA

nAo

Nutella Nutellae

Similar threads