Post your code optimizations

Nick said:
And here's another minor reason:
Code:
int i = 0;
int j = i+++i;
I hope you don't write code like this but guess what's the value of j?
That is a reason to group ambiguous-looking operations with parentheses. And for not using a variable twice in the same expression when using postincrement. But not one for not using preincrement, as it isn't even used here.
 
spookysys said:
Note:
foo = foo ^ (foo>>31)
Takes the absolute (almost :) and compiles to 1 instruction: eor with asr.
Takes the absolute? Absolutely not (and that's even assuming that right shift of a signed value works on your processor).

For a 2's complement number, surely you would need ?...
Code:
 foo = (foo ^ (foo>>31))+ (((unsigned int)foo)>>31)
..since -x == (~x)+1

Anyway, for something so common, isn't the C built-in "abs()" likely to do a good job?

(EDIT added missing unsigned)
 
Last edited by a moderator:
Simon F said:
For a 2's complement number, surely you would need ?...
Code:
 foo = (foo ^ (foo>>31))+ (foo>>31)
..since -x == (~x)+1
But then shouldn't it be
Code:
 foo = (foo ^ (foo>>31)) - (foo>>31)
since (foo >> 31) is -1 if the MSB is replicated?
 
Xmas said:
But then shouldn't it be
Code:
 foo = (foo ^ (foo>>31)) - (foo>>31)
since (foo >> 31) is -1 if the MSB is replicated?
Damn .. forgot to add "unsigned" to the second one, but what you've got would save instructions.
 
Last edited by a moderator:
Simon F said:
Takes the absolute? Absolutely not (and that's even assuming that right shift of a signed value works on your processor).

For a 2's complement number, surely you would need ?...
Code:
 foo = (foo ^ (foo>>31))+ (((unsigned int)foo)>>31)
..since -x == (~x)+1

Anyway, for something so common, isn't the C built-in "abs()" likely to do a good job?

(EDIT added missing unsigned)

Note the "almost" in my sentence. Yes, it will be 1 off for negative numbers, but it is good enough by far. The point is it compiles to 1 instruction on the ARM7.
look, it's a 10-cycle approximation algorithm.
It's 10 cycles on arm.
 
spookysys said:
Note the "almost" in my sentence. Yes, it will be 1 off for negative numbers, but it is good enough by far. The point is it compiles to 1 instruction on the ARM7.
look, it's a 10-cycle approximation algorithm.
It's 10 cycles on arm.
Let me see if I'm understanding what you're saying here.

The full abs() function takes 10 cycles. But the approximation you posted to the abs() function only takes 1?

If this approximation is so much faster, why not overload the abs() function with one that uses that approximation, but corrects negative numbers by adding by 1 times the value of the sign bit, as Simon F seems to be proposing? Surely that will be less than 9 instructions? (Though it may be better to cast to an unsigned int by *(unsigned int* &foo) instead, as that shouldn't require any operations).
 
spookysys said:
Note the "almost" in my sentence. Yes, it will be 1 off for negative numbers, but it is good enough by far. The point is it compiles to 1 instruction on the ARM7.
look, it's a 10-cycle approximation algorithm.
It's 10 cycles on arm.
I don't know the ARM instruction set but the one thing I remember was that it has conditional execution. Can't you test for "-" and conditionally negate?
 
Simon F said:
I don't know the ARM instruction set but the one thing I remember was that it has conditional execution. Can't you test for "-" and conditionally negate?
You can. (In fact there's no neg instruction in the ARM ISA, so you have to "conditionally subtract from zero".)
 
Mate Kovacs said:
You can. (In fact there's no neg instruction in the ARM ISA, so you have to "conditionally subtract from zero".)

Yeah, the dirty abs-approximation could most likely be replaced with some of the other instructions setting the negative-flag, and a conditional rsb. That would IIRC be faster on an ARM11... But as this code is ARM7-targeted (and it's an approximation), i think this cheat is perfectly valid. It's a tad easier to schedule well.
 
You really don't want to understand what he's saying, do you? :D

The whole function fast_pythagoras is a 10-cycle approximation on arm. And given the other approximations in there, it would be overkill to do a correct abs.
 
Basic said:
The whole function fast_pythagoras is a 10-cycle approximation on arm. And given the other approximations in there, it would be overkill to do a correct abs.
I would assume that the abs() calculation discussion was not related at all to the Pythagoras approximation, but I suppose more approximation in the latter wouldn't be too bad.
 
RussSchultz said:
It won't, because it can't know what's going on inside your object.

If all of your members that are called within the loop are marked const, you could get away with it, but I doubt that your objects are like that.

The right thing to do is get the count prior to engaging in the loop, then iterate on that value.
Yes, most often the value is constant, but sometimes it's not. And I agree: I wasn't thinking when I said the compiler might optimize it.
 
kusma said:
Yeah, the dirty abs-approximation could most likely be replaced with some of the other instructions setting the negative-flag, and a conditional rsb. That would IIRC be faster on an ARM11... But as this code is ARM7-targeted (and it's an approximation), i think this cheat is perfectly valid. It's a tad easier to schedule well.

exactly!
It does make sense. really! ;-)
abs() is 2 cycles unless you schedule cleverly.
It's just a small (but fun) improvement.
 
Nick said:
No. Operator precedence is well defined.
While operator precedence is well defined, evaluation order is not. Basic is right, i++ + i is undefined (just like i++ * i++).
 
Xmas said:
While operator precedence is well defined, evaluation order is not. Basic is right, i++ + i is undefined (just like i++ * i++).
Evaluation order is well defined for C++. So i++ + i should evaluate the same for every compiler. I'm not 100% sure about i++ * i++ but it should be 0 when starting with i = 0 because i++ is only evaluated after the full statement. After the statement i is 2.
 
Nick said:
Evaluation order is well defined for C++. So i++ + i should evaluate the same for every compiler. I'm not 100% sure about i++ * i++ but it should be 0 when starting with i = 0 because i++ is only evaluated after the full statement. After the statement i is 2.
The one I found states that
Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
And it does not make any notes on the order of evaluation for additive operators.

Furthermore, it states that
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is unspecified.
So the value of "i++ + i" is unspecified even if "i" is of some class with the "post ++" and "+" operators overloaded.
EDIT: corrected typo
 
Last edited by a moderator:
IIRC, Isn't the order of evaluation of floating point defined tho? Since FP is neither associative nor commutative, it would seem pretty important.
 
Back
Top