I've done some analysis of these things every few years.
Long ago, (in the eighties), it was definitely worthwile. The only compiler that did a good job at optimizing things was Watcom at that time. The compilers from Borland and Microsoft did stupid things like:
Code:
MOV AX, [var1]
MOV BX, AX
MOV BX, [var2]
MOV AX, [var2]
MOV CX, [var1]
ADD AX, CX
That was due to not having much memory and limiting the amount of different passes as much as possible. But that changed drastically.
With Borland Pascal 7/8, and later with Delphi 4, the compiler did a *MUCH* better job at optimizing anything I could come up with, than I could do myself. Just rewriting the code to use different flow structures didn't matter anyting at all, the inner part of the resulting code was mostly the same anyway. And it was pretty hard to understand how it compared to the code I wrote.
Unrolling things, inlining functions, changing all sequences, using very strange (and sometimes even undocumented) opcodes, predicated fast-out, ignoring variables that weren't used or only resulted in a condition, even generating small tables of non-dependent conditions, it did it all.
Objects and data structures did make a difference, though. And although the memory manager did a much better job at it than the default Windows one or malloc, you could see the individual objects being created. I think it was with Delphi 5, that they started only allocating whatever parts that were actually used, for local one-shot ones. Although you will get it all if you use classes.
I think the main difference in optimizing is, where you do it. Are you going to optimize during the parsing step, or during the code generation step? What abstraction level is used? It might be, that most C/C++ compilers that target different architectures do it during the code generation, while compilers that only support a single target arcitecture can do it during the parsing stage.
When you do it at the parsing stage, you still have all the rich syntactical information that tells you what the programmer tried to accompilsh. You lose that if you do it during the code generation stage. In that case, it might make a (slight) difference how you write your code. But when done during the parsing, it doesn't really matter.
So, the way you define and handle your data structures makes a difference, almost all other things don't. Except for interfaces with the API and in general all other libraries. While the libraries itself are optimized, using different functions from different libraries is about the worst case for the code optimizing, as only the local scope is taken into consideration. Then again, it won't matter very much, the most visible case probably being virtual functions.
As for garbage collection: while that protects against memory leaks, there is quite some overhead associated with it, and it runs in batches, often triggered by low-memory conditions or allocations of large blocks of memory. So, it tends to generate much more work during the actual alloc/free process, while increasing the total amount of used memory, IMO.