So the myth of "programming to the metal" is not real and is not about coding using the most obscure functions of a chip but just about special memory management?
That's tough to accept, I always had that view on low level programming, like there was some mystery to it.
If you don't have low level access, then there are things you can still do, if I recall it's faster for array size of foo[x][y] to iterate through [foo][x+1][y] than for [foo][x][y+1], the latter will need to hit memory, the former should be held in cache. [or the opposite can't quite recall]
So the more you get things working in registers and cache and the less you hit memory the better. And if you aren't there yet the more you are hitting memory is better than hitting the hard drive.
There are special processing units like SIMD for instance, where it can handle math for 4x ints. So instead of doing something like performing an operation between two arrays and incrementing by 1, you can actually load 4 ints at once from both arrays and increment by 4.
one way to look at it, is really just about reducing wasted cycles and letting the faucet run with the fewest impediments. For example, it's better to work with allocations in the power of 2 so that the processor will only need to do bit shifts for division and multiplication instead of running costly operations.
reducing branches is a big one, but honestly I think this is generally crazy hard to do. I recall getting heavily penalized for creating an object during run-time, so I created all objects before hand and iterated through them to determine if the object was enabled or not, if enabled run the update and the render for that object. Too many for loops and too many if/else statements.
But honestly thinking about it, there was probably a better way to do this but even now I'm not sure, I'm just an amateur. I made particle effects the same way LOL, they were moving textures exploding in random directions.
memory management is the biggest one because each game is unique in the problems that they're trying to solve, so how the API handles the moving of data is a big deal, but as much as memory management is a big deal taking advantage of the hardware that is available is also a big deal.
Looking back I wish I understood more about hardware then when I was working on the game; regardless it was still a great learning experience.