No, I'm really not convinced that the average programmer will master low-level multi-core programming in the future. Note that with "average programmer" I'm talking about the gross of desktop application programmers, and with low-level multi-core programming I'm referring to understanding MESI, the ABA problem, read-write reordering, etc. I think it's comparable to assembly programming (or even lower level), something many just don't want to deal with. And they shouldn't have to...Perhaps not now, but the average programmer of the future (? years from now) will have to. Even in single processor machines, it hasn't been a good idea to stay single-threaded (or single processing) since the advent of time sharing. Eventually your single thread/process is going to block on IO (perhaps even just a virtual memory page read), and the CPU will go idle (when you could still be doing useful work).
Asynchronous IO is a different story. It's an easy to understand abstraction and like you say it's more related to time sharing than multi-core. Furthermore, a millisecond is a small delay for asynchronous IO, while for the multi-core programming I'm talking about it's an eternity.
Again it's not the same thing we're talking about here. In a server, threads are largely independent, and just serving different users. The only locking that happens is when threads access shared resources outside of the processes (e.g. databases). Also, these resources (or their drivers) are written by highly experienced programmers. The programmers of the services generally don't have to deal with the details of having multiple processors, they essentially deal with single-threaded code and pass data through highly abstracted messages.While threaded (or multi-processing via fork) programming might be a relatively "new" topic for game developers, it has been a way of life for most "server" programmers for a long time.
The kind of multi-core programming game, multimedia and driver developers have to deal with is a single application running as many threads as there are cores, striving to advance execution as fast as possible. They have to deal with things like spin loops doing no useful work and context switching having a significant overhead. There's a lot of fine-grain inter-thread communication that requires a deep understanding of hardware and O.S. mechanisms.
Anyway, my main point was and still is that most programmers need frameworks to abstract locks-and-threads into something like dependencies-and-tasks.