Texture streaming seems to be the culprit there.
Texture (and other data) streaming to GPU memory is often a cause of stuttering in PC games. This is mostly because DirectX abstracts the resource management (Java / garbage collection syndrome). With abstract resource management the GPU driver has no clue what textures you need in a certain level area. When you bind a resource to the GPU, and it is not resident in the GPU memory (you can't even ask for this), the driver notices that it is missing, starts the texture upload and with high likelihood stalls the frame rendering (if a big texture is missing or multiple small ones). Good manual texture management uses engine knowledge about level design and moves textures to GPU memory just ahead of time, avoiding these stalls.
With DX12, controlling movement of data into GPU memory is finer grained and lower overhead, and less likely to interfere with other data being sent to the GPU.
Copy queus make movement cheaper and lower latency. But most importantly the game engine can tell the GPU what data is needed instead of employing driver side black magic to guess it.
What "code base"? It's not an enterprise with legacy code.
Big game engines have huge amount of legacy code. Even the first party console studios don't rewrite their whole code base for every project. We are talking about code bases of several million lines here. It would not be commercially viable to rewrite it all during a single project.
Usually debugging generated code is a real nightmare.
Not if you just generate the small platform specific command creation part, and if you employ techniques to make debugging easier. Some studios even employ code generation to make runtime code editing faster and easier, improving the iteration time. Code generation can be used to made debugging easier instead of harder when used properly.
That was also the case in previous generation. But slightly different: people, who wrote good PS3 code, also wrote a good X360 or PC one.
SPUs certainly forced people to think about data access patterns and optimize the crap out of the data movement between memory and local store. This of course helps all cache based architectures, especially ones like Xbox 360 that require manual cache prefetching to perform well. However Xbox 360 VMX128 code needs a lot of special care to work well. SPUs do not LHS stall for example. And SPUs have lower instruction latency. Compiler needs lots of parallelism for VMX128 inside each loop body to generate good code (and utilize that huge pool of 128 vector registers). SPU code doesn't require that much unrolling and other tricks to perform as expected.
Current generation allows you to use exacly the same optimized CPU code on both platforms. This has never been possible before. When you optimize a loop with AVX intrinsics that code can be used on both consoles. When you optimize some data set to fit to the L1 and L2 cache better it helps both platforms identically, since they both have Jaguar CPUs with same caches (same size and associativity). When you optimize around CPU bottlenecks and quircks, both consoles can use the same code. This is a big improvement for cross platform developers.
On the GPU side, you also can optimize the shader code once (for GCN), and expect minimal extra modifications based on platform. On PS3 you had to be extra careful about 32 bit ALUs and brancing, interpolants, etc. Xbox 360 GPU allowed more advanced techniques, but only if you had the time to fully rewrite your lighting/etc code for PS3. Some devs did lighting and post processing on SPUs (very different code indeed compared to Xbox 360 shader code).