This answer is still the equivalent of "swizzle, fuck yeah!" without telling us why this is useful.
I can answer that. GPUs tile 2d and 3d textures in a way that improves data cache locality. It follows usually some sort of morton order (but with limited tile size instead of fully global):
https://en.wikipedia.org/wiki/Z-order_curve. The default swizzle standardizes the order, making it possible to cook assets to this layout to disk, making it faster to stream data from CPU to GPU. Also it makes it possible for multiple GPUs of different brand to more efficiently access 2d and 3d textures generated by each other.
@sebbbi Curious about the effect of MSAA on L2 cache. MSAA performance seems relatively poor on Vega. Although I guess your renderer may not be compatible with msaa?
UE4 deferred renderer doesn't support MSAA. It is fully designed around temporal antialiasing. Temporal antialiasing is also used to enable stochastic optimizations (remove noise) from screen space reflections, ambient occlusion and transparencies, among others. This makes these effects cheaper and better quality. I don't know how Vega behaves with MSAA, and frankly I don't care about MSAA anymore, as good temporal antialiasing is better, and as a bonus allows high quality temporal upsampling as well (saving 50%+ of frame cost with minimal IQ degradation).
How does the raytracer in clay book work in a layman/high level and how does it interact with ue4.
We ray trace signed distance field volumes (SDF). Our volumes are stored as multiresolution (hierarchical) volume texture. It is using a hybrid sphere tracing / cone tracing algorithm that does empty space skipping by wide cones and then splits cones to single pixel rays on impact. Ray tracing is running on async compute during g-buffer rendering, and there's a full screen pixel shader pass that combines the result to the g-buffer at end of g-buffer pass. There's also a shadow ray trace pass (proper penumbra widening soft shadows) that currently runs in a pixel shader, writing to UEs full screen shadow mask buffer.
If the physics is running on the gpu, how much work is left for the CPU and is it possible to do that work on the gpu -even if it won't be as effective?
Physics is 100% running on GPU. Every shape has 16k particles and we have lots of real time (3d) fluid interacting seamlessly with the deforming shapes, so running physics on CPU would be not possible.
Can ai code run on the gpu?
We have done some mass AI tests that run on GPU. We also generated mass path finding data (velocity field) for them on GPU.
Could your game take advantage of fp16/rapid packed math?
Yes. But Unreal Engine doesn't yet support fp16 on desktops. Only on mobiles. Their DX11 backend doesn't yet support DX 11.2 API (which enables fp16 support).
How close are we to fully ray tracer games?
Ray tracing is great for some forms of geometry, but less great on others. Branch coherency on GPU makes heterogeneous raytracing (= multiple levels of different acceleration structures and/or early out tests) inefficient. Our multires SDF sidesteps this problem (it has simple inner loop with no branches). Ray tracing is also awesome for shadows and AO. I expect games to start ray tracing shadows and AO before they start ray tracing the visible rays.
On PC we will support DX11 and DX12. Possibly Vulkan if UE desktop Vulkan backend gets ready in time, and if we have time to port all our customizations to Vulkan.
What's the most interesting thing about vega?
The new virtual memory system it by far the most impressive achievement. AMD calls it HBMCC, and it basically maps the whole system memory (DDR4) to GPU use. When the GPU touches a memory region, that piece of data will be instantly transferred from DDR4 to HBM2 at page granularity. Games only touch a small percentage of GPU memory every frame, and the accessed data set changes slowly (because animation needs to be smooth to look like continuous movement). With this tech, 8 GB of fast HBM2 should behave similarly as 16+ GB of traditional GPU memory (or even more, depending how big portion of the data is high res content which is only needed up close to a particular surface). I have plans to test this technology by ray tracing huge (32+ GB) volume textures when I have time. I was disappointed that not a single reviewer tested Vega with huge datasets versus traditional 8 GB GPUs and 12 GB Titan X.
Does anything is vega give you opportunities to get massive perf gains in old code?
No massive gains. Iterative improvements mostly. The tiled rasterizer shows huge gains in some engineering apps with massive overdraw, but no old game behaves like this. Hopefully no future game behaves like this either (better to occlusion cull early by software). HBMCC should be huge win for very large datasets, but current games don't have any problems with traditional 6 GB GPUs, so a 8 GB GPU with HBMCC doesn't show any benefits at the moment. Xbox One X with 12 GB memory should accelerate GPU memory consumption (24 GB devkit is just sweet). Maybe next year we see some gains over traditional 8 GB GPUs.
HMBCC also should reduce frame judder in DX11 games since the GPU doesn't need to transfer whole resources on use. It can simply page on demand, causing much smaller data movement per frame -> less stalls. I would be interested to know whether this is already is visible in current games. Do we see less fps spikes and better minimal fps compared to other AMD GPUs?
When will you talk about your engine tech?
I will write something after we ship the game.