Question is if shaders are so dumb on Low setting (in TR). For RX 480: 2304/32 = 72, for GTX 1060: 1280/48 = 26. So shaders must run (with all internal HW overhead) less than 72 cycles (RX 480) or 26 cycles (GTX 1060) to become ROP (or theoretical pixel fillrate) bound. I can't say for sure if...
Just can't see anything wrong with nVidia drivers in this particular scenario. More TFlop powerfull card has beaten less TFlop card in Low detail game setting (when geometry processing may not be a bottleneck anymore). It is rather strange, that it is not the case always (i mean High detail game...
RX 480 is a 5.8 TFlop, GTX 1060 is 4.4 TFlop. Maybe it is just an effect of lower geometry detail at Low settings in game? At Very High both cards bottleneck at geometry processing (which 1060 has slightly better), but on Low Radeon shows much stronger pixel processing power.
Anyway - Is it...
Very good app. But i have same questions: You are rendering graphics to 4k x 4k surface, it means that about 16M pixel shaders are in flight. As i can see on NVidia HW compute is serialized after graphics task - so maybe graphics task have a some kind of higher priority, and can't be preempted...