If I understood what he said correctly, he's saying the compiler leads to less than optimal ALU utilization. But he's also saying that other parts of the GPU slowing the ALU's down in regards to game performance as well... that has nothing to do with the compilation.Ok wait from what I understand of what your saying is that due to compilation other parts of the GPU aren't used optimally, and that leads to the ALU's not being utilized, well they might be waiting on other parts of the GPU?
As any other architecture GCN has had tons of issues since the very release, in order to get the best out of GCN, those issues must be work-arounded by the programmers, there are all kinds of pipeline bubbles and inefficiencies due to low wavefronts occupancy during simple passes like z-prepass and shadow pass, low command processor throughput in some cases, low primitive rate, low tesselation rate, low utilization with low triangle counts, etc., etc., very broad(broad wavefronts, + several issue ports for each SIMD, LD/ST, BR, Scalar instructions) architectures like GCN suffer the most from such inefficiencies, so probably he meant that there are other things in the graphics pipeline, which could limit the performance to a greater extent than compiler's ineficiences of loop unrolling, so these things could be the primary bottleneck for GCN performance.due to compilation other parts of the GPU aren't used optimally, and that leads to the ALU's not being utilized, well they might be waiting on other parts of the GPU?
You mean like dreams or tommorow's children... not that I can think of.Are there any PC exclusive games/engines with any kind of graphics rendering innovation these days?
Epic could have done something big using Unreal Engine 4, but they've sold their soul these days doing MOBA/tower defense shit on mobile platforms...None that I know of
What does wavefront occupancy have to do with it being a z-prepass or shadowpass?and inefficiencies due to low wavefronts occupancy during simple passes like z-prepass and shadow pass,
Epic could have done something big using Unreal Engine 4, but they've sold their soul these days doing MOBA/tower defense shit on mobile platforms...
It has nothing(literally) to do with shadowpass since vertex/pixel shaders are either too simple or nonexistent, CUs are mostly unused, and with z-prepass all geometry inefficiencies are doubled, which increases chances of pipeline stalls due to tesselation, triangle setup or other bottlenecks in the pipeline(= decreased CU occupancy), objects could be batched in z-prepass though, so small objects are less likely to be an issue hereWhat does wavefront occupancy have to do with it being a z-prepass or shadowpass?
I don't know all the tradeoffs it makes, but AMD's compiler is aware of the need for latency hiding and tries to reduce register usage. It won't always use more registers to save ALUs.Many years writing shaders. Modern compilers are CPU centric. They have no concept of the threads in flight versus register allocation trade off. AMD's compiler will allocate registers until only 1 hardware thread can be in flight.
In this case, I had forgotten what fell under the DCE heading. The shared base number with Carrizo made me wonder whether some of the Carrizo-specific items in the GPU's memory handling could have carried over. Additional searching found more clear references to the display controller, and onQ's post noted that as well.3dilettante, the compute block you're thinking of might be MEC - MicroEngine Compute ? Each MEC block manages 4 "pipes" each supporting up to 8 "queues" (rings).
The slide listing the defining features for 4th Gen GCN has an entry for instruction prefetch that would seem to agree with you.I think somewhere along the line AMD talked about instruction caching having an effect on IPC.
I don't buy it. There would need to be multiple massive shaders/kernels trying to run on a set of CUs simultaneously to get even close to exhausting instruction cache. I believe 4 CUs share an instruction cache of 32KB.
Are you sure you don't mean utilization instead of occupany?It has nothing(literally) to do with shadowpass since vertex/pixel shaders are either too simple or nonexistent, CUs are mostly unused, and with z-prepass all geometry inefficiencies are doubled, which increases chances of pipeline stalls due to tesselation, triangle setup or other bottlenecks in the pipeline(= decreased CU occupancy), objects could be batched in z-prepass though, so small objects are less likely to be an issue here
Yeah I was wondering if they would implement the 64Dword limit as well.Another wish-improvement : root buffer size increase. 512-bytes (16 Dwords) is so little.
I meant both, with pipeline stalls it's utilization, with SM stalls due to the lack of available wavefronts it's occupancyAre you sure you don't mean utilization instead of occupany?
Isn't occupancy how many lanes in a wavefront that are utilized?I meant both, with pipeline stalls it's utilization, with SM stalls due to the lack of available wavefronts it's occupancy