DirectX 12: The future of it within the console gaming space (specifically the XB1)

That said the cost of changing pipeline state object may not have changed much, in which case depth sorting (instead of depth pass ?) might not be that good.
On one hand you may be able to remove the depth pass, on the other you still have to pay the cost of pipeline state objects change... (So it would also depend on how many different pipeline state objects you have, with physically based BSDF, it may reduce their number significantly)
Yeah I left that out but like you said it depends on how many shaders you are using.

I thought Front to Back was limited due to all the costly rays needed to calculate occlusion and transparency?
What rays? The zbuffer handles occlusion and transparency in done back to front out of necessity.
 
Last edited:
Do you agree there is now leg room to draw front to back instead of being forced to batch? Do you think its a worthwhile avenue of pursuit?
As Roderic said you still need to be aware of changing pipeline state. The new APIs reduce CPU overhead but there's still a hardware cost. Where you'll see the most benefit from a lot of draw calls is if limited state changes so the hardware doesn't stall. What each architecture can tolerate will vary. In any situation my opinion is draw calls shouldn't be too small because you want work to be amplified on the GPU side of the PCIE bus.
 
Why Xbox One has 48 Ops/Cycle on CPU and 768 Ops/Cycle on GPU?

You'd have to drill back into the discussion threads from earlier in the generation, but the breakdown for Jaguar is that each core has two integer pipes, two memory pipes, and two FP pipes behind its 2-wide front end.
The scheduler can in peak scenarios issue an operation to all six pipes, and with 8 cores that is 48. The sustained throughput is clamped by the front end, which even in the absence of a vast range of hazards (misses, branches, dependences, not having a 2:2:2 mix) cannot provide 6 ops per cycle. It would generally only start to approach if a stall causes a buildup of ops in the scheduler and then the CPU will race to empty the backlog.

The GPU is a case of 12 CUs with 4 16-wide SIMDs, or 12x64 = 768.
 
You'd have to drill back into the discussion threads from earlier in the generation, but the breakdown for Jaguar is that each core has two integer pipes, two memory pipes, and two FP pipes behind its 2-wide front end.
The scheduler can in peak scenarios issue an operation to all six pipes, and with 8 cores that is 48. The sustained throughput is clamped by the front end, which even in the absence of a vast range of hazards (misses, branches, dependences, not having a 2:2:2 mix) cannot provide 6 ops per cycle. It would generally only start to approach if a stall causes a buildup of ops in the scheduler and then the CPU will race to empty the backlog.

The GPU is a case of 12 CUs with 4 16-wide SIMDs, or 12x64 = 768.

Well, got the Jaguar numbers from the answer here.

https://stackoverflow.com/questions...ndy-bridge-and-haswell-sse2-avx-avx2/15657772

He says - "8 SP FLOPs/cycle: 8-wide AVX addition every other cycle + 8-wide AVX multiplication every other cycle"

And for the xb1 it's 768 flops x ~850mhz = 652 GFlops x 2 per cycle = 1.3 TFlops, right?
 
That wouldn't change what the chip could physically perform, and there are services the system partition would perform for the benefit of the game section. It's been a while since I've looked at this, but my recollection was that this was while they were discussing the SoC as an 8-core CPU.

If the reservation did exclude CPU capability, it could also be argued that the GPU's ops would need an asterisk thanks to the system time-slice it has to give up.
 
3004850-quantum.jpg


http://www.quantumbreak.com/windows10
 
You'd have to drill back into the discussion threads from earlier in the generation, but the breakdown for Jaguar is that each core has two integer pipes, two memory pipes, and two FP pipes behind its 2-wide front end.
The scheduler can in peak scenarios issue an operation to all six pipes, and with 8 cores that is 48. The sustained throughput is clamped by the front end, which even in the absence of a vast range of hazards (misses, branches, dependences, not having a 2:2:2 mix) cannot provide 6 ops per cycle. It would generally only start to approach if a stall causes a buildup of ops in the scheduler and then the CPU will race to empty the backlog.

The GPU is a case of 12 CUs with 4 16-wide SIMDs, or 12x64 = 768.
Is 768 ALUs the same as 768 ops/cycle? Does that mean the PS4 can do 1152 ops/cycle given that it has 1152 ALUs?
 

Notice how I said per core for the CPU? I wasn't saying the whole XB1 gpu can only do 2 flops.
I was going by each ALU. I assume each can handle a Multiply + Add in one cycle? I don't know the specifics.
Anyway, maybe it's more accurate to say it can do 32 ops per vector unit or maybe 128 per compute unit, but I really think every knew what I was trying to say.
 
Yeah, I was just kidding. Clearly the whole of GCN can do more than 2 ops per cycle as wrriten. ;) Although I wasn't sure what you were suggesting is two ops. I'd count it as 128 ops/clock per CU. 128 x 12 x 850 = 1.3 TF. That's AMD's official line. Can work backwards to ops per ALU or VU if one wants.
 
Is Quantum break indicative of the quality of the next new api wave of games, or is it a best case scenario and the same can't be attained for example in a racing game?
 

Well ... that's not exactly proof of DX12 allowing console level optimisation CPU side. i5-4460 or FX-6300 minimum (FX-6300 being six cores and about twice as fast as X1 CPU).

The recommended specs are laughable. Yes, we already know that faster PCs can run games better. Simply "recommending" the fastest PC parts you can name off the top off your head helps no-one. #lazyspecs.

"Intel Core i7 4790, 4GHz or AMD equivalent"

Brilliant. Well done. Thanks for those useful recommended specs. Windows store users will now know they need the AMD equivalent of the i7 4790.

Perfect.

There isn't one.
 
I don't know, maybe they haven't given much thought into those specs, wouldn't be the first game to do that.

And to put it into context, Alan Wake (that ran on the previous iteration of the same engine) had a PC port made in-house by Remedy that ran almost flawlessly on most systems at release. Best to wait for the final product before we start calling them names.
 
Last edited:
Back
Top