Next gen lighting technologies - voxelised, traced, and everything else *spawn*

It can, according to NVIDIA, they did this because they found that at least 30% of gamecode is Integer, they separated INT from FP because they wanted to exploit the parallelism. They call it concurrent FP & INT.
Oops, so i confused their scalar path (which is new and like AMDs), with their also new concurrent vector integer path (which RDNA lacks) - get it.
Thanks!

Then this becomes:
So Turing can do 4 ops at a time: vector int, vector float, Tensor, and scalar int
...i guess. (Edit: forgot SFU)
 
They have parallel scalar / vector execution since GCN
Are you sure about that? I've read tons of GCN docs and I've never seen any explicit mention of parallel scalar / vector execution.
RDNA docs mention that pipelined SFU ops can be partially overlapped with FMA SIMD ops, also the same docs mention that just 2 waves can be launched per cycle per CU, so how can they overlap scalar ops with SIMD without loosing a SIMD op?
 
Are you sure about that? I've read tons of GCN docs and I've never seen any explicit mention of parallel scalar / vector execution.
Found no such phrasing either after a quick search, but scalar has it's own unit, so it must operate in parallel with the 4 SIMDs (ofc. each working on another wave at a time, so program flow remains serial).

Illustrated here https://de.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah on slide 28 of 98.

There is also this (http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf):

Up to a maximum of 5 instructions can issue per cycle, not including “internal” instructions.
1 Vector Arithmetic Logic Unit (ALU)
1 Scalar ALU or Scalar Memory Read
1Vector memory access (Read/Write/Atomic)
1 Branch/Message - s_branch and s_cbranch_
1 Local Data Share (LDS) – 1 Export or Global Data Share (GDS)
1 Internal (s_nop, s_sleep, s_waitcnt, s_barrier, s_setprio)

Which would make some sense if issueng 5 instructions means the 4 SIMDs + scalar unit can be fed this way to have enough work all the time.

... but i'm not 100% sure - no hardware expert, and i guess i'm more confused about RDNA than you are :)
 
Up to a maximum of 5 instructions can issue per cycle, not including “internal” instructions.
But there is no word on whether these instructions can be executed without loosing execution cycles for SIMDs.

Which would make some sense if issueng 5 instructions means the 4 SIMDs + scalar unit can be fed this way to have enough work all the time
A wave scheduler can issue just 1 wave per clk. Launching a scalar op requires selecting and issuing a wave.
 
But there is no word on whether these instructions can be executed without loosing execution cycles for SIMDs.
IIRC, you would only eventually loose SIMD cycles if you had multiple scalar instructions in a row. So one scalar op followd by vector op is fine. But i never understood the technical reasons, and i assume even in this case the SIMDs can work on other waves if there are some in flight like always.
A wave scheduler can issue just 1 wave per clk. Launching a scalar op requires selecting and issuing a wave.
Maybe this is just about sheduling the ALU-SIMDs, but other units like scalar or memory have their own shedulers?
Guess the 16-wide-SIMDs get the same instruction 4 times for a single wave, each having latency of 4 cycles, while the scalar unit tries to execute 4 scalar ops from 4 other waves.

That's really the point where i'm unsure, and i can't find a proper document to clarify.
But i think i had discussed such things quite often with other devs, some professionals, and the assumption scalar and vector operate concurrently seemed in commen for everyone, IIRC. I never doubted this until now, but i may be wrong.
I agree, if we look at it from a single wavefront the vector cycle is lost with a scalar op, but the vector units keep saturated with processing other wavefronts so there is no real loss when looking at the entire workload?
 
But there is no word on whether these instructions can be executed without loosing execution cycles for SIMDs.


A wave scheduler can issue just 1 wave per clk. Launching a scalar op requires selecting and issuing a wave.
They are parallel.Each different type of instruction is selected from different waves in the same IB
 
Honestly to me that seems more like a demo of why you don't need RTX at the moment, especially in a game where almost everything is static.
 
RT doesn't have to be always nicer to the eye, it enables developers to achieve more realistic graphics with less effort. RT in next-gen isn't going to be much different either.
 
Sony and MS think it's important with RT atleast.
It's a different story on console as you can develop the game with RT not having to support a legacy set of lighting and shadows for those without any RT support.
 
It's a different story on console as you can develop the game with RT not having to support a legacy set of lighting and shadows for those without any RT support.

I don't see how a next gen game is going to be so much different in the RT part, even if developed for just the PS5 console. All MS games are PC too so there you go.
And how do you think it's impossible for MS to not-cross-develop between Xbox and pc with RT as a requirement? RT gpu's are being quite ancient late 2020, 2.5 years by then. People will have to adopt to SSD too sometime, and 8 core CPU's etc.

Like DF mentioned, the level of RT in Control might not even be atainable in the next gen consoles.
 
What about people who will have last gen consoles w/o RT support?
Since when have new consoles games worked on previous gen consoles?
Sure, there's some overlap when new gen gets released and games are released for both gens, but that's not for too long.
 
Back-porting and porting to Nintendo platform is going to be a real bitch.
 
Back
Top