Next gen lighting technologies - voxelised, traced, and everything else spawn

JoeJ · Oct 19, 2019

DavidGraham said:
It can, according to NVIDIA, they did this because they found that at least 30% of gamecode is Integer, they separated INT from FP because they wanted to exploit the parallelism. They call it concurrent FP & INT.

Oops, so i confused their scalar path (which is new and like AMDs), with their also new concurrent vector integer path (which RDNA lacks) - get it.
Thanks!

Then this becomes:

JoeJ said:
So Turing can do 4 ops at a time: vector int, vector float, Tensor, and scalar int

...i guess. (Edit: forgot SFU)

3dcgi · Oct 21, 2019

DavidGraham said:
Each CU in RDNA has:

32 SPs (IEE754 FP32 and INT32 vector ALUs)

1 SFU

1 INT32 scalar ALU

4 Texture units

1 scheduling and dispatch unit

units for cache read/writes

Each CU has 2 SIMD32, so 64 SPs, 2 SFU, 2 scalar ALU, 2 instruction dispatch units

Kaotik · Oct 21, 2019

nevermind

OlegSH · Oct 21, 2019

JoeJ said:
They have parallel scalar / vector execution since GCN

Are you sure about that? I've read tons of GCN docs and I've never seen any explicit mention of parallel scalar / vector execution.
RDNA docs mention that pipelined SFU ops can be partially overlapped with FMA SIMD ops, also the same docs mention that just 2 waves can be launched per cycle per CU, so how can they overlap scalar ops with SIMD without loosing a SIMD op?

JoeJ · Oct 21, 2019

OlegSH said:
Are you sure about that? I've read tons of GCN docs and I've never seen any explicit mention of parallel scalar / vector execution.

Found no such phrasing either after a quick search, but scalar has it's own unit, so it must operate in parallel with the 4 SIMDs (ofc. each working on another wave at a time, so program flow remains serial).

Illustrated here https://de.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah on slide 28 of 98.

There is also this (http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf):

Up to a maximum of 5 instructions can issue per cycle, not including “internal” instructions.
1 Vector Arithmetic Logic Unit (ALU)
1 Scalar ALU or Scalar Memory Read
1Vector memory access (Read/Write/Atomic)
1 Branch/Message - s_branch and s_cbranch_
1 Local Data Share (LDS) – 1 Export or Global Data Share (GDS)
1 Internal (s_nop, s_sleep, s_waitcnt, s_barrier, s_setprio)

Which would make some sense if issueng 5 instructions means the 4 SIMDs + scalar unit can be fed this way to have enough work all the time.

... but i'm not 100% sure - no hardware expert, and i guess i'm more confused about RDNA than you are

OlegSH · Oct 21, 2019

JoeJ said:
Up to a maximum of 5 instructions can issue per cycle, not including “internal” instructions.

But there is no word on whether these instructions can be executed without loosing execution cycles for SIMDs.

JoeJ said:
Which would make some sense if issueng 5 instructions means the 4 SIMDs + scalar unit can be fed this way to have enough work all the time

A wave scheduler can issue just 1 wave per clk. Launching a scalar op requires selecting and issuing a wave.

JoeJ · Oct 21, 2019

OlegSH said:
But there is no word on whether these instructions can be executed without loosing execution cycles for SIMDs.

IIRC, you would only eventually loose SIMD cycles if you had multiple scalar instructions in a row. So one scalar op followd by vector op is fine. But i never understood the technical reasons, and i assume even in this case the SIMDs can work on other waves if there are some in flight like always.

OlegSH said:
A wave scheduler can issue just 1 wave per clk. Launching a scalar op requires selecting and issuing a wave.

Maybe this is just about sheduling the ALU-SIMDs, but other units like scalar or memory have their own shedulers?
Guess the 16-wide-SIMDs get the same instruction 4 times for a single wave, each having latency of 4 cycles, while the scalar unit tries to execute 4 scalar ops from 4 other waves.

That's really the point where i'm unsure, and i can't find a proper document to clarify.
But i think i had discussed such things quite often with other devs, some professionals, and the assumption scalar and vector operate concurrently seemed in commen for everyone, IIRC. I never doubted this until now, but i may be wrong.
I agree, if we look at it from a single wavefront the vector cycle is lost with a scalar op, but the vector units keep saturated with processing other wavefronts so there is no real loss when looking at the entire workload?

iamw · Oct 22, 2019

OlegSH said:
But there is no word on whether these instructions can be executed without loosing execution cycles for SIMDs.

A wave scheduler can issue just 1 wave per clk. Launching a scalar op requires selecting and issuing a wave.

They are parallel.Each different type of instruction is selected from different waves in the same IB

JoeJ · Oct 22, 2019

Finally found some evidence / explantation here: https://anteru.net/blog/2018/even-more-compute-shaders/

PSman1700 · Oct 22, 2019

Should be available now on steam.

Malo · Oct 22, 2019

Honestly to me that seems more like a demo of why you don't need RTX at the moment, especially in a game where almost everything is static.

PSman1700 · Oct 22, 2019

RT doesn't have to be always nicer to the eye, it enables developers to achieve more realistic graphics with less effort. RT in next-gen isn't going to be much different either.

Malo · Oct 22, 2019

PSman1700 said:
RT doesn't have to be always nicer to the eye, it enables developers to achieve more realistic graphics with less effort.

It's only less effort if you're only doing RT. That's years away.

JoeJ · Oct 22, 2019

Malo said:
It's only less effort if you're only doing RT. That's years away.

It's only less effort if you could do thousands of rays per pixel. That will never happen, because till then we'll find enough other stuff worth our effort.

PSman1700 · Oct 22, 2019

Malo said:
It's only less effort if you're only doing RT. That's years away.

Sony and MS think it's important with RT atleast.

Malo · Oct 22, 2019

PSman1700 said:
Sony and MS think it's important with RT atleast.

It's a different story on console as you can develop the game with RT not having to support a legacy set of lighting and shadows for those without any RT support.

Deleted member 2197 · Oct 22, 2019

Malo said:
It's a different story on console as you can develop the game with RT not having to support a legacy set of lighting and shadows for those without any RT support.

What about people who will have last gen consoles w/o RT support?

PSman1700 · Oct 22, 2019

Malo said:
It's a different story on console as you can develop the game with RT not having to support a legacy set of lighting and shadows for those without any RT support.

I don't see how a next gen game is going to be so much different in the RT part, even if developed for just the PS5 console. All MS games are PC too so there you go.
And how do you think it's impossible for MS to not-cross-develop between Xbox and pc with RT as a requirement? RT gpu's are being quite ancient late 2020, 2.5 years by then. People will have to adopt to SSD too sometime, and 8 core CPU's etc.

Like DF mentioned, the level of RT in Control might not even be atainable in the next gen consoles.

Kaotik · Oct 22, 2019

pharma said:
What about people who will have last gen consoles w/o RT support?

Since when have new consoles games worked on previous gen consoles?
Sure, there's some overlap when new gen gets released and games are released for both gens, but that's not for too long.

BRiT · Oct 22, 2019

Back-porting and porting to Nintendo platform is going to be a real bitch.

Next gen lighting technologies - voxelised, traced, and everything else spawn

JoeJ

3dcgi

Kaotik

Drunk Member

OlegSH

JoeJ

OlegSH

JoeJ

iamw

JoeJ

PSman1700

Malo

Yak Mechanicum

PSman1700

Malo

Yak Mechanicum

JoeJ

PSman1700

Malo

Yak Mechanicum

Deleted member 2197

Guest

PSman1700

Kaotik

Drunk Member

BRiT

(>• •)>⌐■-■ (⌐■-■)

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Drunk Member

Yak Mechanicum

Yak Mechanicum

Yak Mechanicum

Deleted member 2197

Guest

Drunk Member

(>• •)>⌐■-■ (⌐■-■)

Similar threads

Next gen lighting technologies - voxelised, traced, and everything else spawn