AMD: Navi Speculation, Rumours and Discussion [2019-2020]

gamervivek · Jun 28, 2019

So Navi 10,12 and 14. Is it confirmed that 5700 is Navi 10? I'm guessing linux patches majorly address Navi10 and so would be for the first chip's release.

Because looking back at rumors, they had three Navi chips, 20, 12 and 10.

Navi 12 had 40CUs and was supposed to come in at Vega56.

wccftech had an article in November where they said that the 40CUs chip was the first Navi and probably took the codename from above.

Though the CUs for Navi10 from above vs. Navi20 don't make sense, unless Navi20 is like Vega20 on improved process.Or probably confused with Vega20 itself looking at the performance with the given CUs?

And also four Navi variants from MacOS,

The code for the update in question was spotted by Gigamaxx on the TonyMacx86 forum which videocardz picked up on. The source code includes four Navi graphics processors, the Navi 16, Navi 12, Navi 10 and Navi 9

https://wccftech.com/four-amd-navi-gpu-variants-leaked-rumored-for-july-2019-launch/

Then we had a rumor four months back that Navi release had been pushed back to October. Maybe confusing it with the next Navi's release?

JoeJ · Jun 28, 2019

PizzaKoma said:
TEXTURE PROCESSOR BASED RAY TRACING ACCELERATION METHOD AND SYSTEM
United States Patent Application 2019019776

http://www.freepatentsonline.com/20190197761.pdf

Seems very similar to RTX, as far as i can imagine how the latter works.
No FF ray reordering / batching is mentioned. The flexibilty options coming from interaction with general shaders likely boil down to what DXR offers: Batch the rays as good as you can yourself in the ray generation shader.
(After all my initial critique, i'm quite fine with that. Reordering likely is something for the post hybrid era, and without it the costs in die area make sense.)

I really wonder how they manage a stack per ray. Can one assume an upper bound of how large this stack has to be? And even if, that's a lot of memory and bandwidth.
Personally i've always used stackless approach on GPU. I must be missing something here...

del42sa · Jun 28, 2019

gamervivek said:
So Navi 10,12 and 14. Is it confirmed that 5700 is Navi 10? I'm guessing linux patches majorly address Navi10 and so would be for the first chip's release.

Because looking back at rumors, they had three Navi chips, 20, 12 and 10.

using AdoredTV chart as reference is really good idea

snc · Jun 28, 2019

3dilettante said:
The API shaders at the top of the table are mapped to internal shader stages executed by the hardware. I haven't seen a similar listing for what is done internally by Nvidia.

The shaders that are compiled as primitive shaders are flagged as being such, so the option to compile them as normal still exists. The automatic primitive shader concepts first discussed by AMD focused on culling, and the automatic path worked by using dependence analysis of a shader to extract operations in a vertex or other shader and place them in an early culling phase ahead of the rest of the shader. If for some reason the compiler could not separate the position calculations from the rest of the shader, it wouldn't be compiled as a primitive shader. If there was shader code that mixed usage of position and attribute data, or perhaps if there was a mode like transparency that prevented a lot of culling from working, this may be a reason for the compiler to avoid redundant work.

It's not clear if this new iteration of NGG has added features versus the concepts introduced in Vega.
If it's similar, then there are some differences from Nvidia's task and mesh shaders.
Nvidia's path is explicitly separate from the standard geometry pipeline with tessellation and other shaders, with the general argument that outside of certain cases they are more effective. Mesh shading is heavily focused on getting good throughput and efficiency by optimizing the representation and reuse of static geometry. Task shaders can perform a level of decision making and advance culling by being able to vary things like what LOD model the mesh shaders will use, or how many mesh shaders will be launched. There's a more arbitrary set of primitive types that can be fed into that pipeline, and the process exposes a more direct way to control what threads are launched to provide the necessary processing.

Primitive shaders exist within the standard geometry pipeline, which includes tessellation, vertex, and geometry shaders. It's not going to require balancing between pipeline types by the programmer. There's no mention of the sort of reuse or optimization of static geometry, which points to more work being done every frame despite the much of it not changing.
The decision-making of the shaders is more limited, since they are different ways of expressing the standard shader types. They can do the same things more efficiently or with more culling, not do different things like change what model is used or explicitly control the amount of thread expansion. The primitive types used seem to be more standard formats rather than a more generalized set of inputs.

That doesn't rule out that there can be some overlap or changes going forward. Presentations on task and mesh shading mention the possibility of adding culling elements to mesh shaders similar in concept to what AMD proposes to mesh shaders, and AMD alluded to possible future uses of primitive shaders that might allow for more complex behavior. Possibly, the more generalized GS stage may hint at things becoming more flexibile as far as what kind of data is passed through the pipeline and how it is mapped in terms of threads in the shader array.

This seems consistent with the BVH texture instructions mentioned in an earlier LLVM commit. I've speculated about what facets of pre-Navi architectures best mapped to the BVH traversal process, and that it seemed like it benefited by hardware that had its own independent scheduling and data paths that didn't work in lock-step with the SIMD execution path.
At the time I wondered if either the shared scalar path or texturing path could be evolved to handle this, and each had certain features that might be useful depending on the level of programmability or raw performance.
The texturing path already does a lot of automatic generation of memory accesses and internal scheduling for more complex filtering types, and already handles a memory-dominated workload.
The scalar path was shared hardware in past GPUs, is associated with sequencing decisions for a thread, and had its own data path. However, it was more heavily used and needed more execution capability at the time, and with Navi it's become less separate.

"Fixed function ray intersection engine in a texture processor" - question, this texture processor in amd nomenclature is tmu or some sort of CU ?

CarstenS · Jun 28, 2019

gamervivek said:
Is it confirmed that 5700 is Navi 10?

That's how articles about the Navi reveal at AMDs techday say it, yes.

anexanhume · Jun 28, 2019

CarstenS said:
That's how articles about the Navi reveal at AMDs techday say it, yes.

It's on an official AMD slide.

giannhs · Jun 28, 2019

http://www.freepatentsonline.com/y2019/0164328.html
PRIMITIVE LEVEL PREEMPTION USING DISCRETE NON-REAL-TIME AND REAL TIME PIPELINES

http://www.freepatentsonline.com/y2019/0179798.html
SYSTEM AND METHOD FOR SCHEDULING INSTRUCTIONS IN A MULTITHREAD SIMD ARCHITECTURE WITH A FIXED NUMBER OF REGISTERS

Dictator · Jun 28, 2019

PizzaKoma said:
TEXTURE PROCESSOR BASED RAY TRACING ACCELERATION METHOD AND SYSTEM
United States Patent Application 2019019776

http://www.freepatentsonline.com/20190197761.pdf

Wonderful find. Cheers!

Deleted member 13524 · Jun 28, 2019

So.. BVH calculations on RT'ed Navi are done on the TMUs?

Won't this create a bottleneck on texture mapping throughput?

Globalisateur · Jun 28, 2019

So is RT on Navi going to be:

Hardware* accelerated Ray Tracing

* Using the available TMUs (hey that's hardware stuff) available on GPUs

Deleted member 90741 · Jun 28, 2019

EXTREME-BANDWIDTH SCALABLE PERFORMANCE-PER-WATT GPU ARCHITECTURE
Family ID: 1000003133364
Appl. No.: 15/851476
Filed: December 21, 2017
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/PTO/search-bool.html&r=11&f=G&l=50&co1=AND&d=PG01&s1="Advanced+Micro+Devices".AANM.&OS=AANM/"Advanced+Micro+Devices"&RS=AANM/"Advanced+Micro+Devices"

AMD's attempt to Integrate Memory dies on top of APUs with a control die
Not sure if the CPU can also use the memory die.

no-X · Jun 28, 2019

ToTTenTranz said:
So.. BVH calculations on RT'ed Navi are done on the TMUs?

Won't this create a bottleneck on texture mapping throughput?

Well, framerate is often halved with RT enabled, so half of the TMUs should be available without any additional bottleneck ;-)

Malo · Jun 28, 2019

Did we really expect Navi to have anything for RT beyond a new innovative use of existing hardware? I don't expect consoles to have anything beyond this either.

Rootax · Jun 28, 2019

How is this not doable on Vega ?

Deleted member 13524 · Jun 28, 2019

ethernity said:
EXTREME-BANDWIDTH SCALABLE PERFORMANCE-PER-WATT GPU ARCHITECTURE
Family ID: 1000003133364
Appl. No.: 15/851476
Filed: December 21, 2017
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/PTO/search-bool.html&r=11&f=G&l=50&co1=AND&d=PG01&s1="Advanced+Micro+Devices".AANM.&OS=AANM/"Advanced+Micro+Devices"&RS=AANM/"Advanced+Micro+Devices"

AMD's attempt to Integrate Memory dies on top of APUs with a control die
Not sure if the CPU can also use the memory die.

Seems like a method to stack an APU or GPU with HBM using TSVs.
I keep reminding myself of that patent that mentioned a method to dissipate the heat of a chip across the PCB with copper tubes.
Stacking HBM with an APU would be great to reduce costs, but the problem of dissipating the heat between the stacks should prevent it from happening.

Maybe this way they could do it.

anexanhume · Jun 28, 2019

ToTTenTranz said:
Seems like a method to stack an APU or GPU with HBM using TSVs.
I keep reminding myself of that patent that mentioned a method to dissipate the heat of a chip across the PCB with copper tubes.
Stacking HBM with an APU would be great to reduce costs, but the problem of dissipating the heat between the stacks should prevent it from happening.

Maybe this way they could do it.

Like this

https://imgur.com/a/lfPx7MI

snc · Jun 28, 2019

Rootax said:
How is this not doable on Vega ?

Probably lack of ray intersection engines in texture processor doesn't help

Rootax · Jun 28, 2019

snc said:
Probably lack of ray intersection engines in texture processor doesn't help

Reading the patent, I've a hard time figuring out if the ray intersection engine is a dedicated hardware engine, or just a "concept"/step for calculations done by existing units.

snc · Jun 28, 2019

Rootax said:
Reading the patent, I've a hard time figuring out if the ray intersection engine is a dedicated hardware engine, or just a "concept"/step for calculations done by existing units.

"Fixed function ray intersection engine" - fixed function so hw

Rootax · Jun 28, 2019

snc said:
"Fixed function ray intersection engine" - fixed function so hw

Ah f***, how did I miss that... The heat wave in France is getting me : o

Thx

AMD: Navi Speculation, Rumours and Discussion [2019-2020]

gamervivek

JoeJ

del42sa

snc

CarstenS

Moderator

anexanhume

giannhs

Dictator

Deleted member 13524

Guest

Globalisateur

Globby

Deleted member 90741

Guest

no-X

Malo

Yak Mechanicum

Rootax

Deleted member 13524

Guest

anexanhume

snc

Rootax

snc

Rootax