AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
So Navi 10,12 and 14. Is it confirmed that 5700 is Navi 10? I'm guessing linux patches majorly address Navi10 and so would be for the first chip's release.

Because looking back at rumors, they had three Navi chips, 20, 12 and 10.

gOr5Kgs_d.jpg

Navi 12 had 40CUs and was supposed to come in at Vega56.

wccftech had an article in November where they said that the 40CUs chip was the first Navi and probably took the codename from above.

Though the CUs for Navi10 from above vs. Navi20 don't make sense, unless Navi20 is like Vega20 on improved process.Or probably confused with Vega20 itself looking at the performance with the given CUs?

And also four Navi variants from MacOS,
The code for the update in question was spotted by Gigamaxx on the TonyMacx86 forum which videocardz picked up on. The source code includes four Navi graphics processors, the Navi 16, Navi 12, Navi 10 and Navi 9

https://wccftech.com/four-amd-navi-gpu-variants-leaked-rumored-for-july-2019-launch/

Then we had a rumor four months back that Navi release had been pushed back to October. Maybe confusing it with the next Navi's release?
 
TEXTURE PROCESSOR BASED RAY TRACING ACCELERATION METHOD AND SYSTEM
United States Patent Application 2019019776

http://www.freepatentsonline.com/20190197761.pdf

Seems very similar to RTX, as far as i can imagine how the latter works.
No FF ray reordering / batching is mentioned. The flexibilty options coming from interaction with general shaders likely boil down to what DXR offers: Batch the rays as good as you can yourself in the ray generation shader.
(After all my initial critique, i'm quite fine with that. Reordering likely is something for the post hybrid era, and without it the costs in die area make sense.)

I really wonder how they manage a stack per ray. Can one assume an upper bound of how large this stack has to be? And even if, that's a lot of memory and bandwidth.
Personally i've always used stackless approach on GPU. I must be missing something here...
 
So Navi 10,12 and 14. Is it confirmed that 5700 is Navi 10? I'm guessing linux patches majorly address Navi10 and so would be for the first chip's release.

Because looking back at rumors, they had three Navi chips, 20, 12 and 10.

gOr5Kgs_d.jpg

using AdoredTV chart as reference is really good idea :D
 
The API shaders at the top of the table are mapped to internal shader stages executed by the hardware. I haven't seen a similar listing for what is done internally by Nvidia.

The shaders that are compiled as primitive shaders are flagged as being such, so the option to compile them as normal still exists. The automatic primitive shader concepts first discussed by AMD focused on culling, and the automatic path worked by using dependence analysis of a shader to extract operations in a vertex or other shader and place them in an early culling phase ahead of the rest of the shader. If for some reason the compiler could not separate the position calculations from the rest of the shader, it wouldn't be compiled as a primitive shader. If there was shader code that mixed usage of position and attribute data, or perhaps if there was a mode like transparency that prevented a lot of culling from working, this may be a reason for the compiler to avoid redundant work.

It's not clear if this new iteration of NGG has added features versus the concepts introduced in Vega.
If it's similar, then there are some differences from Nvidia's task and mesh shaders.
Nvidia's path is explicitly separate from the standard geometry pipeline with tessellation and other shaders, with the general argument that outside of certain cases they are more effective. Mesh shading is heavily focused on getting good throughput and efficiency by optimizing the representation and reuse of static geometry. Task shaders can perform a level of decision making and advance culling by being able to vary things like what LOD model the mesh shaders will use, or how many mesh shaders will be launched. There's a more arbitrary set of primitive types that can be fed into that pipeline, and the process exposes a more direct way to control what threads are launched to provide the necessary processing.

Primitive shaders exist within the standard geometry pipeline, which includes tessellation, vertex, and geometry shaders. It's not going to require balancing between pipeline types by the programmer. There's no mention of the sort of reuse or optimization of static geometry, which points to more work being done every frame despite the much of it not changing.
The decision-making of the shaders is more limited, since they are different ways of expressing the standard shader types. They can do the same things more efficiently or with more culling, not do different things like change what model is used or explicitly control the amount of thread expansion. The primitive types used seem to be more standard formats rather than a more generalized set of inputs.

That doesn't rule out that there can be some overlap or changes going forward. Presentations on task and mesh shading mention the possibility of adding culling elements to mesh shaders similar in concept to what AMD proposes to mesh shaders, and AMD alluded to possible future uses of primitive shaders that might allow for more complex behavior. Possibly, the more generalized GS stage may hint at things becoming more flexibile as far as what kind of data is passed through the pipeline and how it is mapped in terms of threads in the shader array.



This seems consistent with the BVH texture instructions mentioned in an earlier LLVM commit. I've speculated about what facets of pre-Navi architectures best mapped to the BVH traversal process, and that it seemed like it benefited by hardware that had its own independent scheduling and data paths that didn't work in lock-step with the SIMD execution path.
At the time I wondered if either the shared scalar path or texturing path could be evolved to handle this, and each had certain features that might be useful depending on the level of programmability or raw performance.
The texturing path already does a lot of automatic generation of memory accesses and internal scheduling for more complex filtering types, and already handles a memory-dominated workload.
The scalar path was shared hardware in past GPUs, is associated with sequencing decisions for a thread, and had its own data path. However, it was more heavily used and needed more execution capability at the time, and with Navi it's become less separate.
"Fixed function ray intersection engine in a texture processor" - question, this texture processor in amd nomenclature is tmu or some sort of CU ?
 
So.. BVH calculations on RT'ed Navi are done on the TMUs?

Won't this create a bottleneck on texture mapping throughput?
 
EXTREME-BANDWIDTH SCALABLE PERFORMANCE-PER-WATT GPU ARCHITECTURE
Family ID: 1000003133364
Appl. No.: 15/851476
Filed: December 21, 2017
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/PTO/search-bool.html&r=11&f=G&l=50&co1=AND&d=PG01&s1="Advanced+Micro+Devices".AANM.&OS=AANM/"Advanced+Micro+Devices"&RS=AANM/"Advanced+Micro+Devices"

AMD's attempt to Integrate Memory dies on top of APUs with a control die
Not sure if the CPU can also use the memory die.

Seems like a method to stack an APU or GPU with HBM using TSVs.
I keep reminding myself of that patent that mentioned a method to dissipate the heat of a chip across the PCB with copper tubes.
Stacking HBM with an APU would be great to reduce costs, but the problem of dissipating the heat between the stacks should prevent it from happening.

Maybe this way they could do it.

GsqFUW7.png
 
Seems like a method to stack an APU or GPU with HBM using TSVs.
I keep reminding myself of that patent that mentioned a method to dissipate the heat of a chip across the PCB with copper tubes.
Stacking HBM with an APU would be great to reduce costs, but the problem of dissipating the heat between the stacks should prevent it from happening.

Maybe this way they could do it.

GsqFUW7.png
Like this

 
Probably lack of ray intersection engines in texture processor doesn't help


Reading the patent, I've a hard time figuring out if the ray intersection engine is a dedicated hardware engine, or just a "concept"/step for calculations done by existing units.
 
Status
Not open for further replies.
Back
Top