AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
A surprising leak came from Komachi on Twitter. Apparently Sapphire is *preparing* quite a few Radeon RX 5000 models:
RX 5950XT *, * RX 5950 *, * RX 5900XT *, * RX 5900 *, * RX 5850XT *, * RX 5850 *,* RX 5800XT *, * RX 5800 *, * RX 5750XT *, * RX 5750 *, * RX 5700XT *, * RX 5700 *, * RX 5650XT *, * RX 5650 *, * RX 5600XT *, * RX 5600 *, * RX 5550XT *, * RX 5550 *, * RX 5500XT *, * RX 5500 *, *RX590XT*, * RX 590

Sappphire-RX-5000-series.png
https://videocardz.com/newz/sapphire-registers-radeon-rx-5950-5900-xt-rx-5850-5800-xt-series-at-eec
 
yes, because high margin products comes first, Milan is server CPU so it´s safe to assume, that desktop parts and graphic will come later ....
Just like Naples was released such a long time before Summit Ridge and Rome such a long time before Matisse? Oh wait...
 
Are we talking Calendar year or Fiscal year on those slides?
 
While Ryan knows probably crapload more than most of us put together, that post in no way indicates he has actual solid information on any dates or that AMD would have told him how to read it.
No inside info. But by adding 2021, if AMD takes until 2021 to deliver the hardware, they will say that they delivered it on time as promised. With vague roadmaps you should always take the most conservative interpretation, because that's what the vendor will take (otherwise it wouldn't be vague). Then you can be pleasantly surprised if they beat it.

Also, GPU roadmaps are non-linear.
 
Ehm, what?
All 7+ stuff is 2020.
For what it’s worth, I caught a tidbit straight from TSMC regarding their 5nm process. They were on track for volume production as per previously, but added that the HP variant would be ready for volume production in the later part of next year.
Of course, they said nothing about who or what that volume production was for.
 
For what it’s worth, I caught a tidbit straight from TSMC regarding their 5nm process. They were on track for volume production as per previously, but added that the HP variant would be ready for volume production in the later part of next year.
Of course, they said nothing about who or what that volume production was for.
About 99.99% certainly one of the mobile SoC manufacturers, just like MediaTek is first with 7nm+ (which is already in volume production)
 
About 99.99% certainly one of the mobile SoC manufacturers, just like MediaTek is first with 7nm+ (which is already in volume production)
No.
They are the target market of the regular 5nm process that is already in risk production and that is scheduled for volume production start around March 2020. What was interesting here was the addition of a time for 5nm HP volume production, a process variant suitable for GPUs or desktop/server CPUs. There is a fairly limited number of companies interested in being on the leading edge adopting such a process.
 
No.
They are the target market of the regular 5nm process that is already in risk production and that is scheduled for volume production start around March 2020. What was interesting here was the addition of a time for 5nm HP volume production, a process variant suitable for GPUs or desktop/server CPUs. There is a fairly limited number of companies interested in being on the leading edge adopting such a process.
My guess is FPGAs, 4th gen EPYC, or AMD’s supercomputer custom parts. All high margin products. I believe 2021 was their target for the supercomputer.
 
Last edited:
I ran across a Phoronix article about commits for GFX10 enablement changes for Linux drivers.
https://www.phoronix.com/scan.php?page=news_item&px=RadeonSI-Navi-Merge-Pending
NGG does make an entrance, and primitive shaders are mentioned.
There's a fair bit of plumbing to support changes that can allow for shaders to be flagged as being compiled for NGG, with TES, VS, and GS among the types that can be flagged for being primitive shaders.
There's been an additional merging of stages to support various shader types, as the geometry shader stage has been generalized enough to slot in as the vertex shader:

https://gitlab.freedesktop.org/mesa...iffs#5b30d5f9f14cd7cb6cf8bc05f5869b422ec93c63
(from si_shader.h)
Code:
* API shaders           VS | TCS | TES | GS |pass| PS
* are compiled as:         |     |     |    |thru|
*                          |     |     |    |    |
* Only VS & PS:         VS |     |     |    |    | PS
* GFX6     - with GS:   ES |     |     | GS | VS | PS
*          - with tess: LS | HS  | VS  |    |    | PS
*          - with both: LS | HS  | ES  | GS | VS | PS
* GFX9     - with GS:   -> |     |     | GS | VS | PS
*          - with tess: -> | HS  | VS  |    |    | PS
*          - with both: -> | HS  | ->  | GS | VS | PS
*                          |     |     |    |    |
* NGG      - VS & PS:   GS |     |     |    |    | PS
* (GFX10+) - with GS:   -> |     |     | GS |    | PS
*          - with tess: -> | HS  | GS  |    |    | PS
*          - with both: -> | HS  | ->  | GS |    | PS
*
* -> = merged with the next stage
The number of stages in use doesn't necessarily change, based on what's enabled. In some places, there's a reduction of the last VS stage, though that was a pass-through in prior gens. Fully emulating certain features leverages the GDS and its ordering operations.

In DSBR-related news, there's a few minor GFX10-specific changes in the code that was introduced with Vega.

From some skimming, it also appears that there's at the very least a Navi 14:
(si_texture.c)
/* Stencil texturing with HTILE doesn't work
* with mipmapping on Navi10-14. */
if ((sscreen->info.family == CHIP_NAVI10 ||
sscreen->info.family == CHIP_NAVI12 ||
sscreen->info.family == CHIP_NAVI14) &&
base->last_level > 0)
tex->htile_stencil_disabled = true

On another note, there's a new addition to the LLVM GFX10 bug feature list, dealing with branch offsets of 0x3f being unsafe in some way, requiring compiler workaround.
Given this seems to be one off from 64 and related to instruction fetch, and there's already a bug related to controlling the instruction prefetch, this may point to more intensive rework of the instruction pipeline's internals. Not sure how frequently this would come up, though the workarounds may be a bit kludgy.
 
My guess is FPGAs, 4th gen EPYC, or AMD’s supercomputer custom parts. All high margin products. I believe 2021 was their target for the supercomputer.
The really nice thing about having a pure play foundry at the cutting edge of lithographic (and packaging) technology is that their processes are open to everyone. (Even Intel!) If your product will likely be profitable, then off you go. And if you aren’t able to leverage the technology on offer, maybe your competitor is.
 
I
https://gitlab.freedesktop.org/mesa...iffs#5b30d5f9f14cd7cb6cf8bc05f5869b422ec93c63
(from si_shader.h)
Code:
* API shaders           VS | TCS | TES | GS |pass| PS
* are compiled as:         |     |     |    |thru|
*                          |     |     |    |    |
* Only VS & PS:         VS |     |     |    |    | PS
* GFX6     - with GS:   ES |     |     | GS | VS | PS
*          - with tess: LS | HS  | VS  |    |    | PS
*          - with both: LS | HS  | ES  | GS | VS | PS
* GFX9     - with GS:   -> |     |     | GS | VS | PS
*          - with tess: -> | HS  | VS  |    |    | PS
*          - with both: -> | HS  | ->  | GS | VS | PS
*                          |     |     |    |    |
* NGG      - VS & PS:   GS |     |     |    |    | PS
* (GFX10+) - with GS:   -> |     |     | GS |    | PS
*          - with tess: -> | HS  | GS  |    |    | PS
*          - with both: -> | HS  | ->  | GS |    | PS
*
* -> = merged with the next stage

Whats the different to Nvidias Turing? Are Primitive Shaders now always on? And whats the difference to Nvidias Mesh shader?
 
No.
They are the target market of the regular 5nm process that is already in risk production and that is scheduled for volume production start around March 2020. What was interesting here was the addition of a time for 5nm HP volume production, a process variant suitable for GPUs or desktop/server CPUs. There is a fairly limited number of companies interested in being on the leading edge adopting such a process.
My guess is FPGAs, 4th gen EPYC, or AMD’s supercomputer custom parts. All high margin products. I believe 2021 was their target for the supercomputer.
Could Apple use 5 nm HP?
 
Whats the different to Nvidias Turing? Are Primitive Shaders now always on? And whats the difference to Nvidias Mesh shader?

The API shaders at the top of the table are mapped to internal shader stages executed by the hardware. I haven't seen a similar listing for what is done internally by Nvidia.

The shaders that are compiled as primitive shaders are flagged as being such, so the option to compile them as normal still exists. The automatic primitive shader concepts first discussed by AMD focused on culling, and the automatic path worked by using dependence analysis of a shader to extract operations in a vertex or other shader and place them in an early culling phase ahead of the rest of the shader. If for some reason the compiler could not separate the position calculations from the rest of the shader, it wouldn't be compiled as a primitive shader. If there was shader code that mixed usage of position and attribute data, or perhaps if there was a mode like transparency that prevented a lot of culling from working, this may be a reason for the compiler to avoid redundant work.

It's not clear if this new iteration of NGG has added features versus the concepts introduced in Vega.
If it's similar, then there are some differences from Nvidia's task and mesh shaders.
Nvidia's path is explicitly separate from the standard geometry pipeline with tessellation and other shaders, with the general argument that outside of certain cases they are more effective. Mesh shading is heavily focused on getting good throughput and efficiency by optimizing the representation and reuse of static geometry. Task shaders can perform a level of decision making and advance culling by being able to vary things like what LOD model the mesh shaders will use, or how many mesh shaders will be launched. There's a more arbitrary set of primitive types that can be fed into that pipeline, and the process exposes a more direct way to control what threads are launched to provide the necessary processing.

Primitive shaders exist within the standard geometry pipeline, which includes tessellation, vertex, and geometry shaders. It's not going to require balancing between pipeline types by the programmer. There's no mention of the sort of reuse or optimization of static geometry, which points to more work being done every frame despite the much of it not changing.
The decision-making of the shaders is more limited, since they are different ways of expressing the standard shader types. They can do the same things more efficiently or with more culling, not do different things like change what model is used or explicitly control the amount of thread expansion. The primitive types used seem to be more standard formats rather than a more generalized set of inputs.

That doesn't rule out that there can be some overlap or changes going forward. Presentations on task and mesh shading mention the possibility of adding culling elements to mesh shaders similar in concept to what AMD proposes to mesh shaders, and AMD alluded to possible future uses of primitive shaders that might allow for more complex behavior. Possibly, the more generalized GS stage may hint at things becoming more flexibile as far as what kind of data is passed through the pipeline and how it is mapped in terms of threads in the shader array.


TEXTURE PROCESSOR BASED RAY TRACING ACCELERATION METHOD AND SYSTEM
United States Patent Application 2019019776

http://www.freepatentsonline.com/20190197761.pdf
This seems consistent with the BVH texture instructions mentioned in an earlier LLVM commit. I've speculated about what facets of pre-Navi architectures best mapped to the BVH traversal process, and that it seemed like it benefited by hardware that had its own independent scheduling and data paths that didn't work in lock-step with the SIMD execution path.
At the time I wondered if either the shared scalar path or texturing path could be evolved to handle this, and each had certain features that might be useful depending on the level of programmability or raw performance.
The texturing path already does a lot of automatic generation of memory accesses and internal scheduling for more complex filtering types, and already handles a memory-dominated workload.
The scalar path was shared hardware in past GPUs, is associated with sequencing decisions for a thread, and had its own data path. However, it was more heavily used and needed more execution capability at the time, and with Navi it's become less separate.
 
Status
Not open for further replies.
Back
Top