AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
I'm not entirely sure what you're trying to say as a reply to what I've posted.

The fact that BVH traversal acceleration unit may be located inside a "texture processor" in RDNA2 doesn't make it any more or less flexible than what Turing has - and please do note that we know nothing about the placement of RT h/w in Turing as NV hasn't actually disclosed this information.

The fact that hit evaluations are happening and data paths are used differently than they *seem* to be happening and used in Turing doesn't mean that Turing can't do this in a similar fashion - in fact, NV has already officially confirmed that Turing h/w will support DXR 1.1.
Yes sorry, I think I lost track of what I was writing when I was writing it. Probably needed a heavy handed edit there.
I was thinking along the lines of locality on the silicon. The texture units playing a major role combine with the compute shaders working together and reusing the same types of hardware could possibly benefit in locality (though, possibly not as well, reuse could be detrimental).

This point here:
(3) In addition, by utilizing the texture processor infrastructure, large buffer for ray storage and BVH caching are eliminated that are typically required in a hardware ray tracing solution as the existing VGPRS and texture cache can be used in its place, which substantially saves area and complexity of the hardware solution

is the only (notable) reason I think it's in the TMU. It's cheaper to put it there.
 
Just to clarify - obviously the topology has changed (reorganisation into WGPs and resource allocation) but the individual units (all 2560 of them in Navi 10) are unchanged from GCN 5/Vega?

Will we see an AT deep dive come release of RDNA2?

I would settle for a deep dive on RDNA 1 :)
 
I hope this means a broad "re-baselining" of Apple on Navi. Vega didn't get much attention, letting the 500-series remain as the standard. But with it's own API being a middle ground between OpenGL and Vulcan and new graphics hardware I just hope they can get developer interest, and ease of conversion, to the point where Mac finally becomes a viable gaming platform. An expensive and probably worse performing one compared to the competition, but compared to the current Apple? Yea. Anything is an improvement.

Be that as it may, VRS! Woo!
 
Residency map descriptors
Document Type and Number:
United States Patent 1054080
Filing Date: 01/31/2019
Publication Date: 01/21/2020

Abstract:
A processor receives a request to access one or more levels of a partially resident texture (PRT) resource. The levels represent a texture at different levels of detail (LOD) and the request includes normalized coordinates indicating a location in the texture. The processor accesses a texture descriptor that includes dimensions of a first level of the levels and one or more offsets between a reference level and one or more second levels that are associated with one or more residency maps that indicate texels that are resident in the PRT resource. The processor translates the normalized coordinates to texel coordinates in the one or more residency maps based on the offset and accesses, in response to the request, the one or more residency maps based on the texel coordinates to determine whether texture data indicated by the normalized coordinates is resident in the PRT resource
http://www.freepatentsonline.com/10540802.pdf

METHOD AND SYSTEM FOR PARTIAL WAVEFRONT MERGER
Document Type and Number:
United States Patent Application 20200019530
Filing Date: 07/23/2018
Publication Date:01/16/2020

Abstract:
A method and system for partial wavefront merger is described. Vector processing machines employ the partial wavefront merger to merge partial wavefronts into one or more wavefronts. The system includes a partial wavefront manager and unified registers. The partial wavefront manager detects wavefronts in different single-instruction-multiple-data (“SIMD”) units which contain inactive work items and active work items (hereinafter referred to as “partial wavefronts”), moves the partial wavefronts into one or more SIMD unit(s) and merges the partial wavefronts into one or more wavefront(s). The unified register allows each active work item in the one or more merged wavefront(s) to access the previously allocated registers in the originating SIMD units. Consequently, the contents of the unified registers do not have to be copied to the SIMD unit(s) executing the one or merged wavefront(s).
http://www.freepatentsonline.com/20200019530.pdf

PIXELATION OPTIMIZED DELTA COLOR COMPRESSION
Document Type and Number:
United States Patent Application 20200005514
Filing Date: 06/29/2018
Publication Date: 01/02/2020

Abstract:

A technique for compressing an original image is disclosed. According to the technique, an original image is obtained and a delta-encoded image is generated based on the original image. Next, a segregated image is generated based on the delta-encoded image and then the segregated image is compressed to produce a compressed image. The segregated image is generated because the segregated image may be compressed more efficiently than the original image and the delta image.
http://www.freepatentsonline.com/20200005514.pdf

TECHNIQUES FOR REDUCING SERIALIZATION IN DIVERGENT CONTROL FLOW
Document Type and Number:
United States Patent Application 20200004585
Filing Date: 06/29/2018
Publication Date: 01/02/2020

Abstract:

Techniques for executing shader programs with divergent control flow on a single instruction multiple data (“SIMD”) processor are disclosed. These techniques includes detecting entry into a divergent section of a shader program and, for the work-items that enter the divergent section, placing a task entry into a task queue associated with the target of each work-item. The target is the destination, in code, of any particular work-item, and is also referred to as a code segment herein. The task queues store task entries for code segments generated by different (or the same) wavefronts. A command processor examines task lists and schedules wavefronts for execution by grouping together tasks in the same task list into wavefronts and launching those wavefronts. By grouping tasks from different wavefronts together for execution in the same front, serialization of execution is greatly reduced or eliminated.
http://www.freepatentsonline.com/20200004585.pdf

COOPERATIVE WORKGROUP SCHEDULING AND CONTEXT PREFETCHING
Document Type and Number:
United States Patent Application 20200004586
Filing Date: 06/29/2018
Publication Date: 01/02/2020

Abstract:
A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.
http://www.freepatentsonline.com/20200004586.pdf

Thanks to https://twitter.com/Underfox3
 
https://github.com/CLRX/CLRX-mirror/commit/a4c9fdfd191eda8fb206debe778dc9130caa3545

Navi 12 pieces are starting to fall on their places.
Navi 12 is GCN1.5.1 or "Navi 1.1", it's similar upgrade to Navi as Vega 20 (GCN1.4.1/5.1) was over Vega 10 (GCN1.4/5), adding support for Deep Learning instructions.
It also switches GDDR6 for 2048bit HBM2, but CU count is same as Navi 10 at 40 "old CUs" or 20 "Dual CUs".

Are there rumors about the launch date? I haven't seen anything more specific than "2020" so far.
 
I believe Intel said summer 2020 at some point.
I doubt Intel said anything about Navi 12's launch window. (if you meant to post in Xe-thread, they've only said 2020 for the discrete, summer could refer to announcing Tiger Lake with integrated Xe)

Are there rumors about the launch date? I haven't seen anything more specific than "2020" so far.
Hard to say, but I doubt it will ever be a retail product (at least any more than other Radeon Pros are).
 
In the context of half of RDNA1 being canned, you can understand why people were confused.
Huh? No RDNA chips have been canned.
There was always three, Navi 10, 12 and 14. Navi 10 and 14 have been released, Navi 12's later schedule is easy to understand, since it's actually RDNA"1.1", adding DL instructions over standard RDNA (like Vega 20 did over Vega 10) as well as switching GDDR6 for HBM2e.
 
Huh? No RDNA chips have been canned.
There was always three, Navi 10, 12 and 14. Navi 10 and 14 have been released, Navi 12's later schedule is easy to understand, since it's actually RDNA"1.1", adding DL instructions over standard RDNA (like Vega 20 did over Vega 10) as well as switching GDDR6 for HBM2e.

With Arcturus on the way this never really made sense to me. Arcturus appears to be a huuuuge chip, currently running at 1ghz in a lab apparently but no doubt destined to run much faster by release. And we know Vega is faster per mm and watt than RDNA in terms of compute anyway, so the 2 products seem to overlap with one being clearly better.

Now if it was RDNA 2 I could see it, look at our huge raytracing enabled chip with 96gb HBM2E is a pretty good pitch for VFX houses. You could easily get most scenes that would show up on like, a Netflix or HBO series in RAM, then hardware raytracing gets you 10x or more the render speed. But why there's, rumored at the very least, 2 DL HBM chips coming out in the same year for AMD is something I can't find logic in.
 
Status
Not open for further replies.
Back
Top