AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Status
Not open for further replies.
Sure but they don't take a % of the asking price by Apple, they take a fixed amount negotiated far in advance which might be an ok amount but nothing close to the absurd profits Apple demands just because.
I'm pretty sure AMD didn't design exclusive chip for Apple for free, which needs to be taken into account too, not just the price AMD gets per each chip
 
Sure but they don't take a % of the asking price by Apple, they take a fixed amount negotiated far in advance which might be an ok amount but nothing close to the absurd profits Apple demands just because.
So we're comparing a percentage we don't really know to a percentage we don't really know?
 
A major quality of this release is that it demonstrates the folly of ascribing a power efficieny metric to an architecture based on a single implementation. The performance per Watt of this little fellow is pretty damn good.
One might hope that this is kept in mind once Ampere and RDNA2 products are compared.
 
A major quality of this release is that it demonstrates the folly of ascribing a power efficieny metric to an architecture based on a single implementation. The performance per Watt of this little fellow is pretty damn good.
One might hope that this is kept in mind once Ampere and RDNA2 products are compared.

I don't know why it should be taken in special consideration. It's just an underclocked/undervolted part like we've seen plenty of times before, from both vendors.

As for comparisons to Nvidia architectures at similar TDPs it seems to sit between:

Tesla P4 - 4.3 TFLOPS 50W
Tesla T4 - 8.1 TFLOPS (+euqal INT) 70W

So it doesn't seem to be much better than Pascal and lags behind Turing, despite the node advantage, which is the same pattern we see on higher TDP products...
 
This launch of a HBM2-backed Apple exclusive midrange makes on wonder. Is HBM2 still expensive? I guess it is, since it has appeared only on "exclusive" or highend products.

AMD's plans for a HBM2-based mainstream Vega models sound so weird.
 
I don't know why it should be taken in special consideration. It's just an underclocked/undervolted part like we've seen plenty of times before, from both vendors.

As for comparisons to Nvidia architectures at similar TDPs it seems to sit between:

Tesla P4 - 4.3 TFLOPS 50W
Tesla T4 - 8.1 TFLOPS (+euqal INT) 70W

So it doesn't seem to be much better than Pascal and lags behind Turing, despite the node advantage, which is the same pattern we see on higher TDP products...
I commented on what was going on at rx5700xt release and which IMO has been taking place with embarrasing frequency here - that very strong language has been used to describe tiny percentages, and/or all conclusions drawn from a single data point. Which would draw critisism if found in a middle school science project, and really has no place here.
(And incidentally, there is nothing that says that two different architectures on different processes will respond identically on a frequency/power curve, in fact the differences has been shown in graphs repeatedly. So when you have a single data point, it makes sense to be very non-comittal in your language about these things. And if you’re not, what does it say about your motives?)

As an aside, this implementation will differ in its performance profile as well, since it improves bandwidth/FLOP significantly over its siblings.
 
So what's the fundamental change in conclusion now that we have two datapoints each?
They aren’t terribly comparable though, are they?
The new Navi GPU is an alternate take. Still performance oriented, but with power draw as a much stronger design criterion. I would love to have something like this in a desktop computer that was like a half height Xsx. Or the Apple G4 cube, if you will. Rather than having a desktop computer that is essentially the same as the one I built for Quake3 25 years ago, only now with 8 times the power draw. Yuck. Todays desktop computers feel like the American muscle cars of the sixties. Yeah, they do have the Tyrannosaurus Rexes in terms of power but they are still an evolutionary dead end.
 
Are there any Navi 10 die shot? HBM2 controller would be visible if it's the same chip and not Navi 12.

The specifications for the MacBook GPU new match what is known about Navi12 (CU count, memory controller count) from benchmark/diagnostic leaks.
Navi12 is definitely a separate chip because it has a slightly different compiler target supporting more instructions than Navi10 (https://forum.beyond3d.com/threads/...nd-discussion-2019.61042/page-90#post-2103539).
 
They aren’t terribly comparable though, are they?
Well, we now have a take of Turing and Navi from runing things near (or past) their respective sweet spots in the performance oriented desktop version as well as variants that have to cram as much compute and memory bandwidth in a comparatively limited power envelope.

Personally, I'm not seeing fundamentally different conclusions architecture wise. What's amiss though is sustained performance numbers with the power constrained chips. Without them, I can only derive T4 is about 50 % more TFLOPS at 40-ish % higher power than the Radeon Pro 5600M. But probably that's a very short term or rather a very serial boost number on T4. Under load, I expect frequencies to drop down way more significantly than with the Radeon part.
 
I commented on what was going on at rx5700xt release and which IMO has been taking place with embarrasing frequency here - that very strong language has been used to describe tiny percentages, and/or all conclusions drawn from a single data point. Which would draw critisism if found in a middle school science project, and really has no place here.
(And incidentally, there is nothing that says that two different architectures on different processes will respond identically on a frequency/power curve, in fact the differences has been shown in graphs repeatedly. So when you have a single data point, it makes sense to be very non-comittal in your language about these things. And if you’re not, what does it say about your motives?)

As an aside, this implementation will differ in its performance profile as well, since it improves bandwidth/FLOP significantly over its siblings.

Yeah I do understand your point and I would have taken it more seriously if you just didn't end your post with the sentence that you did. It will differ? Surely you do have all the data points to back that up?

Also "there is nothing that says"? Nothing? Really trully nothing? LIke there's no science behind it or something?

See? Maybe it's not so much about how something is written as much as it is about understanding it in the context of being under an speculation thread. My two cents.

P.S. I do make an effort to always use non-comittal language in my post, but I understand when mistakes are made, and I make them myself. The fact that there's one very comitted claim (and another which is also close) in your post about the need to be cautious with language, speaks volumes IMO.
 
Some new patents applications specifically for RT from AMD

20200193681
MECHANISM FOR SUPPORTING DISCARD FUNCTIONALITY IN A RAY TRACING CONTEXT
Abstract
Described herein is a technique for performing ray tracing. According to this technique, instead of executing intersection and/or any hit shaders during traversal of an acceleration structure to determine the closest hit for a ray, an acceleration structure is fully traversed in an invocation of a shader program, and the closest intersection with a triangle is recorded in a data structure associated with the material of the triangle. Later, a scheduler launches waves by grouping together multiple data items associated with the same material. The rays processed by that wave are processed with a continuation ray, rather than the full original ray. A continuation ray starts from the previous point of intersection and extends in the direction of the original ray. These steps help counter divergence that would occur if a single shader program that inlined the intersection and any hit shaders were executed.


20200193682
MERGED DATA PATH FOR TRIANGLE AND BOX INTERSECTION TEST IN RAY TRACING
Abstract
Described herein is a merged data path unit that has elements that are configurable to switch between different instruction types. The merged data path unit is a pipelined unit that has multiple stages. Between different stages lie multiplexor layers that are configurable to route data from functional blocks of a prior stage to a subsequent stage. The manner in which the multiplexor layers are configured for a particular stage is based on the instruction type executed at that stage. In some implementations, the functional blocks in different stages are also configurable by the control unit to change the operations performed. Further, in some implementations, the control unit has sideband storage that stores data that "skips stages." An example of a merged data path used for performing a ray-triangle intersection test and a ray-box intersection test is also described herein.

20200193683
ROBUST RAY-TRIANGLE INTERSECTION
Abstract
A technique for classifying a ray tracing intersection with a triangle edge or vertex avoids either rendering holes or multiple hits of the same ray for different triangles. The technique employs a tie-breaking scheme in which certain types of edges are classified as hits and certain types of edges are classified as misses. The test is performed in a coordinate space that comprises a projection into the viewspace of the ray, and thus where the ray direction has a non-zero magnitude in one axis (e.g., z) but a zero magnitude in the two other axes. In this coordinate space, edges are classified as one of top, bottom, left, and right, and an intersection on an edge counts as a hit if the intersection hits a top or left edge, but a miss if the intersection hits a bottom or right edge. Vertices are processed in a related manner.

20200193684
EFFICIENT DATA PATH FOR RAY TRIANGLE INTERSECTION
Abstract
Described herein is a technique for performing ray-triangle intersection without a floating point division unit. A division unit would be useful for a straightforward implementation of a certain type of ray-triangle intersection test that is useful in ray tracing operations. This certain type of ray-triangle intersection test includes a step that transforms the coordinate system into the viewspace of the ray, thereby reducing the problem of intersection to one of 2D triangle rasterization. However, a straightforward implementation of this transformation requires floating point division, as the transformation utilizes a shear operation to set the coordinate system such that the magnitudes of the ray direction on two of the axes are zero. Instead of using the most straightforward implementation of this transform, the technique described herein scales the entire coordinate system by the magnitude of the ray direction in the axis that is the denominator of the shear ratio, removing division.


20200193685
WATER TIGHT RAY TRIANGLE INTERSECTION WITHOUT RESORTING TO DOUBLE PRECISION
Abstract
Described herein is a technique for performing ray-triangle intersection test in a manner that produces watertight results. The technique involves translating the coordinates of the triangle such that the origin is at the origin of the ray. The technique involves projecting the coordinate system into the viewspace of the ray. The technique then involves calculating barycentric coordinates and interpolating the barycentric coordinates to get a time of intersect. The signs of the barycentric coordinates indicate whether a hit occurs. The above calculations are performed with a non-directed floating point rounding mode to provide watertightness. A non-directed rounding mode is one in which the mantissa of a rounded number is rounded in a manner that is not dependent on the sign of the number.
 
A few days of looking over the PS5 footage, and Pramata is definitely the most interesting title in technical terms. So much raytracing the entire image is a noisy flicker; we've got GI, reflections, and who knows what else. And there's plenty of time for denoising to be added as it won't release for another two years. Still, it appears RDNA2 raytracing has some solid level of efficiency in implementation:

 
Status
Not open for further replies.
Back
Top