D
Deleted member 13524
Guest
Of which a decent chunk is then pocketed by Apple...
And OEMs don't take a share for the discrete graphics cards they assemble, certify, test and distribute?
Of which a decent chunk is then pocketed by Apple...
They did say something: https://www.amd.com/en/press-releas...600m-mobile-gpu-brings-desktop-class-graphicsSo it's a 40CU / 20 WGP GPU clocked at 1GHz and two very low clocked HBM2 stacks, on a 50W TDP.
Apple just updates its webpage out of the blue and AMD says nothing.
Sure but they don't take a % of the asking price by Apple, they take a fixed amount negotiated far in advance which might be an ok amount but nothing close to the absurd profits Apple demands just because.And OEMs don't take a share for the discrete graphics cards they assemble, certify, test and distribute?
I'm pretty sure AMD didn't design exclusive chip for Apple for free, which needs to be taken into account too, not just the price AMD gets per each chipSure but they don't take a % of the asking price by Apple, they take a fixed amount negotiated far in advance which might be an ok amount but nothing close to the absurd profits Apple demands just because.
So we're comparing a percentage we don't really know to a percentage we don't really know?Sure but they don't take a % of the asking price by Apple, they take a fixed amount negotiated far in advance which might be an ok amount but nothing close to the absurd profits Apple demands just because.
A major quality of this release is that it demonstrates the folly of ascribing a power efficieny metric to an architecture based on a single implementation. The performance per Watt of this little fellow is pretty damn good.
One might hope that this is kept in mind once Ampere and RDNA2 products are compared.
Yes but demand is the big player here.Is HBM2 still expensive?
Yeah where volumes are small so you don't have to fight all the HPC and networking horseshit over KGSD capacity.since it has appeared only on "exclusive" or highend products
I commented on what was going on at rx5700xt release and which IMO has been taking place with embarrasing frequency here - that very strong language has been used to describe tiny percentages, and/or all conclusions drawn from a single data point. Which would draw critisism if found in a middle school science project, and really has no place here.I don't know why it should be taken in special consideration. It's just an underclocked/undervolted part like we've seen plenty of times before, from both vendors.
As for comparisons to Nvidia architectures at similar TDPs it seems to sit between:
Tesla P4 - 4.3 TFLOPS 50W
Tesla T4 - 8.1 TFLOPS (+euqal INT) 70W
So it doesn't seem to be much better than Pascal and lags behind Turing, despite the node advantage, which is the same pattern we see on higher TDP products...
They aren’t terribly comparable though, are they?So what's the fundamental change in conclusion now that we have two datapoints each?
Are there any Navi 10 die shot? HBM2 controller would be visible if it's the same chip and not Navi 12.
Well, we now have a take of Turing and Navi from runing things near (or past) their respective sweet spots in the performance oriented desktop version as well as variants that have to cram as much compute and memory bandwidth in a comparatively limited power envelope.They aren’t terribly comparable though, are they?
I commented on what was going on at rx5700xt release and which IMO has been taking place with embarrasing frequency here - that very strong language has been used to describe tiny percentages, and/or all conclusions drawn from a single data point. Which would draw critisism if found in a middle school science project, and really has no place here.
(And incidentally, there is nothing that says that two different architectures on different processes will respond identically on a frequency/power curve, in fact the differences has been shown in graphs repeatedly. So when you have a single data point, it makes sense to be very non-comittal in your language about these things. And if you’re not, what does it say about your motives?)
As an aside, this implementation will differ in its performance profile as well, since it improves bandwidth/FLOP significantly over its siblings.
Abstract
Described herein is a technique for performing ray tracing. According to this technique, instead of executing intersection and/or any hit shaders during traversal of an acceleration structure to determine the closest hit for a ray, an acceleration structure is fully traversed in an invocation of a shader program, and the closest intersection with a triangle is recorded in a data structure associated with the material of the triangle. Later, a scheduler launches waves by grouping together multiple data items associated with the same material. The rays processed by that wave are processed with a continuation ray, rather than the full original ray. A continuation ray starts from the previous point of intersection and extends in the direction of the original ray. These steps help counter divergence that would occur if a single shader program that inlined the intersection and any hit shaders were executed.
Abstract
Described herein is a merged data path unit that has elements that are configurable to switch between different instruction types. The merged data path unit is a pipelined unit that has multiple stages. Between different stages lie multiplexor layers that are configurable to route data from functional blocks of a prior stage to a subsequent stage. The manner in which the multiplexor layers are configured for a particular stage is based on the instruction type executed at that stage. In some implementations, the functional blocks in different stages are also configurable by the control unit to change the operations performed. Further, in some implementations, the control unit has sideband storage that stores data that "skips stages." An example of a merged data path used for performing a ray-triangle intersection test and a ray-box intersection test is also described herein.
Abstract
A technique for classifying a ray tracing intersection with a triangle edge or vertex avoids either rendering holes or multiple hits of the same ray for different triangles. The technique employs a tie-breaking scheme in which certain types of edges are classified as hits and certain types of edges are classified as misses. The test is performed in a coordinate space that comprises a projection into the viewspace of the ray, and thus where the ray direction has a non-zero magnitude in one axis (e.g., z) but a zero magnitude in the two other axes. In this coordinate space, edges are classified as one of top, bottom, left, and right, and an intersection on an edge counts as a hit if the intersection hits a top or left edge, but a miss if the intersection hits a bottom or right edge. Vertices are processed in a related manner.
Abstract
Described herein is a technique for performing ray-triangle intersection without a floating point division unit. A division unit would be useful for a straightforward implementation of a certain type of ray-triangle intersection test that is useful in ray tracing operations. This certain type of ray-triangle intersection test includes a step that transforms the coordinate system into the viewspace of the ray, thereby reducing the problem of intersection to one of 2D triangle rasterization. However, a straightforward implementation of this transformation requires floating point division, as the transformation utilizes a shear operation to set the coordinate system such that the magnitudes of the ray direction on two of the axes are zero. Instead of using the most straightforward implementation of this transform, the technique described herein scales the entire coordinate system by the magnitude of the ray direction in the axis that is the denominator of the shear ratio, removing division.
Abstract
Described herein is a technique for performing ray-triangle intersection test in a manner that produces watertight results. The technique involves translating the coordinates of the triangle such that the origin is at the origin of the ray. The technique involves projecting the coordinate system into the viewspace of the ray. The technique then involves calculating barycentric coordinates and interpolating the barycentric coordinates to get a time of intersect. The signs of the barycentric coordinates indicate whether a hit occurs. The above calculations are performed with a non-directed floating point rounding mode to provide watertightness. A non-directed rounding mode is one in which the mantissa of a rounded number is rounded in a manner that is not dependent on the sign of the number.
Yeah it's a very green'n'mean setup there (same for Ampere more or less), cheap denoising is the next frontier now.Still, it appears RDNA2 raytracing has some solid level of efficiency in implementation