So the diagrams in this document are interesting because the output of Hull Shader and Tessellator are synchronisation/data-transfer workloads between two APDs (accelerated processing devices). This is the "gnarly" part of a multi-chiplet architecture because down-stream function blocks determine which chiplet will use which chunks of output produced by HS or TS. The routing of the work is "late", when screen-space is used to determine workload apportionment.
A primary processing unit includes queues configured to store commands prior to execution in corresponding pipelines. The primary processing unit also includes a first table configured to store entries indicating dependencies between commands that are to be executed on different ones of a plurality of processing units that include the primary processing unit and one or more secondary processing units. The primary processing unit also includes a scheduler configured to release commands in response to resolution of the dependencies. In some cases, a first one of the secondary processing units schedules the first command for execution in response to resolution of a dependency on a second command executing in a second one of the secondary processing units. The second one of the secondary processing units notifies the primary processing unit in response to completing execution of the second command.
Systems, devices, and methods for direct memory access. A system direct memory access (SDMA) device disposed on a processor die sends a message which includes physical addresses of a source buffer and a destination buffer, and a size of a data transfer, to a data fabric device. The data fabric device sends an instruction which includes the physical addresses of the source and destination buffer, and the size of the data transfer, to first agent devices. Each of the first agent devices reads a portion of the source buffer from a memory device at the physical address of the source buffer. Each of the first agent devices sends the portion of the source buffer to one of second agent devices. Each of the second agent devices writes the portion of the source buffer to the destination buffer.
BOUNDING VOLUME HIERARCHY TRAVERSAL
A technique for performing ray tracing operations is provided. The technique includes initiating bounding volume hierarchy traversal for a ray against geometry represented by a bounding volume hierarchy; identifying multiple nodes of the bonding volume hierarchy for concurrent intersection tests; and performing operations for the concurrent intersection tests concurrently.
120 per GCD, but they're 30WGP and you should count them as such.Are we looking at chiplets with 90CUs or more?
No, those are discrete blobs attached to the MCDs.256MB of Infinity Cache per chiplet?
Well technically yes but magic abound here.And in the middle of it all, 256bit GDDR6 sounds almost inadequate
120 per GCD, but they're 30WGP and you should count them as such.
So there's no LLC in the gaphics core dies?No, those are discrete blobs attached to the MCDs.
Some stuff from known leakers has been appearing about RDNA3, most of them confirming what @Bondrewd has been saying.
Wccftech made a compilation of the tweets, but I'll leave them here.
Navi 31 really does look like a monster on the larger SKU.
Are we looking at chiplets with 90CUs or more? 256MB of Infinity Cache per chiplet?
Or maybe it's the same 128MB per chiplet, but with an optional 128MB V-cache underneath. The 120CU SKU has no V-cache, but the fully enabled 180CU version does.
And in the middle of it all, 256bit GDDR6 sounds almost inadequate.. except of course for the massive cache amounts.
Regardless, these are exciting times ahead!
There's no "CU" anymore.Each WGP has 4 CUs in RDNA3?
240 the old ways?So the 2*GCD part has 180 CUs but could actually go up to 240?
None of, yes.So there's no LLC in the gaphics core dies?
Graphics Compute Die?
Graphics Core Die?
exciting times really. I recall the day chiplets came to CPUs and shortly after leadership positions reversed on price/performance against Intel.According to AMD CCD is "Core Complex Die" so GCD should be "Graphic(s) Complex Die"
nnnnnnnnnope.it's going to have significant price/performance.
Yeah, later (think late'23 timeline and all).Curious to see if it can come to APU form factors in the future.
Greymon55 is bascially Broly_X1.
If RDNA3 follows the patent we saw some time ago, IC is on the package, and acts also as a high-bandwidth interconnect between the modules (which are seen as a single GPU, load-balancing signals should be passed in the same way). <Of course there would be quite some cache on the dies, too, but it would not be "LLC".
It should be on MCD.No, those are discrete blobs attached to the MCDs.