Quoting from the message:
“IF the scale is right...”
When there is an IF there is a doubt, so, no one is claiming that is correct. The point was just to show that AMD expects big gains, and an illustration picture, if that is trully what she is (and we only have your oppinion on that, although I could very well agree) could very well point that, even if not accurate.
But personally, I would not state that the image is a mere illustration without any intent of precision (even if only a prediction at the time). Because I do know that for a fact.
I would also not say that the chart beeing old makes it stupid. If she is misleading in what AMD expects from Navi (or at least expected), then she would be stupid. But the mere fact of beeing old... just makes it old.
I think the intention of the graphic may be more related to the context in which it came up: as marketing, and possibly more for investors more than gamers.
It was intended to make people believe something--that AMD's graphics products had a timely path to significant progress and competitiveness.
Its lack of clarity about what it was measuring, what specific products represented the data points, and debatable placement on the Y and X axes does not point to an intent to be accurate or informative.
Even at the time, attempts to analyze the implications of the chart and be charitable in predicting what data points AMD was using without saying did not really give a healthy picture versus the competition .
The timing for 28nm products could likely be counting some of the less impressive 28nm chips like the early Tonga desktop cards (definitely not the HBM-based Fury Nano). The Polaris data point doesn't seem to have been the initially troubled rollout of Polaris 10. I've seen some speculation that this needs to use a mobile Polaris SKU of some kind to get enough Y distance between the nodes. That's not comparing products with similar use cases, which I wouldn't say is evidence of an intent to inform.
Even at the time the image came out, it was getting kind of rough reconciling the launch dates in reality versus the graphic, and it's basically meaningless on that axis now.
Overall, if the idea is to lead people to make a
favorable prediction, that might be more what AMD wanted. It didn't offer enough to do much else and what we've seen since has left little more than the wishful thinking at the time in evidence.
Is this possible for the Next Gen?
Thanks for your time.
It's a pretty simple set of boxes and labels, which doesn't say much about what they do or what can be expected of them.
One little thing is that the L1 I$ just has arrows feeding into it, which given the rules set down by the diagram doesn't seem to indicate it can do anything.
By asking if this is possible for a future design, it's low-detail and it can be argued that it's not impossible--so long as the definition and behaviors of the various blocks are changed in undisclosed ways. There's some interpretation that could be dreamed up that isn't outright broken, probably.
What has been omitted, like a decent portion of the geometry front end that would have been alongside the DSBR and primitive assembler, the workload distributor, shader launch hardware, etc., could be present but not drawn. Or is there an unspoken claim of software replacement?
The ratio of front-end hardware to compute is notably high per GCX versus what's been in most shader engines. It's not outright impossible, but from drivers and other programming discussions I think the minimum amount of geometry wavefront allocation would likely seriously impede pixel and compute progress.
There's a bidirectional arrow between each CU and its L1D, which is fine I guess, though I'm not sure what having that arrow between a CU and its own subunit tells us. Unless there's an unspoken claim about separating the L1 from the CU.
The RBE's in a strange place, going by what what we know they do in current architecture. That doesn't rule out an unspoken claim about changing the behavior significantly. However, where it lies is an odd position. Current designs have the RBE on a dedicated data path that CUs arbitrate access to and put data on from their vector registers. Some patents that may concern themselves with a next-gen GPU still have the RBEs on some kind of vector export path. In this diagram, the L1s are in the way. It is possible to dream up a way to make this work, but we have little basis for it.
The diagram has the caches and RBE feeding into what seems to be an IF block, which again isn't impossible if a lot of changes were made to everything involved, but that's speculation with little basis and probably a host of problems.
I suppose that IF block is part of the data fabric block in the lower part of the diagram, which goes to the L2, which goes to the memory controller.
I would say that if any of these elements are similar to what we know now, there would likely be serious problems.
One issue is that while this diagram doesn't need to be accurate as to the exact widths of data paths or links, there's a significant number of connections in the upper half versus what becomes almost a flow chart in the bottom half.
I think that if we took the arrows literally, there'd be dozens of caches going into a fabric that then has one link to an L2.
I'd say that if we're extrapolating from current designs, the IF is not in a good place. The L1s are not coherent with one another, so a good portion of the fabric's capability is not wanted between the L1s and L2. The RBE is not coherent at all, and in Vega plugs into the L2 rather than into a coherent fabric. In this diagram, the bandwidths involved are unclear, though it seems like there would be serious contention for bandwidth within a GCX and a straw for data to get out of the L2.
The front-end hardware in the GCX may generate data relevant to other GCX blocks, unless there's an unspoken claim about how the front-end communications work in this design, and they're now broadcasting into that nest of L1s and data fabric blocks. The L2 sort of loses its place as a coherency point if the fabric is on the wrong side of it, and now there's no fabric linking the L2 to anything else like other hardware blocks or the system at large.
IF in Vega links up a set of memory controllers and some number of L2 stops, and it's already noticeable in how much area it takes up. Linking up as many clients as there are CUs, RBEs, and other caches significantly overloads what we'd consider a known implementation of IF, unless the clients see much more starvation than they do currently.
Vega 10's IF section is not overwhelmingly large, but in comparing Vega 10 to earlier non-IF GPUs it seems like it takes up a lot more area versus what linked the L2s to the memory controllers before. Vega 20 has a lot of non-GPU area, and possibly an area all around the die that's just the IF mesh. This proposal may make the IF take up a lot of area, and barring some proposed change in the behavior of the sub-units, many of them do not want to make use of much of its features.