The only thing I would add to this is that you can also have a CU bottleneck as well. If you have a workload that is significantly larger than the number of available ALU, you have a compute deficiency that cannot be rectified by bandwidth.
So for extreme examples in this case 1 CU @ 78Ghz vs 36 CU @ 2.3Ghz. The latter will outperform the former at the same bandwidth and memory setup. This is because the former has to make 36x more write trips and 36x more read trips than the latter. So while the bandwidth is available there isn't enough CU's to take advantage of it.
So you could have 1 TB of bandwidth but a single CU is only capable of requesting so much data before it's full. You can process it fast sure, but requesting data and writing data is likely the slowest part of the process here because latency becomes a factor the more times you make requests. We traditional hide latency by introducing more threads, but once again there is a limit to that as well.
I definitely don't think having a 36x faster front end on the graphics side will make up for the number of memory trips later down the pipeline.
1 CU will need to make 36x requests vs the 36CU 1 request to fulfill the same amount of work. You can eventually extrapolate this to other items over time.
The reason why we don't see things like 80CUs and such have a huge lift over smaller CUs is likely because the workload just hasn't been large enough where you see the smaller ALUs combined with smaller caches fall off a cliff.
It's not always linear and often more than not, most things run very well until you reach a workload point in which you break the camels back and things get progressively worse running.
Yes. In a nutshell, many people don't realize that every architecture out there is going to be limited at X time by Y thing. Until such time as there are unlimited resources and capabilities on some piece of hardware, there will always be times when it's limited by some bit of it's architecture.
It's just that when something is the fastest piece of hardware, you don't think about the times when it hits it's limitations because it's human nature to assume it's not limited because it's the fastest thing on the market. But anytime some part of the hardware is idling, it means another part of the hardware is operating at its limits, thus limiting the overall performance of the hardware
at that point.
The practical holy grail of hardware isn't to design something with no limits, but to design something where each piece can be the limiting factor at some point. IE - the HD 2900 XT wasn't a good design because it's bandwidth was never a limitation (thus transistors could have been better used for something else) for any real world use.
Now when comparing 2 pieces of hardware, the interesting things is to tease out how one architecture might be limited by X feature in Y situations versus another architecture. Unfortunately, often a lot of noise comes in with partisan comments that this means one architecture is overall better than another because of that, when that isn't necessarily the case at all.
Just because one system might be slightly better at RT doesn't suddenly make the other architecture not good. Just because one arch has a lower clock speed doesn't make it worse. Just because one arch has more CU doesn't mean the other arch is bad. Etc., etc.
If someone can't acknowledge when their arch might have a limitation that another arch is less limited by, then there's no way they can fairly judge different architectures. Likewise if they can never admit that another arch than the one they like is better in some areas, the same problem arises.
Of course, in a good technical discourse, there will always be a back and forth about the relative strengths or weaknesses and how that impacts the overall performance of an arch or even potential discussion about whether or not some feature is a weakness or a strength with evidence provided by real world applications after a product has spent enough time on the market.
It's unfortunate that I sometimes see too much of X is better than Y because ... limited data. It's still early in the product cycle. Each arch has been on the market for less than a year. Very little software has been written to utilize the features of either product. Yet, some are already making claims that X is better at Y thing on such limited data.
That said, I do appreciate all the people that keep an open mind and attempt to steer the discourse into talking about why A might be better than B in S product doing X, Y, or Z thing. Is it the hardware? Is it the software? Is it the development environment? Is it something non-obvious? Is it the skill level of the developer? Is it the time spent on A or B arch? Etc.
It's also a little frustrating if someone keeps point out that X thing is true because this is how that person interprets Cerny's words yet at the same time dismisses anything Andrew Goosen might have said about the arch he helped create. Likewise, going the other way around, pointing out things Goosen said while ignoring things that Cerny said.
Also, if someone is going to go through a video frame by frame to find places where X arch is doing something better than Y arch? Your argument will be stronger if you also point out those frames where Y arch is doing something better than X arch. Otherwise, you'll often come out looking partisan when that may or may not be your intention. There's been a lot of screenshots posted here attempting to show that X arch is better or worse, and then later having someone else come in and show that the opposite is true depending on what frame or screenshot that person cherry picked to show some alleged superiority or inferiority. Or at the very least while you're looking for proof that your arch is definitely better, at least try to make sure that there isn't also evidence in the footage of your favorite arch doing the exact same thing?
People, it's not the end of the world if your system of preference is slightly worse or slightly better at this or that.
The fun is in looking at what it happening and trying to tease out any details we can from it.
Bleh, this turned out a lot longer than I intended. Perhaps a side-effect of my not wanting to put people on ignore.
Regards,
SB