Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
Pretty sure data granularity is the size of fetches necessary to reach the full bandwidth on paper. When you need to read small chunks randomly, the actual bandwidth depends on the the I/O width times the prefetch size.

GDDR5: 32 bytes
GDDR5X: 64 bytes
GDDR6: 32 bytes
HBM: edit: actually 32 bytes

That means the worst case benchmark where you read small blocks completely randomly, HBM would end up 8 times slower. GDDR5X would be half. Obviously it doesn't happen like this in real world, because of the cache.
 
Last edited:
Sorry, link is more supposition than I recall, but conclusions make sense to me: https://www.pcgamesn.com/amd-7nm-vega-not-for-gaming
Seems to me the conclusion is about high-end gaming cards, where the money is higher in the pro sector. Nothing precludes a 7nm design for a mid-level part for a big customer.

Found this:
jpr_q3_2016_amd_vs_nvda_SHIPMENTS.png


So AMD is selling 4+ million GPUs a quarter (could be quite a bit more now), including lower-margin parts. Consoles could average ~8 million per quarter across two machines, or at least double their GPU unit sales for one machine.
 
Pretty sure data granularity is the size of fetches necessary to reach the full bandwidth on paper. When you need to read small chunks randomly, the actual bandwidth depends on the the I/O width times the prefetch size.

GDDR5: 32 bytes
GDDR5X: 64 bytes
GDDR6: 32 bytes
HBM: 256 bytes

That means the worst case benchmark where you read small blocks completely randomly, HBM would end up 8 times slower. GDDR5X would be half. Obviously it doesn't happen like this in real world, because of the cache.

Is it fool-hearted to assume GPU workloads are insanely high on spatial locality and this wouldn’t be an issue in actuality?

Seems to me the conclusion is about high-end gaming cards, where the money is higher in the pro sector. Nothing precludes a 7nm design for a mid-level part for a big customer.

Found this:
jpr_q3_2016_amd_vs_nvda_SHIPMENTS.png


So AMD is selling 4+ million GPUs a quarter (could be quite a bit more now), including lower-margin parts. Consoles could average ~8 million per quarter across two machines, or at least double their GPU unit sales for one machine.

I’m sure a large buy would help matters. I also suspect a HBM family solution would be based on Samsung’s new 10nm dies, which could help cost and power factors (contrasted to Aquabolt 20nm dies).
 
Last edited:
Is it fool-hearted to assume GPU workloads are insanely high on spatial locality and this wouldn’t be an issue in actuality?
Yeah but it needs more cache to increase the chances of hitting the rest of the fetch, and in an SoC maybe it will impact the CPU workload in a more dramatic way. I don't know, there must be a point where large granularity becomes a bigger problem.
 
Yeah but it needs more cache to increase the chances of hitting the rest of the fetch, and in an SoC maybe it will impact the CPU workload in a more dramatic way. I don't know, there must be a point where large granularity becomes a bigger problem.

I know that Polaris increased GPU cache sizes, and with the advent of HBCC, the GPU side is trying to cover all of the memory usage scenarios the GPU could see. Would the CPU solution to this just be a bigger L3 to minimize misses?
 
Compare that to being stuck with 12 memory chips around the SoC.
Hm, didn't the DDR3 for the original bone take up a rather similar amount of boardspace?

Also, surely PCBs are not hugely expensive.

That means the worst case benchmark where you read small blocks completely randomly, HBM would end up 8 times slower.
Is that really the case if one also considers the huge number of memory channels offered by HBM, and # of simultaneously open pages etc, and not just burst length?
 
Hm, didn't the DDR3 for the original bone take up a rather similar amount of boardspace?

Also, surely PCBs are not hugely expensive.


Is that really the case if one also considers the huge number of memory channels offered by HBM, and # of simultaneously open pages etc, and not just burst length?
... What gubbi said... :D

Obviously it's not a big deal in practice, that's what the cache is for... but I think nvidia said they needed the granularity to at least match cache lines, which is 32 bytes? I guess?

I can't find it.
 
New Navi and post-Navi rumors from Wccf:

https://wccftech.com/rumor-amd-navi-mainstream-gpu-to-have-gtx-1080-class-performance/

Summary:
  • Navi mid-range will compete with 1080 on performance. Not focused on high-end due to early 7nm part.
  • Multi-die GPUs are coming, unclear whether it’s Navi or post-Navi (Navi’s focus on scalability). Inspired by Zen/Threadripper
  • They link to murky PS5 timeline mentioned in the Kotaku article given the flux of AMD’s plans.
 
There is always hope the integration cost of HBM will get a breakthrough. There was a lot of buzz about organic interposers costing a fraction of silicon ones. The "low cost" HBM proposal was designed with that in mind.

At that point I want 4 stacks :runaway:

I've wondered about this a few times in relation to low cost HBM and its stated bandwidth of 200GB/sec. For it to be competitive with GDDR6 on a 384 bit bus, it requires 4 stacks.

Assuming the same capacity, would 4 stacks of low cost HBM be cheaper than 2 stacks of HBM2 or 3?
 
If Navi is 1080 level and tiny then sticking 4of those in same package would be pretty damn impressive. Maybe 1 hbm stack per gpu?
 
New Navi and post-Navi rumors from Wccf:

https://wccftech.com/rumor-amd-navi-mainstream-gpu-to-have-gtx-1080-class-performance/

Summary:
  • Navi mid-range will compete with 1080 on performance. Not focused on high-end due to early 7nm part.
  • Multi-die GPUs are coming, unclear whether it’s Navi or post-Navi (Navi’s focus on scalability). Inspired by Zen/Threadripper
  • They link to murky PS5 timeline mentioned in the Kotaku article given the flux of AMD’s plans.

These rumors would actually make it more likely that PS5 is Navi based. Navi will be optimized for performance/watt and performance/mm if it's targeting mainstream tiers of performance. It will likely use GDDR6 as well ("nextgen" memory).
 
When was the last time AMD delivered a new architecture with a decent performance/watt?....don't hold your breath over Navi.
 
I've wondered about this a few times in relation to low cost HBM and its stated bandwidth of 200GB/sec. For it to be competitive with GDDR6 on a 384 bit bus, it requires 4 stacks.

Assuming the same capacity, would 4 stacks of low cost HBM be cheaper than 2 stacks of HBM2 or 3?
No idea, they were very vague about cost.... but from their presentation there are many features of the low cost version pointing to a much lower risk with integration by more than a factor or two: it has half the width per chip (so much lower number of bumps), the pitch between bumps is less agressive (which allows suposedly an inexpensive organic interposer and better yield because the alignment isn't as crazy), and it oesn't need an additional interface die on each stack.

Since the entire stack needs the TSVs and microbumps at each layer, there is a big gain. Two 8hi HBM2/3 would still have a massive increase in integration cost and yield risk than four 4hi low cost hbm. Because each die interface to the other needs half the number of tsv and microbumps despite needing twice the stacks.
 
If Navi is 1080 level and tiny then sticking 4of those in same package would be pretty damn impressive. Maybe 1 hbm stack per gpu?
Problem is, "no one" could pay them because they are immediately sold out because of crypto currencies.
But 1080 level of performance with a 7nm process would make sense for the mid-range. But that would mean that the high-end card is "only" at 1080TI level of performance. That's not that much. And if AMD is using 7nm, NVidia will, too. So nothing won here.

But I really think it is much to soon for a next gen console. Only another mid-gen refresh would make sense if there is nothing revolutionary new. Even if the next console has 12TF, even at 7nm you have a power consumption like xbox one x but you won't see that much difference because of diminishing returns. The difference to last gen's mid-gen is not big enough for a new gen.
Well but maybe we just get another mid-gen console that is just called PS5, where the games are fully compatible up and down over two "generations".
 
To me 2019 is right for a PS5 from a business POV, if NAVI is not there it is technically not doable.
Sony needs better IP to build a system that is better overall than the XB1X. Nowadays AMD GPU does not allow that, Sony has stuck to 399€ for the launch of its systems (and +/- no loss) and it is imo a good choice, they need more competitive silicon: more perfs per Watts, per mm2, per GB/s => per €€€ overall.
 
The difference to last gen's mid-gen is not big enough for a new gen.
Well I disagree, both the PS4 PRO and the XB1X brings a lot compared to initial release. It comes in many shades though it does not show equally on both systems due to software. For example had the PS3 been less exotic Sony could offer BC on the PS4 and the PRO would also allow for enhancements. People are still foremost playing on the basic PS4 and XB1(s) hence newer system with XB1X lvl of perf as a basis is a clear leap forward.
As soon as Navy is available (or other competing offering) there is shot for a new round of products, imo a new PS4 Ultra or XB1 XYZ would pushing it, it is better to call it a new gen and keep the incremental approach going.
 
Snippets from article...

  • Don't expect a 'stratospheric leap' - PS5 is the continuation of what's been done before: more visual comfort but not a game-changing revolution - However, this will have advantages as to backwards compatibility or price
  • Launch window: Games currently being developed for the PS5 have been given an 18-month goal (for development time), which means late 2019-early 2020
  • Final dev kits arent being shared with developers yet, at least not third-party devs
  • They're working on powerful PCs with specs roughly equivalent to the PS5
 
Status
Not open for further replies.
Back
Top