CaptainGinger
Newcomer
Something without HBM2?
Vega is not an IMC.
Something without HBM2?
Everything else AMD is releasing is and research from AMD and Nvidia support the idea. GPUs scale far more easily than CPUs.Why is everyone so convinced Navi is a multi chiplet solution?
Just treat each stack/PIM as an independent cache and duplicate vertex data with a paging mechanism from system memory. Same idea as HBCC where only ~3% of the frame changes each iteration. Any modifications can be brute forced from there with heavy frustum culling. Something which primitive shaders have been suggested to be good at.While I agree that binned rasterisation is a task that would be perfect for the base die of a PIM module ...
... vertex data is spread across all memory channels. There's no way to avoid having communication amongst PIMs in this case. And, to be frank, vertex data (pre-tessellation) is not a huge bandwidth monster.
That's the point... if we assume the application programmer optimized the order of vertex's/index's for the post and pre transform caches the majority of triangles can be formed with data local to one chiplet. This will most likely require more complicated vertex buffering but the point is most vertex data and work associated with them and the triangle data can be kept local.... vertex data is spread across all memory channels.
Its not really about the vertex data though, although it is a bonus along with providing work for all chiplets. The real point would be doing things this way minimizes transfers that have to do with the output of the rasterizer. If you don't batch visibility all at once overdraw bandwidth would consume 'extra' bandwidth.And, to be frank, vertex data (pre-tessellation) is not a huge bandwidth monster.
What kind of additional latency do you guys expect from inter-chip(let) connections anyway? It's not like with 3D Rendering in Cinema 4D or Blender, where you have a nice sorting up front and then much much rendering happening in tiny tiles.
I haven't heard whether HBM low-cost is confirmed, since the last I saw Samsung was still shopping the idea around. HBM3 is apparently late 2019/2020, which I would need to reconcile with AMD's roadmap having Navi apparently earlier and Next Gen in that slot.Yays:
1 - Only relevant info we have about Navi is "scalability" and "next-gen memory". Next-gen memory can only be HBM3, HBM Low-Cost or GDDR6.
Vega 20 supposedly has xGMI, which would be an off-package bus running over a PCIe physical interface. There can be use cases for that in compute nodes or servers, although if used in the client space it's potentially more able to accelerate resource moves prompted by the copy queue or AMD's existing transfer over PCIe capabilities.4 - Vega already has Infinity Fabric in it with no good given reason, so they could be testing the waters for implementing a high-speed inter-GPU bus.
Assuming the Polaris and Carrizo refreshes as insignificant changes, and that the shrinks for the Xbox One S and PS4 Slim aren't big enough changes despite being different chips. There's Xbox One X, PS4 Pro, Vega 10, Raven Ridge, and the Intel custom chip.5 - AMD doesn't have the R&D manpower and execution capatility to release 4 distinct graphics chips every 18 months, so this could be their only chance at competing with nvidia on several fronts.
It technically could, but the sort of overhead AMD documented for EPYC would lead to more die area and power efficiency lost to the attempt than if they hadn't bothered.Nays:
1 - Infinity Fabric in Threadripper's/EPYC's current form doesn't provide enough bandwidth for a multi-chip GPU.
Then there's the part when the head of RTG was asked about transparently integrated multiple GPUs, and he said he didn't want that.3 - Multi-chip GPU is probably really hard to make, and some like to think AMD doesn't do hard things. Ever.
I think the more useful interpretation is that Nvidia gave a reasonable bare minimum for what has to be done for any such solution to be adequate (I think even that is optimistic for what people expect), and even then it only discussed things in terms of compute.4 - nvidia released a paper describing a multi-GPU interconnect that would be faster and consume less power-per-transferred-bit than Infinity Fabric, and some people think this is grounds for nvidia being the first in the market with a multi-chip GPU. Meaning erm.. Navi can't be first.
The filing date is June 2016. There's usually a delay between filing and when a feature shows up in a product, if it does. For example the hybrid rasterizer for Vega had an initial filing in March of 2013.That's an interesting patent, I wonder if that's for Navi:
System and method for using virtual vector register files:
The items that AMD discloses for GPU chiplets talk about them being paired with memory standards 2 generations beyond HBM2. Should I only take every other sentence AMD says as evidence and ignore the ones that contradict my desired outcome?Everything else AMD is releasing is and research from AMD and Nvidia support the idea.
One would expect that AMD prioritized its console contracts (IE: PS4 Pro, Bone X) ahead of its own graphics line-up. Anecdotal evidence seems to back that assumption up...Given the quality of Vega's rollout, I'll grant that it's apparently not able to roll them out very well.
My contention is with your original sentence where you say "all work" ([...] they localize all work up to and including rasterization to that chiplet). Now you're saying "not all work". So, uhuh, we agree.That's the point... if we assume the application programmer optimized the order of vertex's/index's for the post and pre transform caches the majority of triangles can be formed with data local to one chiplet.
If we could have kernels with no pre-defined limit on the register allocation, oh boy that would be so sweet.That's an interesting patent, I wonder if that's for Navi:
I guess I wasn't clear in my first post. I guess I should have said "they localize all work that can be generated from the local data up to and including rasterization". But I thought my pointing out the numa thing and the striping methodology right before that statement alluded to that meaning. Sorry for the confusion.My contention is with your original sentence where you say "all work" ([...] they localize all work up to and including rasterization to that chiplet). Now you're saying "not all work". So, uhuh, we agree.
Lower cost, assuming Intel's EMIB is analogous to not using an interposer.I haven't heard whether HBM low-cost is confirmed,
Yes, because not all concepts in a research paper will make the final cut and technology changes. To conserve energy source and destination need to be close. Even for scaling it makes more sense to tightly couple them, then add the ability to share data on top of that. Limit coherence to only the data where it really matters, which isn't most textures and untransformed geometry.The items that AMD discloses for GPU chiplets talk about them being paired with memory standards 2 generations beyond HBM2. Should I only take every other sentence AMD says as evidence and ignore the ones that contradict my desired outcome?
Split out by shader engine makes the most sense with 1, w, and 4 SE parts. Two binning passes and a leap towards 16 SEs across an Epyc/Ripper backplane may be doable.How will a Muilty Chip design looks like? They build 2-3 Complete Engines one a Die and combine 2-4 of thes Dies?
Or maybe they build a Frontend Chip than a Shader Chip and at least a Backendchip?
I haven't heard whether HBM low-cost is confirmed, since the last I saw Samsung was still shopping the idea around. HBM3 is apparently late 2019/2020, which I would need to reconcile with AMD's roadmap having Navi apparently earlier and Next Gen in that slot.
I don't think console chips count for 2 reasons:Assuming the Polaris and Carrizo refreshes as insignificant changes, and that the shrinks for the Xbox One S and PS4 Slim aren't big enough changes despite being different chips. There's Xbox One X, PS4 Pro, Vega 10, Raven Ridge, and the Intel custom chip.
I would say the console shrinks and subsequent distinct SOCs are evidence that AMD can find the means to roll out more than 4, if it wants to.
We've been through that before. Raja was specifically following up on a conversation about ending "Crossfire" (i.e. driver-ridden AFR that needs work per-game on AMD's side) and leaving multi-GPU in DX12 to game developers. Which is what they're progressively doing already.Then there's the part when the head of RTG was asked about transparently integrated multiple GPUs, and he said he didn't want that.
Like I said. It's hard. And AMD can't do hard things.The things they do want to do, however, are actually hard and likely not realizable until after 2020.
More than some ideas on a paper.What AMD has offered is?
You mean this is not what you're doing? Trying to invalidate all facts that point to "Yes" in order to prove your opinion of a "No"?Should I only take every other sentence AMD says as evidence and ignore the ones that contradict my desired outcome?
If we could have kernels with no pre-defined limit on the register allocation, oh boy that would be so sweet.
That would be different than what Samsung's variant is attempting. EMIB is structurally a small silicon bridge capable of having the same interconnect density as a silicon interposer. Samsung's reduced-cost memory drops the bus width so that it can avoid using silicon, and a memory standard captive to something only Intel seems to have would be questionable.Lower cost, assuming Intel's EMIB is analogous to not using an interposer.
GDDR6 would be a next-generation memory, at least compared to GDDR5. If compared to HBM2, the possibilities seem limited in terms an upgrade, like a form of HBM2 that lives up to its original specifications. The low-cost variant is a potential cost reduction, but would be somewhat inferiorThe slide is very clear: Navi in 2018 with "Nextgen Memory", after Vega with HBM2:
Perhaps it's HBM2.5 or HBM Low-cost, perhaps it's HBM2 using Intel's EMIB and given a different name, perhaps it's GDDR6 or perhaps it's HBM3 by SK-Hynix coming before Samsung's.
That's a balance that exists by AMD's choice in priorities, and doesn't take into account how much IP cross-pollination is going on.I don't think console chips count for 2 reasons:
1 - They're not developed solely by AMD, they're a joint venture between teams belonging to AMD and Sony/Microsoft.
2 - The teams who worked on PS4, PS4 Pro, XBone and Xbone X are probably working on the next gen already.
Polaris was "refreshed" in the last 18 months, which is what I wasn't counting. I considered the initial Polaris launch something of a borderline case, and had forgotten to add that to the count.Like you said, Polaris isn't a long way from a 14nm shrink of GFX8 architectures Tonga/Fiji.
That would be a refresh that I did not count, I'm not sure if the steppings changed from the end of one line to the start of the next.Carrizo is actually from 2015 but you probably meant Bristol Ridge which is practically Carrizo with Excavator v2 and the GPU was untouched.
Actually, I had blanked on the other Polaris chips as well, so add two more.So in practice, what we got was Polaris 10/11 in 2016 and Vega 10/11 + Polaris 12 in 2017. I remember Raja saying 2 distinct GPUs per year was just about what RTG could do..
Koduri's words concerning moving away from Crossfire was its abstracting of multiple GPUs as if they were a single GPU. Going forward with the new APIs, the intention was to involve and invest developers into the explicit management of the individual GPUs .We've been through that before. Raja was specifically following up on a conversation about ending "Crossfire" (i.e. driver-ridden AFR that needs work per-game on AMD's side) and leaving multi-GPU in DX12 to game developers. Which is what they're progressively doing already.
What exactly did he state in that interview that makes one think he was talking about multi-chip GPUs?
The technologies at AMD and its chain of manufacturing partners do not show a reasonable path to implementing 3D and 2.5D active interposers and chiplets that scale up and down the stack in this decade, no. Nor does it seem like its competitors are realistically positioned to do any better, although some have on occasion expressed skepticism on steps even earlier in AMD's chain of improvements.Like I said. It's hard. And AMD can't do hard things.
Is the "more" in this case slides for a CPU division product? I do not recall what the "more" is for AMD's GPUs, which I recall is rather vague and long-term.More than some ideas on a paper.
To quote someone's list: "2 - No official news or leaks about Navi have ever appeared that suggest it's a multi-chip solution."You mean this is not what you're doing? Trying to invalidate all facts that point to "Yes" in order to prove your opinion of a "No"?
Different, but reduce the number of pins from HBM2 and you're looking at GDDR. The only difference may be placing the memory close enough to avoid large drivers in the ICs. It avoids the large silicon interposer still, but has the small bridge to retail a high level of IO. Sort of a middle ground if you will. Your thinking was my original thinking as well, but in hindsight we may have been wrong. At the very least they would seem to be competing technologies unless HBM3 is far lower bandwidth or involves interesting signaling.That would be different than what Samsung's variant is attempting. EMIB is structurally a small silicon bridge capable of having the same interconnect density as a silicon interposer. Samsung's reduced-cost memory drops the bus width so that it can avoid using silicon, and a memory standard captive to something only Intel seems to have would be questionable.
The reduced-cost HBM was something Samsung was still making inquiries about customer interest.
I'm not thinking this as much as reading what Samsung stated.Different, but reduce the number of pins from HBM2 and you're looking at GDDR. The only difference may be placing the memory close enough to avoid large drivers in the ICs. It avoids the large silicon interposer still, but has the small bridge to retail a high level of IO. Sort of a middle ground if you will. Your thinking was my original thinking as well, but in hindsight we may have been wrong. At the very least they would seem to be competing technologies unless HBM3 is far lower bandwidth or involves interesting signaling.
Then there's the part when the head of RTG was asked about transparently integrated multiple GPUs, and he said he didn't want that.
I can admit to some skepticism for AMD's chance for implementing this, because I think they've been saying they don't want to do that.
The things they do want to do, however, are actually hard and likely not realizable until after 2020.
The filing date is June 2016. There's usually a delay between filing and when a feature shows up in a product, if it does. For example the hybrid rasterizer for Vega had an initial filing in March of 2013.
Vega's development pipeline may have had some unusual stalls in it, so we may need to come back to this to see when Navi or its successor is finalized and whether this method appears in it.
The items that AMD discloses for GPU chiplets talk about them being paired with memory standards 2 generations beyond HBM2. Should I only take every other sentence AMD says as evidence and ignore the ones that contradict my desired outcome?
If we could have kernels with no pre-defined limit on the register allocation, oh boy that would be so sweet.
If the pattern from the current generation holds, whatever architecture adopted by the console would be much closer to a new card launched in a similar time frame, possibly with a slight delay like with Bonaire. That's potentially something next gen to AMD's Next Gen. Navi hopefully shouldn't be the design under consideration by that point.Okay, what would be your guess and expectations for the next Xbox, if Microsoft is targeting a late 2021 release (4 years after X1X, 8 years after XB1) in terms of AMD GPU architecture, number of CUs and memory bandwidth, and would HBM3 be feasible by then?
How will a Muilty Chip design looks like? They build 2-3 Complete Engines one a Die and combine 2-4 of thes Dies?
Or maybe they build a Frontend Chip than a Shader Chip and at least a Backendchip?
"Nextgen Memory" is DiRAM4 Memory, maybe Navi will have this memory architecture I thinkThe slide is very clear: Navi in 2018 with "Nextgen Memory", after Vega with HBM2
That's not a JEDEC standard."Nextgen Memory" is DiRAM4 Memory, maybe Navi will have this memory architecture I think
https://tezzaron.com/applications/diram4-3d-memory/