AMD: Southern Islands (7*** series) Speculation/ Rumour Thread

For GPUs you want Super Wide I/O (an order of magnitude more than those being targeted for mobile) with the ability to daisy chain them through vertical stacking.

BTW, I noticed that the Amkor presentation about TSVs is simply on AMD's website :
http://sites.amd.com/la/Documents/TFE2011_001AMC.pdf

It's from TFE 2011, although the Amkor presentation isn't linked ... guess someone found the PDF any way by guessing the file name (not me, I just googled).

This one from Hynix looks semi-encouraging, but targeting only a 65% bandwidth improvement over GDDR5 is lame ... sounds to me that's mobile targeted as well and simply repurposed for GPUs. If that's the best we can do electrically I guess we will have to wait for silicon photonics for some real improvement.

Although I suspect a large part of that is the relative primitiveness of the interposer and TSV technology, they only seem to be used to replace the normal interposer/PCB ... not making use of the ability for the silicon interposer and intermediate ICs in a stack to boost/buffer/switch the signals.
 
Last edited by a moderator:
The slide after the slide you're apparently quoting describes 1024-wide I/O at 1Gbps versus 32-wide at 7Gbps. That's a factor of 4 plus, in favour of HMB (presuming that it's 1024-wide per memory module).

Amkor is describing 40 micrometre pitch now and a medium-term target of 20 micrometre pitch. For a "grid" of 32x32 I/Os interfacing with a module (or the GPU) the 40 micrometre pitch results in 1.24mm x 1.24mm of interposer area to support 1024 I/O pads.

So 1024 seems to me merely "entry-level" for this tech. It should scale much much higher, say 2 orders of magnitute. The actual limit would appear to be mesh/ring/bus organisation within the GPU (or CPU), as buses seem to be about 1024 bits currently.

It's also interesting to see an "APU configuration" in the Amkor deck with a purely vertical stacking, not side-by-side.
 
Misread the system performance bit, thought they were talking about system bandwidth being 65% higher.
 
If you're a GPU maker, what features of DRAM as we currently know it, are desirable in a memory subsystem enabled by an interposer?

It seems to me that this is an opportunity to create something with the performance and granularity of "L4 cache" but sized as entire system memory.

In other words, why wouldn't a GPU manufacturer make the memory chips too? AMD and NVidia already know how to build caches...

Surely not DRAM caches.....
 
DRAM manufacturing firms or divisions are built to make money from designing and manufacturing cheap high-density memory, with large volumes and very high yields.
That said, many or most of them have a very hard time making money from it.
The DRAM industry is chronically in overproduction, so it could be very challenging for AMD or Nvidia to beat companies that barely charge enough to make any money.

Whatever system they engineer could be faster or a better fit for their systems, assuming they accumulate expertise they currently lack. The cost of setting up a DRAM division and the design cost would be amortized over a smaller market.
Also, unless another DRAM manufacturer feels like being a foundry for them (which means sharing very little margin already), AMD and Nvidia would have few options since DRAM processes are different from TSMC's or GF's logic processes and may not provide good product or good enough yields.
 
DRAMs are crappy partly because their physical interfaces are very narrow. Without crappy interfaces there's a much bigger optimisation space to explore (compare SATA and PCI Express as interfaces for NAND based storage). So if the memory makers don't want to play ball then...

But seeing that Hynix, at least, is making some effort, it seems like it won't come to that.
 
The nice thing about the interposer is that the decision to use one is the logic manufacturer, who will be taking on responsibility for paying for the medium that the interace must cross.
When the medium is inexpensive PCB with long traces the interface options are more limited, but if AMD or some other manufacturer wants to pay for an extra silicon die, more power to them.

Since stacked memory with TSV is an apparent next step, the interposer may be an incremental change to what is already being researched so that may also make things easier for memory manufacturers.
 
premising that i dont fully understand the difference compared to a traditional substrate

Can this the key element to a modular gpu as was speculated some year ago for ati?
And "crossfire on a chip"?
 
I know answers may be general / guestimations, but I am curious about some things about the Silicone Interloper (SI) and stacked memory -- manufacturing cost, and benefits and sacrifices.

1. Memory Performance. It appears stacked memory requires lower performance memory. Between the added benefit of stacked memory & the SI's ability to offer much wider I/O (could we be looking at 512bit? 1024bit? More?) what kind of memory performance are we looking at?

2. Memory Density. I haven't been following stacked memory, so what are the current densities offered per-module as compared to currently competing non-stacked products? What is the prognosis for 2012? If this is a far off GPU, say late 2013 or 2014 what does the memory landscape look like?

3. What kind of chip-area related constraints are there? An SI is going to need to fit the GPU + memory and anything related (bus to the CPU and such as well). Is the SI going to be sufficient for a mainstream GPU or is it going to be constrained to the laptop-style market with smaller (e.g. sub-150mm^2 chips) and few(er) memory modules?

4. The manufacturing cost of a SI (Silicone Interposer). If I am understanding this correctly this isn't too far off, in concept, from how many laptops use a distinct PCB for the GPU/Memory. I know my Dell XPS M1330 (spit!) had such. But what is the cost comparison? Is there a net neutral or positive trade off by investing in SI (more expensive) but then again reduction of PCB complexity, expensive GDDR, etc?

5. CPU link. What kind of connection would be expected to the CPU? Could something crazy be set up, e.g. the GPU and *System Memory* be on the SI and then the CPU be on the main-board? Would something like this, a single very fast connection, allow for a fairly simple mainboard PCB but also increased lines for the CPU (Bandwidth)?

6. Multi-GPU... Multi-Chip (GPU/CPU). Would an SI offer a realistic solution for Crossfire/SLI cross traffic or will it not hold enough large chips? And what of the prospect of the CPU moving to the SI?

Obviously this could be a big hit in the Laptop market as well as the midrange PC market where, if SI costs can be held in check, AMD could offer some pretty compelling hardware-PLATFORMS. Ok, we know Bulldozer bit it hard (although, oddly, my dad is very excited about 8 CPU cores at a modest investment... he didn't even know any of the performance issues, so I guess in some circles AMD has a lot of positive mojo still) but could the SI be a game changer for AMD? Instead of marketing CPUs they could use SIs to market the "platform" GPU/CPU/Memory and chase the competition from that end?

Ditto consoles... lets say a certain high profile game developer heard that a certain console manufacturer was considering multi-GPU console solutions. What kind of craziness can be fit onto an SI? A couple 200mm^2 GPUs and 4GB of memory too much? What of the CPU?

7. Lets get crazy: XDR2. Is there any indication of stacking with XDR modules? Let's take crazy pills: The AMD XDR rumor is one crazy pill. The fact Sony' PS3 uses XDR and may also be contracting GPU services is another crazy pill. XDR also has fewer pins per module which may also allow for even crazier potential bandwidths is worth 2 more crazy pills. Did my question just go straight to the psych ward or is XDR2, at least in theory, something could could be used downstream? Or are there reasons this is a non-starter?

Thanks for allowing me to ask all these questions. I just had my wisdom teeth surgically removed so I felt the extra kick of crazy pills to come back and post some wild questions. No don't all flood my PM box at once now with those non-leak leaks :cool:
 
premising that i dont fully understand the difference compared to a traditional substrate
The interposer is an actual silicon chip that is manufactured in a similar way to the GPU on top. It is a big, dumb chip. It's manufactured with wider tolerances, an undetermined amount of redundancy, and will probably be made on a very mature process far behind the leading edge to keep costs down and yield up.
It's cheaper than the GPU silicon, but more expensive than the conventional substrate below it. On the plus side, it is a semiconductor crystal, which means it can handle speeds and wire densities far in excess of what is possible on a package substrate.

2. Memory Density. I haven't been following stacked memory, so what are the current densities offered per-module as compared to currently competing non-stacked products? What is the prognosis for 2012? If this is a far off GPU, say late 2013 or 2014 what does the memory landscape look like?
I haven't seen actual mass-production TSV-stacked DRAM. I have seen articles about sampling stacked DRAM, with four stacks able to hold 1 GiB of memory.

3. What kind of chip-area related constraints are there? An SI is going to need to fit the GPU + memory and anything related (bus to the CPU and such as well). Is the SI going to be sufficient for a mainstream GPU or is it going to be constrained to the laptop-style market with smaller (e.g. sub-150mm^2 chips) and few(er) memory modules?
The interposer is still a silicon chip. Its maximum size could be the same as any other silicon chip, what the lithography equipment will allow. This could be the case, but I am not certain if the limit is the same.
This may mean only GPU dies small enough to have room left over for the DRAM stack can be used on top of an interposer.

6. Multi-GPU... Multi-Chip (GPU/CPU). Would an SI offer a realistic solution for Crossfire/SLI cross traffic or will it not hold enough large chips? And what of the prospect of the CPU moving to the SI?
There may not be enough room for two GPUs on the interposer. The test chip in the semiaccurate article doesn't have enough space for more than one.
 
They might be able to use proximity lithography for the interposers, in which case there won't be a reticule limit at all ... even if they need to use projection it's much easier to relax the reticule limits for the larger processes.
 
DRAMs are crappy partly because their physical interfaces are very narrow.
I still feel the main problem with External DRAMs isn't the low bandwidth but rather the high latency. Yes, you could increase performance with higher bandwidth (if it's higher both per dollar and per watt) but both CPU and GPU architectures are still designed with the primary goals of either preventing memory stalls or hiding them completely (respectively). The amount of logic you could save with 10x lower memory latency are quite astonishing IMO. And no, you can't save that logic with just an eDRAM framebuffer or a massive L3 cache sadly. Certainly bandwidth is a more pressing concern though because even current architectures cannot continue to scale without ever more bandwidth.

As for Charlie's picture - he does have a 'HD8000' tag in there. I'm not confident that means anything but heh. Could be console, could be an APU, could be a high-end GPU, could be nothing more than a random prototype they made to test interposers in general... We really don't know anything at this point.
 
There may not be enough room for two GPUs on the interposer. The test chip in the semiaccurate article doesn't have enough space for more than one.

I think about the last question concern that using interposer doesn't mean they share the same front-end which allow them more easily carrying out instructions.
 
AMD Delays Next Generation Radeon launch to Q1 2012
AMD are delaying the launch of their hotly anticipated next generation Radeon cards featuring Graphics Core Next Gen architecture to Q1 2012, despite previously targeting Q4 '11 for launch.

Rage3D's sources indicate that the problem is on the production side, not the design side, possibly indicating TSMC's 28nm process Q4 volume ramp up isn't enough to launch the new line up for Christmas.
:cry:
 
Most of the named (ie those having a chance this year) chips are not GCN anyway so we may still see 28nm launch this year, considering they write GCN chips. But probably not that likely if the volume ramp is the problem.
 
Looks like they are going to miss this BF3 / MW3 upgrade cycle.

To be honest, I'm not convinced there really is such a thing, and even if there is, it's not going to be a punctual event, but something that happens over a few months. After all, not everyone is going to buy BF3/MW3 on launch day.

But if this piece of news is correct, they are going to miss Christmas. I think it may only apply to desktop SKUs, though.
 
An MW3 upgrade? What would that be anyway? *SCNR*

With BF3 it's another matter though. But I think, Alexko, you're not entirely correct. The bulk of sales happens very close to launch including, of course, all the pre-orders. Afterwards, the curve is declinig usually quite fast.
 
Back
Top