Wow, Price vs Perf right now in PC's is nice...

Indeed, faster speeds are easier (for now) than all the extra traces. I think the next big update will be with the Nehalem processors and associated platform. I have to assume that the integrated memory controller will facilitate newer, faster and bigger memory connections.
 
The enthusiast-level Nehalem sockets are likely to push motherboard prices up, with their support of three-channel DDR3.

As an aside, the Xeon MP 8-core Nehalem is planned to have quad-channel memory, only with FB-DIMMs.
 
Aww jyeah... Three channels of DDR3-1600 should give us about, eh, ~12-15GBps worth of bandwidth? FSB should fall out of the equation at some level with the IMC being onboard and all, but I wonder by how much?

I also wonder if we're going to see problems like AMD had with their first IMC-equipped processors, in that high overclocks got REALLY hard to obtain because (rumor mill suggested) the IMC just wasn't up to the task.

I'm pretty sure that's all but resolved now for AMD; here's hoping Intel gets it right the first time.
 
I haven't been able to search out the pin count on Intel's FSB, though the number would be pretty high.

The pinout for the 3-channel socket is 1366, which is a little less than 600 more than the socket with the FSB. How much is power and ground and the CSI links I don't know, but the memory pinout seems to be a net gain over the pins for the FSB.
 
Doubling the number of channels would need a lot of motherboard traces.

why? all the wireing is in place to communicate with all four slots, making it quad channel is just a case of communicating with them at the same time instead of two by two - no ?
 
All DIMMs on a channel share common lines.
If both DIMMs on a channel talk at the same time on shared lines, the result is electrical nonsense.

A single channel setup with 4 slots has the 4 slots interfacing with enough data and control lines for one DIMM to be accessed at a time.

A dual channel setup with 4 slots might have some shared power and ground, but its 2 slots share one set of traces and the other two share another electrically separate set of traces.
That's double the routing for data and control.

DDR3 has a pinout of 240 pins per DIMM, though a lot of that is power and ground.
For simplicity I'll point out that 64 go to for the data lines.

For single-channel, that's 64 lines.
Dual channel is 128.
Quad channel is 256.
 
Last edited by a moderator:
All DIMMs on a channel share common lines.
If both DIMMs on a channel talk at the same time on shared lines, the result is electrical nonsense.

A single channel setup with 4 slots has the 4 slots interfacing with enough data and control lines for one DIMM to be accessed at a time.

A dual channel setup with 4 slots might have some shared power and ground, but its 2 slots share one set of traces and the other two share another electrically separate set of traces.
That's double the routing for data and control.

DDR3 has a pinout of 240 pins per DIMM, though a lot of that is power and ground.
For simplicity I'll point out that 64 go to for the data lines.

For single-channel, that's 64 lines.
Dual channel is 128.
Quad channel is 256.

That doesn't sound so bad if it's only a linear increase.
Especially since a laptop has room for dual channel memory, I'd think they could do quad channel on a desktop with the extra space. I'd imagine the biggest cost would come from the larger chip size needed to accommodate all the pins to connect to.
 
The price in the number of pads or pins per chip is one problem.

The other is the rise in costs associated with the engineering and material costs associated with handling skew and EMI from that many active signal lines at high clock speeds.

Routing 256 data lines in such a way that temperature variance, PCB and metal quality, electromagnetic interference from other sources and other data lines, and any number of other complex phenomena don't cause the memory to fail to hit timing margins is a non-trivial task.

Keeping things within margins would require more PCB layers, more complex routing, and higher quality or more expensive materials. All of these add to cost, and manufacturing complexity can also cause yield issues with board manufacturing.

Upping the manufacturing cost and engineering work for a product where margins are miniscule is eating costs for the sake of the CPU and chipset manufacturers, who will also be charging the board maker more for the larger chips with more pins.
 
Actually i was thinking why has no one made a board with quad channel memory instead of dual ?
You could always get AMD two socket motherboard for that :)
Aww jyeah... Three channels of DDR3-1600 should give us about, eh, ~12-15GBps worth of bandwidth?
Actually theoretical maximum should be around 40GB/s. Real-world bandwidth obviously depends on the IMC efficiency
 
The price in the number of pads or pins per chip is one problem.

The other is the rise in costs associated with the engineering and material costs associated with handling skew and EMI from that many active signal lines at high clock speeds.

Routing 256 data lines in such a way that temperature variance, PCB and metal quality, electromagnetic interference from other sources and other data lines, and any number of other complex phenomena don't cause the memory to fail to hit timing margins is a non-trivial task.

Keeping things within margins would require more PCB layers, more complex routing, and higher quality or more expensive materials. All of these add to cost, and manufacturing complexity can also cause yield issues with board manufacturing.

Upping the manufacturing cost and engineering work for a product where margins are miniscule is eating costs for the sake of the CPU and chipset manufacturers, who will also be charging the board maker more for the larger chips with more pins.

Also, traces in PCB are pretty noisy when you pack a ton together. Going to quad channel memory would force you to have more layers in the board, and as soon as you get past 4 layer PCBs they get really really expensive.

I have an old Alpha PC164 board that has a 256 bit memory interface. It takes 8 matching 72 pin simms to even boot the thing. It's a 16 layer PCB to make all those connections.
 
Is there a need for quad channel is lack of memory bandwidth hurting performance or would doubling bw not add much ?
 
In most desktop applications, memory bandwidth isn't usually a limiter.
For chips on an FSB, it's a slight difference when memory is only a little faster than the FSB.
Quad-channel is so far beyond diminishing returns that it would be indistinguishable from a dual-channel setup that is clocked a little higher than the FSB.
For chips with an IMC, the internal datapaths and IMC speed are a limit.

In a theoretical sense, a single Conroe core can initiate 1 128-bit load per cycle, or 16 bytes.
With a quad-core, that's 4*16*3.0 GHz = 192 billion bytes per second.

That's when the chip is basically doing a loop with a single load instruction forever, and nothing else.

There are bandwidth-limited tasks, but nothing to that extreme, and usually other bottlenecks likely get in the way. Even if some task is limited solely by the 192 billion limit (some mythical Conroe quad-core that is directly tied to a DRAM controller with nothing inbetween) it isn't common enough to justify a mass-produced product that can handle it.
 
Last edited by a moderator:
My understanding of DDR-2 is that its main feature is to multiplex two memory chips(presumably integrated on the same die or chip package?) to get twice the frequency at the cost of latency. And DDR-3 is the same kind of trade-off taken one step further; multiplexing 4 memory chips.

If there existed a fast optical interconnect between RAM and CPU, couldn't you just mux a whole bunch of memory modules in the same fashion with only a little extra latency?
 
DDR2 widens the internal datapaths that links the core memory cells with the part of the chip that serves as a buffer to the external bus.

The core part is very difficult to clock fast, while the DRAM's interface is a smaller section that is easier to clock fast.
DDR2 allows the DRAM core to spit out double the number of bits per core clock cycle to feed the interface, which is clocked twice as high.
This prefetch has scaled thusly:
DDR - 2 bit prefetch
DDR2 - 4 bit
DDR3 - 8 bit

This is great for bandwidth, as the DRAM bus is faster.
It's also great for manufacturing, because the bulk of the DRAM chip is no longer clocked to the point that it hurts yields and binning.
The downside is latency at a given clock and the longer prefetch stride doesn't help access patterns that wind up discarding the prefetched bits.
 
It's the same trade-off of latency for bandwidth. If accesses are truly random on a DRAM, the streaming bandwidth the wider prefetch brings winds up not being used, but the latency penalty remains. These days, RAM really doesn't like random access, the name notwithstanding.

For clarity on my earlier post, DDR2 and DDR3 do not multiplex chips. Their internal DRAM arrays themselves are time-multiplexed, because the prefetched bits can't all go out at the same time. Each DRAM chip's buffer has to pick one array's output at a time.

I don't know if you've seen this link, but it has a more in-depth explanation.

http://www.lostcircuits.com/memory/ddrii/
 
Back
Top