Shared RAM on NUMA-machines?

Nemesis77

Newcomer
The title might be a bit misleading, so allow me to clarify a bit :). If we look at the traditional Intel-architecture and compare it to NUMA-system (like AMD64), the difference are obvious. On Intel, the RAM is attached to the Nortbridge, and each CPU is connected to that nortbridge, and they share the RAM. On AMD-machines, the RAM is connected to the CPU, and each CPU can use other CPU's RAM as well. The northbridge is focused on handling PCIe and other tasks (for the sake of simplicity, I'm assuming a single-chip chipset).

Now, the benefits of the AMD-approach are obvious: The memory-latencies are lower, mem-bandiwdth goes up as number of CPU's increase, and the FSB can be dedicated to other tasks than accessing the RAM. But would there be any benefits if there was some RAM attached to the Northbridge as well?

More details: What if, besides having RAM attached directly to the CPU, the northbridge would also have RAM-banks attached to it? What uses could that RAM serve?

- Texture-cache for the vid-card
- IO-buffer
- Shared RAM for the CPU's
- Some other things I can't think of right now ;)

Yes, the CPU-attached RAM could be used for textures right now. But when doing so, it needs to go through the FSB. And the FSB could be used for other things. And the CPU-attached RAM could be dedicated for the CPU, while the vid-card would have RAM of it's own at it's disposal. I could imagine that the hi-end 3D-folks would love to have few gigs of dedicated texture-RAM at their disposal ;).

And that RAM could be used by the CPU as well, effectively increasing the mem-bandwidth, when the CPU has it's own RAM at it's disposal, the RAM attached to other CPU's, and the RAM attached to the northbridge. But, OTOH, the northbridge-RAM could be dedicated to strictly other uses, like textures and IO. In that case, could it run at different speed than the CPU-attached RAM does, or would there be timing-issues?

Would this make any sense, or am I talking BS? I would guess that normal PC's wouldn't really benefit, but what about workstations?
 
Nemesis77 said:
The title might be a bit misleading, so allow me to clarify a bit :). If we look at the traditional Intel-architecture and compare it to NUMA-system (like AMD64), the difference are obvious. On Intel, the RAM is attached to the Nortbridge, and each CPU is connected to that nortbridge, and they share the RAM. On AMD-machines, the RAM is connected to the CPU, and each CPU can use other CPU's RAM as well. The northbridge is focused on handling PCIe and other tasks (for the sake of simplicity, I'm assuming a single-chip chipset).

Now, the benefits of the AMD-approach are obvious: The memory-latencies are lower, mem-bandiwdth goes up as number of CPU's increase, and the FSB can be dedicated to other tasks than accessing the RAM. But would there be any benefits if there was some RAM attached to the Northbridge as well?

More details: What if, besides having RAM attached directly to the CPU, the northbridge would also have RAM-banks attached to it? What uses could that RAM serve?

- Texture-cache for the vid-card
- IO-buffer
- Shared RAM for the CPU's
- Some other things I can't think of right now ;)

Yes, the CPU-attached RAM could be used for textures right now. But when doing so, it needs to go through the FSB. And the FSB could be used for other things. And the CPU-attached RAM could be dedicated for the CPU, while the vid-card would have RAM of it's own at it's disposal. I could imagine that the hi-end 3D-folks would love to have few gigs of dedicated texture-RAM at their disposal ;).

And that RAM could be used by the CPU as well, effectively increasing the mem-bandwidth, when the CPU has it's own RAM at it's disposal, the RAM attached to other CPU's, and the RAM attached to the northbridge. But, OTOH, the northbridge-RAM could be dedicated to strictly other uses, like textures and IO. In that case, could it run at different speed than the CPU-attached RAM does, or would there be timing-issues?

Would this make any sense, or am I talking BS? I would guess that normal PC's wouldn't really benefit, but what about workstations?

The question you really need to ask yourself is whether or not the benefit of having it is worth the increased complexity. Think about the extra traces that will need to be run for this ram, the necessity of having a memory controller present in the northbridge, syncrhonization issues between the different pools of ram, the extra time needed to transfer data between cpu-ram and the northbridge ram... Ultimately I am afraid that it might actually decrease performance over simply having a larger pool of ram in the current Athlon64 memory setup.

It's always good to brainstorm about these things though. Certainly going to a NUMA architecture has benefited AMD compared with the older northbridge memory controller setup.

Nite_Hawk
 
Nite_Hawk said:
The question you really need to ask yourself is whether or not the benefit of having it is worth the increased complexity. Think about the extra traces that will need to be run for this ram, the necessity of having a memory controller present in the northbridge, syncrhonization issues between the different pools of ram, the extra time needed to transfer data between cpu-ram and the northbridge ram... Ultimately I am afraid that it might actually decrease performance over simply having a larger pool of ram in the current Athlon64 memory setup.

It might not be sensible for regural PC:s (where cost would be an issue), but for more expensive setups. Also, if the RAM was used solely for video, IO and the like, and not by the CPU's at all, then the problems of synchronization. The idea would be that IO-activities and the like, while the CPU's would work with the RAM that is attached to the CPU's.

I actually got the idea from the P.A. Semi processors, where the L2-cache on the CPU can be used for IO-operations as well. Since A64 or most other CPU's don't handle IO (in the same sense as PWRficient does), some other solution is needed instead to mimic the behavior. And that gave birth to the idea of dedicated RAM-pool for IO (and other tasks, like textures).
 
Last edited by a moderator:
Back
Top