NVIDIA Maxwell Speculation Thread

Humm, very interesting. I don't see an author; did you write that AnarchX?
Because NV GPUs can access memory in transactions as small as 32B, they can have better performance on applications with gather or scatter memory access patterns. For example, sorting. Making the memory access atom larger harms performance on these applications because the processor has to fetch extra data that is then unused.

In general, smaller memory access granularity is better but also more work to implement (and can cause overheads in the case where memory accesses are very regular).

It would be easy for NV to move to 64B accesses, but it will harm NV performance on certain applications, since it is a worse match for some applications.
 
Erm.. this might come as a newb question, but if GDDR5X changed so much from the old GDDR5, why didn't they change the per-chip pathway to 64bit?
Given the increased memory densities, IHVs/OEMs are more likely to be forced to use more chips to reach a desired bandwidth.

Putting it another way, is there any GDDRxx implementation that would require a 32bit path? Even the lowest-end GPUs use 64bit..
 
Last edited by a moderator:
Aaaand it requires completely new memory controllers

Exactly, so why not create a new standard that uses a 64bit width channel per chip?
No matter how low-end the GPU is, I haven't seen a single use case for a memory controller that would need 32bit. Memory controllers always use a total width number that comes from multiples of 64bit: 64, 128, 192, 256, 384, 512.
 
I've never seen a die shot of an GDDR5 chip, but a 64 bit interface could run into issues of being pad limited, especially for smaller memory sizes.

Do we know what the "popular" GDDR5X chip will be, initially? If it ends up being something like 8 Gb, then would that sufficiently alleviate any issues about memory chip size being too small for the requisite interface width?
 
Do we know what the "popular" GDDR5X chip will be, initially? If it ends up being something like 8 Gb, then would that sufficiently alleviate any issues about memory chip size being too small for the requisite interface width?
AFAIK, we haven't even seen confirmation that GDDR5X will be used at all, let alone the size of their memory sizes.
 
Low end GPU with 2GB of GDDR5X on a 64bit bus interface (total) would be very fine for me, or we might likely see GDDR5 without the X on 128bit instead.

A "low end" desktop in 2017 I would quite be happy with :

Zen Quad-core CPU (i.e. an eight-core with half the die disabled)
GP107 graphics card
16GB DDR4
256GB or 512GB NVMe SSD.

This thing would fly, although performance-wise I assume most people reading this already have a better PC than that (barring the storage)
 
So far Nvidia was using 32 bit memory controllers. Even if smallest chips have pair of them, you can still make board bit cheaper, smaller, etc., by utilizing only one of them with single memory chip.
 
AFAIK, we haven't even seen confirmation that GDDR5X will be used at all, let alone the size of their memory sizes.

I understand that gddr5x implementations are speculation this point.

However, Anandtech reported that the gddr5x standard would show up in 4Gb, 6Gb, 8Gb, 12Gb and 16Gb varieties based on the definition of the standard.

But then there have been reports that new gddr5 production from micron would be 8Gb chips rated for up to 8Gbps. As far as I know, a similar situation occurred when 4Gb gddr5 chips became popular and they also brought speeds up to 7Gbps.

Since new speeds and increased size seem to be implicitly linked (though I doubt it needs to be that way), I'm wondering if the first mass-produced (i.e. popularly used) gddr5x chips will show up in sizes of no less than 8Gb, and then they will increase from there as the technology matures.

And if gddr5x implementations tend to be narrower than historical "equivalent" gddr5 implementations (e.g. the 64-bit bus comment that someone just made), then denser chips might be helpful to meet raw vram capacity demands (and/or implicit marketing requirements, tbh).

But as always, this is just speculation until we see a gddr5x card on the streets.
 
Here's a dumb question: what would be the advantages of a 64-bit wide GDDR5?

The world is more than GPUs and huge dies alone. E.g. Micron actively promotes GDDR5 for networking applications. If that market is anything than, say, set top boxes, they'll only one DRAM with 32 bits maximum. There are plenty of applications where bandwidth is more important than memory size. Where you want maximum bandwidth with minimum number of IOs.

With 64 bits, your number of balls would go a lot. That would mean a denser ball grid and more expensive PCB. You'd also have less routing flexibility.

In addition, 64 bits would make clam shell mode more complicated, since you'd have to route the address to 2 chips amidst a lot more wires.

I can imagine that 64 IOs will also create more difficulties wrt power delivery?

All that for what? A minor savings in PCB space?
 
Here's a dumb question: what would be the advantages of a 64-bit wide GDDR5?

I suppose that it mathematically allows for a GDDR5 module to be able to transfer the same amount of data over a bus that is physically half as fast.
However, the challenges that scale with wire count like routing, signal integrity, and timing between more wires could be related to why that path wasn't chosen.
Keeping the link speed the same to get a bandwidth gain would hit those same scaling challenges with no extra timing margin that a slower bus would provide.

Some other implementation-related issues could be things where it's nice to have separate channels and separate devices with 2x the power and current budget.
Bank activation restrictions went into why HBM2 included a pseudo-channel mode, and tFAW is a wall clock penalty for a whole channel due to physical limits to the device. Having two devices with their separate capacities to absorb activation cost in that scenario would be preferable to having one device that stalls. The same could go for refreshes, where in a capacity-equivalent system could be one device maintaining refresh and using up command bandwidth for twice as much time versus two devices able to do so independently. Commands can be queued and delayed up to a point, but it's also the case that a single device's queuing capacity is finite and might not be equivalent to the capacity of two separate devices.
There are various factors that could change the math, so that might be highly dependent on the hypothetical implementation of the 32-bit and 64-bit devices, and I am not sure how big an obstacle these might be overall.

Global power consumption might be better with fewer devices, but the modules have a comparatively low ceiling in terms of power that is delivered and dissipated in one spot, where the various issues above and their mitigation measures could bump into a hard power limit in terms of delivery or requiring a heatsink where two devices would not.
 
A major part of power consumption in traditional DRAM is due to the active termination, a static current irrespective of dynamic behavior.
If you increase the number of pins and lower the speed, your perf/W will go down.
 
A major part of power consumption in traditional DRAM is due to the active termination, a static current irrespective of dynamic behavior.
If you increase the number of pins and lower the speed, your perf/W will go down.

How would the static component behave relative to voltage? Cutting the Gbps in half relative to existing GDDR5 speed grades brings it into the range of 1.2V DDR4, although as noted it requires hand-waving everything else related to upping number of signals by a little over 2/3rds, and without a significant future in higher bandwidths.

I did notice that GDDR5X is being compared to GDDR5 at 1.5V. Is there mention of a lower voltage offering? At release, GDDR5 disclosures mentioned a 1.35V option, which some "green" and networking offerings use.
 
Back
Top