NVIDIA Maxwell Speculation Thread

Kaotik · Jan 24, 2016

Pixel said:
https://www.jedec.org/news/pressreleases/jedec-announces-publication-gddr5x-graphics-memory-standard

25%-50% boost in memory bandwidth this year. From a jump from gddr5 8gbps to 10-12gbs.

Assuming of course that someone already implemented GDDR5X memory controllers, to my understanding GDDR5 controllers aren't compatible (but GDDR5X controllers are compatible with GDDR5 too)

AnarchX · Jan 24, 2016

Also the 64 Byte of QDR could be a problem for current NV-GPUs, according to this article: http://monitorinsider.com/GDDR5X.html?hmsr=toutiao.io&utm_medium=toutiao.io&utm_source=toutiao.io

But GDDR5X is a nice interim solution for 2016/2017, until HBM2 goes large scale. You could build some crazy 6GiB - 256-Bit - 384GB/s - ~250mm² performance GPUs, catching up with the 28nm high-end ones.

Ryan Smith · Jan 25, 2016

AnarchX said:
Also the 64 Byte of QDR could be a problem for current NV-GPUs, according to this article: http://monitorinsider.com/GDDR5X.html?hmsr=toutiao.io&utm_medium=toutiao.io&utm_source=toutiao.io

Humm, very interesting. I don't see an author; did you write that AnarchX?

RecessionCone · Jan 25, 2016

Ryan Smith said:
Humm, very interesting. I don't see an author; did you write that AnarchX?

Because NV GPUs can access memory in transactions as small as 32B, they can have better performance on applications with gather or scatter memory access patterns. For example, sorting. Making the memory access atom larger harms performance on these applications because the processor has to fetch extra data that is then unused.

In general, smaller memory access granularity is better but also more work to implement (and can cause overheads in the case where memory accesses are very regular).

It would be easy for NV to move to 64B accesses, but it will harm NV performance on certain applications, since it is a worse match for some applications.

AnarchX · Jan 25, 2016

Ryan Smith said:
Humm, very interesting. I don't see an author; did you write that AnarchX?

No, just found it.

When you read deeper, there also solution for the "64 byte" problem, which could NV choose.

Deleted member 13524 · Jan 26, 2016

Erm.. this might come as a newb question, but if GDDR5X changed so much from the old GDDR5, why didn't they change the per-chip pathway to 64bit?
Given the increased memory densities, IHVs/OEMs are more likely to be forced to use more chips to reach a desired bandwidth.

Putting it another way, is there any GDDRxx implementation that would require a 32bit path? Even the lowest-end GPUs use 64bit..

Razor1 · Jan 26, 2016

Well I there needs to be some kind of mechanism or change in architectural design of the GPU based on the bit rate of the memory, http://www.ibm.com/developerworks/library/pa-dalign/

Deleted member 2197 · Jan 26, 2016

GDDR5X is your standard GDDR5 memory however, opposed to delivering 32 byte/access to the memory cells, this is doubled up towards 64 byte/access. And that in theory could double up graphics card memory bandwith.

http://www.guru3d.com/news-story/jedec-announces-publication-of-gddr5x-graphics-memory-standard.html

Kaotik · Jan 26, 2016

pharma said:
http://www.guru3d.com/news-story/jedec-announces-publication-of-gddr5x-graphics-memory-standard.html

Aaaand it requires completely new memory controllers

Deleted member 13524 · Jan 27, 2016

Kaotik said:
Aaaand it requires completely new memory controllers

Exactly, so why not create a new standard that uses a 64bit width channel per chip?
No matter how low-end the GPU is, I haven't seen a single use case for a memory controller that would need 32bit. Memory controllers always use a total width number that comes from multiples of 64bit: 64, 128, 192, 256, 384, 512.

silent_guy · Jan 27, 2016

ToTTenTranz said:
Exactly, so why not create a new standard that uses a 64bit width channel per chip?

I've never seen a die shot of an GDDR5 chip, but a 64 bit interface could run into issues of being pad limited, especially for smaller memory sizes.

ImSpartacus · Jan 27, 2016

silent_guy said:
I've never seen a die shot of an GDDR5 chip, but a 64 bit interface could run into issues of being pad limited, especially for smaller memory sizes.

Do we know what the "popular" GDDR5X chip will be, initially? If it ends up being something like 8 Gb, then would that sufficiently alleviate any issues about memory chip size being too small for the requisite interface width?

silent_guy · Jan 27, 2016

ImSpartacus said:
Do we know what the "popular" GDDR5X chip will be, initially? If it ends up being something like 8 Gb, then would that sufficiently alleviate any issues about memory chip size being too small for the requisite interface width?

AFAIK, we haven't even seen confirmation that GDDR5X will be used at all, let alone the size of their memory sizes.

Blazkowicz · Jan 27, 2016

Low end GPU with 2GB of GDDR5X on a 64bit bus interface (total) would be very fine for me, or we might likely see GDDR5 without the X on 128bit instead.

A "low end" desktop in 2017 I would quite be happy with :

Zen Quad-core CPU (i.e. an eight-core with half the die disabled)
GP107 graphics card
16GB DDR4
256GB or 512GB NVMe SSD.

This thing would fly, although performance-wise I assume most people reading this already have a better PC than that (barring the storage)

Putas · Jan 27, 2016

So far Nvidia was using 32 bit memory controllers. Even if smallest chips have pair of them, you can still make board bit cheaper, smaller, etc., by utilizing only one of them with single memory chip.

ImSpartacus · Jan 27, 2016

silent_guy said:
AFAIK, we haven't even seen confirmation that GDDR5X will be used at all, let alone the size of their memory sizes.

I understand that gddr5x implementations are speculation this point.

However, Anandtech reported that the gddr5x standard would show up in 4Gb, 6Gb, 8Gb, 12Gb and 16Gb varieties based on the definition of the standard.

But then there have been reports that new gddr5 production from micron would be 8Gb chips rated for up to 8Gbps. As far as I know, a similar situation occurred when 4Gb gddr5 chips became popular and they also brought speeds up to 7Gbps.

Since new speeds and increased size seem to be implicitly linked (though I doubt it needs to be that way), I'm wondering if the first mass-produced (i.e. popularly used) gddr5x chips will show up in sizes of no less than 8Gb, and then they will increase from there as the technology matures.

And if gddr5x implementations tend to be narrower than historical "equivalent" gddr5 implementations (e.g. the 64-bit bus comment that someone just made), then denser chips might be helpful to meet raw vram capacity demands (and/or implicit marketing requirements, tbh).

But as always, this is just speculation until we see a gddr5x card on the streets.

silent_guy · Jan 27, 2016

Here's a dumb question: what would be the advantages of a 64-bit wide GDDR5?

The world is more than GPUs and huge dies alone. E.g. Micron actively promotes GDDR5 for networking applications. If that market is anything than, say, set top boxes, they'll only one DRAM with 32 bits maximum. There are plenty of applications where bandwidth is more important than memory size. Where you want maximum bandwidth with minimum number of IOs.

With 64 bits, your number of balls would go a lot. That would mean a denser ball grid and more expensive PCB. You'd also have less routing flexibility.

In addition, 64 bits would make clam shell mode more complicated, since you'd have to route the address to 2 chips amidst a lot more wires.

I can imagine that 64 IOs will also create more difficulties wrt power delivery?

All that for what? A minor savings in PCB space?

3dilettante · Jan 28, 2016

silent_guy said:
Here's a dumb question: what would be the advantages of a 64-bit wide GDDR5?

I suppose that it mathematically allows for a GDDR5 module to be able to transfer the same amount of data over a bus that is physically half as fast.
However, the challenges that scale with wire count like routing, signal integrity, and timing between more wires could be related to why that path wasn't chosen.
Keeping the link speed the same to get a bandwidth gain would hit those same scaling challenges with no extra timing margin that a slower bus would provide.

Some other implementation-related issues could be things where it's nice to have separate channels and separate devices with 2x the power and current budget.
Bank activation restrictions went into why HBM2 included a pseudo-channel mode, and tFAW is a wall clock penalty for a whole channel due to physical limits to the device. Having two devices with their separate capacities to absorb activation cost in that scenario would be preferable to having one device that stalls. The same could go for refreshes, where in a capacity-equivalent system could be one device maintaining refresh and using up command bandwidth for twice as much time versus two devices able to do so independently. Commands can be queued and delayed up to a point, but it's also the case that a single device's queuing capacity is finite and might not be equivalent to the capacity of two separate devices.
There are various factors that could change the math, so that might be highly dependent on the hypothetical implementation of the 32-bit and 64-bit devices, and I am not sure how big an obstacle these might be overall.

Global power consumption might be better with fewer devices, but the modules have a comparatively low ceiling in terms of power that is delivered and dissipated in one spot, where the various issues above and their mitigation measures could bump into a hard power limit in terms of delivery or requiring a heatsink where two devices would not.

silent_guy · Jan 28, 2016

A major part of power consumption in traditional DRAM is due to the active termination, a static current irrespective of dynamic behavior.
If you increase the number of pins and lower the speed, your perf/W will go down.

3dilettante · Jan 29, 2016

silent_guy said:
A major part of power consumption in traditional DRAM is due to the active termination, a static current irrespective of dynamic behavior.
If you increase the number of pins and lower the speed, your perf/W will go down.

How would the static component behave relative to voltage? Cutting the Gbps in half relative to existing GDDR5 speed grades brings it into the range of 1.2V DDR4, although as noted it requires hand-waving everything else related to upping number of signals by a little over 2/3rds, and without a significant future in higher bandwidths.

I did notice that GDDR5X is being compared to GDDR5 at 1.5V. Is there mention of a lower voltage offering? At release, GDDR5 disclosures mentioned a 1.35V option, which some "green" and networking offerings use.

NVIDIA Maxwell Speculation Thread

Kaotik

Drunk Member

AnarchX

Ryan Smith

RecessionCone

AnarchX

Deleted member 13524

Guest

Razor1

Deleted member 2197

Guest

Kaotik

Drunk Member

Deleted member 13524

Guest

silent_guy

ImSpartacus

silent_guy

Blazkowicz

Putas

ImSpartacus

silent_guy

3dilettante

silent_guy

3dilettante

Similar threads