AMD: Volcanic Islands R1100/1200 (8***/9*** series) Speculation/ Rumour Thread

1c6278ad-01af-4e62-ab8a-1d5ce1c44758.jpg
 
Transferring a 5-screen eyefinity framebuffer at 1440P/60Hz requires over 4GB/s. It's unlikely it will come at absolutely no performance hit whatsoever. Unless this bridge-less crossfire supports only two cards you could pretty much swamp the entire PCIe interface with only framebuffer data, without even having to reach to 4K resolutions.

Frankly, I think this is a mistake.
 
Transferring a 5-screen eyefinity framebuffer at 1440P/60Hz requires over 4GB/s. It's unlikely it will come at absolutely no performance hit whatsoever. Unless this bridge-less crossfire supports only two cards you could pretty much swamp the entire PCIe interface with only framebuffer data, without even having to reach to 4K resolutions.

Frankly, I think this is a mistake.
That's true for AFR modes. But what if they use some more advanced methods, like what Lucid Hydra is doing with multi-GPU load balancing?
 
Transferring a 5-screen eyefinity framebuffer at 1440P/60Hz requires over 4GB/s. It's unlikely it will come at absolutely no performance hit whatsoever. Unless this bridge-less crossfire supports only two cards you could pretty much swamp the entire PCIe interface with only framebuffer data, without even having to reach to 4K resolutions.

Frankly, I think this is a mistake.
Those scenarios are already using the PCI bus, but in a relatively inefficient and unmanaged manner.

Secondly, take a look at PCI version / lane width scaling tests. Standard performances are not greatly affected by PCI bandwidth (i.e. the traffic is fairly low), when they are its because the textures / buffer sizes have overflown the framebuffer and you start addressing from system RAM and the disparities in local bandwidth vs PCIe bandwidth become rather catastrophic to game performance.
 
difference between pci-e 2.0 and 3.0?

Already posted above, look at the tables. Its 32GBs vs 16GBs. Well beyond what's needed for this.

*EDIT: Seems like the chart is saying total bidirectional bandwidth. Single direction bandwidth would be half of that, so 16GB/s vs 8GB/s. And that's theoretical limits. The practical/actual limits might be lower -- perhaps too close to my liking to know if there's limitations or not.
 
Last edited by a moderator:
Ah, that chart is a bit misleading(?). In the total bandwidth column I think they're counting bidirectional bandwidth. So for PCIe3 vs PCIe2 it would be 16GB/s vs 8GB/s for a 16 lane setup in 1 direction.

When you take into account the theoretical vs actual metrics, that makes this much closer than I'd like for this usage. Thanks for providing that datapoint OpenGL Guy.

I've never seen anything beyond about 12.8 GB/s for a single directional transfer on PCIe gen 3.0 and 16 lanes.
 
Ah, that chart is a bit misleading(?). In the total bandwidth column I think they're counting bidirectional bandwidth. So for PCIe3 vs PCIe2 it would be 16GB/s vs 8GB/s for a 16 lane setup in 1 direction.

When you take into account the theoretical vs actual metrics, that makes this much closer than I'd like for this usage. Thanks for providing that datapoint OpenGL Guy.
What was the source for the chart? I ask because the PCIe gen 2 had a lot of command overhead (like 20% of the bandwidth or something) and that was reduced with PCIe gen 3. In fact, I believe a significant part of the performance gain for PCIe gen 3 was the improvements in reducing command overhead. (PCIe gen 2 command rate is 5 GT/s vs. 8 GT/s for PCIe gen 3, so the rate did not double on gen 3 vs. gen 2.)

To put this in perspective, the peak bandwidth I recall seeing on PCIe gen 2 was around 6.2 GB/s for a single directional transfer, compared to 12.8 GB/s on PCIe gen 3. We have done some testing with bidirectional transfers, too, achieving around 20 GB/s. That result wasn't confirmed on other platforms, however, as it was mainly proof-of-concept.

According to the table, PCIe gen 2 peaked at 8 GB/s for single direction (which matches my recollection) and if you take away what I recall the command overhead to be, you end up with around 6.4 GB/s, close to what I was seeing. Gen 3 peaks at 16 GB/s for single direction, yet we (I tested AMD and Nvidia GPUs) only achieve around 12.8 GB/s, so it's possible some gen 3 tuning is needed.
 
What was the source for the chart? I ask because the PCIe gen 2 had a lot of command overhead (like 20% of the bandwidth or something) and that was reduced with PCIe gen 3. In fact, I believe a significant part of the performance gain for PCIe gen 3 was the improvements in reducing command overhead. (PCIe gen 2 command rate is 5 GT/s vs. 8 GT/s for PCIe gen 3, so the rate did not double on gen 3 vs. gen 2.)

To put this in perspective, the peak bandwidth I recall seeing on PCIe gen 2 was around 6.2 GB/s for a single directional transfer, compared to 12.8 GB/s on PCIe gen 3. We have done some testing with bidirectional transfers, too, achieving around 20 GB/s. That result wasn't confirmed on other platforms, however, as it was mainly proof-of-concept.

According to the table, PCIe gen 2 peaked at 8 GB/s for single direction (which matches my recollection) and if you take away what I recall the command overhead to be, you end up with around 6.4 GB/s, close to what I was seeing. Gen 3 peaks at 16 GB/s for single direction, yet we (I tested AMD and Nvidia GPUs) only achieve around 12.8 GB/s, so it's possible some gen 3 tuning is needed.
PCIe 1 & 2 used 8b/10b encoding, so 5Gt/s gave 4gb/s per lane, I think that's where that 20% overhead number you've heard come from, so the 8GB/s BW number is already after taking overhead into account. PCIe 3 uses 128b/130b encoding , with 8Gt/s giving ~7.9gb/s per lane, a little short of a doubling over gen2. AFAIK there were no reduction in packet overhead from gen2 to gen3, so if you see more than a doubling in BW from gen to gen3, it is not from anything inherent in the gen3 specs.
 
The slide says "direct access to GPU display pipes."

Could this mean that PCIe writes don't go to DRAM and then get scanned out by the memory controller to be sent to the display unit, but that those write bypass memory altogether and go straight from PCIe to the display unit?
 
A few of you need to remember that PCI-e is a point to point topology, there isn't any issue with "flooding the bus".
 
And what exactly is the compositing block, separate memory for the frame buffer?
Maybe something that merges different video streams together?

E.g. If you do video decoding on just 1 GPU but render images on both, then you need to merge them somewhere?
 
Transferring a 5-screen eyefinity framebuffer at 1440P/60Hz requires over 4GB/s. It's unlikely it will come at absolutely no performance hit whatsoever. Unless this bridge-less crossfire supports only two cards you could pretty much swamp the entire PCIe interface with only framebuffer data, without even having to reach to 4K resolutions.

Frankly, I think this is a mistake.

PCI-e 3.0 run at 8GT/s and on a 16b wide link providing 16 GB/s of bandwidth. Repeated testing shows little to minimal difference between PCI-e 3.0 and 2.0 on GPU performance. We have measured performance data across wide ranged of PCI-e speeds and widths with all the data pointing to no measurable difference between PCI-e 3.0 at x4 and PCI-e 3.0 x16. These same measurements have also been run in CF comparing PCI-e 3.0 at x4/x8/x16 with no measurable effect.

Its also worth pointing out that the bridge interfaces have generally horrible bandwidth compared to the full width PCI-e interfaces. The bridge interfaces tend to be x1 interfaces and will run out of bandwidth well before the full PCI-e interfaces will.

5x4K displays will require ~10 GB/s and PCI-e 3.0 x16 can certainly support that level of bandwidth along with command stream data from the CPU (about 3+ GB/s worth of it).

In other words, this is actually a way to provide more bandwidth to CF so it won't be bottlenecked by the bridge link in the future.
 
Back
Top