AMD: Volcanic Islands R1100/1200 (8***/9*** series) Speculation/ Rumour Thread

What was the source for the chart? I ask because the PCIe gen 2 had a lot of command overhead (like 20% of the bandwidth or something) and that was reduced with PCIe gen 3. In fact, I believe a significant part of the performance gain for PCIe gen 3 was the improvements in reducing command overhead. (PCIe gen 2 command rate is 5 GT/s vs. 8 GT/s for PCIe gen 3, so the rate did not double on gen 3 vs. gen 2.)

Gotta be careful in what you are comparing. You are switching between link bandwidths and payload bandwidths which aren't at all comparable between gen 2/3. Gen 2 used 8b/10b encoding which meant that 5GT/s was really 4GB/s. Gen 3 uses 128b/130b link encoding which means for all practical purposes GT/s=GB/s.

To put this in perspective, the peak bandwidth I recall seeing on PCIe gen 2 was around 6.2 GB/s for a single directional transfer, compared to 12.8 GB/s on PCIe gen 3. We have done some testing with bidirectional transfers, too, achieving around 20 GB/s. That result wasn't confirmed on other platforms, however, as it was mainly proof-of-concept.

Some of the differentials between PCI-2 and PCI-3 in sustained bandwidth are due to improvements in the actual controllers at both ends being able to send and receive at higher rates. This is mostly buffering/etc. Also bandwidth delivered with PCI-e is also largely dependent on transfer sizes.
 
The slide says "direct access to GPU display pipes."

Could this mean that PCIe writes don't go to DRAM and then get scanned out by the memory controller to be sent to the display unit, but that those write bypass memory altogether and go straight from PCIe to the display unit?

Depends on what they've allowed for buffer depths/etc. If the buffering is sufficient and the QoS capabilities are sufficient then it would be practical to simply have the display controller do DMA access in a JIT basis to the other card(s) to pull in the frame data required without storing it in DRAM.

It is also important to point out that the main limiter is likely going to be the upstream PCI-e controllers peer-to-peer bandwidth and latency characteristics. Though I'm sure AMD has tested this heavily for AMD, Intel, and PLX controllers and all the PCI-e root designers have had years now to work on peer-to-peer bandwidth and latency in their controllers as it has been used in the high end server market now for years. Any controller for instance that can handle Nvidia GPUDirect transfers at high data rates will work fine which means at least any of the extreme CPUs can handle it fine.
 
Transferring a 5-screen eyefinity framebuffer at 1440P/60Hz requires over 4GB/s. It's unlikely it will come at absolutely no performance hit whatsoever. Unless this bridge-less crossfire supports only two cards you could pretty much swamp the entire PCIe interface with only framebuffer data, without even having to reach to 4K resolutions.
wasn't the crossfire bridge about 5GB/s? (or was that SLI?).
and that's probably peak, I'd think 4GB/s real performance might have not been possible, PCIe should be able to handle those 4GB/s.

edit: http://images.anandtech.com/reviews/video/ATI/4870X2/sideport.jpg
5GB/s bi-directional in addition to 5GB bi-directional via PCIe 2.0

edit2: btw. how did you calculate the bandwidth requirement?
for 1440p@60Hz I get
2560x1440 * 4byte * 5Screens * 30Transfers/s -> ~2.06GB/s
 
Last edited by a moderator:
A few of you need to remember that PCI-e is a point to point topology, there isn't any issue with "flooding the bus".
P2P or not, there's a possibility of flooding when you consider that all your data being transferred has to pass through the interface to the destination video card (the one with the monitor cable attached to it). That's the weak link in the chain, to say nothing of any possible limitations in internal bandwidth of the central hub.

We have measured performance data across wide ranged of PCI-e speeds and widths with all the data pointing to no measurable difference between PCI-e 3.0 at x4 and PCI-e 3.0 x16. These same measurements have also been run in CF comparing PCI-e 3.0 at x4/x8/x16 with no measurable effect.
What measurements; what number of video card running crossfire, what screen resolution? What screen refresh rate? Go 120Hz (which many hardcore gamers are very eager to do), you again double bandwidth requirements over what is the current defacto standard. The bandwidth ceiling's gonna be flying towards your head at that rate.

Also consider that of intel's current CPUs, only sandy and ivy bridge EX offer full PCIe 16x interfaces when running more than one board. Few people buy such systems to game on, due to rather massive costs. That halves available bandwidth for crossfire, IE problem for 4k rez/60Hz at least.

The bridge interfaces tend to be x1 interfaces and will run out of bandwidth well before the full PCI-e interfaces will.
Really? That'd be extremely surprising. There's a lot of pins in those connectors, particularly on AMD cards, enough for at least 4 differential signalling links I should think.
 
640K who need of more

Wikipedia says:
Capacity

Per lane (each direction):

  • v1.x: 250 MB/s (2.5 GT/s)
  • v2.x: 500 MB/s (5 GT/s)
  • v3.0: 985 MB/s (8 GT/s)
  • v4.0: 1969 MB/s (16 GT/s)
So, a 16-lane slot (each direction):

  • v1.x: 4 GB/s (40 GT/s)
  • v2.x: 8 GB/s (80 GT/s)
  • v3.0: 15.75 GB/s (128 GT/s)
  • v4.0: 31.51 GB/s (256 GT/s)
The standard PCI-E 4.0 is ready now(an effective will be presented in H;Z series motherboadrs with intel chipset in combine with Scylake processors arcitecture, but and PCI-E 3.0 in X8 electrical has all bandwich who need for crossfire without discrete bridge.
 
P2P or not, there's a possibility of flooding when you consider that all your data being transferred has to pass through the interface to the destination video card (the one with the monitor cable attached to it). That's the weak link in the chain, to say nothing of any possible limitations in internal bandwidth of the central hub.

I buy this to some extent, as I've had an X38 chipset in the distant past where I could demonstrably put the PCI-E bus under enough duress that it would hard-lock the box. The challenge was an IOMeter test run against my PCI-E 2.0 8x RAID card running four (or more) SSD's in RAID0, along with a simultaneous PCI-E traffic throughput test of my Radeon 5850 in PCI-E 2.0 16x mode. The X38 chipset was supposedly capable of dual 16x simultaneously, which is why I bought it. Aaaaannnnddd... Nope!

However, with the advent of PCI-E controller hub now being part of the SB and later processors directly (and some of the Nehalem line too, if I recall.. The Socket 1155 stuff?) I would be hard-pressed to imagine a case where the PCI-E bus could be similarly overwhelmed.
 
I would be hard-pressed to imagine a case where the PCI-E bus could be similarly overwhelmed.
I'm of course not claiming the system would hard-lock (that would IMO be faulty hardware at work), but everything has a limit. Especially in cost-sensitive consumer electronics, where there just isn't a need for 100% maximum I/O concurrency.

Now, it might be practically possible to simultaneously stream to and from every slot using every bidirectional PCIe link in point-to-point fashion, I'm not saying that is unthinkably un-possible, but I would not be at all surprised if you hit some kind of internal limit pretty quickly. There's gotta be a crossbar/router of some type in there, and it too will have a max capacity of some form.
 
If PCIe is so good for this why were SLI and Crossfire connectors needed at all? Or has the situation only been viable since PCIe 3.0?
 
If PCIe is so good for this why were SLI and Crossfire connectors needed at all? Or has the situation only been viable since PCIe 3.0?

If it's now ~3x needed by any output, PCI Expresss 2.0 (x16/x16) would have been sufficient, but 1.0 wouldn't have (nor PCIE2.0 x8/x8)
 
If it's now ~3x needed by any output, PCI Expresss 2.0 (x16/x16) would have been sufficient, but 1.0 wouldn't have (nor PCIE2.0 x8/x8)

At the time, GPUs didn't support the crazy Eyefinity definitions that they do now.
 
If PCIe is so good for this why were SLI and Crossfire connectors needed at all? Or has the situation only been viable since PCIe 3.0?
Given the amount of motherboards with crippled PCIe out there (32 lane connector but only 8 active etc.), doing it over PCIe only sounds like a recipe for consumer support overload.

And AFAIK PCIe was never designed with QoS contracts in mind so everything is best effort. (But I may be totally wrong about that?)

Simple point to point is so much easier...

I assume these things have changed over the years and now it has come to a point where they don't expect too many problems.
 
http://www.techpowerup.com/191768/radeon-r9-290x-clock-speeds-surface-benchmarked.html

1a.jpg


Radeon R9 290X is looking increasingly good on paper. Most of its rumored specifications, and SEP pricing were reported late last week, but the ones that eluded us were clock speeds. A source that goes by the name Grant Kim, with access to a Radeon R9 290X sample, disclosed its clock speeds, and ran a few tests for us. To begin with, the GPU core is clocked at 800 MHz. There is no dynamic-overclocking feature, but the chip can lower its clocks, taking load and temperatures into account. The memory is clocked at 1125 MHz (4.50 GHz GDDR5-effective). At that speed, the chip churns out 288 GB/s of memory bandwidth, over its 512-bit wide memory interface. Those clock speeds were reported by the GPU-Z client to us, so we give it the benefit of our doubt, even if it goes against AMD's ">300 GB/s memory bandwidth" bullet-point in its presentation.

Among the tests run on the card include frame-rates and frame-latency for Aliens vs. Predators, Battlefield 3, Crysis 3, GRID 2, Tomb Raider (2013), RAGE, and TESV: Skyrim, in no-antialiasing, FXAA, and MSAA modes; at 5760 x 1080 pixels resolution. An NVIDIA GeForce GTX TITAN was pitted against it, running the latest WHQL driver. We must remind you that at that resolution, AMD and NVIDIA GPUs tend to behave a little differently due to the way they handle multi-display, and so it may be an apples-to-coconuts comparison. In Tomb Raider (2013), the R9 290X romps ahead of the GTX TITAN, with higher average, maximum, and minimum frame rates in most tests.
 
Last edited by a moderator:
4500 GDDR5 ...... really ?!?!?!? that seems completely nuts. Also clocks are lower then expected. Not sure if i believe that link.

Could be under-clocked sample from AMD.
Final version of the card should be clocked higher based on AMD slide(s)

Or its meant to show the validity of the core architecture at the same speed as Titan, while consuming less power, and having more features possibly.

http://www.techpowerup.com/191768/radeon-r9-290x-clock-speeds-surface-benchmarked.html
 
Last edited by a moderator:
4500 GDDR5 ...... really ?!?!?!? that seems completely nuts. Also clocks are lower then expected. Not sure if i believe that link.

Perhaps turbo clocks apply to memory as well as core clock now (is that feasible?), and these samples rather than indicating the lack of turbo in the product simply don't have it enabled yet in their firmware.
 
Perhaps turbo clocks apply to memory as well as core clock now (is that feasible?)

No, it takes time to retrain the GDDR5, so you can't change it in the middle of a frame.
The clocks aren't surprising for some random Engineering Sample, but clearly not what we'll see in 290x per officially released specs (bandwidth, triangle rate).
But even though amd usually does better in the very high resolutions, performance seems too high for those clocks - could be gpuz not detecting the boost, or just fake..
 
Yeah those clocks definitely don't jive with either AMD's quoted triangle rates or bandwidth so there's hopefully some mistake. Great to hear its faster than Titan even at those speeds though.
 
Back
Top