Tech Report: CrossFire DUAL SLAVE !

Hmmmm. Given that Super AA doesn't use the composite chip yet, I'm wondering if this render mode will actually work with Dual Slave operation.
 
WaltC said:
The reason why I might loosely label that particular aspect as akin to a dongle is this: the core logic ought to be able to seamlessly configure a dual-slot PCIe-board without the need for manual "rerouters" or jumpers or any manual switches. The dual-slot PCIe config, remember, isn't custom or alien in any way to the PCIe spec advanced by Intel since the start. What nV's done, here, seems like deliberately configuring its nf chipset not to auto-switch transparently and to *require* some kind of artificial, chipset-extraneous mechanism to do the switching so that nV can call it proprietary and charge licensing fees for use of the circuitry. (Which makes sense considering that apparently nV has deliberately departed in this area from the Intel spec.) Again, of course, there's nothing dongle-ish about connecting the two cards to each other--it's what's happening at the chipset level on the mboard that I'm talking about.

IE, the manual switching mechanism is simply not needed in my view and is simply an artificially manufactured requirement for nF chipsets so as to make support for nV SLI proprietary and income producing.
The switching card is neither against the spec nor patented by NVidia nor required for SLI nor limited to SLI. So please tell me, in what whay is this switching card proprietary?

And could you tell me why you think switching at the core logic level is less expensive?

WaltC said:
Heh...;) Yes, maybe we should abandon acronyms altogether. I mean, why write "IBM" when it's so much less annoying to write "International Business Machines"...? (Or is it?...;))

Seriously, acronyms are generally used to alleviate annoyance as opposed to creating it. If people understand that "X-Fire" means "Crossfire" (and in this context it's difficult to see how that might be misunderstood), what's the harm? Using lengthy marketing verbosity inside informal posts is what I call "annoying," but that's just me...;)
Hm, you seem to have missed the joke...
 
DaveBaumann said:
Hmmmm. Given that Super AA doesn't use the composite chip yet, I'm wondering if this render mode will actually work with Dual Slave operation.

Doesn't use it at all? Was aware that the data wasn't passed over the dongle but didn't realise that the actual image assembly was done on-chip. Incidentally, if this is the case, is Super AA actually something that the compositor could do? All the other modes seem to involve simply filling blank spaces with data from the other card, wheras my understanding of Super AA has the "compositing" stage as one of the key stages of the precess. Is this kind of blending functionality even possible with the compositor?
 
As I said, the compositor is a performance optimisation. From my discussion with ATI enginering the graphics chips themselves could actually do everything by themselves, but in order to do that the data would need to be passed via the PCIe bus - that solution would reduce performance for all modes, alternatively they could make board level changes, but then that would render current boards useless; the compositor allows them maximum performance without rendering currently sold boards useless.

The comositor is fully programmable device, so it can do Super AA blending, its appear to just be the case that ATI haven't had the time to code for it yet - they are likely to make that opimisation in time (so don't expect twos laves to always allow Super AA if it does work now).

Even though, from what I understand, the graphics chips themselves could do the image composition, I'm not sure that that the separate compositor chip solution will be removed from future solutions, especially since it was hinted to me that being a programmable device it can achieve other things as well (such as stereoscopic output).
 
DaveBaumann said:
As I said, the compositor is a performance optimisation. From my discussion with ATI enginering the graphics chips themselves could actually do everything by themselves, but in order to do that the data would need to be passed via the PCIe bus - that solution would reduce performance for all modes, alternatively they could make board level changes, but then that would render current boards useless; the compositor allows them maximum performance without rendering currently sold boards useless.

The comositor is fully programmable device, so it can do Super AA blending, its appear to just be the case that ATI haven't had the time to code for it yet - they are likely to make that opimisation in time (so don't expect twos laves to always allow Super AA if it does work now).

Even though, from what I understand, the graphics chips themselves could do the image composition, I'm not sure that that the separate compositor chip solution will be removed from future solutions, especially since it was hinted to me that being a programmable device it can achieve other things as well (such as stereoscopic output).

The Composition Engine is a FPGA Chip this make it very flexibel but not nessary fast. I am not sure if Super AA will fit in this FPGA because they need gamma correction. Taht will eat many gates.

You can build multichip solutions with any card without additional hardware. It is even possible to do this on API level without additional driver support.

Steroscopic output is a driver problem the composition engine can not help you with this.
 
According to Raja Gamma correction isn't an issue. It was also Raja's suggestion on Stereoscopic output support.
 
DaveBaumann said:
The comositor is fully programmable device, so it can do Super AA blending, its appear to just be the case that ATI haven't had the time to code for it yet - they are likely to make that opimisation in time (so don't expect twos laves to always allow Super AA if it does work now).
Wouldn't sending all subsample information from the slave to the compositor exceed the bandwidth of the single DVI link (495 MB/s, IIRC) connecting them? (at least at high res)
 
You aren't sending subsample level data. Each graphics chip does its FSAA resolve and then the two display resolution images are blended on the master board.
 
From my discussion with ATI enginering the graphics chips themselves could actually do everything by themselves, but in order to do that the data would need to be passed via the PCIe bus - that solution would reduce performance for all modes, alternatively they could make board level changes, but then that would render current boards useless; the compositor allows them maximum performance without rendering currently sold boards useless.

So, the "only way" to get data "into" the GPU is via PCI-e? I mean ATI could do what they wanted (keep existing boards, maintain performance), if they pumped the output from the slave card via DVI, "directly" into the GPU.

If the GPU could in fact do all of the blending operations itself, it seems like a waste to add a compositor chip. Again, this is unless there just is no way to get data "into" the GPU other than PCI-E.

If that's the case, I would expect ATI to eliminate the compositor down the road, and incorporate a more traditional interface directly into the master GPU.
 
DaveBaumann said:
You aren't sending subsample level data. Each graphics chip does its FSAA resolve and then the two display resolution images are blended on the master board.
Is that true for the other Crossfire AA modes, too?

Do I understand you correctly that ever chip blends its own samples gamma-correctly, then sends them to the compositor, which again, (gamma-correctly?) blends these input pixels to get the final pixel color?

Wouldn't that result in a lower quality AA output compared to a single GPU doing the same level of AA? (think: 8x Crossfire AA vs. a theoretical R3xx/R4xx with the ability to do 8x MSAA)
 
Joe DeFuria said:
So, the "only way" to get data "into" the GPU is via PCI-e? I mean ATI could do what they wanted (keep existing boards, maintain performance), if they pumped the output from the slave card via DVI, "directly" into the GPU.

If the GPU could in fact do all of the blending operations itself, it seems like a waste to add a compositor chip. Again, this is unless there just is no way to get data "into" the GPU other than PCI-E.

If that's the case, I would expect ATI to eliminate the compositor down the road, and incorporate a more traditional interface directly into the master GPU.
I doubt that the DVO port of the chip, where usually the external TMDS is attached to and I think the compositor sits now, was ever designed to take input and make it available to the rendering core, so this limitation is quite logical.

As for your suggestion, once you start incorporating more capable interfaces directly on the GPU die, you could simply go the whole (nvidian) way and add a direct GPU interlink.
 
It seems to me that AA filtering is associative, i.e. it doesn't matter if you make 8x by blending 1x eight times or 2x four times, or 4x twice.

It only matters that each separate AA sample is correctly located in the sparse-sampling grid.

Jawed
 
incurable said:
As for your suggestion, once you start incorporating more capable interfaces directly on the GPU die, you could simply go the whole (nvidian) way and add a direct GPU interlink.

Which they should...if the GPU already has the hardware it needs to do whatever blending....and IF a "direct GPU interlink" doesn't mean "you need practically identical cards."

I can clearly understand for this round (and perhaps the first R5xx round, depending on when ATI started really considering implemetning dual board set-ups)... that they want to leverage already existing cards, and so you have this master - slave setup.

But going forward, I think the master / slave (having extra SKUs) is not the right way to go.

As long as you can still connect to "disparate" cards (not the same brand, or even the same pipes), I would think ATI desires to get rid of the "master" SKUs.
 
Joe DeFuria said:
So, the "only way" to get data "into" the GPU is via PCI-e? I mean ATI could do what they wanted (keep existing boards, maintain performance), if they pumped the output from the slave card via DVI, "directly" into the GPU.

If the GPU could in fact do all of the blending operations itself, it seems like a waste to add a compositor chip. Again, this is unless there just is no way to get data "into" the GPU other than PCI-E.

If that's the case, I would expect ATI to eliminate the compositor down the road, and incorporate a more traditional interface directly into the master GPU.

PCIe is one way – and it could be the case that disused PCIe lanes could route data on future boards (I’m not sure about that). Of course, the other method to get data into a chip is to use the video input port (the input that’s used for Video-In), and this could be the method that NVIDIA are using.

Anyway, I tried to drill down on whether they composite chip will be used in the future and the reply was “it allows for more operationsâ€￾ so I really don’t know if future solutions use this or not.

incurable said:
DaveBaumann said:
You aren't sending subsample level data. Each graphics chip does its FSAA resolve and then the two display resolution images are blended on the master board.
Is that true for the other Crossfire AA modes, too?

Do I understand you correctly that ever chip blends its own samples gamma-correctly, then sends them to the compositor, which again, (gamma-correctly?) blends these input pixels to get the final pixel color?

Wouldn't that result in a lower quality AA output compared to a single GPU doing the same level of AA? (think: 8x Crossfire AA vs. a theoretical R3xx/R4xx with the ability to do 8x MSAA)

The individual graphics chips will always do their own FSAA resolve, but its only modes that go beyond the native capabilities of a single board that will use “Super AAâ€￾, so for 6x and below each chip is rending portions of the screen (tiles or split) as 6x or less, resolving it and then the two images are joined by the compositor, so there is no change in relation to how a single chip is doing it. Even with Super AA, because all the samples have the same weight, blending down on each board and then blending the resultant images would produce the same value as a single chip doing this level of AA.
 
Joe DeFuria said:
But going forward, I think the master / slave (having extra SKUs) is not the right way to go.

As long as you can still connect to "disparate" cards (not the same brand, or even the same pipes), I would think ATI desires to get rid of the "master" SKUs.

I agree. I don't buy the "wasted" transistors story.

What about all the wasted transistors in the vertex shading hardware. It's commonly acknowledged that ATI cards are over-specified in this respect.

Jawed
 
incurable said:
Is that true for the other Crossfire AA modes, too?
Yes, each card does the AA downsampling for its "share" and outputs antialiased image data via DVI.

Do I understand you correctly that ever chip blends its own samples gamma-correctly, then sends them to the compositor, which again, (gamma-correctly?) blends these input pixels to get the final pixel color?

Wouldn't that result in a lower quality AA output compared to a single GPU doing the same level of AA? (think: 8x Crossfire AA vs. a theoretical R3xx/R4xx with the ability to do 8x MSAA)
If the blending of both frames is also done gamma-adjusted, there would only be a very very slight difference due to precision loss, because intermediate results are clamped to 8 bit.
 
Demirug said:
The Composition Engine is a FPGA Chip this make it very flexibel but not nessary fast.
I'm not sure of the exact performance numbers from the Xilinx Spartan FPGA, but it does seem a potential problem.

Joe DeFuria said:
But going forward, I think the master / slave (having extra SKUs) is not the right way to go.
I agree. It's a bolt-on kludge for the current gen. I'd prefer a > 20 lane PCIE chipset solution, so that each PEG interface had a full 4GB/s available for XFire comms. SLI2...?

Does the Sil61161/2 I/O pair on the master mean that composite res is limited to 1600x1200?
 
Xmas said:
Do I understand you correctly that ever chip blends its own samples gamma-correctly, then sends them to the compositor, which again, (gamma-correctly?) blends these input pixels to get the final pixel color?

Wouldn't that result in a lower quality AA output compared to a single GPU doing the same level of AA? (think: 8x Crossfire AA vs. a theoretical R3xx/R4xx with the ability to do 8x MSAA)
If the blending of both frames is also done gamma-adjusted, there would only be a very very slight difference due to precision loss, because intermediate results are clamped to 8 bit.
Thanks for the explanation, and not just to you, but to Jawed and Dave, too!

(I guess I should stop shooting from the hip merely based on a gut feeling and actually think about the stuff I'm trying to understand. ;))
 
Back
Top