Each chip consisting of say 24 super scalar ALU's, 2 TU's, and 1 RBE?
But how connect them? Three-digit GB/s numbers are not easy to reach on a package.
I do not think it will be more than two GPUs, which work in an enhanced Crossfire.
Each chip consisting of say 24 super scalar ALU's, 2 TU's, and 1 RBE?
I think a package is the perfect place to put "high bandwidth". I assume that a package can be made multi-layer.But how connect them? Three-digit GB/s numbers are not easy to reach on a package.
The Windows Vista operating system will include native support for multiple graphics accelerators through an ATI sponsored technology called Linked Adapter. Linked Adapter will treat multiple graphics accelerators as a single resource (GPU and memory), and working together with parallel engine support, schedule the most efficient workload possible across the graphics processors and graphics memory pool to maximize performance.
I think parts of it could be done that way, but I think it would be best if the reads to the register file and caches didn't require jumps to other chips.I must admit this is very far from my area of expertise, and at risk of making a complete fool out of myself, wouldnt it be possible to sort of just hide the multiple cores, so from a user/software point of view it would look like a normal 1-chip solution, while on the inside it would be multiple cores.
Sort of like if you replace the Stream Processing Units Clusters in the picture below, with dedicated chips.
This is much like my original conception of R700, from when we first heard about it being multi-chip.but maybe, as an extension of your L2 sharing idea, we can think of the R700 as a single processor on multiple dies? like a 386+387, a voodoo 1 or a pentium pro (with one or two L2 dies). one of the dies would be the master and speak to the PCIe bus and you'd effectly have a single GPU software wise. The on-package interconnects would have to be really fast (how is that made on the pentium pro? or the L3 dies on POWER chips?)
Is that feasible and could they still easily be able to use a single die for midrange boards.
Yeah Truform. Or the fog ALU.That's why its no guarantee that DX11/12 features implemented now in the R600 will actually be the optimal way to implement DX11/12 features when the market is actually ready for them. You might find ATI jettisoning the work they did in the R600 by the time the R800 rolls along.
This feature should definitely stay that way! hardware MSAA resolve has to die, we need shaders based resolves, we need to know where subsamples are and we also need to know if a pixel has been fully compressed (all subsamples are equal) or not.Or, MSAA resolve, which is no longer implemented in hardware.
I haven't noticed any sign of this coming in D3D. Presumably it's right at the back of the queue behind all the other MSAA-related stuff.we also need to know if a pixel has been fully compressed (all subsamples are equal) or not.
It can support multiple surface views. The limitation is binding them to the pipe as input and output simultaneously, which isn't supported (for obvious reasons).D3D10 can't support multiple views on the same resource at the moment, though, can it? So that'd be a way off I suppose.
I don't see why it couldn't be exposed directly as a HLSL function, since querying MSAA sub-samples is also pretty "custom". Just a simple function that takes a location and returns a boolean.I haven't noticed any sign of this coming in D3D. Presumably it's right at the back of the queue behind all the other MSAA-related stuff.
Could this be realised as an alternate resource view? e.g. two concurrent views of an MSAA'd buffer: all samples as one view and compression flag as another view (1 bit per "element", whoopee!). D3D10 can't support multiple views on the same resource at the moment, though, can it? So that'd be a way off I suppose.
Umh..since depth is the only supersampled information that you get while using multisample I think you really don't want to check it to understand if your pixel has been 'compressed' or not. But maybe you were referring to something else..That said upon further reflection (since our last discussion of this) I think checking whether the sub-sample depths are equal is quite sufficient for most cases (excepting perhaps some issues with EQUAL depth functions... not sure though). .
I call it compression but I never referred to it as tile comrpession, but as pixel compression.I thought about this a bit more, and it seems to me that compression cannot *guarantee* you that a pixel's samples are not identical when it is not compressed. This is because the compression works at a tile level
Yes, if you only have a few samples per pixel to retrieve such a compression flag wouldn't be probably any faster than manually reading and checking the samples, but if you have many subsamples per pixel (8 or more in the future) you really don't want to sample all of them and check if they're all equal.Furthermore, It seems to me that if, for a reasonably sized tile, only *two* (or at least very few) samples over the entire tile were identical, then the compressed version of the tile might be larger than the uncompressed version of it! This is implementation dependent, of course, but it shouldn't be hard to see that it might happen.
Right yeah, I guess it wouldn't work properly if reading from the depth buffer (which apparently you can't do right now anyways). However it should work properly with deferred rendering if you write out position/depth to the G-buffer, which you currently have to do anyways because of the aforementioned limitation about reading MSAA depth buffers (although Humus says this limitation is going away in DX 10.1 - will be interesting to see how the sampling interface is specified).Umh..since depth is the only supersampled information that you get while using multisample I think you really don't want to check it to understand if your pixel has been 'compressed' or not. But maybe you were referring to something else..
Hi;
after reading this thread I'm wondering why everyone assumes that R700 is an two-chip architecture.
Why no four-chip architecture?
IMHO it would be far more logical to have an 4 chip highend implementation; a 2 chip midrange implementation and a single chip low-end/mainstream implemenation of the same basic architecture.
Sounds rather 3dfx-ish to me.
If I remember correctly, back in the days of VSA-100, multi-chip solutions were criticised for not being able to compete on cost with larger single chip cards.
There will be a touch of irony if we now find the IHVs moving back to multi-chip solutions because they are more economically feasible to produce than the huge chips which have been needed to increase performance in recent times!