It can true, but there are few conditions:
1. Bart should be at least as fast as Cypress per clock. That would indicate 32 ROPs. How big would it be then?
2. On the assumption, that per-clock gaming performance is the same, the Bart-X2 should run at 870MHz to offer 20% performance over Hemlock.
3. TDP of Bart have to be significantly lower than TDP of Cypress to stay under 300W at 870MHz for X2 config.
I can't imagine two GPUs with 80 TMUs and 32 ROPs running at 870MHz to stay in 300W limit.
Ad.1 You make a good point about 32 ROPs and the space. Given a few rumours I've seen floating around: AMD wanting to up what is considered mainstream graphics, 6770 being clocked around 700mhz and 6770 being around 5850 performance or close, 32 ROPs seem like a necessary to accomplish that, especially when AA appears in (almost
)all new games .
Ad.2 That 700mhz value sound either like a false romour or some deliberate crippling. If Juniper can be clocked up to 1GHz and Cypress, let's say 900MHz, then I don't see any reason why Barts wouldn't be able to clock similary.
Ad.3 That's indeed a necesity and I don't know how much this is possible. We can only hope the experience AMD has with 40nm graphics will allow them to pull of even more efficient designs we see now.
Add following to the mix: Sideport for better crossfire scalling (anybody got an idea how much it could be?) and the possibility to install GDDR5+ memory for more bandwidth, while single Barts cards will most liekly keep regualr GDDR5. Maybe overall there won't be such need for high clocks?
I've been over this subject in a lot of detail, describing a scenario where all lanes work together to compute those functions that used to be performed by T.
Thanks, very interesting read.
Made me realize a few things I wasn't aware off. Those Rightmark shaders are going to be an interesting benchmark for the new architecture, most likely those situations where we will see a slowdown.
I should add I'm a little sceptical over the feasibility of this (inner workings of the serial math operations give me pause for thought) - though not as sceptical as some people back then. Also there are other possibilities with 4 lanes.
And there's also the question of whether transcendental instructions need to be computed in a single cycle. Related to this is the fact that for the precision required by OpenCL, the conventional single-cycle transcendental unit is of very little use - a much more complex sequence of operations is required.
Taking more cycles to do a transcendential makes sense, if it can be done with one medium sized shader, without the help of supporting lanes. But only if the code is vectorized, so the transcendential is done in all 4 shaders at the same time. If not, then what would happen then? We could use other lanes, but would have to wait for the transcendential to finish if we want to use the result. So major challenge for the compiler there - totally different to Evergreen, right?
One example would be 32bit integer multiplication. In the moment it can only be done by the t unit, but at the same time the other four ALUs can do something else.
If now the 4 remining ALUs need to work together to accomplish a 32bit multiplication, they can't do anything else in the same clock. So while the peak throughput of 32bit integer multiplication stays the same, the throughput with a real instruction mix (with a lot of integer multiplications but also a bunch of other operations) may be quite a bit lower. In the extreme case it may be half the performance.
In case of 32bit mul I am expecting (or at least hoping
)these new medium sized shaders to deal with them on their own. So the other ALU's will be free to do other things. But things might be different for the harder functions...
8 Memory chips, bye bye 384-bit.
I think I am also seeing a 6-pin and 8-pin power connector. Or am I missinterpreting that pin soldering points closest too us?
Might be early silicone, but It doesn't look likely Cayman can be made to run on low enough wattage to run two of them on a
00W board. So BartsX2 after all?