So does the 750 CL's paired singles allow what is effectively 2 x 32-bit int simd?
Thanks again.
So the performance difference for integer workloads is potentially even greater than for floating point workloads. So much for Nbench based beard stroking then.
After 7+ years of vectorising as many performance critical tasks as possible, I guess it makes sense that developers looked at the Wii U with it's float-only paired-singles and groaned.
One would hope the CPU die contains some mechanism to share data between CPU caches. Otherwise you'll be doing a round-trip through that not-particularly-awesome 12GB/s main RAM, which'll be hammered by GPU accesses at the same time.
There was at least one dev who gave an interesting insight into Wii U optimization.. he basically said to go for L2 cache locality or go home. On aggregate Wii U has substantially more L2 cache than XBox360. Sizing out algorithms to try to fit a particular cache size is often not very high on a typical developer's check list, although in the case of XBox360 it was probably pretty vital to be as cache resident as possible since the main RAM latency was so bad. It could be that the latency isn't that great on Wii U either, what with it going through a memory controller on the GPU. So that means resize your stuff to try to get a better hit-rate on Wii U's L2 if you can. But the weird 2MB + 512KB + 512KB local L2 structure is very different from shared 1MB, and I'm sure that presents some of its own challenges (there could be a substantial penalty for sharing data structures between cores now, it could even be as bad as going out to memory or even worse, depending on how they did it)
I guess cash or cache it all works the sameCan you ever have too much cache?
Yes. The more you have, the slower it is to access (higher latency). A balance has to be struck between quantity and speed. To gain the benefits of different quantity and speed combinations, different cache layers are used - instruction and data caches, L1, L2, and sometimes L3, each smaller and faster than the next (this concept extends to RAM and storage, all increasing capacity at a reduction in speed).Can you ever have too much cache?
Ah, right. I forgot; my bad.Espresso is 1 x 2MB and 2 x 0.5 MB for a total of 3 MB.
I wonder what real performance difference there is between half and one meg of cache, considering say an intel core i-series CPU has only 256k L2 (although paired with a L3) and manages quite well. Perhaps only on the order of a few percent in most cases? Making cache four times larger for one out of the three cores is odd though, but then again SO MUCH of the wuu is just fricken odd, so better not get hung up on this particular oddity, eh?and so that you could run any task on an core with the same level of performance (greater flexibility in scheduling tasks).
I wonder what real performance difference there is between half and one meg of cache, considering say an intel core i-series CPU has only 256k L2 (although paired with a L3) and manages quite well. Perhaps only on the order of a few percent in most cases? Making cache four times larger for one out of the three cores is odd though, but then again SO MUCH of the wuu is just fricken odd, so better not get hung up on this particular oddity, eh?
Cache size seems to affect power usage as well. (Reason why nvidia proposed additional 1KB L0 cache.)Yes. The more you have, the slower it is to access (higher latency). A balance has to be struck between quantity and speed. To gain the benefits of different quantity and speed combinations, different cache layers are used - instruction and data caches, L1, L2, and sometimes L3, each smaller and faster than the next (this concept extends to RAM and storage, all increasing capacity at a reduction in speed).
Gamecube only had 256k cache and ~2.6GB/s main RAM bandwidth (although ridiculously low latency, at least from the GPU side since that's where the memory controller is), it did fine. 512k caches get 90+ percent hitrate typical case, so I have a hard time seeing how diff between 512k and 2MB would really be all that monstrous. Must be something else going on as well methinks if your source is correct.If the developer who swears that they only got acceptable performance by optimizing for Wii U's cache sizes is to be believed (sorry I don't remember the source) then I doubt the difference between 512KB and 2MB is negligible.
L2 wouldn't have helped GPU performance; without eDRAM the system would have choked on that pathetic 12GB/s main RAM B/W.It makes you think, if half as much L2 cache would have been about as good then they really shouldn't have bothered going with eDRAM on this.
You really think nintendo's dumb enough to be suckered into a deal like this? They've been building consoles for thirty years, you'd think they would know better than THAT at least.But then again, I don't know if there was a great technical reason for this, they could have just been suckered into whatever IBM wanted to sell them.
Why not? IBM has advanced eDRAM process, and they're the developer of the CPU core, so I would think they'd be ideal for ninty to work with. Especially since they probably have spare fab capacity too and might be able to offer a good price, since they only make chips for big iron enterprise market, which isn't all that big despite the moniker.Ideally they wouldn't be getting IBM to fab this at all
Gamecube only had 256k cache and ~2.6GB/s main RAM bandwidth (although ridiculously low latency, at least from the GPU side since that's where the memory controller is), it did fine. 512k caches get 90+ percent hitrate typical case, so I have a hard time seeing how diff between 512k and 2MB would really be all that monstrous. Must be something else going on as well methinks if your source is correct.
L2 wouldn't have helped GPU performance; without eDRAM the system would have choked on that pathetic 12GB/s main RAM B/W.
You really think nintendo's dumb enough to be suckered into a deal like this?
Why not? IBM has advanced eDRAM process, and they're the developer of the CPU core, so I would think they'd be ideal for ninty to work with. Especially since they probably have spare fab capacity too and might be able to offer a good price, since they only make chips for big iron enterprise market, which isn't all that big despite the moniker.
720P with FSAA is 14/28 MB for 2x/4x. I'm not sure what Megafenix is talking about, because it is the same everywhere, but you would have to tile with FSAA.720P 32-bit color + Z needs only 7.2MB regardless of the platform. No need to tile this rez on the 360, it fits in eDRAM just fine.