Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
What makes you think it even needs to? The 1MB texture cache of the GC&wii is an outdated design. Current GPUs texture dozens of times more effectively than these crappy old chips (counting low, no doubt.)

You have to keep a low level compatibility.

So it means that the parameters of the flipper can be used as a starting point of the new GPU.
 
You have to keep a low level compatibility.

So it means that the parameters of the flipper can be used as a starting point of the new GPU.
There is no low level compatibility anyway. Any modern GPU is quite different than the Gamecube/Wii GPU.
I think the easiest way is to simply ignore the old 3 MB eDRAM for the frame/Z buffer and textures and just place everything in the normal RAM and let the higher mem bandwidth together with the texture and ROP caches handle it. I have no doubt it will be fast enough. One should not make it more complicated than it is. But I wouldn't be completely surprised if the GPU has again same eDRAM at its disposal, making it a moot point of discussion either way.
 
Last edited by a moderator:
There is no low level compatibility anyway. Any modern GPU is quite different than the Gamecube/Wii GPU.
I think the easiest way is to simply ignore the old 3 MB eDRAM for the frame/Z buffer and textures and just place everything in the normal RAM and let the higher mem bandwidth together with the texture and ROP caches handle it. I have no doubt it will be fast enough. One should not make it more complicated than it is. But I wouldn't be completely surprised if the GPU has again same eDRAM at its disposal, making it a moot point of discussion either way.

It's actually not quite that simple, it all depends on what was exposed to the API's, and before anyone asks I honestly don't remember outside the the command list format being documented and having to manage dataflow to the GPU explicitly.
It could well be that the GPU has a legacy mode, certainly wouldn't be the first piece of console hardware to do that. The fact that the GPU has EDRAM probably makes the texture cache a moot issue though.
 
The trouble is the latency doesn't decreased as much as the speed
There's no need to match the latency of wii memory access to successfully emulate the hardware. Programmers don't count clock cycles on a per-frame basis anymore when writing games. They haven't done that since the SNES era.

So the bandwidth of the edram 125 gigabybe/sec.
Lol, that math is seriously wacked.
 
How can you emulate latency?
The numbers you've given say that Wii is slower than other eDRAMs, so it wouldn't be an issue. But ns of latency aren't a problem anyway, and graphics latencies are well designed around using caches. Graphics tasks tend to be linear memory accesses so the delay in starting a read/write is immaterial next to the rate you can read/write.
 
How can you emulate latency?
You simply don't. Why should you? All you care about is that you can complete the frame in the necessary time and that it renders correctly. You don't have to do it the exact same way as the Wii did. So if the access to the textures takes a bit longer (what is not clear at all), you can compensate by allowing much more concurrent texture accesses. We simply don't care how it is done if it is fast enough in the end and doesn't break the rendering.
 
if you say that they will do two memory read in in one emulation cycle then the expected bandwidth can drop to 62.2 GB/sec.

But I don't think that they can do it within one cycle.4 ns is way to close to the 2 ns to be able to grant 100% 1 read + 1 write within the emulation time frame.

I'm the only one who thinking in flow?
Just don't work on 4 pixels in parallel but on a few thousands and the accesses can take almost ages while one would still be faster in the end. Current GPUs are masters at latency hiding compared to the ancient Hollywood in the Wii. You think way too low level.

In the distant past, the latency had to be low. Otherwise you would have stalls which reduced the throughput. Today, one can reach the (much higher) maximum throughput even with high latencies by means of exploiting more paralellism of the workload (of which there is plenty, that's why one can build so wide GPUs).
 
Last edited by a moderator:
Current GPUs have what...15.000+ threads flying around the GPU at various stages of completion all at the same time, and one or a few thousand clocks latency to graphics memory. It's not neccessary or even possible to achieve the exact same latencies as in the wii.
 
How can you emulate latency?
Agree on definitions of emulation and simulation and then start discussing back-compat. If you expect Nintendo recreating Wii HW with cycle-precision than you're wrong. They care about output looking good enough, not about everything running frame-perfect with Wii. And since Wii wasn't an open platform there's a limited number of titles that you have to get working on Wii U and you can afford hand-tweaking the Wii-to-Wii-U translation per application. Stop arguing that HW has to be in any way, on any level compatible with Wii. It doesn't. The closer it is to the original the easier it gets to translate workload but you can always throw more cycles on stuff (if you have some). They don't have to be like MAME. And the more abstract the programming was on Wii, the more leeway they have (didn't code for Wii so don't know).
 
VLIW5 had shipping configurations with 40 SPs per SIMD.
RV610 and RV620 had even just 20 SPs per SIMD (4x 5SPs) and 2 SIMDs (so 40 SPs in total). But the first VLIW5 iteration was quite a bit different than the later VLIW generations in this respect (number of TMUs scaled with the size of the SIMDs, not with the number of SIMDs as later).
 
The above calculation (125 gbyte/sec bandwidth) is the bare minimum to be able to emulate the WII without artefacts on the WII U.
Xenos eDRAM is 256 GB/s. I doubt Wii U will be struggling for BW.

The GPU is good to hide the latency of the slow main memory, but in the case of the emu as a rule of thumb it has to be way faster than the emulated platform.
And when you have 2 ns latency instead of a 4 ns then it is not way faster.
:???: 2ns latency is faster than 4ns latency. It's also immaterial. The reason you need emulating hardware to be faster than what it's emulating is to turn instructions for different hardware into equivalent instructions on the emulating hardware. When it comes to data access, you only need the emulating hardware to have access to the data to work on at a suitable speed. That data moving system can be completely different architecturally.
 
The above calculation (125 gbyte/sec bandwidth) is the bare minimum to be able to emulate the WII without artefacts on the WII U.

As Shifty pointed out, if the WiiU comes with eDRAM, it will be faster than the eDRAM that the Wii shipped with. Come on, Nintendo has specifically addressed backwards compatibility at the hardware design stage. It is not something they try to achieve in software as an afterthought. You can safely assume that it will work as well as can be expected, i.e. there will probably be corner cases in specific games where the lack of identical hardware will cause glitches, but generally it will work just fine.

It's amazing that the console will go on sale now and we still don't have die shots even of main memory RAM.
 
You can't forget that the wii memory system is a big edram.
So to be able to give bw compatibility with 32 megs of edram, then they need 24 megs for the main memory (65 bit ) 2 megs of frame buffer (128 bit with) 1 megs of texture buffer(512 bit with)
The above calculation (125 gbyte/sec bandwidth) is the bare minimum to be able to emulate the WII without artefacts on the WII U.

Less than that should mean a z buffer comparison can not happen due to a texture reading.

The GPU is good to hide the latency of the slow main memory, but in the case of the emu as a rule of thumb it has to be way faster than the emulated platform.
And when you have 2 ns latency instead of a 4 ns then it is not way faster.
Wii's main memory bandwidth is really low by today standards, so no problem at all.
One of AMD's quad TMUs has already a 512bit interface to its L1 (64bytes/cycle). That gets you covered in the texture departement. And 4 of AMD's color ROPs can access 256 Bits/cycle from their caches (Z bandwidth comes on top of that).

That means a simple Cedar or Caicos (2 SIMDs, 8 TMUs 4 ROPs) has already twice the bandwidth/clock available of what you estimated for the Wii GPU. And that while running at a significanly higher frequency.

There is no performance problem at all to emulate the rendering of the ancient Wii graphics on a more modern architecture.
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top