Wii U hardware discussion and investigation *rename

Status
Not open for further replies.
Is that actually confirmed or is that still speculation?


confirmed/speculation i guess. i mean i guess you cant believe everything but that was the word around town. even IGN's(yes i know IGN) they made their test Wii U dev kit with a 4850. i believe thats wha nintendo was telling dev like the final GPU will be as powerful as the 4850 now remember this...The E6760 scores a 5870 in 3D Mark Vantage which is higher than the HD 4850 despite the card only running at 35 watts! so take it for what its worth i guess.
 
confirmed/speculation i guess. i mean i guess you cant believe everything but that was the word around town. even IGN's(yes i know IGN) they made their test Wii U dev kit with a 4850. i believe thats wha nintendo was telling dev like the final GPU will be as powerful as the 4850 now remember this...The E6760 scores a 5870 in3D Mark Vantage which is higher than the HD 4850 despite the card only running at 35 watts! so take it for what its worth i guess.

That kind of invalidates the whole thing doesnt it? Lol..i mean you can't use a 4850 off the shelf GPU in a PC and expect to have the same results comparable to a closed box with plenty of other variables even if they had the same GPU. It just shows a clear lack of technical knowledge on their part. Not saying i know much about tech myself though.
 
That kind of invalidates the whole thing doesnt it? Lol..i mean you can't use a 4850 off the shelf GPU in a PC and expect to have the same results comparable to a closed box with plenty of other variables even if they had the same GPU. It just shows a clear lack of technical knowledge on their part. Not saying i know much about tech myself though.


YES exactly they stated that before their test. obviously whatever is inside the Wii U is custom and not the same as off the shelf.
 
I don't need to be told it's literally 3 single core processors, please don't insult my intelligence :p You're making it sound as if threads are either dependent or they aren't, and you get either full parallel utilization or full serial utilization.

Go run some of your favorite PC games and log CPU utilization over some period of time.
Okay, in saying 'multithreaded engine' I wasn't thinking of a PC targeting a range of machines - should have said 'console multithreaded engine'. In a console where you know you've got three cores, I don't see why devs wouldn't keep them all busy as long as they have work to do. It'd all come down to dependencies as you say. I guess a dev can set us right there.
 
From the outside GDDR3 and DDR3 behave very similarly, almost identical, from a logical perspective. At 800 MHz, both GDDR3 and DDR3 have a CL (cas latency) of 11 cycles, while RCD (ras to cas delay) is 12 cycles for GDDR3 vs 11 cycles for DDR3. So the effective latency of a read is almost identical, 11+12 vs 11+11.

The main logical difference is that GDDR3 is a 4-bit fetch and DDR3 is an 8-bit fetch. This determines the minimum burst size you'll get for a read or write. This is important for pushing up to higher IO bandwidths but has almost nothing to do with latency. At their core, DRAMs have not been speeding up anywhere near as much as the IO speeds. To get more and more bandwidth the interface between IO and core has been made wider and wider. GDDR3 at 800 MHz runs the DRAM core at 400 MHz and fetches 4-bits in parallel. This gives you 1600 Mbit/sec. DDR3 at 800 MHz runs the DRAM core at 200 MHz and fetches 8-bits in parallel. It gives you the same 1600 Mbit/sec. In both cases it fetches bits in parallel at lower speed and then serializes them at a higher speed. The latency of making that fetch is roughly the same. This is also why CL in cycles goes up very quickly as IO speed goes up - the core is running much slower than the IO.

The core structures of the DRAMs in all of these (DDR2, DDR3, DDR4, GDDR3, GDDR5) are the same. The differences are in the IO area. The wider the core interface, the higher you can push the IO speed. So DDR2 and GDDR3 are 4-bit, while DDR3 and GDDR5 are 8-bit. Then when you get to the electrical interface between two chips you expose a larger number of differences. GDDR3 uses a 1.8-2.0V IO with pull-up termination at the end point and a pseudo open drain style of signaling. It also uses single ended uni-directional strobes. It is a good interface for a controller chip and a couple DRAMs. DDR3 uses 1.35-1.5V for IO with mid-rail termination at the end points with termination turned on/off by a control pin. It has bi-directional differential strobes. It is better suited for interfaces with more DRAMs (like a DIMM).

GDDR3 and GDDR5 use signaling designed to go a lot faster. They wind up limited by both the DRAM core speed and the IO speed. DDR2 and DDR3 use signaling designed to handle more loads. They wind up limited by IO speed but not by DRAM core speed.

At this point if you are making something at the upper end of the GDDR3 speed range there is almost no reason to use GDDR3 over DDR3. They will have very similar performance and latency. Since GDDR3 is being phased out it is relatively expensive. DDR3 is available in huge quantities because it is the PC main memory, and this drives down prices. The one advantage GDDR3 has is that it comes in x32 packages. If you wanted to keep the PCB small, you might opt for GDDR3. This also works against you because the core of the DRAM remains the same size. 2 x16 DDR3 modules can give you twice the memory of 1 x32 GDDR3 module. If there were no new Xbox coming out in the next couple years, you would definitely see an updated 360 using DDR3 rather than GDDR3 just because of the relative price of the DRAMs.

Thank you for the very informative post. It's much more detailed than the information I could quickly find on the technologies, including no explanation as to why the latency would be worse.

I do however have a couple of questions..

1) The GDDR3 latency on XBox 360 is awful (PS3's XDR latency is also very bad but Rambus has always had higher latency RAM), but you seem to be saying that the technology has no latency disadvantage vs DDR3 (aside from a larger minimum burst size which I agree barely effects latency). Yet we can see that the absolute latency of main RAM on XBox 360 (> 150ns) is much worse than a high end x86 desktop with DDR3. Would you say this is purely down to having to go through the GPU along with a potentially inferior memory controller and has nothing to do with the memory itself?
2) You don't mention GDDR5; do you have any input on what the (absolute, not in clock cycles) latency is like there vs DDR3? I've seen various reference that suggest it's higher latency than DDR3, for instance on Xeon Phi where it should be paired with a high quality memory controller.
 
Go run some of your favorite PC games and log CPU utilization over some period of time. You will be hard pressed to find a game which sustains the same utilization over 3 or more threads. Game threading just doesn't tend to scale that evenly, even for fairly synchronously threaded loads. And 3 threads is kind of an awkward way to split loads for synchronous stuff.

It's worth noting that the lack of command lists in PC gaming is a confound in this comparison. Even the best threaded code can still be held back by the number of draw calls that the CPU can process and leaving the secondary cores without work.

Crysis was a notable example of this. The 'object detail' setting could push enough draw calls that overall CPU utilization on the Core 2 architecture would lower rather than increase.
 
It's worth noting that the lack of command lists in PC gaming is a confound in this comparison. Even the best threaded code can still be held back by the number of draw calls that the CPU can process and leaving the secondary cores without work.

Crysis was a notable example of this. The 'object detail' setting could push enough draw calls that overall CPU utilization on the Core 2 architecture would lower rather than increase.

Wasn't that what DX11 multithreaded rendering was supposed to help with? AFAIK, Civilization 5 is the only game that supports it, but it was shown to get better performance once Nvidia enabled it in their drivers
 
Wasn't that what DX11 multithreaded rendering was supposed to help with? AFAIK, Civilization 5 is the only game that supports it, but it was shown to get better performance once Nvidia enabled it in their drivers

Yes. AMD had yet to enable this last time I knew.

Edit: It's 'sænskur víkingur' for the correct nominative ;)
 
Last edited by a moderator:
Heh, it's not all roses for us either. I had problems finding the correct possessive for my niece's name on a Christmas card earlier today.

I felt stupid.
 
Isn't it possible to teardown the GPU to find out exactly what it is?
Like was done with the A6 processor in the iPhone 5.

I guess it's just that tech sites like Anandtech, iFixit, Chipworks etc are not as interested in the internals of the Wii U compared to the latest iDevice and don't think the effort is worth it.
 
What can they do exactly, stare at a square that says "this is where the GPU is"? or is it possible to trim carefully the top and get an actual die shot.
 
Mario, even at 550 mHz, would be considerably faster than Xenos. Even with 1/4 of the GPU removed it would still be better.

The Wii U has shown nothing so far to indicate it is better than RV730 (or even as fast as that).
 
55nm, "a bit unlikely"?.. Its a 40nm chip, 137mm2, should be somewhere around 1 billion transistors. The fact that developers haven't gotten the most out of it yet doesn't change that.
 
Last edited by a moderator:
Thank you for the very informative post. It's much more detailed than the information I could quickly find on the technologies, including no explanation as to why the latency would be worse.

I do however have a couple of questions..

1) The GDDR3 latency on XBox 360 is awful (PS3's XDR latency is also very bad but Rambus has always had higher latency RAM), but you seem to be saying that the technology has no latency disadvantage vs DDR3 (aside from a larger minimum burst size which I agree barely effects latency). Yet we can see that the absolute latency of main RAM on XBox 360 (> 150ns) is much worse than a high end x86 desktop with DDR3. Would you say this is purely down to having to go through the GPU along with a potentially inferior memory controller and has nothing to do with the memory itself?
2) You don't mention GDDR5; do you have any input on what the (absolute, not in clock cycles) latency is like there vs DDR3? I've seen various reference that suggest it's higher latency than DDR3, for instance on Xeon Phi where it should be paired with a high quality memory controller.

1) The latency of the DRAM itself is only a portion of the total latency. In the example of 800 MHz DDR3, the first piece of read data comes out of the DRAM 11+11 cycles after you send the first part of the read command. Add in 4 cycles for the rest of that burst and you get 26 cycles * 1.25ns = 32.5ns for a read. That's just for the command to the DRAM and the read return. The rest of the time is how long it takes a read request to get from the CPU to the memory controller to the DRAM and for read data to make the return trip. In a high end x86 desktop chip the memory controller is a part of the CPU. The path to/from the controller is made as short as possible and can run at high clock speeds. In the 360 the CPU is a separate chip from the GPU chip which has the memory attached to it. It takes additional time for the CPU to send the request from one chip to the other, for the request to go from that interface to the memory controller on the GPU and the reverse. This can add quite a bit of latency. Any system where the memory controller is on the same chip as the CPU can have much lower latency.

2) GDDR5 uses similar signaling to GDDR3. Pseudo open drain and pull up termination, but at lower voltage (1.2-1.5V rather than 1.8V). In order to push the interface faster there is additional overhead on the sending and receiving sides and logical changes to the interface. That overhead adds to the base latency. The DRAM core has roughly the same latency as DDR3 but the GDDR5 IO layer imposes that extra latency penalty. For that extra cost you gain the ability to send data a lot faster. As a result, GDDR5 latency is a bit higher than DDR3 latency in absolute terms, but it's not a huge difference.
 
Mario, even at 550 mHz, would be considerably faster than Xenos. Even with 1/4 of the GPU removed it would still be better.

The Wii U has shown nothing so far to indicate it is better than RV730 (or even as fast as that).

But it would actually cost more to do something like that than it would to shrink an RV730 to 40nm. There's absolutely no reason for it to be 55nm. I know how you plan to respond to this, so instead please give a reason.
 
Status
Not open for further replies.
Back
Top