Predict: The Next Generation Console Tech

Status
Not open for further replies.
It has been said only one of those controllers with screen can be attached to wiiu at a time.

Isnt there some video of 4 people playing a split screen game each with a controller?

Say it streams a single image to all controllers, then the controllers chop the quadrant of the image they require.

Then also if the console is dual headed, it could even send different images to each controller and have a different image on the screen.

Still the cost of 4 controllers is going to be high (though some how nintendo seem to be able to sell there wimotes for a crazy price). I bet they get a cheap price on screens these days too.
 
Isnt there some video of 4 people playing a split screen game each with a controller?
I've only seen 4-way split TV with regular wiimotes+5'th person on the new remote. If you have any other images with more than one new remotes attached feel free to share :)
Say it streams a single image to all controllers, then the controllers chop the quadrant of the image they require.
Well, some kind of broadcasting could work I guess. Separate stream with all screens sent to each separate device would be an awful waste of bandwidth.

Though assuming one screen is 800x480 then 30FPS uncompressed image would need roughly around 35MB/s transfer speed (images+sound). It's probably possible to squeeze it down to a few MB/s but having more than one controller around would "fill up" the air really fast.
Then also if the console is dual headed, it could even send different images to each controller and have a different image on the screen.
GPUs in PCs can handle rendering several differetn 3D windows in parallel just fine, I don't think it's any different in consoles.
 
I don't think IBM use EDRAM to work around bandwidth constrain. It seems about cache density in the case of the power7 and low power cache in the power a2.
Here a presentation about the power a2 they explain what they do with the EDRAM in the L2.

As it is off die, it is unlikely to reduce latency. So only bw seems to be the motivation.

L3 edram is done for different reasons. Server workloads are very different from gaming workload.
 
a) Look at the prices of Power7 based systems. There is a reason others aren't using it.
To be fair though, that has more to do with the overall system architectures (and lack of volume), than the chip itself, even though it is clearly more expensive to produce than any PC parts.

b) CPU's run latency sensitive workloads, so enhancing bw for such an architecture instead of gpu's is counterintuitive.
I take it you have an interest in GPUs? CPUs are general purpose computing devices, and they need as much bandwidth as they can get. Preferably all at single cycle latency. Since that isn't possible, compromises have to be made, and the Power7 does what it can. It greatly reduced latency of its L1 and L2 caches, the EDRAM L3 pool is higher latency than its predecessors, but on the other hand offers a much greater pool at 32MB. For main memory it has dual 256-bit buses for just over 100GB/s.
There are a large number of sites that buy these systems precisely because it offers a reasonable memory hierarchy.

The reason CPUs have wimpy memory bandwidth vs. ALU capabilities is rather that bandwidth costs money, particularly when we are not talking GPU amounts of RAM soldered to the PCB.

c) I am hoping that it is single chip system with eDRAM mainly intended for the gpu. That would be cool. :)
Funny, because I hope just the opposite - separate CPU and GPU, probably talking to a single pool of RAM, since the large amount of L3 on the CPU helps decouple it from a main memory channel hogged by the GPU. If the GPU needs an embedded pool of RAM, then that is where it makes the most sense to put it, so if it's not a single CPU/GPU die (and there is nothing implying it is), I think we can assume that the EDRAM is intended primarily for the CPU.
 
I don't think IBM use EDRAM to work around bandwidth constrain. It seems about cache density in the case of the power7 and low power cache in the power a2.
Here a presentation about the power a2 they explain what they do with the EDRAM in the L2.

the total 8MB L2 look tiny on that chip!

as for the derivate for consoles, I believe we can explain the L3. POWER7 has only 64K L1 and 256K L2, so it's similar to sandy bridge and nehalem. even the lowest grade sandy bridge, the pentium G620, has 3MB L3. its gaming performance is high, with artificial crippling (disabled HT)
the Phenom II is quite better at gaming than the Athlon X4 too, even if you can live without the L3 there.

as Entropy says, it's also useful if the GPU is using shared memory, and you have to feed the 4-way SMT cores. so a 8MB L3 that takes little area feels very reasonable
 
the total 8MB L2 look tiny on that chip!

as for the derivate for consoles, I believe we can explain the L3. POWER7 has only 64K L1 and 256K L2, so it's similar to sandy bridge and nehalem. even the lowest grade sandy bridge, the pentium G620, has 3MB L3. its gaming performance is high, with artificial crippling (disabled HT)
the Phenom II is quite better at gaming than the Athlon X4 too, even if you can live without the L3 there.

as Entropy says, it's also useful if the GPU is using shared memory, and you have to feed the 4-way SMT cores. so a 8MB L3 that takes little area feels very reasonable
I'm not stating that the CPU is a derivative of either power a2 or 7 just that IBM uses more and more dram in its processors. My belief is that the Wii U CPU is a "pure" custom chip one may find resemblances between the A2 or 7 but that because the relevant made sense for the design.
An L2 made out of DRAM makes sense as well as the L2. Next power imho will use DRAm for the L2 as the A2 does (which is a newer design by the way).
 
I don't know, L2 edram feels very slow?

that a2 is pretty specific, it's made for embarassingly parallel and lightweight tasks, such as web frontends with thousands users (where you could use a Sun niagara), and other networking workloads.

I imagine we will see a pretty straight derivate of POWER7, the way the Power PC G5 was a POWER4 derivate. Xenon was much more custom but there were timing and other constraints.
with more time and transistor budget, surely you can cut all the enterprise features and fat from the big chip, drop core count to 4 or 3 and call it done, it's a very powerful and power efficient design already.
 
Last edited by a moderator:
Xenon was much more custom but there were timing and other constraints.
with more time and transistor budget, surely you can cut all the enterprise features and fat from the big chip, drop core count to 4 or 3 and call it done, it's a very powerful and power efficient design already.

Xenon was derived from Cell's PPC
 
To be fair though, that has more to do with the overall system architectures (and lack of volume), than the chip itself, even though it is clearly more expensive to produce than any PC parts.
It also has a lot to do with IBM's disregard to cost in it's target market.

I take it you have an interest in GPUs? CPUs are general purpose computing devices, and they need as much bandwidth as they can get. Preferably all at single cycle latency. Since that isn't possible, compromises have to be made, and the Power7 does what it can. It greatly reduced latency of its L1 and L2 caches, the EDRAM L3 pool is higher latency than its predecessors, but on the other hand offers a much greater pool at 32MB. For main memory it has dual 256-bit buses for just over 100GB/s.
There are a large number of sites that buy these systems precisely because it offers a reasonable memory hierarchy.
They moved from sram->edram because for L3, only size matters. They had no qualms increasing latency of L3. The process for CMOS edram is pretty expensive.

The reason CPUs have wimpy memory bandwidth vs. ALU capabilities is rather that bandwidth costs money, particularly when we are not talking GPU amounts of RAM soldered to the PCB.
 
Performance with transparencies seems to make the WiiU demos suffer.


-----

I want to call it Wii Mu... like the greek letter. >_>
 
the total 8MB L2 look tiny on that chip!

as for the derivate for consoles, I believe we can explain the L3. POWER7 has only 64K L1 and 256K L2, so it's similar to sandy bridge and nehalem. even the lowest grade sandy bridge, the pentium G620, has 3MB L3. its gaming performance is high, with artificial crippling (disabled HT)
the Phenom II is quite better at gaming than the Athlon X4 too, even if you can live without the L3 there.

as Entropy says, it's also useful if the GPU is using shared memory, and you have to feed the 4-way SMT cores. so a 8MB L3 that takes little area feels very reasonable

I used to have an Athlon II x4 equipped desktop along with my current Phenom II x4 powered machine, and I could never really tell a difference in games. From online benchmarks, Far Cry 2 was one of a handful that actually saw some real gains from L3, but when L3 and non-L3 equipped systems were running 60+ FPS, it didn't really matter.. Then again, these are PC games that have to be able to run with or without L3 available.
 
They won't be selling controllers without the system. No multiple screen controller scenarios to worry about.

Thanx for the info, because I was thinking of a scenario of up to 4 controllers with displays at the same time maybe could be a nightmare for gpu with (please correct me if im wrong) expectation in a scene with Radeon RV770 or even Redwood "LE ".
 
Last edited by a moderator:
It has been said only one of those controllers with screen can be attached to wiiu at a time.


Thank you for information.

Please let me dream again...Is there any confirmation of the specs of the Wii U supposedly leaked recently in many forum/sites ..power6 4 core 3.5GHz, 766MHz GPU, etc.?
 
Last edited by a moderator:
Thanx for the info, because I was thinking of a scenario of up to 4 controllers with displays at the same time maybe could be a nightmare for gpu with (please correct me if im wrong) expectation in a scene with Radeon RV770 Redwood "LE ".

What? "Redwood RV770"? Those are two completely different chips, ironically the older one (RV770) being vastly more powerful.
 
What? "Redwood RV770"? Those are two completely different chips, ironically the older one (RV770) being vastly more powerful.

My bad ,typo ,escuse me see my previus posts* about expectations RV770 to Redwood ok? ;)

You got a really interesting point, cause the RV770 would be more powerful, but Redwood would have the advantage of DX11 (any significant tesselation capacity for this gpu expected?).


* http://forum.beyond3d.com/showpost.php?p=1557842&postcount=6189
 
Last edited by a moderator:
It also has a lot to do with IBM's disregard to cost in it's target market.


They moved from sram->edram because for L3, only size matters. They had no qualms increasing latency of L3. The process for CMOS edram is pretty expensive.
Not only they explain that they did it also for power concerns:
Deep trench (DT) embedded DRAM (eDRAM) is used instead of SRAM in the
four 2MB L2 caches. The 2MB L2 caches are implemented as 16 1Mb eDRAM
macro instances, each is composed of four 292Kb sub-arrays (264 WL × 1200
BL). The eDRAM cell measures 152×221nm2 (0.0672µm2). This eDRAM implementation
allows substantially larger cache sizes (3×>SRAM) with only ~20% of
the AC and DC power [4]. Use of eDRAM provides a significant noise reduction
benefit in the form of DT decoupling capacitance. DT provides at least a 25×
improvement over thick-oxide dual-gate decoupling devices in capacitance per
unit area. Coupled with a robust power distribution, DT decoupling reduced AC
noise 30mV and reduced power more than 5W.
From what I get from the two papers linked in wiki, they also use smaller "memory line" (I guess it means cache lines?) I guess it's tie to the choice of dram vs sram as the gain density makes up for the choice. In the end IBM claims that their implementation (in the case of A2 which is clocked pretty low) achieve the same result as SRAM in less than half the area for a fifth of the power.
The main problem is that it may not clocked high enough to find its place in a power7 but in power efficient throughput cores.
Depending on the clock speed of the WiiU CPU it could make sense in regard to power consumption and to allow plenty of L3. It would not be the right choice for say the SnB L2.

Anyway I hope we will find out soon more about the inner of the WiiU. Honestly I don't expect them to use EDRAM for the L2. IBM put a lot of work in the A2 and it's not a match for console needs.
 
I used to have an Athlon II x4 equipped desktop along with my current Phenom II x4 powered machine, and I could never really tell a difference in games. From online benchmarks, Far Cry 2 was one of a handful that actually saw some real gains from L3, but when L3 and non-L3 equipped systems were running 60+ FPS, it didn't really matter.. Then again, these are PC games that have to be able to run with or without L3 available.

that may be a case of +10-15% being more visible on bars than in framerate then.
but it's also game dependent.
and sure, the L3 got in there at least because it's needed by the opteron (which is about the same CPU)
 
Last edited by a moderator:
that may be a case of +10-15% being more visible on bars than in framerate then.
but it's also game dependent.
and sure, the L3 got in there at least because it's needed by the opteron (which is about the same CPU)

Well I did a bit of research. GTA IV saw a small jump in FPS and L4D2 especially benefited from L3. I still just don't see the need for L3 for anything gaming related, I see it mostly being useful for practical stuff, and PC games are not practical but they are a good indicator of practical overall performance (I guess :p).
 
Considering the size, barely bigger than the Wii, it's for sure a single chip design, with low power-consumption. 3 CPU core and few R700 simds seems most likely. As for RAM, i bet they went for 1 GB of GDDR5.
 
Status
Not open for further replies.
Back
Top