Would GTA V be possible on Wii U?

Heck, Scarface The World is Yours was a pretty good open world game on the Wii, honestly, it had more NPC's running around than Watch Dogs does.
But at what AI complexity? You can't just go by body count or on-screen results and conclude that comparable looking games in terms of amount of content require comparable workloads and should perform similarly on the same hardware. AI is one of those features that can expand limitlessly. Heck, we haven't got a computer with enough power to simulate one human brain yet, so clearly simulating a city is going to be an impossibly complex job. One so complex, it's not the slightest bit realistic in the solution used, but the approximations can be improved exponentially to provide more realistic results.

Against the in-order 3.2Ghz PPE in PlayStation 3?
Why are we ignoring the SPEs for AI? Ray tracing for line-of-sight and audio awareness, spatial searching for proximity tests, etc. are part of AI and great (or moderate) fits for SPEs.
 
Why are we ignoring the SPEs for AI? Ray tracing for line-of-sight and audio awareness, spatial searching for proximity tests, etc. are part of AI and great (or moderate) fits for SPEs.

These are calculations are necessary to feed into AI but aren't AI themselves. The core AI is taking all of these inputs (what can I see, what can I hear, where I am positioned in relation to the world, what is my health, what is my motivation/goal, what are my capabilities, what can I shoot, what should I avoid, where can I take cover etc) and make a decision about what the subject is going to do. The SPUs strength doesn't lie in non-linear processing and having to load each parallelized AI job with all of this the data and yet still need to react to other AIs for whom that data isn't accessible because it's on another SPU makes this type of processing better suited to the PPE.

Not impossible but far from optimum.
 
AI agents wouldn't need to be aware of other AI's decision making (in fact, that'd go beyond AI to precognition!). Each agent only needs evaluate its place. This is serial processing. And I'm sure clever algorithms can create fancy multi-dimensional datasets that encapsulate the various state of play for key objects for fast evaluation, calculating them all individually and then creating an overarching representation.

But even if not, Wii U's maths powers are poop, but these are important in AI. Calculating distances, intercepts, collisions, mathematical weightings, etc. Cell may not be the best at churning through finite state machines, but Wii U certainly has its own fair share of AI shortcomings such that it shouldn't be assumed Espresso can handle everything AI that PS360 can.
 
AI agents wouldn't need to be aware of other AI's decision making (in fact, that'd go beyond AI to precognition!). Each agent only needs evaluate its place. This is serial processing. And I'm sure clever algorithms can create fancy multi-dimensional datasets that encapsulate the various state of play for key objects for fast evaluation, calculating them all individually and then creating an overarching representation.
You're right. AIs don't need to react to other AIs instantly, it'd be more realistic to leave it a frame or two to mimic human reactions.
 
AI agents wouldn't need to be aware of other AI's decision making (in fact, that'd go beyond AI to precognition!). Each agent only needs evaluate its place. This is serial processing. And I'm sure clever algorithms can create fancy multi-dimensional datasets that encapsulate the various state of play for key objects for fast evaluation, calculating them all individually and then creating an overarching representation.

But even if not, Wii U's maths powers are poop, but these are important in AI. Calculating distances, intercepts, collisions, mathematical weightings, etc. Cell may not be the best at churning through finite state machines, but Wii U certainly has its own fair share of AI shortcomings such that it shouldn't be assumed Espresso can handle everything AI that PS360 can.
You assume that Wii U's CPU can't do it and you probably ignore advantages that it has...

Espresso is Out-of-Order-Execution in comparison Xenon and Cell are In-Order-Execution

"The key concept of OoOE processing is to allow the processor to avoid a class of stalls that occur when the data needed to perform an operation are unavailable. In the outline above, the OoOE processor avoids the stall that occurs in step (2) of the in-order processor when the instruction is not completely ready to be processed due to missing data.

OoOE processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal. The way the instructions are ordered in the original computer code is known as program order, in the processor they are handled in data order, the order in which the data, operands, become available in the processor's registers. Fairly complex circuitry is needed to convert from one ordering to the other and maintain a logical ordering of the output; the processor itself runs the instructions in seemingly random order.

The benefit of OoOE processing grows as the instruction pipeline deepens and the speed difference between main memory(or cache memory) and the processor widens. On modern machines, the processor runs many times faster than the memory, so during the time an in-order processor spends waiting for data to arrive, it could have processed a large number of instructions."

http://en.wikipedia.org/wiki/Out-of-order_execution#Basic_concept

Espresso has 3MB of L2 cache compared to Xenon's 1MB and Cell's 512KB
~All three have 64KB (32/32 kB instruction/data) L1 Cache


Each Espresso core has own L2 Cache
Core 0 512KB - Core 1 2MB - Core 2 512KB
Each L2 Cache is 4-way, set associative compared to 2-way on Gamecube/Wii
Each L2 Cache is 2-sectored
8-way, set associative L1 Cache
6 Execution Units per Core, total of 18 Execution Units

Sectored Cache
"As Raf suggests, the SPG manuals volumes 1 and 2 are great resources for gleaning details such as these, and at the risk of repeating some of the useful information he passed along, here's an overview of memorycaching (as opposed to some of the other caching done in the processor). Cache types fall in a spectrum from Direct Mapped (where every line in memory has its own cache line) to Fully Associative (where every cache line can hold locally the contents of every memory line). N-Way Set-Associative caches lie somewhere in the middle: each cache line can hold the contents of some set of memory lines, which are evenly dispersed and interleaved through memory, and as noted in the text Adrien quoted, reduces the number of cache lines the processor must examine in order to determine whether there was a hit.. Cache line size may vary per architecture as can the number of ways in the caches (and whether the caches are Write Back or Write Through and other esoteric features). Pentium 4 and Intel Xeon processors have a sectored L2 (and L3 if present) cache, which for all practical purposes means that if adjacent sector prefetch is enabled, a request for one cache line of an associated pair of cache lines (Intel implementations all use 128-byte/ 2 cache-line sectors) will also generate a prefetch for the other cache line in the pair on the speculation that it will be needed eventually anyway. This is one of at least four kinds of hardware prefetch supported in current processors. There are a few specialized cases where application of software prefetch (in the form of an actual instruction in the stream) can hide some memory latency by starting the fetch early, but generally it is better to let the machine figure out when to prefetch, since optimal conditions vary from architecture to architecture."
https://software.intel.com/en-us/forums/topic/302355

Xenon has shared Dynamic L2 Cache between cores
Evenly split would mean 341,3KB per core or 170,6KB per thread
L2 Cache is 8-way, set associative
L1 2-way, set associative instruction cache
L1 4-way, set associative data cache
5? Execution Unit per Core, total of 15? Execution Units

Cell is likely the same Dynamic L2 Cache as Xenon
Evenly split would mean 170,6KB per core or 85,3KB per thread
L2 Cache is ?-way, set associative
L1 ?-way, set associative instruction cache
L1 ?-way, set associative data cache

...continuing...
Espresso has 4-6 stage pipeline compared to Xenon's and Cell's 32-40 stage pipeline

"It's hard to reduce power consumption in a deeply pipelined processor. Xbox 360 CPU had the longest pipeline in history, approaching 40 stages. Microsoft had to cut this down to 13~15 pipes to reduce power consumption in order for the SOC to fit into a Roku like box, meaning it is basically a new CPU sharing instruction set, not a die shrunk version of old CPU"
http://www.psu.com/forums/showthrea...i-and-Durango(MisterXmedia-Being-Vindicated-)


"I believe if you program only against one main CPU (like we do for pretty much most emus), you would find that the PS3/Xenon CPUs in practice are only about 20% faster than the Wii CPU.

I've ported the same code over to enough platforms by now to state this with confidence - the PS3 and 360 at 3.2GHz are only (at best - I would stress) 20% faster than the 729Mhz out-of-order Wii CPU without multithreading (and multithreading isn't a be-all end-all solution and isn't a 'one size fits all' magic wand either). That's pretty pathetic considering the vast differences in clock speed, the increase in L2/L1 cache and other things considered - even for in-order CPUs, they shouldn't be this abysmally slow and should be totally leaving the Wii in the dust by at least 50/70% difference - but they don't."
http://gbatemp.net/threads/retroarch-a-new-multi-system-emulator.333126/page-7#post-4365165


http://www.avsforum.com/forum/141-xbox-area/758390-xbox-360-vs-ps3-processor-comparison.html

http://forums.macrumors.com/showpost.php?p=1633076&postcount=3

Wii U's CPU is SMP which it has own minor advantages over multi processing.

https://software.intel.com/en-us/bl...rence-between-multi-core-and-multi-processing

http://en.wikipedia.org/wiki/Symmetric_multiprocessing

It can handle it and do it better, it is painfully obvious despite you trying to convince/persuade everyone that it "apparently can't do it"... Its like saying a Core 2 Duo can't beat Pentium D which is two Pentium 4 duck taped together.

You can play SIMD and SPE card if you want as a last resort... It can be done on Wii U's GPU.

"Next you would think that the PS3 (just like the 360) would be able to segment the game control plus AI code into one core and the graphics rendering code into another core. However that is not possible! Since the total application code may be about 100 MB and the SPE only has 256KB of memory, only about 1/400 of the total code can fit in one SPE memory. Also since there isn't any branch prediction capabilities in an SPE, branching should be done as little as possible (although I believe that the complier can insert code to cause pre-fetches so there may not be a big issue with branching).

Therefore the developer has to find code that is less than 256KB (including needed data space) that will execute in parallel.

Even if code can be found that can be segmented, data between the PPE and the SPE has to be passed back and forth via DMA which very slow compared of a pointer to the data like the 360.

If we assume that enough segment code was found that could use all the 6 SPE cores assigned to the game application, now the developer would try to balance the power among the cores. Like the 360, some or all the cores may have a very low utilization. Adding more hardware threads are not possible since each core has only one hardware thread. Adding software threads probably will not work due to the memory constraint. So the only option is an overlay scheme where the PPE will transfer new code using DMA to the SPE when the last overlay finishes processing. This is very time consuming and code has to be found that does not overlap in the same time frame."

Wii U GPU has Wii GPU in itself thus inherited its 1MB SRAM texture cache and 2.25MB Framebuffer that likely can serve different role otherwise it would be a waste of silicon for Nintendo.
 
Last edited by a moderator:

You have quoted a lot of other peoples opinions from 2006/7. Im not really going to address the individual problems in this post because I feel there are people on this forum with far more knowledge who can do so. But I would suggest you fact check things before you post them, just because someone said it on the internet doesn't make it so.
 
You have quoted a lot of other peoples opinions from 2006/7. Im not really going to address the individual problems in this post because I feel there are people on this forum with far more knowledge who can do so. But I would suggest you fact check things before you post them, just because someone said it on the internet doesn't make it so.

Opinions of people that worked on hardware and has relevance today...

Just noticed Cell has only one thread per core...
 
Opinions of people that worked on hardware and has relevance today...

Just noticed Cell has only one thread per core...

The majority of your argument comes from the following three posts.

http://www.avsforum.com/forum/141-xbox-area/758390-xbox-360-vs-ps3-processor-comparison.html
http://forums.macrumors.com/showpost.php?p=1633076&postcount=3

and this

http://gbatemp.net/threads/retroarch-a-new-multi-system-emulator.333126.//page-7#post-4365165

The first two don't even mention anything about the people posting doing any work whatsoever on either of the consoles, the third doesn't mention enough information to give us context. Whilst i have no doubt the 'only 20% faster then the Wii' is true in at least one case, I doubt it is true in the majority of cases.

Also whoever wrote the information about the Cells internal EIB speed being slow clearly has no idea what they are talking about, it has a peak bandwidth of 25.6GB/s of bandwidth per client (where each SPE and the PPE is a client as are a bunch of other devices), and is what the external system RAM travels over to get into the Local Store / Caches.

can you explain why the Wii U's CPU is better then the PS3's approach or say the XBOX 360's approach.
 
I'm bored and have some time, so I'll bite.

You can play SIMD and SPE card if you want as a last resort
IIRC every console cpu since the PS2 had SIMD and SPE's are inherently a part of cell's architecture, to not take them into account is reducing cell to a dual core 1.6ghz in-order dual issue (I think) CPU - they are meant to be used by design.

"Next you would think that the PS3 (just like the 360) would be able to segment the game control plus AI code into one core and the graphics rendering code into another core. However that is not possible! Since the total application code may be about 100 MB and the SPE only has 256KB of memory, only about 1/400 of the total code can fit in one SPE memory. Also since there isn't any branch prediction capabilities in an SPE, branching should be done as little as possible (although I believe that the complier can insert code to cause pre-fetches so there may not be a big issue with branching).
In the history of "normal applications" I'm rather certain there has never been code 100MB in size or anywhere near that, and since all code exhibits locality breaking large code into smaller pieces becomes an exercise in caching. In addition if you've ever coded or looked into coding a game/3d engine you would know the access patterns to large data sets don't always exhibit the problem you're pointing out. SPE's do have branch hint instructions.

Even if code can be found that can be segmented, data between the PPE and the SPE has to be passed back and forth via DMA which very slow compared of a pointer to the data like the 360.
IIRC SPE's can initiate DMA transactions on there own and therefore traverse complex data structures on there own, I don't remember if there is "special ways" to get data from the PPU cache to an SPE LS w/o hitting memory first. If I'm not mistaken software pipelining was used to mitigate the latency incurred by DMA transfers, but I could be wrong - don't really remember.

Wii U GPU has Wii GPU in itself thus inherited its 1MB SRAM texture cache and 2.25MB Framebuffer that likely can serve different role otherwise it would be a waste of silicon for Nintendo.
That is a brazen assumption on your part.

Anyway this is all kinda OT for this thread so I'll shut up now, sorry for the interruption.
 
I'm bored and have some time, so I'll bite.


IIRC every console cpu since the PS2 had SIMD (You're assuming I didn't know) and SPE's are inherently a part of cell's architecture, to not take them into account is reducing cell to a dual core 1.6ghz in-order dual issue (I think) CPU - they are meant to be used by design. Cell is Tri-Core 3.2Ghz,.. Its dual issue.

In the history of "normal applications" I'm rather certain there has never been code 100MB in size or anywhere near that, and since all code exhibits locality breaking large code into smaller pieces becomes an exercise in caching. In addition if you've ever coded or looked into coding a game/3d engine you would know the access patterns to large data sets don't always exhibit the problem you're pointing out. SPE's do have branch hint instructions. Really? Ok.


IIRC SPE's can initiate DMA transactions on there own and therefore traverse complex data structures on there own, I don't remember if there is "special ways" to get data from the PPU cache to an SPE LS w/o hitting memory first. If I'm not mistaken software pipelining was used to mitigate the latency incurred by DMA transfers, but I could be wrong - don't really remember.


That is a brazen assumption on your part.

Anyway this is all kinda OT for this thread so I'll shut up now, sorry for the interruption.

"We use the eDRAM in the Wii U for the actual framebuffers, intermediate framebuffer captures, as a fast scratch memory for some CPU intense work and for other GPU memory writes."

http://hdwarriors.com/general-impression-of-wii-u-edram-explained-by-shinen/

They can access with CPU the 32MB pool in GPU directly, so its probably possible to access other two if not already used...
 
"We use the eDRAM in the Wii U for the actual framebuffers, intermediate framebuffer captures, as a fast scratch memory for some CPU intense work and for other GPU memory writes."

http://hdwarriors.com/general-impression-of-wii-u-edram-explained-by-shinen/

They can access with CPU the 32MB pool in GPU directly, so its probably possible to access other two if not already used...

The Cell has a single PPE at 3.2Ghz and 8 SPE's at 3.2Ghz. You could easily argue that each SPE counts as a core, even if they are cut down.
 
As well as each SPE (6 for general use) has 256kB of very fast local storage.

There's a lot of stuff I have issues with, above...

"I believe if you program only against one main CPU (like we do for pretty much most emus), you would find that the PS3/Xenon CPUs in practice are only about 20% faster than the Wii CPU.

Especially this... no way how you spin it does this make sense. Only if you drop the SPEs in CELL and program to one HW thread on CELL might this be correct. And even then, I am unsure. Wii CPU is not just old. It's freakishly old. And additionally, it's not clocked high, either. It might be out of order, but the architecture surrounding it doesn't make OOOe the magic bullet to combat CPU that is clocked nearly 5 times as high. And that is ignoring the fact, that XeCPU has AVX and several other SIMD advances, WiiCPU completely lacks. Each SPE has 8 times the processing power of a single Broadway! EACH ONE. Even if you program it really badly, you have 6 of them PLUS the PPE...

If we're talking WiiU, the story is different. But even then, I'd not dismiss CELL. It helped the PS3 to combat the XeGPU, which was hands down the much faster than RSX, yet the game differences (later on, that is) didn't show this.
 
You assume that Wii U's CPU can't do it and you probably ignore advantages that it has...
You can't read (or jumped on one post and didn't bother reading the thread). I had acknowledged Wii U has some strengths, but was challenging the view that Cell and Xenon were lacking by pointing out a good part of modern AI involves lots of maths and that's an Espresso weakness.

You then go on to talk completely incoherently about code not fitting in SPE local store and such madness. Umm....no processor on the planet can fit 100 MBs of code on chip for the CPU to use (save maybe Intel's eDRAM chips!). All of them stream the code into the caches, measured in kilobytes.

You're possibly someone we've removed from this board before. Certainly you're someone being removed now as not being capable of contributing to sane discussion on the board.

I don't remember if there is "special ways" to get data from the PPU cache to an SPE LS w/o hitting memory first.
The ring bus, I believe. Yep.
 
IIRC every console cpu since the PS2 had SIMD (You're assuming I didn't know) and SPE's are inherently a part of cell's architecture, to not take them into account is reducing cell to a dual core 1.6ghz in-order dual issue (I think) CPU - they are meant to be used by design. Cell is Tri-Core 3.2Ghz,.. Its dual issue.
I'm not assuming anything, you were talking about "pulling cards", how can you pull a card that is in everyones hand? Cell has one PPU @ 3.2 ghz, it handles two threads - IIRC they are scheduled in a round robin static fashion - hence the 1.6ghz number.

"We use the eDRAM in the Wii U for the actual framebuffers, intermediate framebuffer captures, as a fast scratch memory for some CPU intense work and for other GPU memory writes."
Assuming your source is accurate, thank you I learned something new, edram/esram for gpu's as far as I know has never been implemented being directly addressable by the CPU or any other bus master besides (maybe) DMA. You also mention the texture cache... I see no support of that claim in your link. However I'd like to point out you point out problems with cell's local stores yet you don't point out the problems with this access pattern, why?

The ring bus, I believe. Yep.
While everything is hooked up to the EIB cache's aren't normally directly addressable, so unless something special was built in to the bus interface of the PPU... but I might be missing something in regards to cache coherence in regards to DMA (dirty bit causing transfer to hit cache). I looked through your article and couldn't find anything to clear that up.
 
What do you mean by 'hitting memory'? You can send a package from any core to any other core via the EIB without having to go out to and back from RAM.
 
What do you mean by 'hitting memory'? You can send a package from any core to any other core via the EIB without having to go out to and back from RAM.
Aren't all access's to a SPU's local store done through DMA? If so unless the DMA unit is cache coherent you can't go straight from cache to local store, hence the cache line/s would need to be flushed (used to update RAM - I'm not sure that is the right term) first. I'm unclear on this behavior and never really bothered looking it up.

BTW - I looked up the Wii-U CPU and according to wikipedia it can retire four instructions per clock, I'm not sure how many of those are ALU ops though.
 
Aren't all access's to a SPU's local store done through DMA? If so unless the DMA unit is cache coherent you can't go straight from cache to local store, hence the cache line/s would need to be flushed (used to update RAM - I'm not sure that is the right term) first. I'm unclear on this behavior and never really bothered looking it up.

BTW - I looked up the Wii-U CPU and according to wikipedia it can retire four instructions per clock, I'm not sure how many of those are ALU ops though.
Apparently PPU can directly access SPU local store directly, but it is more efficient to use DMA.
https://books.google.fi/books?id=7g...wOp9YD4DQ&ved=0CDIQ6AEwAA#v=onepage&q&f=false
 
Last edited:
The only piece of that rant that is rooted in fact is the Shin'en quote about the edram for the Wii U. Shin'en did say they can access the GPU's edram directly with the CPU. Wii U is tightly engineered from that perspective. Shin'en did an interview with HD warriors where they expressed positive opinion of Wii U's memory setup, stating that Nintendo's engineering avoided typical stalls due to high memory latency. So I don't know enough about coding to know just how much code is super memory latency sensitive, but it does appear to be a positive for the Wii U.
 
Back
Top