Nintendo Switch Tech Speculation discussion

AlNom · Feb 22, 2017

Reservations can change as we all know...

Deleted member 13524 · Feb 22, 2017

AlNets said:
Reservations can change as we all know...

They can. My point was that if they could use the A53 module they would most probably use them, even if just two of those.

Exophase · Feb 22, 2017

Lalaland said:
For the A53 cores to work alongside the A57s then NV would have had to solve their cache coherency issues with TX1 which would be a major customisation for Nintendo. If they had though why reserve any cores from the A57 cluster at all?

I don't believe nVidia said that there was a cache coherency limitation or issue with TX1. In fact tech sites reported the opposite, that their interconnect supports cache coherency to support more efficient cluster switching. You may be thinking of Exynos 5410 which was known to have a broken CCI.

But it was still made clear that TX1 supports cluster switching mode. This can be a design limitation without involving cache or coherency. For instance if the big and little cores are on the same power domain then there wouldn't be a way to power the two simultaneously while achieving appropriate independent DVFS which would largely defeat the point. There could be other such resources, particularly in the interconnect, that are muxed between the two clusters and can't support both simultaneously.

Goodtwin · Feb 22, 2017

I looked through those developer documents, but never once did I come across a page showing the specs. Gaf has the specs chart in the original post, but I never came across that in the actual documents.

Anyone know what PS3 and 360 reserved for OS? I don't remember hearing about any reserved CPU cores or even time slice for those consoles.

Sent from my SM-G360V using Tapatalk

Deleted member 11852 · Feb 22, 2017

I believe 360 was around 32mb and PS3 around 120mb near launch, which dropped gradually over the console's lifetime but it was never close to 360's footprint. I really hope Nintendo have not reserved a ton of RAM for OS functions intended to run concurrently with the game.

Rikimaru · Feb 22, 2017

I doubt Nintendo will write video capture to NAND murdering it. They'll reserve a good chunk for that.

function · Feb 22, 2017

Goodtwin said:
I looked through those developer documents, but never once did I come across a page showing the specs. Gaf has the specs chart in the original post, but I never came across that in the actual documents.

Anyone know what PS3 and 360 reserved for OS? I don't remember hearing about any reserved CPU cores or even time slice for those consoles.

Sent from my SM-G360V using Tapatalk

I seem to recall that PS3 had one reserved SPU, and another that was available for games but that the OS could claim if it needed it. 5 SPUs were guaranteed for games.

On the 360 core 0 was 100% available for games, so I think that's were the rendering thread was normally run. Cores 1 & 2 had something like a 5% slice for the OS. XNA gave you cores 0 and 1 iirc, though I was a lazy bum and just threw all my shitty code on one thread.

Goodtwin · Feb 22, 2017

Rikimaru said:
I doubt Nintendo will write video capture to NAND murdering it. They'll reserve a good chunk for that.

Good point, and perhaps this is a reason to keep a CPU core reserved for OS as well. I would assume capturing video will take up some resources, and probably maintains a connection with social media to allow quick uploads to Facebook and Twitter.

Sent from my SM-G360V using Tapatalk

Lalaland · Feb 22, 2017

I'm not sure running the O/S on A53 and games on A57 and communicating via RAM would even work, correct me if I'm wrong but software will be making OpenGL calls to the hardware, which involves the video driver, which would be on the A53 cluster and thus approximately eleventy billion years away in execution terms via RAM? Sure button presses are relatively latency insensitive but any and all I/O would have to go via that path also.

Having a rummage on the nvidia dev boards there is no mention of why A53 is disabled on the Jetson kit (TX1 dev board) just that it is. I think I may have picked up cc as the reason from a twitter dev rant (that I can no longer find) but it could well be that the design isn't cc because other parts of the design don't support concurrent operation of both clusters anyway.

BRiT · Feb 22, 2017

Goodtwin said:
Is the sourse suggesting only three A57 cores for game reputable? Looked like another fake document to me. Locking a full core for such a lightweight OS seems excessive.

It fits in with Nintendo using 50% of the ram on WiiU for that OS.

3dilettante · Feb 22, 2017

Rikimaru said:
Reserving A57 if you have 4 A53 is very wasteful. 4 A53 could handle any OS task.

There are low-priority background tasks in the OS, but in other consoles there are also interfaces that abstract the hardware, secure APIs, encryption, and other measures for these DRM-heavy platforms. There's also the operating system's role in securing and managing the system reliably and responsively, and with software being customized for the platform the cores can be pegged in ways that can crowd an OS process. Alternately, the heavyweight actions in some OS functions can make the core unreliable in what performance it can offer the game.

There's also a security aspect (at least in theory--see Sony), although the latest research shows just having a dedicated core may only inhibit certain exploits.

There are also other examples of where games do sometimes fall back to the OS. Naughty Dog mentions relying on the OS for some synchronization when its lightweight job system runs into something complicated.
It's not glamorous, but the OS or OS functions can be unavoidable and they can be costly. Deciding to make them slower might not let all that optimized client code show as much improvement.

Deleted member 13524 · Feb 22, 2017

Goodtwin said:
I looked through those developer documents, but never once did I come across a page showing the specs. Gaf has the specs chart in the original post, but I never came across that in the actual documents.

If you downloaded the documents, it's in the "Overview" file, section 3.3 "Hardware Specifications".

Lalaland said:
I'm not sure running the O/S on A53 and games on A57 and communicating via RAM would even work, correct me if I'm wrong but software will be making OpenGL calls to the hardware, which involves the video driver, which would be on the A53 cluster and thus approximately eleventy billion years away in execution terms via RAM? Sure button presses are relatively latency insensitive but any and all I/O would have to go via that path also.

I doubt the Switch will use OpenGL given all the Vulkan attachments, and I don't see why communicating API calls to the hardware would take billions of years.

3dilettante said:
There are low-priority background tasks in the OS, but in other consoles there are also interfaces that abstract the hardware, secure APIs, encryption, and other measures for these DRM-heavy platforms. There's also the operating system's role in securing and managing the system reliably and responsively, and with software being customized for the platform the cores can be pegged in ways that can crowd an OS process. Alternately, the heavyweight actions in some OS functions can make the core unreliable in what performance it can offer the game.

There's also a security aspect (at least in theory--see Sony), although the latest research shows just having a dedicated core may only inhibit certain exploits.

There are also other examples of where games do sometimes fall back to the OS. Naughty Dog mentions relying on the OS for some synchronization when its lightweight job system runs into something complicated.
It's not glamorous, but the OS or OS functions can be unavoidable and they can be costly. Deciding to make them slower might not let all that optimized client code show as much improvement.

No one said having the little module (supposedly without cache coherency with the big cores) dedicated to the OS would be an optimal situation. Just that when faced with a small amount of very low clocked cpu cores, games and game development in general could benefit from having that fourth big core free.

Goodtwin · Feb 22, 2017

ToTTenTranz said:
If you downloaded the documents, it's in the "Overview" file, section 3.3 "Hardware Specifications".

Not sure how I managed to miss that, but thank you. Seems odd they were still mentioning max clock speeds, and not the Eurogamer clock speeds. I suppose this may be older documentation, but why reference max clocks if your final hardware never stood a chance of hitting those speeds?

Sent from my SM-G360V using Tapatalk

Deleted member 13524 · Feb 22, 2017

Goodtwin said:
Not sure how I managed to miss that, but thank you. Seems odd they were still mentioning max clock speeds, and not the Eurogamer clock speeds. I suppose this may be older documentation, but why reference max clocks if your final hardware never stood a chance of hitting those speeds?

It's just yet another inconsistency from the eurogamer camp.
Imagine you're developing for a 2GHz CPU and all of a sudden you're downgraded to half the CPU performance. Must be a bit unpleasant.

AlNom · Feb 22, 2017

Why mention a GPU at all if it's entirely TBD? :runaway:

Rikimaru · Feb 22, 2017

It looks like column on the left was copy-pasted from nvidia docs unedited.

Lalaland · Feb 22, 2017

ToTTenTranz said:
I doubt the Switch will use OpenGL given all the Vulkan attachments, and I don't see why communicating API calls to the hardware would take billions of years.

Hyperbole is my greatest weakness, the problem inherent in a non cc design is that it is hard to manage a shared memory location between two CPUs and without it is basically impossible to use two cpus (or clusters of cpus) together. In this example of the O/S on a separate cluster (A53) from the game code (A57) we have numerous functions that we wouldn't want to have to have take up game resources (disk i/o, network i/o, input i/o, display drivers, etc) but that we do want to be able to interact with the same memory space as the game cluster(for loading, audio, button handling, etc)

As an example on loading a game I want the A53 disk i/o handler to read data from my media and deposit it into RAM so that my A57 game code can do 'things'. The problem arises in that neither cluster knows what bit of RAM the other is using (i.e within the other cluster's cache) at any given time so you could wind up with the game code wanting to flush a given set of data to disk but there is no mechanism to say to the A53 disk i/o handler "I am done with mem range XXXX, flush now please". The only way to be sure that a given memory range is not also somewhere in the other cluster's cpu cache hierarchy is to completely halt all of that clusters operations and flush the cache to RAM. Then my A53 cluster can read from memory XXXX, write it to disc and when it is done it must then itself completely halt, flush and hand back over to the A57 cluster which has to recover it's cache and start all over again.

Cache coherence is the answer to all this because it allows the cache lines to be checked to see if they match the area of RAM we need to interact with. Within each A53 and A57 cluster there is a cache coherent memory controller, it's the only way they can have 4 cores share a common pool of RAM, but because the clusters are seemingly not cc (or have not been until now at least) there is no way for them to both be active and sharing memory space.

The only way I could see this working would be if when a game is not running the A57 cluster is simply disabled and all O/S functions would move to the A53 cores. On loading a game the A53 cluster could be flushed, halted and the A57 launched with O/S functions being then handled by the reserved A57 core, seems complex and prone to failure to me. Easier to just fuse off one cluster or the other which is how TX1 has shipped thus far.

3dilettante · Feb 22, 2017

ToTTenTranz said:
No one said having the little module (supposedly without cache coherency with the big cores) dedicated to the OS would be an optimal situation. Just that when faced with a small amount of very low clocked cpu cores, games and game development in general could benefit from having that fourth big core free.

It would be suboptimal in a coherent setup because there are baseline elements of the system that the games rely on that are the OS and run in the dedicated share. The non-coherent case may not be practical or possible to implement, since so much synchronization and memory management the OS has authority over assumes the use of coherent structures.
As an example of where this doesn't happen, it is how current APUs like Carrizo handle updating the translation tables in their IOMMU--which is everything stalls.

N00b · Feb 22, 2017

Maybe the A53 and A57 cores in the TX1 have different cache line sizes? There are other ARM CPUs that have different cache line sizes and it's known to cause problems, see http://www.mono-project.com/news/2016/09/12/arm64-icache/

bunnybug · Feb 22, 2017

Goodtwin said:
Is the sourse suggesting only three A57 cores for game reputable? Looked like another fake document to me. Locking a full core for such a lightweight OS seems excessive.

Sent from my SM-G360V using Tapatalk

the document has been confirmed to be real by a developer at neogaf, and the specs are in the document took me a while to find it as well.

Nintendo Switch Tech Speculation discussion

AlNom

Moderator

Deleted member 13524

Guest

Exophase

Goodtwin

Deleted member 11852

Guest

Rikimaru

function

None functional

Goodtwin

Lalaland

BRiT

(>• •)>⌐■-■ (⌐■-■)

3dilettante

Deleted member 13524

Guest

Goodtwin

Deleted member 13524

Guest

AlNom

Moderator

Rikimaru

Lalaland

3dilettante

N00b

bunnybug

Similar threads