Nintendo Switch Technical discussion [SOC = Tegra X1]

Since this is B3D, I think a proper answer is deserved.

https://en.wikipedia.org/wiki/FLOPS

Each Cortex A57 (or any other ARMv8 CPU, apparently) is capable of doing 4 concurrent FP32 FMADD operations per cycle. Fused Multiply-Add counts as two operations (it's the same in GPUs), so there's 8 operations per cycle.
There are 4 cores at 1GHz, so 4*1G*8 = 32 GFLOPs.


One could argue that 32GFLOPs is still a nice addition to a <200GFLOPs GPU, but 4 cores for (recenti-ish) games are already pretty anemic, especially as one of them is reserved for the O.S.
I doubt devs are going to use the CPU's FPUs for graphics shading like they did with e.g. the Cell (which had a theoretical max throughput of ~230 GFLOPs BTW).
 
Last edited by a moderator:
@ToTTenTranz thanks for putting together that info. My build finished before I had time to get into that on my response and I didn't want to give any answer that would have been over-estimated. I also thought the GPU was more like 700 GFLOPS, but maybe that's just the FP16 number than I was thinking of.

¯\_(ツ)_/¯
 
Assuming that Digital Foundry's clock speeds are correct, it's probably (400FP32/800FP16) docked, and (200FP32/400FP16) undocked, rounded for sanity.

The 200GFLOPS number is the lowest possible to grab. The conditions under which it is appropriate are rarely given for obvious reasons.
 
You know the old saying its better than nothing? Well, I think Nintendo just proved with their mobile app that is not always the case. This thing is getting unanimously panned across the internet. I downloaded it, and honestly cannot see our group bothering with it. It is easier and better for us to simply create a conference call like we did with the original Splatoon. Nintendo did so many things right with Switch, but their online features are potentially worse than Wii. We played COD Black Ops, MW3, and the Conduit all with integrated voice chat. Forcing the voice chat onto an app would be tolerable if it was a good app, but currently it is atrocious.
 
You know the old saying its better than nothing? Well, I think Nintendo just proved with their mobile app that is not always the case. This thing is getting unanimously panned across the internet. I downloaded it, and honestly cannot see our group bothering with it. It is easier and better for us to simply create a conference call like we did with the original Splatoon. Nintendo did so many things right with Switch, but their online features are potentially worse than Wii. We played COD Black Ops, MW3, and the Conduit all with integrated voice chat. Forcing the voice chat onto an app would be tolerable if it was a good app, but currently it is atrocious.

Yeah, hopefully the backlash is bad enough for them to put it as a higher priority. Not to mention that these features should be on the system as well.
 
Assuming that Digital Foundry's clock speeds are correct, it's probably (400FP32/800FP16) docked, and (200FP32/400FP16) undocked, rounded for sanity.

The 200GFLOPS number is the lowest possible to grab. The conditions under which it is appropriate are rarely given for obvious reasons.

You're forgetting the bottom 307 MHz clock speed.
 
Regarding future upgrades to the Switch, it is in a good position insofar as lithographic process development goes. A really straightforward option would be to use Parker shrunk to 7 or 5nm, depending on when you want to release it. Twice as wide, and at those process nodes clock speeds upwards of a GHz undocked would yield a factor of five or so in terms of ALU capabilities. And of course, nVidia may not stand still in terms of architectural evolution. But making that thought experiment puts the finger on memory.
The simplest solution there would probably be to update the 128-bit interface of Parker to support LPDDR5, yielding a nominal bandwidth of just over 100GB/s.
This could be done as early as 2019. If that makes sense for Nintendo is another question. For instance they could go for a mid-gen lower power draw upgrade instead.
With the Switch, Nintendo can piggy-back on mobile tech development. They have options.
 
Each Cortex A57 (or any other ARMv8 CPU, apparently) is capable of doing 4 concurrent FP32 FMADD operations per cycle. Fused Multiply-Add counts as two operations (it's the same in GPUs), so there's 8 operations per cycle.
There are 4 cores at 1GHz, so 4*1G*8 = 32 GFLOPs.
Thank you for answer! But tell me please, is that enough for current games?
 
Thank you for answer! But tell me please, is that enough for current games?

No - if you follow any of the discussions on these forums you will see how even the Jaguar CPUs are not enough to make advances in gameplay.

However, It has to be enough for current Switch games because there isnt any other option. But its nowhere near enough for current higher end PS4 or Xbox One games, which is why everyone has been talking about ports to Switch will be very tough to do and would have to use last-gen engines.
 
No - if you follow any of the discussions on these forums you will see how even the Jaguar CPUs are not enough to make advances in gameplay.
Yes I remember that. But about what advances you are talking about? In terms of gameplay games haven't reached something significantly higher than PS2 era IMO.

But its nowhere near enough for current higher end PS4 or Xbox One games, which is why everyone has been talking about ports to Switch will be very tough to do and would have to use last-gen engines.
What exactly PS4 Xbox One games have so special what Switch could run? Zelda Breath of the Wild is equal game in terms of gameplay to any PS4 Xbox One game.
And one more thing. Isn't there some problems on Switch exactly with old engines, so many now use new like Unreal Engine 4?
 
Last edited by a moderator:
Yes I remember that. But about what advances you are talking about? In terms of gameplay games haven't reached something significantly higher than PS2 era IMO.


What exactly PS4 Xbox One games have so special what Switch could run? Zelda Breath of the Wild is equal game in terms of gameplay to any PS4 Xbox One game.
And one more thing. Isn't there some problems on Switch exactly with old engines, so many now use new like Unreal Engine 4?

Any game could be ported with enough downgrades, time and budget, but most developers won't do that, maybe too much work or not worth the effort, who knows? Zelda is actually a good example the game was targeting hardware 2-3x less powerful hardware and switch cpu is much better, yet at launch it's struggle to run at a descent frame rate on switch, now imagine developers targeting much higher specs ad trying to port to switch, it's gonna be a more difficult process, developers are likely to run into many bottlenecks and release something that doesn't run well because they ran out of time , look at most 360/ps3 ports that were trying to do much for the hardware by the end of last generation, and the results were not good.
 
Thank you for answer! But tell me please, is that enough for current games?

That would mostly depend on what you define as a "current" game.

For example the Witcher 3 will run on a crappy Intel igp at around 20fps while not even looking at that horribly bad. With some optimization a dev might be able to get 30fps without the game looking like something totally different from how it was intended to look.

Instead of asking whether switch can run current games it might be better to ask whether a multiplatform games on switch will be, at their core, the same game as the one released on other platforms.

For example could switch run something like a gta6 with the same level of scale as ps4 or Xbox one but with scaled down graphics or would it have to become a different game?

I think at their core most current games would work on switch though it would depend on how much time and money publishers are willing to invest in it.

Personally I don't see that happening.
 
Instead of asking whether switch can run current games it might be better to ask whether a multiplatform games on switch will be, at their core, the same game as the one released on other platforms.

For example could switch run something like a gta6 with the same level of scale as ps4 or Xbox one but with scaled down graphics or would it have to become a different game?
I think these are the thoughts people have been having when looking at game options on portables since forever.
 
Last edited:
In terms of gameplay games haven't reached something significantly higher than PS2 era IMO.

What exactly PS4 Xbox One games have so special what Switch could run? Zelda Breath of the Wild is equal game in terms of gameplay to any PS4 Xbox One game.
GTA5 has lots of simulated people and lots of cars and lots of other simulated systems running concurrently. The city feels alive. PS2 would never been able to achieve that. Last gen console versions had much less cars and people on the streets. You can't really simulate rush hour traffic without being able to simulate enough cars. Highways simply don't have enough cars in the last gen version to cause traffic jams. The city feels less alive.

Haven't got experience from Switch, but 3x ARM cores are likely a significant downgrade compared to 7x x64 cores for this kind of highly parallel city simulation workload.
 
GTA5 has lots of simulated people and lots of cars and lots of other simulated systems running concurrently. The city feels alive. PS2 would never been able to achieve that. Last gen console versions had much less cars and people on the streets. You can't really simulate rush hour traffic without being able to simulate enough cars. Highways simply don't have enough cars in the last gen version to cause traffic jams. The city feels less alive.

Haven't got experience from Switch, but 3x ARM cores are likely a significant downgrade compared to 7x x64 cores for this kind of highly parallel city simulation workload.

Sure, but GTA 5 also ran on PS3/X360 with identical gameplay. The Switch compares much more favorably to those consoles. The graphics are the only things that changed between last gen and current gen.

Regards,
SB
 
That would mostly depend on what you define as a "current" game.
PS4 Xbox One level.

Instead of asking whether switch can run current games it might be better to ask whether a multiplatform games on switch will be, at their core, the same game as the one released on other platforms.
My question was: Is 32 Gflops CPU enough to run all gameplay stuff of current games?

GTA5 has lots of simulated people and lots of cars and lots of other simulated systems running concurrently. The city feels alive. PS2 would never been able to achieve that. Last gen console versions had much less cars and people on the streets. You can't really simulate rush hour traffic without being able to simulate enough cars. Highways simply don't have enough cars in the last gen version to cause traffic jams. The city feels less alive.
Ok, you are right. Let me say it a little different. Yes, there is s lot more cars and people in GTA 5 than in GTA SA, but I by higher level a mean people should behave more realistic, do much more different things, cars drive more complex etc. We have more things happening on screen , but not so much more complex things.
 
PS4 Xbox One level.


My question was: Is 32 Gflops CPU enough to run all gameplay stuff of current games?


Ok, you are right. Let me say it a little different. Yes, there is s lot more cars and people in GTA 5 than in GTA SA, but I by higher level a mean people should behave more realistic, do much more different things, cars drive more complex etc. We have more things happening on screen , but not so much more complex things.
It's not necessarily the FLOPS that does the things you're talking about. It may be branching/branch prediction and memory latencies for instance traversing heuristics based AI.
I'm afraid you need to give up the single figure of merit idea. CPUs do complex tasks.
 
Haven't got experience from Switch, but 3x ARM cores are likely a significant downgrade compared to 7x x64 cores for this kind of highly parallel city simulation workload.

I suppose the question is just how much more simulation can the 3x ARM cores handle compared to what the Xenon and Cell were able to accomplish with GTA V. Any speculation on your part?
 
I suppose the question is just how much more simulation can the 3x ARM cores handle compared to what the Xenon and Cell were able to accomplish with GTA V. Any speculation on your part?
It would be highly dependent on the code they are running.

Cell and Xenon are good in highly optimized SIMD code. Xenon = 3 cores at 3.2 GHz, four multiply-adds per cycle (76.8 GFLOP/s). That's significantly higher theoretical peak than the 4x ARM cores on Switch can achieve. But obviously it can never reach this peak. You can't assume that multiply-add is the most common instruction (see Broadwell vs Ryzen SIMD benchmarks for further proof). Also Xenon vector pipelines were very long, so you had to unroll huge loops to reach good perf with it. Branching and indexing based on vector math results was horrible (~40 cycle stall to move data between register files). ARM NEON is a much better instruction set and OoO and data prefetch helps even in SIMD code.

If you compare them in standard C/C++ game code, ARM and Jaguar both stomp over the old PPC cores. I remember that it was common consensus that the IPC in generic code was around 0.2. So both Jaguar and ARM should be 5x+ faster per clock than those PPC cores (IIRC Jaguar average IPC was around 1.0 in some real life code benchmark, this ARM core should be close). However you can also write low level optimized game code for PPC, so it all depends on how much resources you had to optimize and rewrite the code. Luckily those days are a thing of the past. I don't want to remember all those ugly hacks we had around the code base to make the code run "well enough". The most painful thing was that CPU didn't have a data prefetcher. So you had to know around 2000 cycles in advance which memory regions your future code is going to access, and prefetch that data to cache. If you didn't do this, you would get 600 cycle stalls on memory loads. Those PPC cores couldn't even prefetch linear arrays.
 
Back
Top