Switch 2 Speculation

Ampere's flops are inflated, mainly because Int cores from Turing architecture, were expanded in Ampere to run FP32 cuda core code. This allows for much higher paper stats, but those int cores did do int work, and these expanded FP32 cores do have to now do int work too, so it's not 2TFLOPs in handheld, it's 1TFLOPs dedicated FP32 + 1TFLOPs of FP32/Int cuda core work, which will always involve some Int work outside of synthetic benchmarks, having said that, these numbers basically would give us:

That perspective on Ampere would only make sense if comparing against Turing. Maxwell (or other comparables) also lose FP32 throughput in a mixed FP32/INT32 work load as well.
 
That perspective on Ampere would only make sense if comparing against Turing. Maxwell (or other comparables) also lose FP32 throughput in a mixed FP32/INT32 work load as well.
Absolutely, I'm aware of this. However 2TFLOPs Ampere =! 2TFLOPs RDNA, flops is still the primary way GPU performance on paper is compared, it's what most people have used in this thread, but I needed to undercut Drake's flop numbers, because they should under perform vs RDNA flop to flop, it will gain some ground from being in a closed system, and who knows if custom changes in hardware were made to improve performance, I just think it is important to say that 3.456TFLOPs is not a solid number when compared to 4TFLOPs RDNA in XBSS, it isn't 7/8th of the raw performance, and it doesn't get as close to it's theoretical number as Turing did... If Drake was 3.456TFLOPs Turing, it would beat RDNA 4TFLOPs in XBSS. Hopefully the posts were useful to the thread at understanding Switch 2's performance, especially if those clocks are used.
 
Switch is closer to 360 and PS3 than PS4 and Xbox one. Just getting certain games from the 7th gen working on switch is considered a "miracle port". PS4 level GPU power is without a doubt a generational leap. And in addition we can't forget that it would probably be pretty hard to be weaker than the jaguar CPU inside those machines. It's fully possible the CPU is stronger which would make the jump even more dramatic
Yes it's closer to 360/PS3, yet it's way better is many aspects (RAM amount being a supr important one). And I'm quite sure we won't get PS4 levels of performance in portable mode. If so, while still a a large jump in performances, it's hard for me to call this a true generational leap (where basically you should get x10 is a few key metrics). Slowing advances in process nodes + diminishing returns in IQ means getting real genarational updates is possible only whith very long lifecyles. But that's just me, I can live with Switch levels of performances for a few more years if that's what needed to really feel the difference with Switch 2. I know not everyone share this view.

Ok, all caught up, we know that if Switch 2 launches in 2023 or 2024, it will be T239, because 12 months ago, Nvidia was hacked and NVN files (up to date from Feb 2022) only supported 3 SoCs, T210 (Erista), T214 (Mariko), T239 (Drake). There is no other hardware as of 13 months ago, that this updated custom Nvidia API for Nintendo was being built around. Switch 2 was also shown by public files to have engineer samples out in April 2022 and final Silicon in August 2022, through the linux Kernel, we also know the hardware is still being worked on, which wouldn't be the case for canceled hardware, it also wouldn't make sense for Nvidia to continue to build T239 as a product if it wasn't a custom part for millions of devices, which realistically only Nintendo is even being hinted at. We also have job postings at both Nvidia and Nintendo from 2020 through 2023, that mention DLSS, next gen console, and AI work...

5nm Samsung is the worst node that Nvidia would go with if end of this year or early next year, because 8nm is already winding down production lines from what I've been hearing, and Nintendo was kicked off 20nm and had to move to 12nm around 4 years after 20nm was introduced. Exynos 980 was introduced September 2019 and was the first 8nm chip from Samsung, and 8nm is an enhanced process node of Samsung's ancient 10nm technology, so releasing at the end of 2023 or in 2024, 5nm Samsung is pretty much the oldest chip Nvidia would be able to produce without having to shrink it right away and have a new version out the following year at the absolute latest, which makes it far too costly to go with 8nm, also 5nm Samsung recently massively improved production failure rates, making it cheaper than Drake on 8nm would be, in the end we are also talking about something like ~150-190mm^2 which is reasonable at 5nm, but would be twice that on 8nm, and completely unrealistic in terms of cost.

I said that to say that your CPU clocks are too low, Nintendo used just under 2 watts for the original Switch's CPU, Drake's CPU is going to also use around 2 watts, this is pretty clear IMO, because DLSS benefits the GPU's power consumption so much that they can get away with a "higher" CPU ratio, ~2w on Samsung 5nm should give around 2GHz for the 8 core A78C, which is the most efficient modern ARM CPU atm anyways. 2GHz on 7 cores for gaming, should offer something around 70-85% of Steam Deck's CPU, because that Ryzen 2 CPU while clocked higher, only has 4 physical cores, though 8 threads, much like A78C's 8 cores 8 threads. A78C at 2GHz should offer around 66% of Ryzen 2 core at 3.5GHz, which is where I ultimately get these numbers from, it's of course not exact science, and custom, closed environments like the Switch, do benefit from efficiency gains thanks to task focus and native ports directly to the system, being able to code closer to the "metal" and what not, so that 70-85% number seems pretty safe.

Your GPU frequencies are about what I've been expecting for a long time, somewhere between 600-700MHz for the GPU in handheld and 1.1-1.2GHz docked, that is why that DLSS NVN test is so interesting to me, it fits with the original Switch's power consumption, 4.2w for 660MHz (2.05TFLOPs) + ~2w for the CPU, and 2-3w for the rest of the system, falls right inside the original Switch's ~5.5w TX1 SoC and 1.6-3.5w (screen really sucked power) for the rest of the system. This left the Original Switch with 7.1w - 9w power consumption based on minimal / maximum brightness and other connectivity settings found in original Switch menu from launch while playing botw. The above power consumption for Drake would be around 8w and maybe go as high as 10w, which OG Switch does hit in handheld mode with more demanding games.

Ampere's flops are inflated, mainly because Int cores from Turing architecture, were expanded in Ampere to run FP32 cuda core code. This allows for much higher paper stats, but those int cores did do int work, and these expanded FP32 cores do have to now do int work too, so it's not 2TFLOPs in handheld, it's 1TFLOPs dedicated FP32 + 1TFLOPs of FP32/Int cuda core work, which will always involve some Int work outside of synthetic benchmarks, having said that, these numbers basically would give us:

Handheld
CPU
70-85% of Steam Deck / >50% of PS5's CPU performance
GPU (2TFLOPs) Greater than Steam Deck / PS4, before DLSS is used to free up performance, pushing handheld performance up, maybe around 680M native resolution performance? (hard to say, but could have trade offs against PS4 pro, though screen resolution might be fairly limited, the small screen would also allow for lower graphical fidelity to not matter as much, much like when Steam Deck lacks in some settings.

Docked
CPU
Same CPU clocks/performance as handheld.
GPU (3.456TFLOPs) I think it makes the most sense to compare this with XBSS, it will have less raw flops available, especially after Int performance is drawn from the same Flops pool, however thanks to DLSS's superior reconstruction, it should match up fairly well with XBSS, and does offer some RT support with 12 RT cores, allowing it to have similar RT performance to the bigger brother consoles in PS5 and XBSX, it really depends on the bottleneck here, because it could have superior RT performance, especially thanks to DLSS' help with RT performance drops.

Switch 2 launching next year, would be in Switch's 8th year on the market, the oldest console Nintendo will have ever replaced by that point, and I don't expect them to stop selling Switch, since this new hardware is unlikely to be below $399, they will keep the V2 redbox/non OLED model on the market at $249 like the new mario bundle without a mario game would indicate (reports are that OLED models are slowing production, and the new bundle is a 2019 Mariko redbox model, not OLED), and Switch LIte or a replacement Switch Mini at $149 (sort of like how New 2DS XL launched at $149 4 months after Switch, in July 2017).

This isn't a super powerful console, but it's a nice increase from Switch, especially on the CPU side, but this comes from higher clocks and core count matching modern development (of the last decade), and DLSS' black magic reconstruction tech, without it, it would fall behind XBSS pretty easily, I'd say at best it would match PS4 Pro on that front without DLSS, but could even exceed XBSS given a low enough render resolution, compromising some IQ.
I'll believe the 2Ghz CPU mark when I see it. Nintendo is always very conservative with the CPU perfs for some reasons. And the Switch 2 will have more RAM, more bandwidth, likely faster storage and this rather large GPU and no real option to get a really bigger battery.
By the way, do we actually have real world number of actual power consumption of 8 A78 on a 4nm node?
We know it's smth like 2.5W@1.3Ghz on Samsung 8nm. It will be better with TSMC 4 for sure... but how much better? I can see it being sub 2W for sure... @1.3Ghz. That's my expectations: 1.2-1.5 range at max. But time will tell :)
As for the GPU, I guess we need to keep a balanced system. For sure hoping for more than 4TFlops with a slow 128bits bus is a bit dumb. So that's indeed my optimistic expectation: ~2TF mobile and 3.5TF docked. But I'm fully prepared to get smth lower, possibly way lower(1.2 mobile, double that for docked). Maybe if we get fast local storage and 12GB of RAM, it could be an acceptable tradeoff.
We'll see. But I'd say: keep your expectations low, because we got numbers from the bench test before, and they never materialized.
 
Absolutely, I'm aware of this. However 2TFLOPs Ampere =! 2TFLOPs RDNA, flops is still the primary way GPU performance on paper is compared, it's what most people have used in this thread, but I needed to undercut Drake's flop numbers, because they should under perform vs RDNA flop to flop, it will gain some ground from being in a closed system, and who knows if custom changes in hardware were made to improve performance, I just think it is important to say that 3.456TFLOPs is not a solid number when compared to 4TFLOPs RDNA in XBSS, it isn't 7/8th of the raw performance, and it doesn't get as close to it's theoretical number as Turing did... If Drake was 3.456TFLOPs Turing, it would beat RDNA 4TFLOPs in XBSS. Hopefully the posts were useful to the thread at understanding Switch 2's performance, especially if those clocks are used.
FLOPs are FLOPs. There is no difference. What you see on the PC is API and software. Ampere is far ahead of RDNA2 when it comes to compute per mm^2.
 
Yes it's closer to 360/PS3, yet it's way better is many aspects (RAM amount being a supr important one). And I'm quite sure we won't get PS4 levels of performance in portable mode. If so, while still a a large jump in performances, it's hard for me to call this a true generational leap (where basically you should get x10 is a few key metrics). Slowing advances in process nodes + diminishing returns in IQ means getting real genarational updates is possible only whith very long lifecyles. But that's just me, I can live with Switch levels of performances for a few more years if that's what needed to really feel the difference with Switch 2. I know not everyone share this view.


I'll believe the 2Ghz CPU mark when I see it. Nintendo is always very conservative with the CPU perfs for some reasons. And the Switch 2 will have more RAM, more bandwidth, likely faster storage and this rather large GPU and no real option to get a really bigger battery.
By the way, do we actually have real world number of actual power consumption of 8 A78 on a 4nm node?
We know it's smth like 2.5W@1.3Ghz on Samsung 8nm. It will be better with TSMC 4 for sure... but how much better? I can see it being sub 2W for sure... @1.3Ghz. That's my expectations: 1.2-1.5 range at max. But time will tell :)
As for the GPU, I guess we need to keep a balanced system. For sure hoping for more than 4TFlops with a slow 128bits bus is a bit dumb. So that's indeed my optimistic expectation: ~2TF mobile and 3.5TF docked. But I'm fully prepared to get smth lower, possibly way lower(1.2 mobile, double that for docked). Maybe if we get fast local storage and 12GB of RAM, it could be an acceptable tradeoff.
We'll see. But I'd say: keep your expectations low, because we got numbers from the bench test before, and they never materialized.

A78-X1-crop-6_575px.png
This chart shows what A78 power consumption is on TSMC 5nm (which is better than Samsung 5nm, but Samsung 4nm is closer). What's actually important here is A77 consumes 1w at 2.6Ghz, and that A78 consumes less than half of A77 at 2.3GHz at 2.1GHz. This means the drop from 3GHz to 2GHz is somewhere around 300mw give or take 15%? 300mw times 7 cores is 2.1w, now is Drake going to be 2GHz? I don't know, but dropping the clocks too much more, actually don't yield performance gains over dropping entire cores, so the compromise is a higher clock, something in the neighborhood of 2GHz, could be 1.8GHz, I use 2GHz as a rough estimation.
FLOPs are FLOPs. There is no difference. What you see on the PC is API and software. Ampere is far ahead of RDNA2 when it comes to compute per mm^2.
This isn't flops, this is theoretical flops... In the end it's really not important, if you want to compare exact performance between two architectures, you really have to run the software, no general measure will be accurate enough IMO. It's certainly outside of the scope of my post, and was the purpose of explaining why Turing and Ampere flops have very different performance in general software.
 
View attachment 8440
This chart shows what A78 power consumption is on TSMC 5nm (which is better than Samsung 5nm, but Samsung 4nm is closer). What's actually important here is A77 consumes 1w at 2.6Ghz, and that A78 consumes less than half of A77 at 2.3GHz at 2.1GHz. This means the drop from 3GHz to 2GHz is somewhere around 300mw give or take 15%? 300mw times 7 cores is 2.1w, now is Drake going to be 2GHz? I don't know, but dropping the clocks too much more, actually don't yield performance gains over dropping entire cores, so the compromise is a higher clock, something in the neighborhood of 2GHz, could be 1.8GHz, I use 2GHz as a rough estimation.

This isn't flops, this is theoretical flops... In the end it's really not important, if you want to compare exact performance between two architectures, you really have to run the software, no general measure will be accurate enough IMO. It's certainly outside of the scope of my post, and was the purpose of explaining why Turing and Ampere flops have very different performance in general software.
This chart is nice and all, but gives no real context as to how this A77@2.6Ghz was measured at 1W. Avg of your "typical" smartphone load? Or maxed CPU utilization like it will be on a console? I could be the latter, but well, it's a marketing slide....
As I said, I'll believe the 8xA78@2Ghz when I see it. I just don't expect to see it with Switch 2, but I'd love to get a pleasant surprise.

And being honest here, this T239, even clocked low, is a good chip. I still feel insulted by the off the shelf, bog standard X1 we got with Switch 1. A least the rumored specs of Drake are good and most importantly well balanced. Enough to warrant a "new generation" of console? Maybe for some, maybe not for others. But the rumored chip is good. Paired with lots of RAM and a moderatly fast storage solution, I can see this new console last 10 years.
 
This chart is nice and all, but gives no real context as to how this A77@2.6Ghz was measured at 1W. Avg of your "typical" smartphone load? Or maxed CPU utilization like it will be on a console? I could be the latter, but well, it's a marketing slide....
As I said, I'll believe the 8xA78@2Ghz when I see it. I just don't expect to see it with Switch 2, but I'd love to get a pleasant surprise.

And being honest here, this T239, even clocked low, is a good chip. I still feel insulted by the off the shelf, bog standard X1 we got with Switch 1. A least the rumored specs of Drake are good and most importantly well balanced. Enough to warrant a "new generation" of console? Maybe for some, maybe not for others. But the rumored chip is good. Paired with lots of RAM and a moderatly fast storage solution, I can see this new console last 10 years.
Tegra X1 was actually pretty great for March 2017, I mean it certainly punches above it's weight. I do agree that Drake even with lower clocks makes sense, however even if the CPU were to pull 3w, they could still comfortably go for it. I actually think we will see Nintendo use a 5000mah battery in Drake, they use a 4315mah battery in all switch models, but density has improved and 5000mah is a very standard (thus should be cheaper) battery. You can also find it in $200 phones, as well as flag ship Samsung phones, so it's extremely wide spread, and thus easy to get enough components for production without worry. Drake will last possibly through this decade, so going with such a common battery, helps future proof production a bit.

Typical CPU load in games is moderate at best, but I am not trying to guarantee any clocks, just looking at available information.
 
But the rumored chip is good. Paired with lots of RAM and a moderatly fast storage solution, I can see this new console last 10 years.

Storage is a curious one. What's the cheapest thing Nintendo can go for these days and have some sort of current gen parity? Steamdeck's getting away with eMMC and SD cards for now, so maybe they'll do the same? There's not much in the way of games designed for current gen console storage though.
 
This isn't flops, this is theoretical flops... In the end it's really not important, if you want to compare exact performance between two architectures, you really have to run the software, no general measure will be accurate enough IMO. It's certainly outside of the scope of my post, and was the purpose of explaining why Turing and Ampere flops have very different performance in general software.
Turing is the exception and not Ampere. Turing has a second math pipeline which is 70% of the time idling. Ampere uses these intra-computeunit ressources much better.
 
Last edited:
Storage is a curious one. What's the cheapest thing Nintendo can go for these days and have some sort of current gen parity? Steamdeck's getting away with eMMC and SD cards for now, so maybe they'll do the same? There's not much in the way of games designed for current gen console storage though.
EMMC has a read speed of 400MB/s, that is actually fast enough for current gen games IMO, UFS 3.1 is the hope though, would put it at parity with current gen consoles, really just matters what Nintendo is planning on price and storage, but there is a cool little device on the market called Retroid Pocket 3 Plus, that is $149USD and comes with 128GB EMMC storage, and performance somewhere near the TX1. Drake could realistically have 256GB EMMC at 400MB/s or UFS 3.1 at 2GB/s+ with either 128GB or 256GB capacity, considering Drake is probably a $399 product.
Turing is the exception and not Ampere. Turing has a second math pipeline which is 70% of the time idling. Ampere uses these intra-computeunit ressources much better.
Yeah, I know all of this, I think my point did come across to most people, I wasn't trying to say that Turing was a better architecture or more efficient. Ampere just looks inefficient on paper because it takes something like 20TFLOPs (RTX 3070) to match what 14TFLOPs (RTX 2080ti) does. Of course we know why, but I was using TFLOPs to compare platforms, and trying to explain to people why the numbers aren't 1:1.
 
EMMC has a read speed of 400MB/s, that is actually fast enough for current gen games IMO, UFS 3.1 is the hope though, would put it at parity with current gen consoles, really just matters what Nintendo is planning on price and storage, but there is a cool little device on the market called Retroid Pocket 3 Plus, that is $149USD and comes with 128GB EMMC storage, and performance somewhere near the TX1. Drake could realistically have 256GB EMMC at 400MB/s or UFS 3.1 at 2GB/s+ with either 128GB or 256GB capacity, considering Drake is probably a $399 product.

That's a couple of interesting options. If they stick with carts it does let the cheap out and only have enough internal storage for one AAA title, which is 256GB these days. The S struggles with more than a handful of AAA titles on its 512GB.
 
That's a couple of interesting options. If they stick with carts it does let the cheap out and only have enough internal storage for one AAA title, which is 256GB these days. The S struggles with more than a handful of AAA titles on its 512GB.
It's not like switch 2 will be using the assets of the consoles anyway.
 
It could use the same as the S, if it's as powerful as rumoured. That'd save a chunk of porting effort.
I don't expect similar assets by virtue of devs going for smaller game sizes on Switch. rather than making for 512GB/1TB Xbox's, the 256GB SD card is probably the most common option to tailor towards
 
EMMC has a read speed of 400MB/s, that is actually fast enough for current gen games IMO, UFS 3.1 is the hope though, would put it at parity with current gen consoles, really just matters what Nintendo is planning on price and storage, but there is a cool little device on the market called Retroid Pocket 3 Plus, that is $149USD and comes with 128GB EMMC storage, and performance somewhere near the TX1. Drake could realistically have 256GB EMMC at 400MB/s or UFS 3.1 at 2GB/s+ with either 128GB or 256GB capacity, considering Drake is probably a $399 product.
Yeah, EMMC certainly is fast enough, and it's the cheapest option. Plus if you go faster, you also draw more power, so... EMMC would make even more sense now that hw decompression is a thing.
That's a couple of interesting options. If they stick with carts it does let the cheap out and only have enough internal storage for one AAA title, which is 256GB these days. The S struggles with more than a handful of AAA titles on its 512GB.
Well, you have to cut corners somewhere. If we get a larger battery, the nice OLED screen (possibly 1080p... read no rumors at all on this front), 12GB of RAM, a far better SoC and so on... I'd say it's unliky we'll also get 256GB (or more) of super fast storage. I can see Nintendo offering 128GB with the (micro) SD card option still being there. Meaning the speed of the embedded local storage really wouldn't need to be super fast as the fastest micro SD Card you can buy today has a marketing read speed of 300MB/s (UHS-II tops at 312MB/s and the cards are no cheap at all).

I really think the best option (read cheapest and lowest power consumption) would be just that: 128MB of "fast" EMMC + "fast" UHS-II micro SD card + hw decompression unit. WIth new SKUs having more local storage coming later.
Me being pessimistic, I can also see a 64MB EMMC/UHS-I micro SD/no hw decompression option. Sure that's even cheaper :D
 
Yeah, EMMC certainly is fast enough, and it's the cheapest option. Plus if you go faster, you also draw more power, so... EMMC would make even more sense now that hw decompression is a thing.

Well, you have to cut corners somewhere. If we get a larger battery, the nice OLED screen (possibly 1080p... read no rumors at all on this front), 12GB of RAM, a far better SoC and so on... I'd say it's unliky we'll also get 256GB (or more) of super fast storage. I can see Nintendo offering 128GB with the (micro) SD card option still being there. Meaning the speed of the embedded local storage really wouldn't need to be super fast as the fastest micro SD Card you can buy today has a marketing read speed of 300MB/s (UHS-II tops at 312MB/s and the cards are no cheap at all).

I really think the best option (read cheapest and lowest power consumption) would be just that: 128MB of "fast" EMMC + "fast" UHS-II micro SD card + hw decompression unit. WIth new SKUs having more local storage coming later.
Me being pessimistic, I can also see a 64MB EMMC/UHS-I micro SD/no hw decompression option. Sure that's even cheaper :D
I sincerely hope they will stay with the 720p OLED screen of the current OLED model.
The number of games, that run in native resolution on current Switch is already low enough. That way at least the OG Switch games will run in native res on Switch 2 ;)

I am also expecting no more than 128GB of onboard storage. Could see UFS 3.1 being used, which is quite common now in smartphones (UFS 4.0 will start this year AFAIK).
 
I sincerely hope they will stay with the 720p OLED screen of the current OLED model.
The number of games, that run in native resolution on current Switch is already low enough. That way at least the OG Switch games will run in native res on Switch 2 ;)

I am also expecting no more than 128GB of onboard storage. Could see UFS 3.1 being used, which is quite common now in smartphones (UFS 4.0 will start this year AFAIK).
Well, the 720p screen is fine. Having a 1080p would help with text clarity mostly. If DLSS upscaling from 540-720p to 1080p power draw is "light enough", it would be a win/win scenario. Yet real world power requirement of DLSS is not smth talked that much. Maybe because no one cared till now, or maybe because the power draw is not that small.

Regarding UFS 3.1, it would be great to have it as it's a better solution vs eMMC in all aspects but cost.
 
Back
Top