Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
Thx for the explanations sebbbi.

The other case can be increase in details/effects, but same output resolution. Some PS4pro game do that. Anyway, I still believe most devs will stick to undock profil, even when docked. Simpler that way.
There are however passes that are resolution independent.

For example shadow map rendering. Shadow map rendering cost doesn't change when rendering resolution is increased. Unless you of course also increase shadow resolution accordingly. Higher rendering resolution generally tends to require slightly higher shadow map resolution to look good (low resolution hides low shadow map resolution).

Another example is GPGPU work that is resolution independent. Examples: GPU based occlusion culling, light culling, complex GPU animations (vegetation/wind, skinning), GPGPU based game logic / physics. These things need to be designed according to the lowest performance mode. Resolution scaling to half doesn't make these steps 2x faster -> execution time doubles at half clocks.
 
Sometimes it requires more work to get lower precision calculations to work (with zero image quality degradation), but so far I haven't encountered big problems in fitting my pixel shader code to FP16 (including lighting code). Console developers have a lot of FP16 pixel shader experience because of PS3. Basically all PS3 pixel shader code was running on FP16.

It is still is very important to pack the data in memory as tightly as possible as there is never enough bandwidth to lose. For example 16 bit (model space) vertex coordinates are still commonly used, the material textures are still dxt compressed (barely 8 bit quality) and the new HDR texture formats (BC6H) commonly used in cube maps have significantly less precision than a 16 bit float. All of these can be processed by 16 bit ALUs in pixel shader with no major issues. The end result will still be eventually stored to 8 bit per channel back buffer and displayed.

Could you give us some examples of operations done in pixel shaders that require higher than 16 bit float processing?

EDIT:
One example where 16 bit float processing is not enough: Exponential variance shadow mapping (EVSM) needs both 32 bit storage (32 bit float textures + 32 bit float filtering) and 32 bit float ALU processing.

However EVSM is not yet universally possible on mobile platforms right now, as there's no standard support for 32 bit float filtering in mobile devices (OpenGL ES 3.0 just recently added support for 16 bit float filtering, 32 bit float filtering is not yet present). Obviously GPU manufacturers can have OpenGL ES extensions to add FP32 filtering support if their GPU supports it (as most GPUs should as this has been a required feature in DirectX since 10.0).

Lack of post processing effects in games is IMHO the biggest difference between mobile and console graphics. Mobile games tend to have zero post effects. FP16 is more than enough for post processing (DOF, bloom, motion blur, color correction, tone mapping). As FP16 makes post processing math 2x faster on Rogue (all the new iDevices), it will actually be a big thing towards enabling console quality graphics on mobile devices. Obviously FP16 is not enough alone, we also need to solve the bandwidth problem of post processing on mobiles. On chip solutions (like extending the tiling to support new things) would likely be the most power efficient answers.

So it seems that this would apply to Switch in a big way. Probably not inappropriate reference the Tegra powering Switch in terms of FP16 performance seeing as how most shaders can infact benefit from FP16. Assuming Nintendo and Nvidia came up with solutions to resolve memory bandwidth issues, shader performance on Switch should greatly outperform Wii U/360/PS3.

Rumors are suggesting Ubisoft is giving it a go with bringing the new Assassins Creed game to Switch. It will be interesting to see just how the developers tackle this task. The low hanging fruit would be lower the resolution. Possibly even use checkerboard rendering. Games that target 1080p 60fps should be the most port friendly, seeing as how they can free up a ton of work by dropping the resolution and framerate to 720p 30fps. Reading through Unreal's mobile development information is pretty interesting. It makes me curious just how easily it is on other engines to implement static lighting and shadows. Reaching any sort of parity with the other consoles shouldn't even be on the table. Trying too hard to do so will just result in a game that has a terrible framerate. Better off taking the mobile route with reduced quality lighting and shadows and have a game that isn't a clunky mess to play.
 
So it seems that this would apply to Switch in a big way. Probably not inappropriate reference the Tegra powering Switch in terms of FP16 performance seeing as how most shaders can infact benefit from FP16. Assuming Nintendo and Nvidia came up with solutions to resolve memory bandwidth issues, shader performance on Switch should greatly outperform Wii U/360/PS3.
Yes, FP16 is nice for post processing. But bandwidth and texture samplers are also potential bottlenecks in post processing passes. But compute shader (groupshared memory based) techniques save bandwidth and reduce sampler cost drastically. You need a bit more ALU this way, but that's what double rate FP16 offers :)
Rumors are suggesting Ubisoft is giving it a go with bringing the new Assassins Creed game to Switch. It will be interesting to see just how the developers tackle this task. The low hanging fruit would be lower the resolution. Possibly even use checkerboard rendering. Games that target 1080p 60fps should be the most port friendly, seeing as how they can free up a ton of work by dropping the resolution and framerate to 720p 30fps.
Many Ubisoft console games are already using checkerboard rendering. Rainbow Six Siege was 60 fps 1080p with checkerboard. And Watch Dogs 2 was 30 fps 1080p with checkerboard. Ubisoft calls their checkerboard implementation "Temporal Filtering". I would assume they adapt this excellent technique to Assassin's Creed also at some point.
 
Many Ubisoft console games are already using checkerboard rendering. Rainbow Six Siege was 60 fps 1080p with checkerboard. And Watch Dogs 2 was 30 fps 1080p with checkerboard. Ubisoft calls their checkerboard implementation "Temporal Filtering". I would assume they adapt this excellent technique to Assassin's Creed also at some point.

There was a very good presentation from Ubisoft's Jalal El Mansouri about this at GDC 2016 - I can't post a link yet, as I'm too new, but if you search for "Rendering 'Rainbow Six | Siege'" you'll find the slide deck.

This whole thread has been a welcome breath of fresh air and actual technical discussion vs. the hype-trains going full steam ahead at other forums about how the Switch will run DOOM and Witcher 3 "because Vulkan." I'm seconding @Goodtwin's hope that Nintendo encourages their developers to prioritize smooth gameplay over trying in vain to reach visual parity with the Xbox One and PS4.
 
Very interesting that Ubisoft has already been using checkerboard rendering for a while now. I would have to think that this would be very beneficial for developers trying to port AAA games to Switch. 720P checkerboard rendering. I know some people will baulk at the idea seeing as how image quality suffers, but lets be real here, if your looking for the highest quality visuals for these games as a consumer, your going to play these games on the PC or PS4/X1.

I still have a feeling that Nintendo one way or the other solved the bandwidth issue on the Tegra X1. Either by going with some esram (very Nintendo thing to do), or by going to a 128 bit bus. Not sure which is less expensive, but I doubt after years of coming up with creative ways of avoiding memory stalls Nintendo will suddenly release hardware that has such an obvious hurdle.

A57-power-curve_575px.png


I was able to catch the effect of temperature over power consumption in these tests: over the period of a minute power would continuously increase as the silicon heated up. After only 10 seconds of load the consumption would increase by 5%. Frequencies above 1600MHz especially suffer from this effect, as static power leakage seems to be increasing a whole lot on these states, so I wasn't able to account for the measured power to dynamic leakage alone.

The battery savings mode of the phone also caps the frequency of the A57 cores at 1400MHz, allowing for a very reasonable maximum 3.3W cap on the big cores while still providing excellent performance.

The reduced clock speeds on the A57 probably should have been obvious. Four cores at 1.9Ghz pulls nearly 8 watts, and clock speeds over 1.6Ghz seem to get hot and leakage starts to be a problem.

The good thing is that the A57's outperform the A15 cores by 15-30% on average, with certain benchmarks showing absolutely massive improvements in floating point performance.
 
People seem to have way too high expectations towards mobile devices.

I can't help but feel like this is an unfair statement, at least if it's directed to this forum/discussion.

Someone please correct me if I'm wrong, but I don't know of anyone here who said they were expecting a mobile device to match the Xbone in performance.
I know people expected the NX to at least match the Xbone in performance if it was a home console, but that isn't a stretch at all.
Even when it was announced that the NX was a hybrid, I don't remember people saying they were expecting the console to surpass the Xbone, because the SoC is still cramped inside a (although very thick) tablet format.



Xbox One S (slim) is ~60 watts in gameplay. A mobile device of this size has to consume less than 10 watts (including display). Even less, if we assume 4 hour battery life (100% gameplay). It would be physically impossible to design a device that has 6x+ higher perf/watt than a die shrinked 20 nm console AP. Thus nobody should expect Nintendo Switch to match Xbox One S in performance.
(...)
1/3 perf of 60 watt console on a mobile device should be considered a good achievement, not a letdown.

@sebbbi aren't you mixing two different things here?
Unless the rumors are wrong and this part in the patents is unused, there are two power/performance profiles: one for docked and another for mobile.
You're saying 1/3rd the perf. of Xbone is good for a handheld and that would definitely be true, fantastic even. But that's not what the rumors are pointing at.

Eurogamer's clocks + 2 SM means the Switch is actually getting 12% the performance of the Xbone in handheld mode. 1/3rd the performance of Xbone is achieved in docked mode, where power consumption constraints shouldn't really be that much different and we'd be looking solely at heat dissipation constraints.

1/3rd the performance of Xbone in "home console mode" is a real letdown, IMO.


These specs would make Switch around 2x faster than last gen consoles when docked and around equal to last gen consoles when handheld. Don't get me wrong. These are fantastic specs for an handheld device. Carrying Xbox 360 in your pocket is great. 720p was the most common last gen resolution.

They're really not great for a new handheld in tablet form.
The 2.5 year-old Shield Tablet K1 seems to get PS360 performance with a ~2 hour battery life, practically without throttling. Limit the A15 cores to ~1.4GHz and it may hit the 3.5 hour mark that console makers have been after for their handhelds.
More recent SoCs like the Snapdragon 820, apple A9 or Exynos 8890 would probably beat the TK1 in gaming performance (even with throttling in mind for a tablet factor) and if they were paired with a similar battery they'd last quite a bit more. And they're not even gaming-oriented SoCs.

IMHO, there's really no valid perspective to think a 2 SM @ 300MHz GPU in a handheld is fantastic in early 2017. It would be fantastic in 2013/14, very good in 2015 and passable in 2016. But it's not even good for 2017. If I had to guess, the 16FF Mediatek P30 for mid-low end handhelds will probably match that quite easily, and that too has a 2*32bit LPDDR4 controller.


It's fantastic if the console costs less than $200 and it gets all the 3DS devs onboard, but it might not be enough to justify the purchase of yet another platform. Plus, there's the chance that most 3DS owners are counting on something to wear inside a pocket, and parents may be wary to offer a tablet to their toddlers (whereas a shell format was much safer).


I guess someone with a Pixel C should just force the A57 cores to clock down to a fixed 1GHz, and see how long the tablet would last running the GFXBench battery test. Even better if they could also downclock the GPU to 768 and 300MHz as well.
 
Eurogamer's clocks + 2 SM means the Switch is actually getting 12% the performance of the Xbone in handheld mode. 1/3rd the performance of Xbone is achieved in docked mode, where power consumption constraints shouldn't really be that much different and we'd be looking solely at heat dissipation constraints.

FP16 performance puts Switch at portable 314Gflop and 786 Gflop docked. Unreal Engine 4 uses FP16 extensively, so its not a pointless marketing metric. Like Sebbi mentioned in the post I quoted, he wasn't finding many cases where his pixel shaders couldn't be done in FP16. I have always heard that Maxwell outperforms GCN on a per flop basis. So assuming there aren't debilitating memory bottlenecks, portable Switch portable is probably more like 25% of Xbox One and 60% Xbox One docked.
 
About the Maxell vs GCN, is that relevant in a consoles context ? I mean, I guess the devtools for consoles are optimized for the GCN architecture. It could be worse on Maxell, if actual shaders&co are using gcn strengths ...
 
So assuming there aren't debilitating memory bottlenecks, portable Switch portable is probably more like 25% of Xbox One and 60% Xbox One docked.

No way Maxwell 2.5 with 2*FP16 gets twice the performance-per-GFLOP of GCN2-ish in the Xbone.
IIRC sebbbi said around 70% of the pixel shaders in his games could use FP16 without noticeable loss in quality, but pixel shader performance is not the only bottleneck in the pipeline.
 
About the Maxell vs GCN, is that relevant in a consoles context ? I mean, I guess the devtools for consoles are optimized for the GCN architecture. It could be worse on Maxell, if actual shaders&co are using gcn strengths ...

Sony and Microsoft aren't going to be sharing much with Nintendo. The consoles have tools optimized for themselves, and at least in the PC space where GCN and Maxwell compete more directly it's a mix.
Nvidia is allegedly doing a lot of heavy lifting with the API and software stack for hardware they know intimately, but where that places them versus Sony or Microsoft is unclear.
 
Sony and Microsoft aren't going to be sharing much with Nintendo. The consoles have tools optimized for themselves, and at least in the PC space where GCN and Maxwell compete more directly it's a mix.
Nvidia is allegedly doing a lot of heavy lifting with the API and software stack for hardware they know intimately, but where that places them versus Sony or Microsoft is unclear.
Ahead according to devs talking to Michael pachter who relied that the Nintendo Switch is the easiest of the 3 to develop for. Also in doom, maxwell still out performs gcn by ~25‰ per flop. That game should be as close to as optimized on PC and consoles as it gets.
 
Sony and Microsoft aren't going to be sharing much with Nintendo. The consoles have tools optimized for themselves, and at least in the PC space where GCN and Maxwell compete more directly it's a mix.
Nvidia is allegedly doing a lot of heavy lifting with the API and software stack for hardware they know intimately, but where that places them versus Sony or Microsoft is unclear.

@Rootax 's term was it could be worse on Maxwell, so you're both agreeing on the unclear part.
 
It's fantastic if the console costs less than $200 and it gets all the 3DS devs onboard, but it might not be enough to justify the purchase of yet another platform. Plus, there's the chance that most 3DS owners are counting on something to wear inside a pocket, and parents may be wary to offer a tablet to their toddlers (whereas a shell format was much safer).

At least if you ruin the controllers, they can get replaced.

On another note, I wonder what online things it does when online. Nvidia might provide security updates for 10 years, that wouldn't be out of the ordinary, at least from the graphics stack side of things. That's one difference with some random Qualcomm, Exynos, Mediatek thing. ARM SoCs are supported from six months to a couple years if that, nvidia comes from a more traditional computer/workstation industry where you don't cheap out on this.
Do we have an idea about the OS it runs? I would think either linux or BSD, albeit some hidden plumbing that doesn't matter at all to the user. It can be updated over the years. Phone/tablet SoC would leave you stuck on the same linux kernel version for years till the console's demise.
 
No way Maxwell 2.5 with 2*FP16 gets twice the performance-per-GFLOP of GCN2-ish in the Xbone.
IIRC sebbbi said around 70% of the pixel shaders in his games could use FP16 without noticeable loss in quality, but pixel shader performance is not the only bottleneck in the pipeline.

Why wouldn't you get 2X throughput with FP16 for pixel shaders where it works? That's the point of going half precision, double the throughput for the same operations. GCN does seem to be better with GPU Compute operations, but as far as graphics rendering go, Maxwell outperforms the GCN architecture the PS4/X1 use. Sebbi made mention that a lot of newer games are really pushing GPU Compute in their games, so this would be a substantial hurdle for those developers if trying to do a straight port. GPU Compute is going to be very limited on Switch seeing as how they have to limit that workload to the portable clock speed. Who knows, maybe it will make more sense for some games to use older game engines from the 360/PS3. Assuming the Assassisns Creed Egypt rumor is legit, is there any reason the Switch version couldn't be constructed using the older engine? Much of the work for games is asset creation, so I don't know how much work it would take to plug in the assets into an older engine. Its basically what Treyarch did with the COD games on Wii. Then there was Prince of Persia Forgotten Sands for Wii that was built from the ground up for Wii, so who knows how developers will decide to tackle it. Ubisoft seems to be taking on the task. EA not so much.

Isnt the PS4/X1 GPU based on the GCN 1.1 architecture?
 
Last edited:
Yes, FP16 is nice for post processing. But bandwidth and texture samplers are also potential bottlenecks in post processing passes. But compute shader (groupshared memory based) techniques save bandwidth and reduce sampler cost drastically. You need a bit more ALU this way, but that's what double rate FP16 offers :)

So from what I understand of the packed fp16 is that it only applies when you're performing the same op on both values - is that a non-issue in general for games or is that basically the point of "group shared" techniques?
 
Why wouldn't you get 2X throughput with FP16 for pixel shaders where it works?

Where it works, yes. Just not everywhere, as you suggested when you claimed Maxwell's performance-per-GFLOPs would be 2x GCN 2's.
 
Status
Not open for further replies.
Back
Top