Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
But where are you getting the projected clock speeds from? And where are you getting 4 x [8 Jags @ 1.75] from?

You might find it damn sad, but some actual, comprehensive answers would help prevent the sadness from spreading.

Honestly didn't know that Volta was 8 SP flops per CUDA core per cycle btw. Couldn't find anything to say that Xavier was shipping to customers in 2017 either.

Edit: Contrary to the image above suggesting Drive PX 2 draws 80 W, Ars seem to be pretty sure it draws much more - more like 250W: http://www.anandtech.com/show/9903/nvidia-announces-drive-px-2-pascal-power-for-selfdriving-cars

"What isn’t in doubt though are the power requirements for PX 2. PX 2 will consume 250W of power – equivalent to today’s GTX 980 Ti and GTX Titan X cards – and will require liquid cooling. NVIDIA’s justification for the design, besides the fact that this much computing power is necessary, is that a liquid cooling system ensures that the PX 2 will receive sufficient cooling in all environmental conditions."

So at this time, I continue to be happy calling bullshit on that 20W figure.

common sense would tell you what he's suggesting is BS, it's actually pretty sad he believed it, it's like he's living in a fantasy land, and ignoring the law of physics.
 
Edit: Contrary to the image above suggesting Drive PX 2 draws 80 W, Ars seem to be pretty sure it draws much more - more like 250W: http://www.anandtech.com/show/9903/nvidia-announces-drive-px-2-pascal-power-for-selfdriving-cars

"What isn’t in doubt though are the power requirements for PX 2. PX 2 will consume 250W of power – equivalent to today’s GTX 980 Ti and GTX Titan X cards – and will require liquid cooling. NVIDIA’s justification for the design, besides the fact that this much computing power is necessary, is that a liquid cooling system ensures that the PX 2 will receive sufficient cooling in all environmental conditions."

So at this time, I continue to be happy calling bullshit on that 20W figure.
The Ars article predates the actual hardware. The 250 W figure is for a prototype that used Maxwell hardware. When nVidia discussed PX2 @ Hot Chips later that year they gave the 80 W figure: http://wccftech.com/nvidia-tegra-parker-soc-hot-chips/
 
I came across some interesting news today on 3dcenter.org: http://www.3dcenter.org/news/2016-10. They compare GTX 1050Ti and RX 460 and end up with the conclusion that the 1050Ti delivers 67% more performance / W. That might be one of the reasons why Nintendo choose nVidia.

Now we know that the new XBox One S, which has a SOC made on 16nm FinFET, draws about 50 W when gaming. With that is achieves about 1.4 TFlops FP32. That's 28 GFlops / W.

Now let's speculate that a design with nVidia's more efficient graphics cores, way more efficient ARM CPU cores (in comparison to the Jaguar cores), more efficient (but slower) memory and general attention to power efficiency and other tricks (like better memory compression, low overhead API, etc.) might achieve twice the performance per Watt.That would be 56 GFlops / W.

A docked Switch might have a power consumption of about 15 W. That way the Switch might achieve 0.84 TFlops FP32 when docked or 60% of XB1 S FP32 performance. Not bad.

Please not that this is speculation and only meant to provide a rough estimate what Switch might achieve best case. Don't hate me for it.

Since Tegra and (probably Switch) has 2x performance when using FP16, I wonder if Switch will use that a lot (where applicable) in order to provide even better performance, maybe even at the expense of image quality. Or maybe NVN can transparently switch from FP32 to FP16 in some cases when in mobile mode in order to save power.
 
Now let's speculate that a design with nVidia's more efficient graphics cores, way more efficient ARM CPU cores (in comparison to the Jaguar cores), more efficient (but slower) memory and general attention to power efficiency and other tricks (like better memory compression, low overhead API, etc.) might achieve twice the performance per Watt.That would be 56 GFlops / W.

A docked Switch might have a power consumption of about 15 W. That way the Switch might achieve 0.84 TFlops FP32 when docked or 60% of XB1 S FP32 performance. Not bad.

Please not that this is speculation and only meant to provide a rough estimate what Switch might achieve best case. Don't hate me for it.
It's perhaps not a bad line of reasoning, but you'd need some decent numbers rather than pie-in-the-sky comparisons. How many more watts would Jaguar use than ARM in Switch? How many more watts does XB1's DDR3 use than the LPDDR4 expected in Switch? And discount things like 'low overhead API'. A low overhead API just means more graphics can be run, and typically means the hardware runs hotter. ;) DX12 is also plenty low enough. Maybe what you meant was 'lighter OS'?

Since Tegra and (probably Switch) has 2x performance when using FP16, I wonder if Switch will use that a lot (where applicable) in order to provide even better performance, maybe even at the expense of image quality. Or maybe NVN can transparently switch from FP32 to FP16 in some cases when in mobile mode in order to save power.
It's probably wrong to say 'Switch will use FP16 a lot' and instead say devs will use it, because it's a software choice.
 
It's perhaps not a bad line of reasoning, but you'd need some decent numbers rather than pie-in-the-sky comparisons. How many more watts would Jaguar use than ARM in Switch? How many more watts does XB1's DDR3 use than the LPDDR4 expected in Switch? And discount things like 'low overhead API'. A low overhead API just means more graphics can be run, and typically means the hardware runs hotter. ;) DX12 is also plenty low enough. Maybe what you meant was 'lighter OS'?

It's probably wrong to say 'Switch will use FP16 a lot' and instead say devs will use it, because it's a software choice.
From a quick look around the net LPDDR3 uses about 15% less power than DDR3. LPDDR4 uses 40% less power than LPDDR3. So that's 1 * 0.85 * 0.6 = 0.51, let's say LPDDR4 uses about 50% less power than DDR3 (ignoring bandwidth). Since Switch probably won't have more than 8 GB Ram, memory power consumption will be at least 50% lower, more if Switch has less memory. So my guestimate works out here.

Comparing Jaguar and ARM is much harder. The best I have found is this SOC shootout from notebookcheck.com: http://www.notebookcheck.net/SoC-Shootout-x86-vs-ARM.99496.0.html
It shows that the Tegra 4 has comparable CPU performance @ 4 W TDP (using 4 ARM Cortex-A15 cores) to the Jaguar-based AMD A4-5000 @ 15 W TDP and both are on the 28nm node. XB1 has twice as much cores as AMD A4-5000, Switch has newer, faster and more efficient cores (compared to Tegra 4). So I would guess that Switch CPU power consumption is at the very least 50% lower than XB1S (but may not be able to match XB1S total processing power), more likely CPU power consumption is even less, maybe even lower than 30%.

You're right. "Lighter OS" is what I meant.

Regarding FP16: Of course it's a software choice, but that means developers need to actively use it. I'm sure we won't see that kind if optimization for every port. So I wonder if nVidia could provide functionality in it API layer that transparently converts selected FP32 render targets and shaders to FP16 without 3rd party developer intervention. Here is how it would work: Before a game comes out nVidia analyzes the game and provides a profile that tells the runtime what optimizations it can make. This profile is published along with the game or as a system update.
 
You can't use FP16 for everything. There's really a limit as to how much you can use it for. It is no magic silver bullet.
 
From a quick look around the net LPDDR3 uses about 15% less power than DDR3. LPDDR4 uses 40% less power than LPDDR3. So that's 1 * 0.85 * 0.6 = 0.51, let's say LPDDR4 uses about 50% less power than DDR3 (ignoring bandwidth). Since Switch probably won't have more than 8 GB Ram, memory power consumption will be at least 50% lower, more if Switch has less memory. So my guestimate works out here.
50% lower very little is very little. DDR3 in XB1 is consuming a few watts, so negligible savings to be had.

Comparing Jaguar and ARM is much harder. The best I have found is this SOC shootout from notebookcheck.com: http://www.notebookcheck.net/SoC-Shootout-x86-vs-ARM.99496.0.html
It shows that the Tegra 4 has comparable CPU performance @ 4 W TDP (using 4 ARM Cortex-A15 cores) to the Jaguar-based AMD A4-5000 @ 15 W TDP and both are on the 28nm node
But these are SOCs, right? So that's 15W for the A4-5000 including GPU. I don't think there's any relevant data to be extracted here, but I think we're talking just a few watts savings at best again.

Regarding FP16: Of course it's a software choice, but that means developers need to actively use it. I'm sure we won't see that kind if optimization for every port. So I wonder if nVidia could provide functionality in it API layer that transparently converts selected FP32 render targets and shaders to FP16 without 3rd party developer intervention. Here is how it would work: Before a game comes out nVidia analyzes the game and provides a profile that tells the runtime what optimizations it can make. This profile is published along with the game or as a system update.
Ummmm, maybe. Would Nintendo pay for this for each title? Seems a stretch. I'm sure FP16 will get decent use, but benefits might be of the order of 15%. Or 3. Or 70. None of us really know! I doubt it's significant though.

Overall, definitely savings to be had with the nVidia GPU architecture. I think RAM and CPU efficiency gains won't amount to much. I'm no expert though. There are engineers here who could weigh in with some real data.
 
It's like GeForce FX is back. All hail FP16 partial precision! Tis the future!! Will AMD go back to being full-time FP24?!?! Stay tuned!

I'm imagining what a 20W 6" tablet might be like. Surface 4 Pro is 25W peak, 15W sustained. It can't sufficiently cool itself for 25W sustained even with fan and that metal body as a heatsink. Toasty!
 
Last edited:
50% lower very little is very little. DDR3 in XB1 is consuming a few watts, so negligible savings to be had.
I disagree. The XB1S probably uses around 6-8 W for memory (16 x Samsung SEC 549 K4W4G1646E-BC1A gDDR3) plus memory controller when under load. That's between 12% and 16% of total power consumption for the XB1S. But we are targeting a system that is supposed to consume no more than 15 W when docked and maybe around 4-7 W when mobile. So if Switch only uses 4 GB that would mean 1.5-2 W or 10%-13% power needed for memory when in docked mode as opposed to 40% to 53%. I would not call that negligible.
But these are SOCs, right? So that's 15W for the A4-5000 including GPU. I don't think there's any relevant data to be extracted here, but I think we're talking just a few watts savings at best again.
Well a few watts is all it takes when you operate in the very low double-digit to single-digit world. Man, I'm glad you don't design mobile devices.
 
It's like GeForce FX is back. All hail FP16 partial precision! Tis the future!! Will AMD go back to being full-time FP24?!?! Stay tuned!

I'm imagining what a 20W 6" tablet might be like. Surface 4 Pro is 25W peak, 15W sustained. It can't sufficiently cool itself for 25W sustained even with fan and that metal body as a heatsink. Toasty!
Not so fast. Surface 4 Pro has a huge screen that consumes up to 4.8 watts (http://www.displaymate.com/Surface_Pro4_ShootOut_1.htm#Display_Power). And comes with a Samsung SSD that has a peak power consumption of 6.5 W (http://www.thessdreview.com/our-reviews/samsung-sm951-m-2-pcie-ssd-review-512gb/). I'm pretty sure Switch will have neither. And it won't be 20 W while not docked.
 
I do agree that with a plaform that will not consume more than 20 watts, saving 3-4 watts is significant. It's all relative, and just a couple watts can be substantial on a mobile platform.

Sent from my SM-G360V using Tapatalk
 
I'd expect Xavier to have little more than 2X the GPU performance of Tegra Parker, and still not near the OG PS4 from 2013.

So you expect Nvidia to be all about wasting transistors for no good reason in the future? I mean 5 billion transistor GM204 is over 2x as fast as the PS4, but somehow a 7 billion transistor chip using a next gen architecture built @16FF+ will not even come close to PS4? Seems like a good plan.
 
Status
Not open for further replies.
Back
Top