Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
So you expect Nvidia to be all about wasting transistors for no good reason in the future? I mean 5 billion transistor GM204 is over 2x as fast as the PS4, but somehow a 7 billion transistor chip using a next gen architecture built @16FF+ will not even come close to PS4? Seems like a good plan.

That's not a very straight comparison given that Xavier has 8 "custom ARM" cores and if they're all anything like Denver they won't be tiny. It probably also has a fair amount of other peripherals that are needed in an SoC for its product market and irrelevant to a GPU. So a much smaller percentage of those transistors are going to be spent on GPU in Xavier than in GM204.
 
That's not a very straight comparison given that Xavier has 8 "custom ARM" cores and if they're all anything like Denver they won't be tiny. It probably also has a fair amount of other peripherals that are needed in an SoC for its product market and irrelevant to a GPU. So a much smaller percentage of those transistors are going to be spent on GPU in Xavier than in GM204.

So you're telling me that 8 Denver-like cores and a few components needed in a SoC are anywhere near 4-5 billion transistors? See... I find that extremely difficult to believe based on previous SoCs aimed at the same market, which in their entirety were just a fraction of that figure...

Considering the transistor count and the performance figures disclosed, it's a lot more logical to expect a high performance GPU with around 5 TFlops. Tesla P4 already offers 5.5 TF in a 50-75W power envelope and it's based on a chip that was not designed for such low power figures and uses GDDR5 instead of low power memory. A chip designed for best perf/w at around 20W from the get go (vs a chip arguably designed for best perf/w at >5x the power) could be able to hit that performance, more so when using a new GPU architecture that is said to be a lot more efficient. The only argument I've ever seen against such performance is the 512 core count, which is completely meaningless without knowing the nature of those cores.

One thing is to be cautious about PR claims, but this is being in complete denial...
 
So you're telling me that 8 Denver-like cores and a few components needed in a SoC are anywhere near 4-5 billion transistors? See... I find that extremely difficult to believe based on previous SoCs aimed at the same market, which in their entirety were just a fraction of that figure...

Considering the transistor count and the performance figures disclosed, it's a lot more logical to expect a high performance GPU with around 5 TFlops. Tesla P4 already offers 5.5 TF in a 50-75W power envelope and it's based on a chip that was not designed for such low power figures and uses GDDR5 instead of low power memory. A chip designed for best perf/w at around 20W from the get go (vs a chip arguably designed for best perf/w at >5x the power) could be able to hit that performance, more so when using a new GPU architecture that is said to be a lot more efficient. The only argument I've ever seen against such performance is the 512 core count, which is completely meaningless without knowing the nature of those cores.

One thing is to be cautious about PR claims, but this is being in complete denial...

I didn't give any speculation on just how large Xavier's GPU is, I just said that you can't compare an SoC of this nature with a GPU. SoCs like A9X are pretty GPU heavy, but the GPU still only takes up about half of the die, and this is without having 8 CPU cores. Even nVidia's own marketing chip drawing shows the GPU taking up about half the SoC (https://blogs.nvidia.com/blog/2016/09/28/xavier/), but who knows how meaningful that is. I also don't think Xavier and GM204 are actually aimed at the same market.

I also disagree that the 512 CUDA core count is completely meaningless. nVidia would have to be pretty wildly changing their design philosophy for CUDA cores to have a substantially higher FLOP count. Which would kind of change the whole meaning behind the "CUDA" part.

And transistors alone are hardly an indicator of peak performance. More transistors can improve perf/W, especially for GPUs. That wouldn't make them wasted.
 
I didn't give any speculation on just how large Xavier's GPU is, I just said that you can't compare an SoC of this nature with a GPU. SoCs like A9X are pretty GPU heavy, but the GPU still only takes up about half of the die, and this is without having 8 CPU cores. Even nVidia's own marketing chip drawing shows the GPU taking up about half the SoC (https://blogs.nvidia.com/blog/2016/09/28/xavier/), but who knows how meaningful that is. I also don't think Xavier and GM204 are actually aimed at the same market.

Haha. So now we believe in those Nvidia drawings?? Anyway, I'm pretty sure that desktop chips also have many of the blocks present in the SoCs, like video processor, I/O, etc. It's not fair to compare the GPU block on a SoC, which is at least lacking the memory controller and L2 cache, to a GPU that includes memory controllers, L2 cache, I/O and everything else. If you look at a drawing of GM204 the "GPC"s also take around 50% of the GPU.

I chose a 5 billion transistor GPU for comparison, instead of a 7 billion transistor GPU like GP104 for a reason. Considering that 2 billion transistors is probably as much as the entire Tegra X1, GPU included, and 2x times Tegra K1, and Apple A9X is like 3 billion transistors, I consider giving 2 billion for the CPU and "extra stuff", pretty fair. Even if it's not, and the GPU is in fact just 50% of the chip, does that really change my general point tho? I don't think so. We are talking about at least 4x less performance-per-mm thatn it should either way...

I also disagree that the 512 CUDA core count is completely meaningless. nVidia would have to be pretty wildly changing their design philosophy for CUDA cores to have a substantially higher FLOP count. Which would kind of change the whole meaning behind the "CUDA" part.

Why would it change the meaning behind the CUDA part? Wouldn't those still be designed to execute CUDA code? And why exactly wouldn't Nvidia have changed their design phylosophy? You're sounding a lot like Charlie D. before the launch of G80...

And transistors alone are hardly an indicator of peak performance. More transistors can improve perf/W, especially for GPUs. That wouldn't make them wasted.

We are talking about a factor of 4x. I've never seen a precedent. Have you?
 
I disagree. The XB1S probably uses around 6-8 W for memory (16 x Samsung SEC 549 K4W4G1646E-BC1A gDDR3) plus memory controller when under load. That's between 12% and 16% of total power consumption for the XB1S. But we are targeting a system that is supposed to consume no more than 15 W when docked and maybe around 4-7 W when mobile. So if Switch only uses 4 GB that would mean 1.5-2 W or 10%-13% power needed for memory when in docked mode as opposed to 40% to 53%. I would not call that negligible.
Well a few watts is all it takes when you operate in the very low double-digit to single-digit world. Man, I'm glad you don't design mobile devices.

Was there any indication it will have a significant overclock when docked?
I mean where is this coming from?

The majority of memory consumption is usually from the interface, not the memory amount. having 4GB instead of 8GB doesn't change the figure much. Unless half as wide and half as fast. Which it probably is, though.
 
Besides 8 custom CPU cores, Xavier also includes a "custom vision accelerator", which I have seen near 0 information on, but presumably could account for non-trivial fractions of the transistor count, deep learning TOPS, and power budget. Given all those unknowns, I'm not sure how taking the 512 GPU core count to mean the usual 1024 fp32 / 2048 fp16 floating point operations per cycle is unreasonable or evidence of wasted transistors.

To get to > PS4 level TFLOP numbers for the GPU portion of Xavier, it seems like you have to assume those 512 cores are clocked really high (which seems at odds with the SoCs low power targets), or that each core does more than the usual 2 floating point operations (1 FMA) per cycle. It's a new architecture, so that can't be ruled out... but I would argue that what NVidia has historically called a "CUDA core" is a glorified fp32 FMA ALU. For them to call more than one of these a "CUDA core" would be a drastic change, and would basically mean that marketing could have claimed 2x or more of the core count, but didn't.
 
For all we know Xavier could have a much larger than usual amount of SRAM soaking up transistors. It wouldn't be that out of place. The video encode/decode is also pretty high end.

With the way nVidia is proposing these "deep learning" operations it at least sounds like they're pushing this for more applications than what their GPUs are optimized for. Maybe this is just marketing BS, or maybe they really did add a lot of stuff that goes beyond their typical GPU stuff.
 
Not so fast. Surface 4 Pro has a huge screen that consumes up to 4.8 watts (http://www.displaymate.com/Surface_Pro4_ShootOut_1.htm#Display_Power). And comes with a Samsung SSD that has a peak power consumption of 6.5 W (http://www.thessdreview.com/our-reviews/samsung-sm951-m-2-pcie-ssd-review-512gb/). I'm pretty sure Switch will have neither. And it won't be 20 W while not docked.
I'm only referring to the package power of the Skylake SOC. You can read the chip's various sensors with software. The tablet basically doesn't get warm and the fan is always off unless you are pegging that chip, in which case it gets very toasty and noisy. It can pull 25W for about 10 minutes. It gradually reduces to 15W as the skin temperature of the tablet increases. I read the skin temp limit is about 40C and I would say that's probably correct. You don't really want to be holding it when it's like that.
 
Last edited:
Let's see what the official nVidia website says:
Nintendo Switch is powered by the performance of the custom Tegra processor. The high-efficiency scalable processor includes an NVIDIA GPU based on the same architecture as the world’s top-performing GeForce gaming graphics cards.
If I'm not mistaken the world's top-performing GeForce gaming graphics cards are currently based on Pascal architecture. :runaway:
 
Let's see what the official nVidia website says:
If I'm not mistaken the world's top-performing GeForce gaming graphics cards are currently based on Pascal architecture. :runaway:

"The 980ti (Maxwell) is still the third fastest GPU in the world, which certainly qualifies as "top-performing GeForce gaming graphics cardS" (emphasis on the plural S). This quote means and says absolutely nothing. If anything it suggests it's NOT Pascal, else they would have used the more boisterous singular--top or fastest "card" in the world. Come to think of it, the plural vaugery is actually very out of character for Nvidia, they speak in absolutes (best, fastest, most advanced, etc...) In limiting the subset to Geforce cards and using the plural, I think this may very well be confirmation it's Maxwell. It makes no sense to have not to have used the singular if it were Pascal, Volta has no bearing on what they say or do in October of 2016. It doesn't even exist".
 
I disagree. The XB1S probably uses around 6-8 W for memory (16 x Samsung SEC 549 K4W4G1646E-BC1A gDDR3) plus memory controller when under load. That's between 12% and 16% of total power consumption for the XB1S. But we are targeting a system that is supposed to consume no more than 15 W when docked and maybe around 4-7 W when mobile. So if Switch only uses 4 GB that would mean 1.5-2 W or 10%-13% power needed for memory when in docked mode as opposed to 40% to 53%. I would not call that negligible.
Well a few watts is all it takes when you operate in the very low double-digit to single-digit world. Man, I'm glad you don't design mobile devices.
I'm talking in regards your maths that you can get from 50 watts down to 15 by improvements in RAM and CPU. Savings from RAM is a few watts. Savings from CPU is a few watts. So for your argument, negligible changes. Okay, not quite negligible, but far from the large savings you are talking about finding.
 
Was there any indication it will have a significant overclock when docked?
I mean where is this coming from?
I think there are two sources, the more obvious one being the difference between the X1 in Pixel-c (mobile, without fan), and the Shield TV (stationary, small quiet fan). Since the Switch dev kit rumours indicate that it is a fan cooled Tegra X1, that describes essentially a Shield TV in performance and power draw. Either that is a placeholder for a cooler running FinFET SoC, or it indicates that it will drop clocks for power reasons when mobile for battery life reasons, or both.
The Switch shown in the reveal trailer does seem to have vents for forced air cooling.
Thus, when the Switch is docked, it won't have to power the screen, it won't be limited by battery power, and it seems it will have access to forced air cooling. Ergo....
 
I'm talking in regards your maths that you can get from 50 watts down to 15 by improvements in RAM and CPU. Savings from RAM is a few watts. Savings from CPU is a few watts. So for your argument, negligible changes. Okay, not quite negligible, but far from the large savings you are talking about finding.
Well, obviously you can't read very well because I never said anything about bringing 50 W down to 15 W. I looked at the XB1S theoretical performance per watt (28 GFlops FP32 / W) and guestimated how efficient Switch might end up because of nVidia's more power efficient GPU cores, more power efficient ARM CPU and more power efficient memory. I speculated that it could end up twice as efficient as XB1S (edit: per watt), i.e. 58 GFlops FP32 / W. Further I speculated that a docked Switch might have a power consumption of 15 W, ending up with 15 * 58 = 0.84 TFlops FP32 or 60% of XB1S in that case.

However you never provided any facts or hard data that my speculation was wrong. You just wrote something about a few watts being negligible in a system that consumes 15 W at max. Good job!

Lastly, let me quote myself:
Please not that this is speculation and only meant to provide a rough estimate what Switch might achieve best case. Don't hate me for it.
 
http://www.neogaf.com/forum/showpost.php?p=222110731&postcount=1032

more rumors from Emily rogers, a very reliable source backing eurogamer saying it will be a Tegra x1.
Here we go again. She explicitly says it is not the Tegra X1, but something "similar", whatever the hell that might mean.
I think it is generally ill adviced to draw too specific conclusions from a devkit, but the way you misquote people to try to make definite claims crosses over into something else.
 
Here we go again. She explicitly says it is not the Tegra X1, but something "similar", whatever the hell that might mean.
I think it is generally ill adviced to draw too specific conclusions from a devkit, but the way you misquote people to try to make definite claims crosses over into something else.

her is exact quote is

"I was told before that Nvidia's custom Tegra chip is pretty similar to Tegra X1. So these specs might not be farfetched."

I don't how anybody can take that as not being a TX1, she also talks about the leaked specs not being farfetched. if she wanted to describe a pascal/tx2 she would describe as a similar to tx1 but more powerful, and efficient. you also have another insider backing the leaked specs on neogaf.
 
Regarding speculation into where a new Tegra chip could end up in terms of performance, the only thing I'd like to say is that the iPad Pro uses a 128-bit LPDDR4 since a year back at 3200MHz in a fanless setting. This is also the memory interface of Parker. So such an interface, at even higher clcks (supplied by Samsung) is definitely possible.
Whether it is likely is another matter entirely.
 
her is exact quote is

"I was told before that Nvidia's custom Tegra chip is pretty similar to Tegra X1. So these specs might not be farfetched."

I don't how anybody can take that as not being a TX1, she also talks about the leaked specs not being farfetched. if she wanted to describe a pascal/tx2 she would describe as a similar to tx1 but more powerful, and efficient. you also have another insider backing the leaked specs on neogaf.
The leaked specs of the dev kit. DEV KIT! As opposed to FINAL PRODUCT!
 
Status
Not open for further replies.
Back
Top