Nintendo Switch Tech Speculation discussion

Discussion in 'Console Technology' started by Deleted member 13524, Oct 20, 2016.

Thread Status:
Not open for further replies.
  1. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    That's not a very straight comparison given that Xavier has 8 "custom ARM" cores and if they're all anything like Denver they won't be tiny. It probably also has a fair amount of other peripherals that are needed in an SoC for its product market and irrelevant to a GPU. So a much smaller percentage of those transistors are going to be spent on GPU in Xavier than in GM204.
     
  2. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    So you're telling me that 8 Denver-like cores and a few components needed in a SoC are anywhere near 4-5 billion transistors? See... I find that extremely difficult to believe based on previous SoCs aimed at the same market, which in their entirety were just a fraction of that figure...

    Considering the transistor count and the performance figures disclosed, it's a lot more logical to expect a high performance GPU with around 5 TFlops. Tesla P4 already offers 5.5 TF in a 50-75W power envelope and it's based on a chip that was not designed for such low power figures and uses GDDR5 instead of low power memory. A chip designed for best perf/w at around 20W from the get go (vs a chip arguably designed for best perf/w at >5x the power) could be able to hit that performance, more so when using a new GPU architecture that is said to be a lot more efficient. The only argument I've ever seen against such performance is the 512 core count, which is completely meaningless without knowing the nature of those cores.

    One thing is to be cautious about PR claims, but this is being in complete denial...
     
  3. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    I didn't give any speculation on just how large Xavier's GPU is, I just said that you can't compare an SoC of this nature with a GPU. SoCs like A9X are pretty GPU heavy, but the GPU still only takes up about half of the die, and this is without having 8 CPU cores. Even nVidia's own marketing chip drawing shows the GPU taking up about half the SoC (https://blogs.nvidia.com/blog/2016/09/28/xavier/), but who knows how meaningful that is. I also don't think Xavier and GM204 are actually aimed at the same market.

    I also disagree that the 512 CUDA core count is completely meaningless. nVidia would have to be pretty wildly changing their design philosophy for CUDA cores to have a substantially higher FLOP count. Which would kind of change the whole meaning behind the "CUDA" part.

    And transistors alone are hardly an indicator of peak performance. More transistors can improve perf/W, especially for GPUs. That wouldn't make them wasted.
     
  4. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    Haha. So now we believe in those Nvidia drawings?? Anyway, I'm pretty sure that desktop chips also have many of the blocks present in the SoCs, like video processor, I/O, etc. It's not fair to compare the GPU block on a SoC, which is at least lacking the memory controller and L2 cache, to a GPU that includes memory controllers, L2 cache, I/O and everything else. If you look at a drawing of GM204 the "GPC"s also take around 50% of the GPU.

    I chose a 5 billion transistor GPU for comparison, instead of a 7 billion transistor GPU like GP104 for a reason. Considering that 2 billion transistors is probably as much as the entire Tegra X1, GPU included, and 2x times Tegra K1, and Apple A9X is like 3 billion transistors, I consider giving 2 billion for the CPU and "extra stuff", pretty fair. Even if it's not, and the GPU is in fact just 50% of the chip, does that really change my general point tho? I don't think so. We are talking about at least 4x less performance-per-mm thatn it should either way...

    Why would it change the meaning behind the CUDA part? Wouldn't those still be designed to execute CUDA code? And why exactly wouldn't Nvidia have changed their design phylosophy? You're sounding a lot like Charlie D. before the launch of G80...

    We are talking about a factor of 4x. I've never seen a precedent. Have you?
     
  5. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    Was there any indication it will have a significant overclock when docked?
    I mean where is this coming from?

    The majority of memory consumption is usually from the interface, not the memory amount. having 4GB instead of 8GB doesn't change the figure much. Unless half as wide and half as fast. Which it probably is, though.
     
    BRiT likes this.
  6. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    Besides 8 custom CPU cores, Xavier also includes a "custom vision accelerator", which I have seen near 0 information on, but presumably could account for non-trivial fractions of the transistor count, deep learning TOPS, and power budget. Given all those unknowns, I'm not sure how taking the 512 GPU core count to mean the usual 1024 fp32 / 2048 fp16 floating point operations per cycle is unreasonable or evidence of wasted transistors.

    To get to > PS4 level TFLOP numbers for the GPU portion of Xavier, it seems like you have to assume those 512 cores are clocked really high (which seems at odds with the SoCs low power targets), or that each core does more than the usual 2 floating point operations (1 FMA) per cycle. It's a new architecture, so that can't be ruled out... but I would argue that what NVidia has historically called a "CUDA core" is a glorified fp32 FMA ALU. For them to call more than one of these a "CUDA core" would be a drastic change, and would basically mean that marketing could have claimed 2x or more of the core count, but didn't.
     
  7. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    For all we know Xavier could have a much larger than usual amount of SRAM soaking up transistors. It wouldn't be that out of place. The video encode/decode is also pretty high end.

    With the way nVidia is proposing these "deep learning" operations it at least sounds like they're pushing this for more applications than what their GPUs are optimized for. Maybe this is just marketing BS, or maybe they really did add a lot of stuff that goes beyond their typical GPU stuff.
     
  8. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,044
    Likes Received:
    1,116
    Location:
    WI, USA
    I'm only referring to the package power of the Skylake SOC. You can read the chip's various sensors with software. The tablet basically doesn't get warm and the fan is always off unless you are pegging that chip, in which case it gets very toasty and noisy. It can pull 25W for about 10 minutes. It gradually reduces to 15W as the skin temperature of the tablet increases. I read the skin temp limit is about 40C and I would say that's probably correct. You don't really want to be holding it when it's like that.
     
    #428 swaaye, Oct 30, 2016
    Last edited: Oct 30, 2016
  9. bunnybug

    Regular

    Joined:
    Oct 25, 2016
    Messages:
    280
    Likes Received:
    168
    BRiT likes this.
  10. cheapchips

    Veteran

    Joined:
    Feb 23, 2013
    Messages:
    2,493
    Likes Received:
    2,665
    Location:
    UK
  11. bunnybug

    Regular

    Joined:
    Oct 25, 2016
    Messages:
    280
    Likes Received:
    168
    yea custom could mean any minor tweak
    custom usually means slightly modified, sometimes even weaker.
     
    BRiT likes this.
  12. N00b

    Regular

    Joined:
    Mar 11, 2005
    Messages:
    698
    Likes Received:
    114
    Let's see what the official nVidia website says:
    If I'm not mistaken the world's top-performing GeForce gaming graphics cards are currently based on Pascal architecture. :runaway:
     
  13. bunnybug

    Regular

    Joined:
    Oct 25, 2016
    Messages:
    280
    Likes Received:
    168
     
    Goodtwin likes this.
  14. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,104
    Likes Received:
    16,896
    Location:
    Under my bridge
    I'm talking in regards your maths that you can get from 50 watts down to 15 by improvements in RAM and CPU. Savings from RAM is a few watts. Savings from CPU is a few watts. So for your argument, negligible changes. Okay, not quite negligible, but far from the large savings you are talking about finding.
     
    BRiT likes this.
  15. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    I think there are two sources, the more obvious one being the difference between the X1 in Pixel-c (mobile, without fan), and the Shield TV (stationary, small quiet fan). Since the Switch dev kit rumours indicate that it is a fan cooled Tegra X1, that describes essentially a Shield TV in performance and power draw. Either that is a placeholder for a cooler running FinFET SoC, or it indicates that it will drop clocks for power reasons when mobile for battery life reasons, or both.
    The Switch shown in the reveal trailer does seem to have vents for forced air cooling.
    Thus, when the Switch is docked, it won't have to power the screen, it won't be limited by battery power, and it seems it will have access to forced air cooling. Ergo....
     
    BRiT likes this.
  16. N00b

    Regular

    Joined:
    Mar 11, 2005
    Messages:
    698
    Likes Received:
    114
    Well, obviously you can't read very well because I never said anything about bringing 50 W down to 15 W. I looked at the XB1S theoretical performance per watt (28 GFlops FP32 / W) and guestimated how efficient Switch might end up because of nVidia's more power efficient GPU cores, more power efficient ARM CPU and more power efficient memory. I speculated that it could end up twice as efficient as XB1S (edit: per watt), i.e. 58 GFlops FP32 / W. Further I speculated that a docked Switch might have a power consumption of 15 W, ending up with 15 * 58 = 0.84 TFlops FP32 or 60% of XB1S in that case.

    However you never provided any facts or hard data that my speculation was wrong. You just wrote something about a few watts being negligible in a system that consumes 15 W at max. Good job!

    Lastly, let me quote myself:
     
  17. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Here we go again. She explicitly says it is not the Tegra X1, but something "similar", whatever the hell that might mean.
    I think it is generally ill adviced to draw too specific conclusions from a devkit, but the way you misquote people to try to make definite claims crosses over into something else.
     
  18. bunnybug

    Regular

    Joined:
    Oct 25, 2016
    Messages:
    280
    Likes Received:
    168
    her is exact quote is

    "I was told before that Nvidia's custom Tegra chip is pretty similar to Tegra X1. So these specs might not be farfetched."

    I don't how anybody can take that as not being a TX1, she also talks about the leaked specs not being farfetched. if she wanted to describe a pascal/tx2 she would describe as a similar to tx1 but more powerful, and efficient. you also have another insider backing the leaked specs on neogaf.
     
    Goodtwin and BRiT like this.
  19. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Regarding speculation into where a new Tegra chip could end up in terms of performance, the only thing I'd like to say is that the iPad Pro uses a 128-bit LPDDR4 since a year back at 3200MHz in a fanless setting. This is also the memory interface of Parker. So such an interface, at even higher clcks (supplied by Samsung) is definitely possible.
    Whether it is likely is another matter entirely.
     
  20. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    The leaked specs of the dev kit. DEV KIT! As opposed to FINAL PRODUCT!
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...