Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
16 MB is enough for 720p forward rendering with no MSAA. 720P RGBA16f color buffer + 32b Z buffer = 11 MB. But the same cache is also used for texture sampling and geometry (vertex and index buffer loads). 16 MB cache will trash for sure. 32 MB cache would be much better. Intel chose 64 MB and 128 MB for their Crystalwell GT4e cache sizes. Intel integrated chips are roughly in the same ballpark in performance and are mostly used to play games at 720p. So the comparison is relevant.

16 MB cache would take less die space than Xbox One's 32 MB ESRAM scratchpad, but not significantly so, as cache logic and tags take plenty of space. It would also be somewhat slower (latency) than current < 2 MB GPU L2 caches. Would likely affect performance a bit (at least global atomics). Would still be a huge perf boost assuming only 25.6 GB/s memory bandwidth.

I hadn't thought about Intel's IGP as a point of comparison - thanks for bringing it up! However, I feel like the thrashing issue could be addressed without expanding capacity to match GT4e (or moving to a scratchpad) by e.g. allowing cache ways to be allocated based on usage - basically Broadwell's L3 cache partitioning (CAT) but for GPUs. (http://danluu.com/intel-cat/, and 17.17 in http://www.intel.com/content/dam/ww...s-software-developer-vol-3b-part-2-manual.pdf has details).

I'm curious though - under the assumption that Switch is successful and becomes a specific optimization target for game devs, would you prefer a large cache (or scratchpad), or would you rather have more compute throughput and a smaller cache that requires tailored software to really shine?
 
There are multiple questions regarding the memory that I have yet to see answered. Is the Tegra X1 memory starved to begin with? I know 25GB/s sounds low, but with Maxwells compression techniques and the titled rasterizing, perhaps its not an issue, especially when we are seeing clock speeds reduced. Assuming this is a bottleneck, is it more cost efficient to go with esram than switching to a 128 bit bus like they did with Parker? Whatever is cheaper is likely the route Nintendo would go. I would have to think that Nintendo wants to make sure dye shrinks are not held back because of memory this time. The quicker they can move to more cost effective manufacturing processes the quicker they can either make more money per unit, or drop the price. Nintendo has historically gone with small pools of fast memory, so it wouldn't surprise me if there is 32MB of esram in there, but I haven't seen benchmarks on the Tegra X1 that show obvious bottlenecks with the memory. After all, Doom BFG runs 1080p 60fps on the Shield TV.

After the "reveal", I had the impression they use a 100% stock Tegra X1. At most there would be the clocks/power curve built into the firmware or microcode, unused parts like PCIe lanes fused off and that wouldn't count as a custom or different CPU at all.

Before reveal, I assumed it was different even just so that it can be 5% more efficient or something.
In the old days, even if a console vendor used something incredibly plain like a Z80 that'd be a custom version where they tried to remove a couple CPU pins, added a timer or two, even removed and/or added a few CPU instructions.
Today masks, testing, validations etc. might cost enough zillions dollars, the SoC doesn't need features added, designing a new package with a few less pins might not be worth it and I guess there's a nice list of hardware bugs for the vanilla chip that makes it well known and supported.
 
Well, NX should have about 1/4 to 1/5 of the PS4 shader/compute and about 1/3~ish of the cpu. Bandwidth is 1/7, going by the raw numbers, so comparatively that doesn't look great.

But it's highly likely that tiled rasterisation helps significantly - NX doesn't stand a chance at hitting peak pixel fill rates, but as folks in the GPU forums have explained, having more ROPs means more effective tiling so it's possible that the 16 ROPs are helping BW limited performance even in situations where there isn't a hope of hitting peak fill due to external BW.

My highly uneducated guess would be that main memory BW isn't a limitation in handheld mode - probably the primary focus of the system, going by Nintendo's handheld vs home HW and SW sales - but that it may become something developers need to consider carefully if they intend to scale their 720p game up to 1080p while docked.
Around 50% of modern game engine frame time goes to running compute shaders (lighting, post processing, AA, AO, reflections, etc). Maxwell's tiled rasterizer has zero impact on compute shaders. 25.6 GB/s is pretty low as everybody knows that 68 GB/s of Xbox One isn't that great either. ESRAM is needed to reach good performance. But I am talking about the POV of down porting current gen games to Switch. Switch certainly fares well against last gen consoles, and Maxwell's tiled rasterizer would certainly help older pixel + vertex shader based renderers. Too bad last gen consoles already got their last big AAA releases year ago. Easy ports between Xbox 360 and Switch are not available anymore. Xbox One is a significantly faster hardware. Straightforward code port is not possible. Content also needs to be simplified.
I'm curious though - under the assumption that Switch is successful and becomes a specific optimization target for game devs, would you prefer a large cache (or scratchpad), or would you rather have more compute throughput and a smaller cache that requires tailored software to really shine?
My compute shader code is actually very bandwidth friendly as I use LDS (groupshared memory) a lot. I personally would prefer more compute units (not just ALU, as I also need samplers, groupshared memory and L1 caches).

I like small scratchpads (close to L1 size) because you don't need to manage their data much (load -> process, process, process -> store). Big shared scratchpads (several megabytes) are difficult to use perfectly. I would definitely prefer a big cache instead. 32 MB LLC would be perfect. A cache this big would likely cut my BW usage by 90%. With a big cache like this, you'd want to split the scene to a few big tiles (4 quadrants for example), to ensure everything fits to the cache. Modern GPU culling techniques ensure practically zero geometry processing overhead when splitting the rendering workload to multiple smaller viewports. You could also tile post processing as cache would automatically load all touched lines (reading wider area for bloom/dof/motion blur/etc wouldn't be an issue).
 
One thing I've been wondering is whether it would make sense to put a large amount of last-level cache into a hypothetical bespoke SoC. Something like 16MiB, which seems like it could soak up practically all frame and z-buffer traffic at 720p. Would that be a large perf/W win?
like @sebbbi has explained a small scratch pad memory would help the Switch quite a lot. 25GB/s is not that bad if you couple it with a fast memory bus -heck, just look at what was achieved on the original Xbox and its meagre 8GB/s-. :smile2:

But tbh, I don't think we are ever going to see a console with a scratchpad in the future, and I mean ever. (except besides the Switch, although NX rumours tend to be true and there is no mention of eDRAM nor eSRAM).

Perhaps they just aren't as efficient anymore.
 
We have pretty much assumed that the A53 cores were removed, but is it actually more likely they will remain for OS task? Things like cross game chat, downloading in the background, that sort of thing?

Sent from my SM-G360V using Tapatalk
 
Nvidia has never shipped a Tegra with the little cores enabled. Also, you cant have all 8 cores enabled at the same time, only BIG or only Little, so no background usage at all. If you have the 4 BIG running games you can't have the 4 little running anything.
 
Nvidia has never shipped a Tegra with the little cores enabled. Also, you cant have all 8 cores enabled at the same time, only BIG or only Little, so no background usage at all. If you have the 4 BIG running games you can't have the 4 little running anything.
There are many ways to configure big.LITTLE (https://en.wikipedia.org/wiki/ARM_big.LITTLE). Cache coherency seems to be the deciding factor in what kind of operation modes can be supported.

It might be that both big and LITTLE clusters can be (theoretically at least) running at the same time, if you don't need cache coherency between them. Wouldn't be useful for standard software, but might work if there are two strongly separated OSs running on the same machine with no access to each other's memory (no shared cache lines or pages). But this is just speculation. I don't know if there are other roadblocks to enabling both big and LITTLE cores at the same time besides cache coherency. And any communication between the OSs would be ugly and slow.
 
There are many ways to configure big.LITTLE (https://en.wikipedia.org/wiki/ARM_big.LITTLE). Cache coherency seems to be the deciding factor in what kind of operation modes can be supported.
If we're discussing the vanilla TX1, there could be hardware switches that prevent both clusters to be powered on at the same time. In fact, if the SoC wasn't developed to support GTS/HMP from scratch, there should be plenty of reasons to do so, such as preventing power viruses (from doing more harm than needed).


If Nintendo is reduced to running the cpu at only 1020mhz, I don't see them enabling the 4 "little" cores.
OTOH, I guess a properly optimized Cortex A53 cluster at ~300MHz could probably handle the OS perfectly while sipping very few mW.
 
Last edited by a moderator:
But tbh, I don't think we are ever going to see a console with a scratchpad in the future, and I mean ever. (except besides the Switch, although NX rumours tend to be true and there is no mention of eDRAM nor eSRAM).
Perhaps they just aren't as efficient anymore.
Well the Tegra X1 doesn't contain a large unified pool of on-chip memory, so if the rumour that the DevKit is based on the X1 is correct then having no mention of such is only natural. It belongs in speculation about how a hypothetical custom SoC would be configured. If Nintendo is bold enough to commission a custom chip from nVidia, then clearly all other speculation based on the specifics of the X1 is suspect, other than possibly ballpark performance in fairly broad terms.
Designing a new SoC with a sufficiently large pool of on-chip memory is costly both from a design standpoint, and from a silicon area standpoint. If Nintendo feel they need higher memory system performance, wouldn't it be cheaper to just use Parker and its 128-bit LPDDR4 interface and be done with it, incurring no extra design cost whatsoever, and ensuring solid volumes on a chip that nVidia hopes to sell elsewhere? It makes no sense to design a new SoC that has roughly the same properties that the existing designs already offer. It has to hit another spot in terms of price or performance.
There is so much we just don't know now.

Something I've been thinking about, but haven't brought up in this thread is the tidbit from digitimes at the beginning of June, where their sources said the NX was delayed because "Nintendo wished to enhance the game console's video-game/handheld-game-integrated gaming experience and add virtual reality (VR) function into the system to gain advantages in the upcoming video game and mobile game competitions." Could that be SoC related? Or just a rearrangement of sensors and control options? VR is mentioned in the patent as well however, so there might have been something to this.
 
My guess about the delay is that games were not ready. Even Zelda is not sure to be launched on Mars, and they communicate a lot on it.

Hardware-wise, "It makes no sense" could be Nintendo moto for a few generation now, so anything (bad or "stupid") is possible I guess.
 
My guess about the delay is that games were not ready. Even Zelda is not sure to be launched on Mars, and they communicate a lot on it.

Hardware-wise, "It makes no sense" could be Nintendo moto for a few generation now, so anything (bad or "stupid") is possible I guess.
I actually don't agree with that statement about Nintendo. Their designs make quite a bit of sense if you factor in hardware BC being an absolute requirement along with design to a price point while still having to make room for a unique hardware selling point AKA gimmick. What that will translate to with the Switch where by all appearances they have dropped hardware BC remains to be seen, but judging by the rumours and their introductory trailer it definitely means that they have made a giant leap in terms of performance from the 3DS.

Speculation about about a SoC with a substantial pool of on-chip memory requires Nintendo to foot the bill for a custom chip. Since the chip area will grow and they will get correspondingly less SoCs per wafer, combined with design costs, it implies optimising for performance.
I still think going with an off-the-shelf solution (or possibly an optimised/cut-down Parker) is more likely, their historical preference for low-latency memory subsystems notwithstanding.
 
From the 3DS yes. But it's their home console now too. They sell it (for now) like a WiiU done right. Funny thing is, it's not THAT much more powerfull, it seems, specially in undock mode if it's only a underclocked tegra X1 in there. It's the wii and wii u again. Underpowered home console with a strange gimmick not a lot of ppl will care about IMO. For me, beeing always underpowered is what makes no sense. If it's a little bit but in the same ballpark, like xboxone vs ps4, it's ok, be beeing so much behind (and late), and at the same time not understanding why no devs want backport their AAA games... They fake the "it's the way we want it, we are not against MS and Sony", it's BS imo, Nintendo sell console, MS & PS4 too. The end. If their is a next home console after the Switch (My belief is it will bomb like the WiiU ), I wonder if it will be more traditional, with Iwata now gone (rip)
 
I think you underestimate the attraction of a good enough console having a screen you can take with you wherever. That is something new. Sure, we have had something like the Vita and various hacks with phones and tablets. But none of them where well supported and a hundred procent backed up by the manufacturer. This is will be Nintendos one and only console.
They will give it all and make it work. They can't afford another Wii U.

Has I been announced whether it has a touchscreen?
 
Nothing from Nintendo but the same sources point to a 10 point capacitive touchscreen. It may not have been highlighted as it would seem hard to have a meaningful gameplay mechanic that uses the touchscreen that could be easily translated to a pad controller when docked.
 
http://www.anandtech.com/show/9289/the-nvidia-shield-android-tv-review/9

I missed this in my prior searches, but Anandtech did do a thermal and power consumption test on the Nvidia Shield TV. While gaming the Shield pulls 19.4 watts, and when they ran the T-Rex benchmark for two hours straight, the outside Shield case temp was raised 12 degrees Celsius over ambient. I wonder just how hot the actual Tegra X1 chip is running? Regardless, I think this helps make sense of the lower clocks for Switch (at least for me). In portable mode, the system cant pull more than 5 watts if they want decent battery life, and when docked the console needs to be able to operate in temps a lot higher than the comfy 22 degree Celsius ambient temps that Anandtech tested the Shield TV in. I say this because the Switch dock may be placed inside an entertainment center where temps will get far higher than your typical ambient. Nintendo is going to be a bit conservative, and I would bet that Nintendo expects the Switch to be able to run full tilt for hours in ambient temps higher than what most think is reasonable. Couple all this with low level API that allows for max chip utilization, and your talking about a product that would get hotter than a product running on the heavy Android API that the Shield TV uses. Its easy to see why the Tegra X1 hasn't seen more success in the mobile market, it really is power hungry and hot when running high clock speeds.
 
Something I've been thinking about, but haven't brought up in this thread is the tidbit from digitimes at the beginning of June, where their sources said the NX was delayed because "Nintendo wished to enhance the game console's video-game/handheld-game-integrated gaming experience and add virtual reality (VR) function into the system to gain advantages in the upcoming video game and mobile game competitions." Could that be SoC related? Or just a rearrangement of sensors and control options? VR is mentioned in the patent as well however, so there might have been something to this.

Inserting the console into a headset is actually in one of the latest patent submissions, but it would probably require the tablet to have 1080p screen to be anywhere usable.


I think you underestimate the attraction of a good enough console having a screen you can take with you wherever.
Having a screen you can take on the go is great, but the Switch offers nowhere near the same mobility as a 3DS or a Vita. The old handhelds could fit into a pocket but the tablet certainly won't.
 
Inserting the console into a headset is actually in one of the latest patent submissions, but it would probably require the tablet to have 1080p screen to be anywhere usable.
And not have a massive battery on the back adding crazy weight at the end of the lever. It's a nonsense application for the current Switch design with obvious changes being required. eg. Have the battery pack removable and have it insert in the VR headset to transfer the weight, keeping the screen as minimal as possible. If Nintendo delayed NX to consider VR applications, they rejected the idea in the final design.
 
Last edited:
From the 3DS yes. But it's their home console now too. They sell it (for now) like a WiiU done right. Funny thing is, it's not THAT much more powerfull, it seems, specially in undock mode if it's only a underclocked tegra X1 in there. It's the wii and wii u again. Underpowered home console with a strange gimmick not a lot of ppl will care about IMO. For me, beeing always underpowered is what makes no sense. If it's a little bit but in the same ballpark, like xboxone vs ps4, it's ok, be beeing so much behind (and late), and at the same time not understanding why no devs want backport their AAA games... They fake the "it's the way we want it, we are not against MS and Sony", it's BS imo, Nintendo sell console, MS & PS4 too. The end. If their is a next home console after the Switch (My belief is it will bomb like the WiiU ), I wonder if it will be more traditional, with Iwata now gone (rip)
So. . . You take the performance of the handheld for the user that doesn't care about the function rather than the docked performance? Just thought I'd catch that illogical argument for you.

The docked performance is somewhere around 40% of the xb1, with the cpu more or less matching 4 ps4 cpu cores. This isn't a Wii U. The interesting thing about the design, assuming 4 A57 cores and maxwell 2sm, it should have a ton of thermal overhead even docked, so if they did run into a cpu hurdle, there really is no reason I can think of that would lock them out of a higher clock on the cpu.

We've seen it with the psp cpu going from 2xxmhz to 3xxmhz when God of war was released, as for gpu, it should be able to handle any game running on xb1 with the appropriate reductions.

Even without western 3rd party support, it will see a far superior library than wii u simply because Nintendo won't be dividing their attention equally across 2 major platforms. If wii u had 3ds' entire library on top of what was there, it would have not suffered the droughts it had and sold far better.

The idea that it needs to "win" the console war is just school yard nonsense, Nintendo needs to have a platform that can make a profit. I can't tell you the future, but switch should be more successful than wii u and surpass it within the first two years regardless of 3rd party support.
 
Status
Not open for further replies.
Back
Top