Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
One thing I've been wondering is whether it would make sense to put a large amount of last-level cache into a hypothetical bespoke SoC. Something like 16MiB, which seems like it could soak up practically all frame and z-buffer traffic at 720p. Would that be a large perf/W win?
It would be a possibility, but I can't really see large benefits given the cost in die area, and requiring nVidia to modify their architecture in untested ways. A wider bus, or even easier, adjusting memory system clocks/voltages downwards when downclocking the GPU for mobile use seems like a cheaper and more robust solution all around. No mobile chip has gone with large amounts of EDRAM/ESRAM, they just don't seem to need it and both Apples A#x chips and nVidias Parker have elected to go with a 128-bit bus instead (although they need to accomodate higher resolutions).

Since this is the speculation thread, I've mulled the idea that Nintendo going cheap might consider a Parker derivative with a single SM, 64-bit memory interface and unnecessary bits and bobs removed. It would be quite small and still reasonably performant on TSMC 16nm, while requiring minimal design effort. Doesn't seem to match the clocks and cooling system though. The suggestion of backporting the TX1 to 28nm is a better fit for those, but I can't really see that making sense given the volumes Nintendo probably projects for the Switch and the necessary die area. Wafers cost money, and they would incur a significant design cost compared to just taking either finished chips or using simple adaptions of existing designs. (Of course they could also go wider in their design (drives up costs a bit) and in that case something like embedding memory might be necessary to feed the GPU.)

We have indications of a pretty performant system though. The noisy fan cooled X1 devkit, the UE4 profiles, the Souls developer comment, and Todd Howard of Bethesda testifying to being extremely impressed, and he is hardly a Nintendo fanboy.
We'll see, but at least there is the potential for some interesting innards.
 
Come on, of course Tottentranz will be burned on a stake here if the innards of the Switch is a TX1 at the clocks eurogamer reported. :D
He won't even get the benefit of getting tossed in a lake to see if he floats first.
Ha! I'll be gone using one of these babies before any of you can catch me!

c0AePyE.jpg






Now that's wishful thinking! :p
Current rule of thumb is if the specs are passable to good, they'll talk about them. If they're bad, they'll say the games are what matter the most.
If the final specs are a TX1 equivalent at those clocks, they'll definitely try to keep them under the rug.


No mobile chip has gone with large amounts of EDRAM/ESRAM, they just don't seem to need it and both Apples A#x chips and nVidias Parker have elected to go with a 128-bit bus instead (although they need to accomodate higher resolutions).
There are other possibilities. The Vita used Wide I/O to achieve an unprecedented 12.8GB/s for the VRAM (4x more than the fastest LPDDR2 at the time).
The VRAM chip alone had the same bandwidth as the A5X's 4*32bit controller of the time, and I don't know if the GPU could still access the 6.4GB/s of the main RAM too.

Nintendo SoCs without exquisite RAM implementations have actually been rare, but this is a design from nvidia..
 
It would be a possibility, but I can't really see large benefits given the cost in die area, and requiring nVidia to modify their architecture in untested ways. A wider bus, or even easier, adjusting memory system clocks/voltages downwards when downclocking the GPU for mobile use seems like a cheaper and more robust solution all around. No mobile chip has gone with large amounts of EDRAM/ESRAM, they just don't seem to need it and both Apples A#x chips and nVidias Parker have elected to go with a 128-bit bus instead (although they need to accomodate higher resolutions).

I don't know enough about HW design to know if its a factor, but I was thinking a 128bit bus might be an impediment to achieving smaller die size in a cost related shrink later on. Also, Parker seems targeted at higher power than Switch, and I would guess it's more acceptable to throttle on an iPad.

Still, you could argue that devs should just tile in software so as not to overflow whatever LLC cache ends up being present. That should yield most of the DRAM bandwidth/power savings of the cache-the-entire-frame-buffer idea, save on die size, and would be less likely to fall off a performance cliff when moving to 1080p/docked mode. So I think you're right that a large LLC doesn't make much sense.

Since this is the speculation thread, I've mulled the idea that Nintendo going cheap might consider a Parker derivative with a single SM, 64-bit memory interface and unnecessary bits and bobs removed. It would be quite small and still reasonably performant on TSMC 16nm, while requiring minimal design effort. Doesn't seem to match the clocks and cooling system though. The suggestion of backporting the TX1 to 28nm is a better fit for those, but I can't really see that making sense given the volumes Nintendo probably projects for the Switch and the necessary die area. Wafers cost money, and they would incur a significant design cost compared to just taking either finished chips or using simple adaptions of existing designs. (Of course they could also go wider in their design (drives up costs a bit) and in that case something like embedding memory might be necessary to feed the GPU.)

We have indications of a pretty performant system though. The noisy fan cooled X1 devkit, the UE4 profiles, the Souls developer comment, and Todd Howard of Bethesda testifying to being extremely impressed, and he is hardly a Nintendo fanboy.
We'll see, but at least there is the potential for some interesting innards.

At this point I'm not sure what to believe - gotta wait for the presentation in Jan I guess :). I was initially imagining a wider / lower clock-speed Parker, with Denver 2 cores and automotive bits excised, but I'm not sure that squares with convincing Nintendo-is-cheap arguments in this thread.
 
At this point I'm not sure what to believe - gotta wait for the presentation in Jan I guess :). I was initially imagining a wider / lower clock-speed Parker, with Denver 2 cores and automotive bits excised, but I'm not sure that squares with convincing Nintendo-is-cheap arguments in this thread.
I'd say that such a suggestion would be the choice of the technocrati. It makes perfect (perhaps cutting it a bit close in terms of silicon availability) technical sense, but may not be the choice of a company that absolutely wants to avoid glitches.
It would be my present choice. I guess that makes it less likely to be the solution favoured by Nintendo back when decisions were actually made.
 
I like the arguments it doesn't make sense, did the wii and wiiu makes sense spec wise. wiiu didn't even have more gflops then 7 year old machines, and the wii... come on now, making sense out Nintendo specs is not gonna happen, people haven't or wouldn't believe leaked specs till both gens were over. as far switch goes these are not anywhere as bad considering it's really a handheld and at best case scenario could be 1.5x more powerful.
 
One thing I've been wondering is whether it would make sense to put a large amount of last-level cache into a hypothetical bespoke SoC. Something like 16MiB, which seems like it could soak up practically all frame and z-buffer traffic at 720p. Would that be a large perf/W win?
16 MB is enough for 720p forward rendering with no MSAA. 720P RGBA16f color buffer + 32b Z buffer = 11 MB. But the same cache is also used for texture sampling and geometry (vertex and index buffer loads). 16 MB cache will trash for sure. 32 MB cache would be much better. Intel chose 64 MB and 128 MB for their Crystalwell GT4e cache sizes. Intel integrated chips are roughly in the same ballpark in performance and are mostly used to play games at 720p. So the comparison is relevant.

16 MB cache would take less die space than Xbox One's 32 MB ESRAM scratchpad, but not significantly so, as cache logic and tags take plenty of space. It would also be somewhat slower (latency) than current < 2 MB GPU L2 caches. Would likely affect performance a bit (at least global atomics). Would still be a huge perf boost assuming only 25.6 GB/s memory bandwidth.
 
I like the arguments it doesn't make sense, did the wii and wiiu makes sense spec wise. wiiu didn't even have more gflops then 7 year old machines, and the wii... come on now, making sense out Nintendo specs is not gonna happen, people haven't or wouldn't believe leaked specs till both gens were over. as far switch goes these are not anywhere as bad considering it's really a handheld and at best case scenario could be 1.5x more powerful.

It's not bad and also, it really has a ton of RAM.
More traditional cheaping out would have been to provide half the RAM it has. There's so much RAM, even iPhone 7 and iPad Pro 9.7" only have 2GB. iPad Pro 12.9" and Switch have 4GB - that's right, Apple skimped on RAM when they made a $1000 Ipad "Pro, but not really" variant.
 
There are multiple questions regarding the memory that I have yet to see answered. Is the Tegra X1 memory starved to begin with? I know 25GB/s sounds low, but with Maxwells compression techniques and the titled rasterizing, perhaps its not an issue, especially when we are seeing clock speeds reduced. Assuming this is a bottleneck, is it more cost efficient to go with esram than switching to a 128 bit bus like they did with Parker? Whatever is cheaper is likely the route Nintendo would go. I would have to think that Nintendo wants to make sure dye shrinks are not held back because of memory this time. The quicker they can move to more cost effective manufacturing processes the quicker they can either make more money per unit, or drop the price. Nintendo has historically gone with small pools of fast memory, so it wouldn't surprise me if there is 32MB of esram in there, but I haven't seen benchmarks on the Tegra X1 that show obvious bottlenecks with the memory. After all, Doom BFG runs 1080p 60fps on the Shield TV.
 
Well, NX should have about 1/4 to 1/5 of the PS4 shader/compute and about 1/3~ish of the cpu. Bandwidth is 1/7, going by the raw numbers, so comparatively that doesn't look great.

But it's highly likely that tiled rasterisation helps significantly - NX doesn't stand a chance at hitting peak pixel fill rates, but as folks in the GPU forums have explained, having more ROPs means more effective tiling so it's possible that the 16 ROPs are helping BW limited performance even in situations where there isn't a hope of hitting peak fill due to external BW.

My highly uneducated guess would be that main memory BW isn't a limitation in handheld mode - probably the primary focus of the system, going by Nintendo's handheld vs home HW and SW sales - but that it may become something developers need to consider carefully if they intend to scale their 720p game up to 1080p while docked.
 
There are multiple questions regarding the memory that I have yet to see answered. Is the Tegra X1 memory starved to begin with?
Not for games running on it. These are either Android games dealing with typically lower BWs, or games targeting Shield TV BW.
I know 25GB/s sounds low, but with Maxwells compression techniques and the titled rasterizing, perhaps its not an issue...After all, Doom BFG runs 1080p 60fps on the Shield TV.
Which is comparable with the PS3 game. So in terms of achieving PS3 level graphics, the memory system should be okay.

Of course, if you're talking about ports from this gen, function's reponse is more informative.
 
Not for games running on it. These are either Android games dealing with typically lower BWs, or games targeting Shield TV BW.
Which is comparable with the PS3 game. So in terms of achieving PS3 level graphics, the memory system should be okay.

Of course, if you're talking about ports from this gen, function's reponse is more informative.
Doom BFG has low res textures. So does its impressive lighting increase its memory bandwidth demands to the point the overall memorybandwidth demands equal alot of big budget X360/PS3 titles?

edit: That 25.6GB/s from the LPDDR4 would also see memory contention with the cpu. We both know that PS3 GPU didn't have to share its memory bandwidth and X360 had edram.
 
Last edited:
@Pixel

Are you saying the Shield version of Doom BFG used lower resolution textures than the PS3/360 version? In digitial foundrys review they made no mention of that if it is indeed true. My understanding was that the version was pretty much an exact match for the ps3/360 build, except it rendered in 1080p for Shield TV and had a more stable framerate.
 
@Pixel

Are you saying the Shield version of Doom BFG used lower resolution textures than the PS3/360 version? In digitial foundrys review they made no mention of that if it is indeed true. My understanding was that the version was pretty much an exact match for the ps3/360 build, except it rendered in 1080p for Shield TV and had a more stable framerate.
Sure its textures are identical to the PS3/360 version, however what I'm saying is this basically a remastered version of a game from 2004, and its in game textures are noticeably lower than most AAA games for ps3/Xbox360.

Take other games ported to Nvidia Shield Android TV ( specs: the Tegra X1 with a 256 core 2nd gen Maxwell clocked at 1ghz, 25.6GB/s memory bandwidth):
Metal Gear Rising
Lower shadow quality and sporadically lower texture quality compared to 360

Resident Evil 5
Lower texture resolution compared to PS3/Xbox 360

Borderlands 2:
Well that uses cel shading which obviously reduces memory bandwidth requirements
 
Last edited:
LKD is backing up a rumor that Switch will be getting the next Assassins Creed game, and will release alongside the other versions. One of the biggest concerns about the Switch's specs has been in the manner it will affect third party multi plats, but so far rumors are suggesting that AAA games are not off the table. Skyrim, Dark Souls 3 and now Assassins Creed all rumored to be on their way to Switch. Assuming these rumors come to fruition, is it out of the question for a downclocked TX1 to run these games? Four A57 cores at 1Ghz is going to handle the next Assassins Creed game? The GPU clock speeds have gotten all the attention, but really the CPU clock speeds were more alarming to me. Is it possible that A57 cores perform significantly better clock for clock than the Jaguar cores? It just doesn't make sense for Bethesda, who has historically shunned Nintendo hardware for not being capable enough would suddenly be excited about hardware that is a cut down Tegra X1. January 13 should tell us a lot, even without listed specs from Nintendo.
 
Are you sure it's not the 2Dish Assassin Creed Chronicles games like AC China?
 
LKD is backing up a rumor that Switch will be getting the next Assassins Creed game, and will release alongside the other versions. One of the biggest concerns about the Switch's specs has been in the manner it will affect third party multi plats, but so far rumors are suggesting that AAA games are not off the table. Skyrim, Dark Souls 3 and now Assassins Creed all rumored to be on their way to Switch. Assuming these rumors come to fruition, is it out of the question for a downclocked TX1 to run these games? Four A57 cores at 1Ghz is going to handle the next Assassins Creed game? The GPU clock speeds have gotten all the attention, but really the CPU clock speeds were more alarming to me. Is it possible that A57 cores perform significantly better clock for clock than the Jaguar cores? It just doesn't make sense for Bethesda, who has historically shunned Nintendo hardware for not being capable enough would suddenly be excited about hardware that is a cut down Tegra X1. January 13 should tell us a lot, even without listed specs from Nintendo.

well Skyrim ran on a ps3/ xbox 360 so there isn't a reason why it wont run on the switch. Dark souls 3 and AC have had 360/ps3 editions in the past and they would be rendering at 720p or even 640p ?

Also I believe DF is correct on the specs but who knows if that was just for dev kits and final hardware will have faster clocks ? Who knows its only 2 more weeks
 
Status
Not open for further replies.
Back
Top