Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
That's not how it works, nvidia bandwidth > amd (at least prior to vega) due to different memory compression.
Memory isn't compressed. What you mean is probably color compression which has been used by AMD since Tonga and improved up to Polaris. nvidia's improved bandwidth efficiency over Polaris comes from Maxwell+Pascal using Tile Based Rendering.

Regardless, no matter how much better nvidia's bandwidth efficiency is in Maxwell 2.5, those 25GB/s are always going to be well behind the Wii U's eDRAM which is probably at least in the 60GB/s duplex range, plus the 12GB/s to the main memory shared with the CPU.

Capcom recently talked about Nintendo providing an optional dev kit that includes a GPU capable of emulating Switch on the PC. Something about making it possible to do all the work on PC, which makes things much faster. This would make the Fox Con leak about the additional GPU plausible, but it's strictly for development kits.
While this could be possible, why not just bundle a GTX 1050 graphics card with a special bios and a driver that either enables 2 SM for FP32 or up to 4 SM using FP16 promoted to FP32, together with the appropriate clocks and memory allocation?
I also don't know why such a device would be attached to the whole console. The whole point of emulating the Switch on the PC would be not having to have the Switch connected to it, but the described upgrade connects to the devkit through a port in the back.
 
That's not how it works, nvidia bandwidth > amd (at least prior to vega) due to different memory compression.

It's also worth noting the draw distance is improved on switch as well, with other improvements/changes here and there. It'll be interesting to see if the draw distance, or something other than resolution is cut back in handheld mode. If that's the case then that would explain why it's only 900p docked.

At 900p you're drawing more than 1.5 x the pixels of 720p. WiiU has 35+ GB/s (or potentially rather more) just for framebuffers on very low latency edram, NX has 25.6 GB/s shared, with the CPU capable of taking most or all of that under extreme circumstances. Colour compression isn't a magic wand - latest AMD DCC is supposedly up to a 30~40% improvement in throughput with its latest iteration. Maxwell also has tiled rasterization, but my understanding is that large quads arranged front to back [across the screen] rather than grouped into more tile friendly arrangements will reduce it's efficiency.

Suffice to say, I think there could easily be situations where WiiU was less likely to be BW bound than NX. Interestingly, this is going to be the first time that Nintendo doesn't have a small pool of fast embedded ram for their console since the N64.
 
Last edited:
I thought the final verdict for Wii U edram was 35GB/s? If Wii U had an abundance of memory bandwidth, it's suspicious that so many games didn't us any AA. Perhaps the Wii U's low number of rops?

Also, didn't Vulcan introduce a way to use tiling with deferred rendering?

Sent from my SM-G360V using Tapatalk
 
I thought the final verdict for Wii U edram was 35GB/s? If Wii U had an abundance of memory bandwidth, it's suspicious that so many games didn't us any AA. Perhaps the Wii U's low number of rops?

I remember 35 GB/s too, but there was some argument that it could potentially be around 70 (can't remember the reason given).

35 would seem the most likely fit from the die shot and indeed the games. But that's still a decent amount of BW for the fillrate if you think about APUs and the PS3.
 
I thought the final verdict for Wii U edram was 35GB/s?
I was assuming the same 1024bit width as Xbone's ESRAM but with clocks at 550MHz instead of 854MHz.
What you guys are suggesting is a 512bit width, but I haven't seen that spec anywhere.


If Wii U had an abundance of memory bandwidth, it's suspicious that so many games didn't us any AA.
AA doesn't just tax the bandwidth. Depending on the type of AA, it could tax the GPU compute units and/or fillrate.
Every AA method except maybe FXAA is really taxing on the GPU though. FXAA seems to take away too much texture detail, especially if the starting resolution is low (e.g. 720p in a >40" TV) so I can guess that's why Wii U developers avoided using it.
 
Yeah Wii U's eDRAM bandwidth has to be around 35 gb/s. I'm just saying Switch will have more usable bandwidth than Wii U, though yes at this time it's looking like Switch is more bandwidth limited than that console was. Time will tell, for me once again I want to see all the differences between Zelda in handheld mode and docked, and even then it's a launch title so whatever issues it has might not be present in later games.

AA doesn't just tax the bandwidth. Depending on the type of AA, it could tax the GPU compute units and/or fillrate.
Every AA method except maybe FXAA is really taxing on the GPU though. FXAA seems to take away too much texture detail, especially if the starting resolution is low (e.g. 720p in a >40" TV) so I can guess that's why Wii U developers avoided using it.

SMAA and even MLAA are better than FXAA, they could've used that. I think the issue was the games that lacked AA and AF were all 60fps and the system didn't have any more to spare. Pikmin 3 is a 30fps game and it had AA.
 
MLAA and FXAA are very similar in looks and performance (both post-processing filters applied to a 2D frame). Biggest differentiator is FXAA was developed by nvidia and MLAA was developed by AMD, really.
SMAA 1x is just an improved MLAA (again a post-processing 2D filter). The consequences of using it in a low-resolution render for a large screen are still there.
SMAA 2x or higher mixes post-processing with MSAA and/or TAA, so it's taxing on the GPU.

There are no miracles on post-processing methods. Texture detail will get a hit and 3D objects at a larger distance will blur.
I was using FXAA as a general term for post-processing methods, as they will all present problems on low-resolution renders.
 
FXAA is substantially blurrier than smaa. SMAA x1 without the temporal component (which is what really causes blur, and ghosting as well) is still the best AA method available as far as the hit to performance and coverage, without the aforementioned side effects.

If it was between temporal AA and no AA i'd choose no AA every time, and MSAA is too taxing and doesn't do anything to shader aliasing.
 
The model numbers on the LPDDR4 memory chips seem to point to 2GB 32bit chips. Doesn't it seen off to use two chips when they could have gotten a single 4GB 64bit chip?

Sent from my SM-G360V using Tapatalk

it's the same configuration for the shield tv. do yourself a favor and don't read into z0m3le posts on neogaf, he latches to any positive news with out any real proof, and tries to discredit any negative news, no matter how much evidence there is to support it, he did the same in the wiiu spec thread on neogaf.
 
SMAA and even MLAA are better than FXAA, they could've used that. I think the issue was the games that lacked AA and AF were all 60fps and the system didn't have any more to spare. Pikmin 3 is a 30fps game and it had AA.
Even FXAA was pretty expensive on last gen consoles (took 5% of our Xbox 360 frame time at 60 fps, see here http://www.eurogamer.net/articles/digitalfoundry-trials-evolution-tech-interview). SMAA is roughly 2x as expensive (~10% of frame time on last gen). FXAA needs an extra full screen pass (multiple reads and single write), and SMAA needs 3 passes. Tegra X1 has only 25.6 GB/s memory bandwidth. Xbox 360 had 21.6 GB/s, but it rendered to EDRAM and that nullified backbuffer bandwidth cost. Only resolve at end of each pass required memory bandwidth (overdraw didn't cost any memory BW on Xbox 360). Tegra of course has larger GPU caches and delta color compression, so post AA should be slightly cheaper compared to Xbox 360.
 
Here's another leak coming from another foxconn employee:

Bf3P2qU.jpg


According to the description (check the following pictures), the first set of values is the device on idle, then he does that julia.sh benchmark on handheld mode that returns 267 FPS - 375 GFLOPs FP32.

Then he docks it, measures 2143MHz CPU and 1005MHz GPU, measures 806 FPS and 875.77 GFLOPs.

From these results, we would be clearly looking at a 4 SM / 512 core GPU, ~400MHz portable / ~1GHz docked, and CPU going up to 2143MHz docked. The "516 Core" in the architecture description also hints at a 512 core GPU + 4 core CPU (bullshit number but something nvidia would put in their SoC microcode, referring to the sheer number of MADD ALUs).


Some rumors are pointing out that the leaked teardown refer to an old devkit with a TX1 and not the final production unit. Reason for that is the complete absence of Nintendo logos inside the console.


If this is truly just a TX1, it'll be awfully funny to watch nvidia laughing their way to the bank after selling Nintendo their 2 year-old SoC while releasing their own Shield Console 2 around the same time frame, with vastly superior performance (and also using 4GB RAM btw).




The model numbers on the LPDDR4 memory chips seem to point to 2GB 32bit chips. Doesn't it seen off to use two chips when they could have gotten a single 4GB 64bit chip?

Probably because it's cheaper to buy two 16Gbit chips (I think they're all stacks now?) than a single 32Gbit one.
Also, the larger number of paths on a single 32Gbit stack could require a PCB with more layers, so again two chips takes more PCB area but it could still be cheaper to implement.
 
It depends on the price Nintendo is paying. For all we know, Nintendo secured a very good deal on the TX1, and part of the deal is Nvidia API and tool support at no extra fee. Normally Nintendo has to spend good money just for the chip design. If the Tegra X1 pretty much fit the bill, it makes little sense to spend r & d money only to end up with a very similar chip.



Sent from my SM-G360V using Tapatalk
 
The model numbers on the LPDDR4 memory chips seem to point to 2GB 32bit chips. Doesn't it seen off to use two chips when they could have gotten a single 4GB 64bit chip?

Sent from my SM-G360V using Tapatalk
2017 Nvidia Shield Tv has 2 ram chips. As for the Switch I wouldn't be surprised if 2x 2GB lpddr4 chips are cheaper than 1x 4GB lpddr4 chips. The 2GB lpddr4 is likely much more common in devices now days, and because of a combination of higher availability and memory makers pricing schemes, is somewhat cheaper at the moment.
 
Last edited:
So you're thinking Nintendo coupled a GPU with 2/3's XB1's performance with 25 GB/s RAM, a tenth of XB1's available BW? Or is there eDRAM on there as well now? Stacked RAM? Or just hugely imbalanced?
I'm a Nintendo fanboy, and even I rolled my eyes at that one. I left the Fox Con clock speeds in the plausible column based on a possible move from 20nm to 16nm finfet making such speeds possible, even if just barely. These new "leaked" specs are pretty out there.



Sent from my SM-G360V using Tapatalk
 
So you're thinking Nintendo coupled a GPU with 2/3's XB1's performance with 25 GB/s RAM, a tenth of XB1's available BW? Or is there eDRAM on there as well now? Stacked RAM? Or just hugely imbalanced?
I only described the specs presented in that SSH-ish console screenshot which seems believable so far (if it's a fake it's a very well done fake).

What I think is the TX1 is definitely the chip in the devkits up to July, and so are the 4GB LPDDR4 1600MHz in a 2*32bit memory controller. That leaked html is the real thing.
It's the only thing I'm >90% sure.

What comes in the final production units is what I'm not sure at all.
In one hand we have Eurogamer's specs and clock values (which according to themselves are all from July's devkits) and in the other hand we have the Foxconn leaker from November who's shown a pristine track record in every single non-subjective affirmation he made so far. Every number he saw and everything he measured has been proven right to a ridiculous degree. Stuff like "OMG this is so powerful it's gonna be more powerful than iphone 8" should be taken with a huge grain of salt because he's obviously not an engineer nor an executive with access to what's actually inside the chip, and they're definitely not letting the assembly line workers load up android and 3dmark.
But the actual numbers he's provided so far are ridiculously legit.
 
So you're thinking Nintendo coupled a GPU with 2/3's XB1's performance with 25 GB/s RAM, a tenth of XB1's available BW? Or is there eDRAM on there as well now? Stacked RAM? Or just hugely imbalanced?

Well, it's not like low end laptop GPUs have a lot of bandwidth to play with. A first generation Maxwell GM107 based Gtx 850m gets by with 32 GB/s bandwidth for 1.1 Teraflops of compute power. I know the CPU will take some of it as well but judging from the x86 space it should be an order of magnitude less than the GPU side? Yes? No?

https://www.techpowerup.com/gpudb/2538/geforce-gtx-850m

Edit: A review of the Ddr3 version shows that, while clearly behind the GDDR5 version, still has acceptable performance in modern games at medium settings and 768p.

http://www.jagatreview.com/2015/03/...850m-ddr3-performa-kencang-di-kelas-tengah/4/

Unbalanced? Certainly. Unreasonably so? I don't think so.

Don't forget that the main native resolution targeted by Switch is 720p, while XBox One and PS4 is/should be 1080p.
 
Last edited:
Has Nintendo been known to be off its target renders for games? As far as I know, at least not in the same league as Watchdog or Killzone 2. If you compare the 2014 Breath of the Wild target render for WiiU footage from 2014 to the 2016 e3 footage, there is a drop in visuals. I think this is further hint at compromises that had to be made when creating identical builds of the game for the memory bandwidth limited Switch.

z1v.jpg


Besides foliage and object count if you rewatch the video texture quality has also taken a dive.
 
Last edited:
If you compare the 2014 Breath of the wild WiiU footage from the 2016, there is a drop in visuals. I think this is further hint at compromises that had to be made when creating identical builds of the game for the memory bandwidth limited system hand held system.

z1v.jpg

The first image looks more like concept art than in game. Are you sure it was an actual ingame screen?

Edit: Nevermind I checked it myself. Nintendo said it was...
 
The first image looks more like concept art than in game. Are you sure it was an actual ingame screen?
Its a screencap from e3 2014 footage. That footage is likely a target render on a computer. But has Nintendo been known for having to downgrade target renders? Hasn't Nintendo been fairly responsible with this unlike lets say... Ubisoft?
 
Status
Not open for further replies.
Back
Top