Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
Actually I haven't seen this 1st 3rd party sizzle reel yet.

Look at the lighting and some of the textures, though those may not be indicative of the final product and may be from consoles/pc. But some are very impressive compared to last gen consoles. Maybe the system does have 128bit memory subsystem. Still not convinced about 520 Cuda cores for a variety of reasons.
 
If you compare the bandwidth with the PC GPU equivalent of the XBOX One GPU, the HD7770, its not 1/10th but rather 1/3rd, since that GPU has 72GB/s of bandwidth.
You can't fob off bandwidth as a bottleneck by referencing parts with low bandwidth as proof it's not an issue. In bandwidth heavy workloads, lack of bandwidth impacts performance. There's a reason PS4 and XB1 went with the added cost of 150+ GB/s BW to GPU, and more BW is added to faster GPUs. Nothing shows this more than mobile parts with very limited bandwidth.

Plus, where are you getting that 1/10th from? Are you adding both the embedded RAM and System Memory together? As can be seen by the lower quality shown by Xbox One vs PS4 version of games, that is hardly a correct statement. The System Memory bandwidth alone stands at just 68 GB/s. Switch's 25GB/s is 40% of it.
This is a really dumb argument that needs to die. The GPU in XB1 can read and write some 250 billion bytes of data per second, and games use this. Switch has 1/10th the BW of XB1 and 1/6th the BW of PS4 which is going to impact rendering at TV resolutions. Bandwidth saving measures do reduce the relative impact, of course. But then again, factoring in contention on the shared bus, actual BW available to the GPU is going to be well below 25 GB/s. Exact relative ratios are impossible to pin down, but these specs are clearly going to have far less data:compute.

I think you are overblowing the memory bandwidth issue out of proportion. This is a console targeted at 720p mostly. 25GB/s of bandwidth is not ideal
Indeed. It's not a great design as described in this supposed leak. Of course, it's possible Nintendo designed a handheld and then just allowed 800 GF of GPU in a BW limited situation because that's what was possible, but it'd be a badly balanced system that'd see the GPU limited in what it could do. How likely is it Nintendo took a next-gen TX2 (that nvidia aren't using in their own handheld) and coupled it with a last-gen memory system?
 
Actually I haven't seen this 1st 3rd party sizzle reel yet.

Look at the lighting and some of the textures, though those may not be indicative of the final product and may be from consoles/pc. But some are very impressive compared to last gen consoles. Maybe the system does have 128bit memory subsystem. Still not convinced about 520 Cuda cores for a variety of reasons.
Footage from other systems except Dragon Quest. Also DQ drops frames a lot.
First party does not have AA and AF.
 
You can't fob off bandwidth as a bottleneck by referencing parts with low bandwidth as proof it's not an issue. In bandwidth heavy workloads, lack of bandwidth impacts performance. There's a reason PS4 and XB1 went with the added cost of 150+ GB/s BW to GPU, and more BW is added to faster GPUs. Nothing shows this more than mobile parts with very limited bandwidth.

And yet the PS4 Pro has a huge 128% increase in FLOPs for a tiny 23% increase in memory bandwidth. My point? That OG PS4 memory bandwidth was greatly over estimated for its actual needs. Otherwise, if it is like you say, they would surely have to increase bandwidth more to not starve the new FLOPs capabilities.

For the rest of your post, you seem to imply that I'm expecting image quality parity with Xbox and PS4. I'm not, that would be foolish. Image quality will surely be a good few points below it. That is why I compared with low end GPUs, because that is my expectation of it. Low end GPUs are in the same ballpark as FLOPs concern and they get by with low memory bandwidth. It contradicts your argument that a 800 GFLOPs GPU with be bottlenecked by 25GB/s bandwidth. It wont for the most part.

I leave here again the link of a review for the DDR3 GM107 as proof of this. At medium settings it can run games like Battlefield 4, Bioshock Infinite, Crysis 3, Metro Last Light, etc, at 60 FPS or more.

http://www.jagatreview.com/2015/03/...850m-ddr3-performa-kencang-di-kelas-tengah/4/
 
Last edited:
Nintendo has done stupider things in the past (e.g. everything about Wii U).
Possible explanation for having a severely underpowered devkit? Nintendo wanted devs to be able to experiment the console on its various modes (docked, undocked with coupled controllers in hand, undocked in a table with decoupled controllers) while taking notions of battery life, UI size of several elements when 2 players are looking at the 6" screen at the distance, ideas for gameplay, etc. And this would come before sheer power, at least for the first generation of games.
So yes, this would be a devkit showing the console's final physical form, for developers to think of gameplay ideas and functionality before thinking about showing the best graphics they could squeeze out of the system.

Does this sound believable coming from Nintendo?

You don't need to have the thing that's been leaked to allow developers any of this. You can throw a screen on anything. You can attach controllers to anything, especially with a concept that even includes detachable controllers. You can attach a battery to anything, and if it's okay to scale performance to try to hit the same battery life (won't really work in practice) it's about as okay to scale battery to do the same. if this is something developers even think about it, because if they're doing a high end game they're going to want to utilize the system as well as possible and let the battery life fall where it does based on how Nintendo specs it. Because they're going to count on Nintendo to design the battery to accommodate the best performance utilization like with every other gaming handheld.

If you really need all of this in one package you make something that's a lot bulkier than the final thing will be. The developers won't get quite the same experience as the end user using this, but it barely makes much of a difference. Being able to determine what kinds of assets, shaders, resolution, logic etc they can actually utilize and what frame rate they can target are much, much more important considerations.

Focusing on delivering the mechanical design/ergonomics/fit first and the function second is so incredibly backwards for a gaming device. Wasting huge amounts of time and engineering resources on this is insane. No I don't think "Nintendo is insane" is enough of an explanation, usually there's some clear rationalization to be made behind their questionable compromises.

To avoid fillrate + bandwidth constraints while trying to measure compute performance?

For a test like this the bandwidth needed should more or less be proportional to the compute, at least until you start hitting resolution thresholds where the caches make a huge difference. Generally speaking going with half the resolution but double the framerate would result in about the same bandwidth usage.

But I digress, until anyone can explain why the test's reported ratio of what should be two constants (frame count and FLOP count) isn't itself a constant itself I see no point really arguing about the other merits of the claims.
 
And yet the PS4 Pro has a huge 128% increase in FLOPs for a tiny 23% increase in memory bandwidth. My point? That OG PS4 memory bandwidth was greatly over estimated for its actual needs. Otherwise, if it is like you say, they would surely have to increase bandwidth more to not starve the new FLOPs capabilities.

That's quite an assumption.

PS4 can easily be bottlenecked by memory BW, as devs here have explained. The pro adds colour compression and the BW cost of sharing with the CPU won't have increased in line with GPU FLOPs. But it can surely be bottlenecked too.

Increasing BW comes at a cost - more or faster chips, wider or faster interface. Sony will have gone for the best bang-for-buck they can, but it doesn't mean there weren't compromises. The same is true particularly for budget GPUs. Scorpio will be 320+ GB/s, even though it's likely to have effective colour compression, tiled rasterization and a shared L2.

Mobile has additional power concerns that squeeze available BW far more than mains powered devices - it's one reason why TBDRs have been so successful in phones.

For the rest of your post, you seem to imply that I'm expecting image quality parity with Xbox and PS4. I'm not, that would be foolish. Image quality will surely be a good few points below it. That is why I compared with low end GPUs, because that is my expectation of it. Low end GPUs are in the same ballpark as FLOPs concern and they get by with low memory bandwidth. It contradicts your argument that a 800 GFLOPs GPU with be bottlenecked by 25GB/s bandwidth. It wont for the most part.

If NX isn't bottlenecked by memory BW it'll be because developers try to avoid situations where it happens. So expect to see less of everything that requires reads and writes e.g. less or none existent aniso, optimised trilinear, hacked away AA, lower res or lower accuracy buffers, paired back post processing e.g. fewer previous buffers used for motion blur. Stuff like that.
 
Did I miss something? Why is people assuming that the hypothetical 800 GFLOPs SOC would be limited to 25 GB/s BW? The real thing on th Switch is either a TX1 or that other something. If it's the other thing, unless I missed some crucial info, there's absolutely zero evidence pointing to 25 GB/s for that part...

EDIT: And considering the various SKUs that Nvidia has with similar GFLOP to BW ratios, such as the mentioned 850M or the GT 745, I don't think 25 GB/s would be a massive bottleneck in a system where games are going to be designed for the plaform, unlike the aforementioned PC GPUs which don't even meet the minimal requirements for modern games afaik, yet they actually hold their own pretty well.
 
That's quite an assumption.

PS4 can easily be bottlenecked by memory BW, as devs here have explained. The pro adds colour compression and the BW cost of sharing with the CPU won't have increased in line with GPU FLOPs. But it can surely be bottlenecked too.

Fair enough. In the end, no matter how much memory bandwidth you have, there will always be ways of making a bottleneck out of it. Switch being a Maxwell 2 part means it also brings colour compression though.

If NX isn't bottlenecked by memory BW it'll be because developers try to avoid situations where it happens. So expect to see less of everything that requires reads and writes e.g. less or none existent aniso, optimised trilinear, hacked away AA, lower res or lower accuracy buffers, paired back post processing e.g. fewer previous buffers used for motion blur. Stuff like that.

Like I said, not expecting Xbox One / PS4 quality :)
 
EDIT: And considering the various SKUs that Nvidia has with similar GFLOP to BW ratios, such as the mentioned 850M or the GT 745, I don't think 25 GB/s would be a massive bottleneck in a system where games are going to be designed for the plaform, unlike the aforementioned PC GPUs which don't even meet the minimal requirements for modern games afaik, yet they actually hold their own pretty well.

Switch will need a good chunk of BW to feed the CPU, don't forget.

Fair enough. In the end, no matter how much memory bandwidth you have, there will always be ways of making a bottleneck out of it. Switch being a Maxwell 2 part means it also brings colour compression though.

Like I said, not expecting Xbox One / PS4 quality :)

FWIW, I think once people have Switch in their hands they'll be very happy with what it's putting on screen, particularly in handheld mode where they won't be comparing it to PS4Bone!
 
Did I miss something? Why is people assuming that the hypothetical 800 GFLOPs SOC would be limited to 25 GB/s BW? The real thing on th Switch is either a TX1 or that other something. If it's the other thing, unless I missed some crucial info, there's absolutely zero evidence pointing to 25 GB/s for that part...

EDIT: And considering the various SKUs that Nvidia has with similar GFLOP to BW ratios, such as the mentioned 850M or the GT 745, I don't think 25 GB/s would be a massive bottleneck in a system where games are going to be designed for the plaform, unlike the aforementioned PC GPUs which don't even meet the minimal requirements for modern games afaik, yet they actually hold their own pretty well.

I blame the "elite mindframe" :D
They (without really pointing at anyone specific) are so used to using top of the line GTX x80 or x70 line of GPUs combined with very high settings that they really believe that low end GPUs are only useful for Office applications.
 
I leave here again the link of a review for the DDR3 GM107 as proof of this.

Remember that the GTX850M doesn't have to share that 128 bits of bandwidth. Also note how the GDDR5-sporting GTX750Ti is roughly 20% faster by clock speed, but tends to have a ~45% lead in the 1080p benchmarks on that page.

Anyone who's built an AMD APU-based system can tell you how important memory bandwidth is, especially when you have to share it.
 
Remember that the GTX850M doesn't have to share that 128 bits of bandwidth. Also note how the GDDR5-sporting GTX750Ti is roughly 20% faster by clock speed, but tends to have a ~45% lead in the 1080p benchmarks on that page.

Anyone who's built an AMD APU-based system can tell you how important memory bandwidth is, especially when you have to share it.

If you check back my original post about these benchmarks on the previous page or so you will see that I did mention that. I just posted the link again because the first time around it looked like it was ignored. Concerning CPU usage of memory bandwidth, this is ARM we are talking about, they are not known to be able to use massive amounts of it. Last time I checked on Anandtech the A57 used around 5GB/s for 24576 KB transfers. They can use a maximum of 14GB/s on very small transfers of 32KB, but I guess the Switch APIs could be built to optimise transfers as much as possible?
 
That's quite an assumption.

PS4 can easily be bottlenecked by memory BW, as devs here have explained. The pro adds colour compression and the BW cost of sharing with the CPU won't have increased in line with GPU FLOPs. But it can surely be bottlenecked too.

Increasing BW comes at a cost - more or faster chips, wider or faster interface. Sony will have gone for the best bang-for-buck they can,
In addition to this, PS4 Pro isn't an optimised new platform, but a cheap-and-cheerful halfway upgrade to an existing platform. Sony improved what they could at minimal cost to achieve a certain upgrade. The only reason the CPU and RAM are faster than PS4 is because Sony could upclock them a bit - they're certainly not a carefully selected set of performance parts balanced with the GPU.

For the rest of your post, you seem to imply that I'm expecting image quality parity with Xbox and PS4. I'm not, that would be foolish. Image quality will surely be a good few points below it. That is why I compared with low end GPUs, because that is my expectation of it. Low end GPUs are in the same ballpark as FLOPs concern and they get by with low memory bandwidth. It contradicts your argument that a 800 GFLOPs GPU with be bottlenecked by 25GB/s bandwidth. It wont for the most part.
If the lack of bandwidth is preventing higher quality graphics, than it's a bottleneck! ;)

I leave here again the link of a review for the DDR3 GM107 as proof of this. At medium settings it can run games like Battlefield 4, Bioshock Infinite, Crysis 3, Metro Last Light, etc, at 60 FPS or more.

http://www.jagatreview.com/2015/03/...850m-ddr3-performa-kencang-di-kelas-tengah/4/
Only if you reduce quality. Now compare the 850M to similarly powerful parts with far more bandwidth in bandwidth heavy operations.

I won't argue that you could have this 800 GF part in there and it would run games. The GPU will be hampered by a lack of bandwidth though and it's debateble how much improvement you'd get versus using a smaller, cheaper GPU on the same RAM.

Last time I checked on Anandtech the A57 used around 5GB/s for 24576 KB transfers.
5 GB/s probably isn't that unreasonable to consider as an impact. We also don't know what overhead that might have (consider the CPU impact on PS4 for comparison). Whatever it is, the GPU has less than 25 GB/s. It'd be the lowest ratio of BW to processing ever I think!
 
If the lack of bandwidth is preventing higher quality graphics, than it's a bottleneck! ;)

In that case, there is virtually zero hardware out there that is not bottlenecked :D
Common, we have to look at use cases here in order to evaluate it objectively. Never mind what Marketing says, Switch is a handheld console first and living room console second. For its use case the hardware seems to be fine, especially taking power consumption in consideration!

Only if you reduce quality. Now compare the 850M to similarly powerful parts with far more bandwidth in bandwidth heavy operations.

No shit Sherlock! :D
It seems you fit my definition of "elite mindframe" ;)
 
In that case, there is virtually zero hardware out there that is not bottlenecked :D

All bottlenecks are not created equal.

My old Sandy Bridge i5 with a GTX1050Ti - there's probably a CPU bottleneck in a few games, and a GPU bottleneck in most others, but it's reasonably well balanced.

A new XPS 8910 with a 6th gen i7 and a 64-bit DDR3, 384-Kepler-core GT730? That's not a bottleneck, that's a war crime.
 
the mark of a good troll is picking believable numbers.
Well then at least if it turns out to be fake it's a good troll and not a bad one. I'm fine with that.
Believable specs can at least be discussed. Unbelievable specs wouldn't find their way here. Not by my hand, at least.



Not seeing your answer. :???:
Sorry, poor last minute editing. I answered it above, not below lol.
Unless it's a question I missed?


I'd argue that goes the other way. Why would nVidia give Nintendo their flagship part which they want to sell their own hardware? Switch will be in direct competition with Shield. nVidia would be better off giving Nintendo TX1 and using TX2 themselves. Yet nVidia aren't using TX2 in their own handheld? Why?
Because a Nintendo-branded handheld is bound to sell a hundred times more than an android/nvidia-branded one, resulting in completely different levels of exposure both to developers and to consumer mindshare.
Willing to part with their latest and greatest is part of the reason why AMD has been so successful with their semi-custom parts in the console world. nvidia would do well to follow suite in this.



You don't need to have the thing that's been leaked to allow developers any of this. You can throw a screen on anything. You can attach controllers to anything, especially with a concept that even includes detachable controllers. You can attach a battery to anything,
You can do all that, but it's not the same.

If you really need all of this in one package you make something that's a lot bulkier than the final thing will be. The developers won't get quite the same experience as the end user using this, but it barely makes much of a difference. Being able to determine what kinds of assets, shaders, resolution, logic etc they can actually utilize and what frame rate they can target are much, much more important considerations.
(...)
Focusing on delivering the mechanical design/ergonomics/fit first and the function second is so incredibly backwards for a gaming device. Wasting huge amounts of time and engineering resources on this is insane. No I don't think "Nintendo is insane" is enough of an explanation, usually there's some clear rationalization to be made behind their questionable compromises.

This is you thinking like everything but an imaginative developer wanting to make something really different for an innovative platform.
Making the thing a lot bulkier, heavier, with less battery life could hinder whatever gameplay innovations that Nintendo wants developers to come up with. You may come up with something because you're holding a device with the right weight and volume.

Regardless, what would you put in this bulkier and heavier devkit? A x86 CPU with a GM208? How much would that last with a 4000mAh (or even 10 000mAh) battery, and how much would it cost to assemble?
Maybe the TX1 was indeed the best choice after all?


For a test like this the bandwidth needed should more or less be proportional to the compute, at least until you start hitting resolution thresholds where the caches make a huge difference. Generally speaking going with half the resolution but double the framerate would result in about the same bandwidth usage.

IMO for measuring GPGPU compute only, pixel count should be reduced as much as possible in order to reduce pixel shader cycles and bandwidth for the framebuffer. Otherwise you'll just get further away from the theoretical values.


But I digress, until anyone can explain why the test's reported ratio of what should be two constants (frame count and FLOP count) isn't itself a constant itself I see no point really arguing about the other merits of the claims.
Why should the FPS-to-GFLOPs ratio be constant?
You can count how many compute operations are being done when generating Fractals, but random fractal generation + viewpoint panning results in different times per frame.
I just ran that Julia benchmark in my PC about 3 times and my scores ranged between 510 and 550 FPS.




Did I miss something? Why is people assuming that the hypothetical 800 GFLOPs SOC would be limited to 25 GB/s BW? The real thing on th Switch is either a TX1 or that other something. If it's the other thing, unless I missed some crucial info, there's absolutely zero evidence pointing to 25 GB/s for that part...
Exactly.
I also don't understand why bandwidth is being questioned so much. If it's a different chip, it could (most probably would..?) have a completely different memory subsystem.
The only thing I'm fighting here is all the rigidness in what could or could not be. At this point, I care very little for what it turns out in the end.
 
I think the memory bandwidth is an issue where people look at it as if it's completely stalling the operation. It's not a hard ceiling that brings things to a hault, but slows down operations that max out the bandwidth. Lots of opps will be limited by something other than bandwidth. For example, let's say bandwidth heavy opperations take up 50 percent of frame time. A bandwidth limitation may result in those operations now taking 60-70 percent of frame time. It's allway bad to have anything slowing down the rendering time, but it's managable. The developer will simply have to make chpices, either reduce bandwidth requirements, or speed up code in other areas to give the bandwidth heavy opperations enough time. My point is you could take a 400 Gflop gpu with 25GB/s and an 800 Gflop gpu with 25 GB/s, even though both are limited on bandwidth, the 800 Gflop gpu will still be a lot faster, even though the more powerful gpu is even more severely starved for bandwidth. I think there has been a misconception that bottlenecks are a hard ceiling severely limiting framerate, when they are really only slowing down opperations that max them out.

Sent from my SM-G360V using Tapatalk
 
This is you thinking like everything but an imaginative developer wanting to make something really different for an innovative platform.
Making the thing a lot bulkier, heavier, with less battery life could hinder whatever gameplay innovations that Nintendo wants developers to come up with. You may come up with something because you're holding a device with the right weight and volume.

The entire concept of the Switch is that you can use it in all sorts of configurations including with a TV and with the controllers detached and the big RELEASE gimmicky control games (1, 2 Switch and ARMS) aren't even intended to be used with the thing in handheld configuration. So if this was Nintendo's goals for an early devkit, which is only really applicable for early release games, then they failed massively. I can't think of a single game that Nintendo announced that would have benefited from this! Let alone one a launch or near launch title.

Regardless, what would you put in this bulkier and heavier devkit? A x86 CPU with a GM208? How much would that last with a 4000mAh (or even 10 000mAh) battery, and how much would it cost to assemble?
Maybe the TX1 was indeed the best choice after all?

It's been said like a billion times already, a TX1 that's not severely underclocked below what the final unit is supposed to be! Which is what this entire "slow Switch early devkit" idea is built around...

IMO for measuring GPGPU compute only, pixel count should be reduced as much as possible in order to reduce pixel shader cycles and bandwidth for the framebuffer. Otherwise you'll just get further away from the theoretical values.

Okay well have you looked at a Julia set shader? If you want ALU-only that's not a benchmark you would use.

Why should the FPS-to-GFLOPs ratio be constant?
You can count how many compute operations are being done when generating Fractals, but random fractal generation + viewpoint panning results in different times per frame.
I just ran that Julia benchmark in my PC about 3 times and my scores ranged between 510 and 550 FPS.

FPS = frames / time
FLOP/s (FLOP_rate) = total FLOPs / time

FPS/FLOP_rate = (frames / time) / (FLOPs / time) = (frames / time) * (time / FLOPs) = frames / FLOPs

If frames or FLOPs is not a constant it means you have a non-deterministic benchmark which is pointless. If you have a benchmark that's random in a way that's not properly seeded and repeatable it's useless and tells us nothing. Can you find the random number generation in Julia set fractal code?

https://www.shadertoy.com/view/4d3SRl

I can't.

Even if the benchmark is deterministic the result won't be totally repeatable because of a ton of uncontrollable factors. But frames / FLOPs should be constant regardless of this.
 
Status
Not open for further replies.
Back
Top