Nintendo Switch Tech Speculation discussion

Deleted member 13524 · Dec 22, 2016

function said:
http://hexus.net/tech/reviews/graphics/36509-nvidia-geforce-gtx-680-2gb-graphics-card/?page=4

Shaders, ROPs, TMUs, memory controller etc don't actually need to run at the same speed due to some fundamental law. If it were deemed worthwhile, ROPs could be made to operate at a different clock - for example 9 x some base clock to the shader cores 10 x.

Being able to fine tune where your power was spent on a highly power constrained system might be useful.

Coming up a technical method for somehow maybe who-knows make that spec a little bit believable doesn't make the whole spec list much more believable.

Furthermore, the twitter spec says 14.4 pixels per cycle. Cycle of what? You'd have to imagine the whole GPU is running at clock X and the ROPs alone are running at 0.9*X.
And you'd have to imagine that's somehow more power efficient than coupling TMUs with ROPs at the same clock. And you'd have to imagine this newly found power efficiency gained from having the ROPs at 0.9 the frequency of the TMUs was just now discovered by nvidia engineers for the Switch, and implemented only for the Switch because no Pascal GPU has that.

And I do stand on my opinion that Nintendo going super-cheap on making a TX1 "amplified" to 28nm while taking out the A53 module and dumbing-down on non-essential stuff like the video codec and ISP (essentially what @AlNets suggested) is more believable than all of a sudden nvidia engineers deciding the ROPs are more power-efficient at 90% the clocks of everything else, despite not applying it to any of the other products in their portfolio.

Ailuros said:
My sentence you quoted was just a fair warning that I'm not answering on your behalf, nothing else.

And I didn't take it as anything more than that. Thanks for the kind words

The "I'm not willing to place my own bet" part was directed at the strawman vultures I mentioned, a group of which you're definitely not a part of.

Ailuros · Dec 22, 2016

GPUs have had varying frequencies for eons now for the record and no there's no fancy tricks NV is supposed to pull on Tegra ULP GPUs with frequencies. What I have been hearing for years now in the background is that NV has been pulling quite a few impressive sw/compiler mostly related legitimate tricks, which I wouldn't say that it would be anything new for those that have somewhat a clue about graphics.

Other than that I'm positive that developers for Switch games will be animated wherever possible to also use FP16 related optimisations.

Lalaland · Dec 22, 2016

ToTTenTranz said:
Takashi Mochizuki is Wall Street Journal's technology correspondent from Tokyo. Let's go a little easier on the knee-jerk reactions.
I'd believe his sources over the friends of any twitter celebrity, but in this particular case he's simply quoting a report from a japanese analyst firm called Ace Research Institute.

Fair point, I'd seen he was repeating what the financial analyst firm had reported before posting but was unaware he was an actual reporter. I'm afraid I let the somewhat fevered reactions to his tweets on other forums colour my own response.

On the question of who else was available for the job other than Nvidia for the job I'm torn. I have strong feelings towards Nvidia's mobile chip being a 15W TDP part but their strengths in software and gaming as a domain are not to be discounted. The other vendors with ARM licenses (assuming PPC is off the table) are all mobile phone focused so their off the shelf parts are all targeted at "bursty" workloads, on top of that they mostly focus on OpenGL ES performance so I'm not sure I'd want to be their first customer for a fully programmable GPU and it's attendant libraries, documentation et al. In some ways this is a task highly suited to Nvidia's skill set even if their parts have traditionally eaten batteries for breakfast.

I've a huge soft spot for PowerVR though and TBR in general though, PCX1 for the win baby!

Ailuros · Dec 22, 2016

Add a D into TBR and I'll make you my king

function · Dec 22, 2016

ToTTenTranz said:
Furthermore, the twitter spec says 14.4 pixels per cycle. Cycle of what? You'd have to imagine the whole GPU is running at clock X and the ROPs alone are running at 0.9*X.
And you'd have to imagine that's somehow more power efficient than coupling TMUs with ROPs at the same clock. And you'd have to imagine this newly found power efficiency gained from having the ROPs at 0.9 the frequency of the TMUs was just now discovered by nvidia engineers for the Switch, and implemented only for the Switch because no Pascal GPU has that.

I'd assume the 14.4 "per cycle" refers to an average figure, and that the cycle in question is either that of the ROPs or of the shaders/TMUs they serve. It could be either outright speed or a bubble that prevents some or all of the ROPs from reading/writing on some cycles.

Regarding clocks, I've thought for a while that altering clocks of different sections of the GPU might allow for better performance within a given power budget. For example ROPs, geometry engines, shaders can all be bottlenecks and boosting the performance of a bottlenecked area. I assume the reason this hasn't happened is down to complexity.

Being able to boost, say, fill during shadow rendering or tessellation performance could lead to an overall improvement in frame times.

And I do stand on my opinion that Nintendo going super-cheap on making a TX1 "amplified" to 28nm while taking out the A53 module and dumbing-down on non-essential stuff like the video codec and ISP (essentially what @AlNets suggested) is more believable than all of a sudden nvidia engineers deciding the ROPs are more power-efficient at 90% the clocks of everything else, despite not applying it to any of the other products in their portfolio.

The ROPs would always be more efficient at .9 of 1. That part wouldn't be a discovery. The question is whether it would be worth implementing. Nintendo don't have the luxury of throttling, and will have profiled their software. They may have a particular view of what's most important to them.

It would be interesting to know if the existing X1 had a peak fill of 14.4 p/hz, but like the 970 Nvidia chose to market differently. These supposed leaked specs are meant for developers rather than consumers, so may be more accurate than customer facing PR stuff.

Deleted member 13524 · Dec 22, 2016

function said:
The ROPs would always be more efficient at .9 of 1. That part wouldn't be a discovery.

Source?

BRiT · Dec 22, 2016

I'll throw my wild ass guess to mean "OS Reservations of 10%" is how that 14.4 number was arrived at for Game Developers. Yes it doesn't make any sense, but that's just too much of an exact number for it to not be a silly human arbitrary set limit.

function · Dec 22, 2016

ToTTenTranz said:
Source?

Voltage scales with clock speed.

function · Dec 22, 2016

BRiT said:
I'll throw my wild ass guess to mean "OS Reservations of 10%" is how that 14.4 number was arrived at for Game Developers. Yes it doesn't make any sense, but that's just too much of an exact number for it to not be a silly human arbitrary set limit.

It's just bizarre that they'd reserve ROP time but nothing that included TMUs .... unless they just forgot that part. It does have a whiff of that "lets reserve 10%" thing though. I think that was the original GPU reservation on X1, and the original reserve on the 6 "gaming cores" on X1 too, come to think of it ....

It's also a really odd figure for someone who was making fake specs to pick. :/

Goodtwin · Dec 22, 2016

Shouldn't there be more concern over the Memory clock speeds Eurogamer listed? Tegra X1 specs showed LPDDR4-3200, but Eurogamer shows 1600Mhz and 1331Mhz as the memory speeds. Am I missing something here?

AlNom · Dec 22, 2016

Goodtwin said:
Shouldn't there be more concern over the Memory clock speeds Eurogamer listed? Tegra X1 specs showed LPDDR4-3200, but Eurogamer shows 1600Mhz and 1331Mhz as the memory speeds. Am I missing something here?

double data rate.

Deleted member 13524 · Dec 22, 2016

function said:
Voltage scales with clock speed.

I didn't ask for a terribly generic sentence about voltage and clock speeds. I asked for a source/article/study saying ROPs result in better performance-per-watt if they're clocked at 90% of the rest of the GPU as you stated so surely in this post, namely here:

function said:
The ROPs would always be more efficient at .9 of 1. That part wouldn't be a discovery.

Unless you're calling "efficiency" to absolute power consumption with no regard to performance. Which is a mistake.

Ailuros · Dec 22, 2016

For the record I know Monsieur Leadbetter for years now and he's always carefully investigating before he starts writing any similar writeup. Granted everyone makes mistakes, but Richard tends to take his job as seriously as everyone in the business should.

function · Dec 22, 2016

ToTTenTranz said:
I didn't ask for a terribly generic sentence about voltage and clock speeds. I asked for a source/article/study saying ROPs result in better performance-per-watt if they're clocked at 90% of the rest of the GPU as you stated so surely in this post, namely here:

That's not what I stated - you're strawmaning [edit: maybe not, I've seen your edit]. You quoted what I stated, which was that the ROPs would be more efficient at .9 of a clock of 1. Less power per unit of work done, assuming you scale voltage with clocks as modern processors try to do (and that all else remains equal, of course).

The issue of efficiency was one you introduced though. My point was that spending power where it does you most good could be a reason for differential clocks.

function · Dec 22, 2016

Ailuros said:
For the record I know Monsieur Leadbetter for years now and he's always carefully investigating before he starts writing any similar writeup. Granted everyone makes mistakes, but Richard tends to take his job as seriously as everyone in the business should.

Don't know the man, but I've followed his work since C&VG and Mean Machines!

Deleted member 13524 · Dec 22, 2016

The Eurogamer article is mostly alright, as they're only effectively "putting their hands on fire" regarding final clock speeds, which are believable. I see no reason why they wouldn't be as those would be just a small part of what will constitute performance and power consumption.

It's when it was suddenly assumed that the Switch's final specs were the lowest common denominator between eurogamer's clock speeds and the twitter leak that refers to dev kits that the skeptic in me started questioning the obvious.
All this despite the article itself pointing out many of the discrepancies and leaving final specs in the open:

We know how fast it runs, but what are the custom modifications that set apart the bespoke Tegra from the stock X1? While we're confident that our reporting on Switch's clock-speeds is accurate, all of the questions we have concerning the leaked spec remain unanswered. Those anomalies still seem odd, and details of the processor's customisations remain unknown at this time. Has Nintendo added a bunch of smaller tweaks or has it been a little more ambitious?

@Ailuros if you personally know him then let him know that Tegra X1 only supports either Cluster or In-Kernel switching between the A53 and A57 clusters, there's no HMP. So the A53 cluster would never work in combination with the A57 cluster, hence why they would be absent from the dev kit specs.

MrFox · Dec 22, 2016

ToTTenTranz said:
I don't even want to share what I have in mind.

[snip... victim argument]

Instead, I'll just broadcast my idea that the "Twitter Leak Specs + Eurogamer clocks make no sense" premise. Because it doesn't, regardless of what the outcome of this whole thing will be.

This is the speculation thread, we're supposed to be speculating and deconstructing each other's speculations.

If you give a higher number of SM, I will want to discuss the high wattage issues.
If you give a low wattage, I'll want to discuss how many SM this can reasonably feed.
If you think nvidia tripled the gflops per watt of X1, I'll want to discuss how it's possible.

What is it you are disagreeing with?

MrFox · Dec 22, 2016

"the venturebeat story actually corroborate an earlier leak on twitter which was essentially a tegra X1, exactly as we said back in July. We haven't been adding anything to that because the sources that we have are all saying Tegra X1 or a customized variant of it essentially."
-- Richard Leadbetter

Source: www.youtube.com/watch?v=PzS4LbH5nmA

So they are confident it's an X1 and they are confident about the clock. What kind of customization could be made if devs are esentially telling them it's an X1?

BRiT · Dec 22, 2016

MrFox said:
So they are confident it's an X1 and they are confident about the clock. What kind of customization could be made if devs are esentially telling them it's an X1?

Stamp the Nintendo Seal on it.

bunnybug · Dec 22, 2016

MrFox said:
"the venturebeat story actually corroborate an earlier leak on twitter which was essentially a tegra X1, exactly as we said back in July. We haven't been adding anything to that because the sources that we have are all saying Tegra X1 or a customized variant of it essentially."
-- Richard Leadbetter

Source: www.youtube.com/watch?v=PzS4LbH5nmA

So they are confident it's an X1 and they are confident about the clock. What kind of customization could be made if devs are esentially telling them it's an X1?

they also say customization will not add compute capability or make the cpu faster at 5:41 in the video. now I really don't see what's the point in arguing these specs when every source in the industry is backing them, and we don't even have one good source saying otherwise. with the switch being out so soon, these specs are all but confirmed if you follow the industry.

Nintendo Switch Tech Speculation discussion

Deleted member 13524

Guest

Ailuros

Epsilon plus three

Lalaland

Ailuros

Epsilon plus three

function

None functional

Deleted member 13524

Guest

BRiT

(>• •)>⌐■-■ (⌐■-■)

function

None functional

function

None functional

Goodtwin

AlNom

Moderator

Deleted member 13524

Guest

Ailuros

Epsilon plus three

function

None functional

function

None functional

Deleted member 13524

Guest

MrFox

Deludedly Fantastic

MrFox

Deludedly Fantastic

BRiT

(>• •)>⌐■-■ (⌐■-■)

bunnybug

Similar threads