Nintendo Switch Tech Speculation discussion

Status
Not open for further replies.
I don't know if CheapChips was joking but I 100% agree that NV had a whole load of TX1 burning a hole in the warehouse/balance sheet. It has not been a successful chip for them, there's a reason you can't buy a Qualcomm handset or an Intel notebook as the profit is in the chip not in dealing w/consumers. Prior to Switch the only 3rd party wins for TX1 were one or two Asia only handsets and the latest Google Pixel, a volume success it is not.

As with AMD and Jaguar/GCN in the consoles NV will be hoping this provides a long term revenue stream to support them making the next thing a success. Although with them blathering on AI and in car entertainment and navigation solutions I'm guessing it's a ridiculously inefficient power hog too
 
From a different thread, but... curious, if accurate. ;)
*cough*

http://electroiq.com/blog/2015/05/moores-law-to-keep-on-28nm/

VUlFO2.png

Something seems off about the 14nm FD SOI cost there ^

edit:

another source June 2016:


Handel3.png
 
Last edited:
Assuming the 256 Cuda cores are stalling thanks to the minimal 25GB/s memory bandwidth, how much would you really expect a games performance to increase by doubling it to 50GB/s? Maybe a 20-30% increase in framerate at the most?

Hardly. Parker TX2 has 2x bandwidth, 50% higher GPU clocks, souped up CPU cores, and it only gets 50% higher performance in public benchmarks.
If the ALUs were stalling that much in TX1, then we'd see more than 50% performance scaling between TX1 and TX2 (since it has twice the bandwidth).
 
Hardly. Parker TX2 has 2x bandwidth, 50% higher GPU clocks, souped up CPU cores, and it only gets 50% higher performance in public benchmarks.
If the ALUs were stalling that much in TX1, then we'd see more than 50% performance scaling between TX1 and TX2 (since it has twice the bandwidth).

Where did you find benchmarks for Parker?
 
I was half joking about nVidia having an x1 stockpile. I would love for the Switch to have the best that nVidia can offer. At this point it feels unlikely, but who knows.

If I had Photoshop skills I'd do a pic of someone sticking little Nintendo labels on x1 chips. ;)
 
The benchmarks that @ToTTenTranz linked show Tegra Parker with a 35% advantage over the TX1 results. Even less on texturing where it saw only a 17% advantage. Makes me believe the 25GB/s isn't crippling the TX1 performance, and probably even less so with a lower clocked processor.

Assuming there isn't a warehouse full of TX1's that Nintendo is purchasing, the wafer cost comparison linked above makes me think that maybe the idea of 28nm HPC+ isn't so crazy after all. Its cheaper than both 20nm and 16nm FinFet by quite a bit, and in a way it would make the lower than expected clock speeds make more sense. From Nintendo's perspective, the reduced cost may be worth the reduction in performance.
 
The benchmarks that @ToTTenTranz linked show Tegra Parker with a 35% advantage over the TX1 results. Even less on texturing where it saw only a 17% advantage. Makes me believe the 25GB/s isn't crippling the TX1 performance, and probably even less so with a lower clocked processor.

Assuming there isn't a warehouse full of TX1's that Nintendo is purchasing, the wafer cost comparison linked above makes me think that maybe the idea of 28nm HPC+ isn't so crazy after all. Its cheaper than both 20nm and 16nm FinFet by quite a bit, and in a way it would make the lower than expected clock speeds make more sense. From Nintendo's perspective, the reduced cost may be worth the reduction in performance.
But what about heat ?
 
But what about heat ?

Reportedly, power efficiency and thermal performance isn't that much better on 20nm than 28nm, which is why nVidia passed on the process completely for their GPUs and AMD for GPUs and APUs. The bigger advantage is reduced die size (transistor & gate area equivalent to 16FF, actually), but that is somewhat nullified by the difference in waffer price.

So if the SoC only has 4*A57 cores at 1GHz, 2 SM at 300-768MHz (clocks are so low they could be using area-optimized transistors everywhere) and a 2*32bit memory controller, it could make a lot more sense to use 28nm than 20nm.
For example, the 2 SMX / 64bit memory GK208 only takes 80mm^2. Maxwell SMs are smaller than Kepler SMXs, so a Maxwell equivalent could be even smaller. Add a 4*A57 module in it plus some glue logic and we could be looking at a <100mm^2 SoC at 28nm. There's no need to go 20nm if the chip is already that small.


OTOH, nvidia maintaining production of the >2 year-old TX1 for the "new" Shield TV could point to them just slapping the same chip on the Switch, underclock and call it a day.
 
OTOH, nvidia maintaining production of the >2 year-old TX1 for the "new" Shield TV could point to them just slapping the same chip on the Switch, underclock and call it a day.

IIRC from the early days of trying to make HTPC projects truly passive, CPUs that could be under-volted to reduce clockspeed and heat output were almost as hard to find as ones that could be over-volted for higher clockspeeds. Is it that simple to basically dial down the multiplier and then under-volt the chip or are the two unrelated in that you can lower the clockspeed handily via multipliers but undervolting takes more work?

I'm working on the presumption that they've reduced the voltage on the TX1 to increase battery life as I assume lower clocks at the same voltage would offer no or negligible energy savings over the same voltage at a higher clocks?
 
How much dye space do the A53 cores take up? I'm not sure how much other hardware could be scrapped in the TX1 assuming the Tegra chip is indeed custom, I would have to think its worthwhile to scrap those A53 cores. The low clocks lend themselves nicely to the larger 28nm process. Its cheaper by a decent margin, and yields are superior. Really I cant see a reason why Nintendo wouldn't choose the lower cost option when power consumption and heat would be nearly identical to 20nm.
 
Is it that simple to basically dial down the multiplier and then under-volt the chip or are the two unrelated in that you can lower the clockspeed handily via multipliers but undervolting takes more work?
In ARM cores that were designed for handhelds it should be no effort at all. There's most probably a pre-determined power-state for 1GHz operation in TX1, by design. In this case all they have to do is disable the power-saving, rush-to-sleep and other features through the OS kernel and simply fix the CPU frequency at 1GHz.
@Nebuchadnezzar usually does an excellent job at evaluating that in Anandtech articles. I don't think they did this for the TX1, but here's an example of how much the A57 cluster consumes with 4 cores @ 1GHz, in Exynos 5433 (also 20nm):

MYbZN6.png


And if you want to see how much Nintendo/nvidia will be loosing by being greedy as fuck if they haven't gone with 16FF with A72 cores:

EBcoYo.png


1.83W vs. ~1W, though since the A72 cores have a substantially higher IPC than the A57, they could probably go with 900MHz and get the same performance (or clock the CPU at 1.7GHz and get the same power consumption), so 4*A72 @ 16FF would probably consume less than half at the same performance as 4*A57, or get twice the performance at same power consumption.
With A73 the difference would be even larger, if you go by ARM's promises. The A73 is supposedly a lot smaller and consumes less power than A72.


How much dye space do the A53 cores take up?
It's not linear because the cores can use either area-optimized or performance (frequency)-optimized transistors. Except for those chinese SoCs with dual 4*A53 clusters (where the higher-clocked cluster is visibly larger than the lower-clocked one), most LITTLE A53 clusters are usually designed to use low frequencies so they are area-optimized and end up being significantly smaller than the big A57/A72 clusters. In SoCs like e.g. Snapdragon 810, Exynos 5433 or TX1, my bet would be one A53 = ~1/4 A57 in area.

Easy answer would be, if you go by this comparison, 4x A53 would shave off little more than 2mm^2, plus whatever glue logic they need:

DE6JOe.png


I suspect the ISP and video codec hardware take a whole lot more die area than the A53 cluster.
 
Status
Not open for further replies.
Back
Top