Nintendo announce: Nintendo NX

Status
Not open for further replies.
I'm still confused. ;) No-one's saying NX is going to use Drive PX2. The idea is NX can use a single Tegra SOC. Drive PX2 has two of these with a combined TF os 2.5. Ergo, one of these is 1.25 TF which would be the ballpark for NX, no? Are you suggesting a discrete GPU coupled to this is NX? How'd that work in a handheld?
I agree no-one is saying it is using a Drive PX2, but all the recent posts have been because of the Drive PX2 and trying to translate that to the NX, my 1st post was in response to that.

As I also pointed out, there is only a marginal performance difference between the Tegra X1 and Tegra 'X2' when comparing the performance figures given by Nvidia for the Drive PX and PX2 platforms....
Dual Tegra X1 is 2.3 Teraflops in Drive PX, while dual Tegra 'X2' is 2.5 Teraflops in Drive PX2.

Yes I keep mentioning that the Drive PX2 will not be a direct translation to a gaming/console Tegra solution, especially as the console will not be making much use of the ISPs-DSPs (maybe a bit for AR)
But my point again is that the latest Tegra 'X2' in Drive PX2 is designed around a discrete solution where before it was not, this does not mean it has to be a traditional full GPU chip and we have no idea how the Tegra 'X2' is actually designed internally.
Why was it not possible to do discrete GPU with the Drive PX like they do now with the PX2, and why are the performance figures nearly identical from Nvidia for the Tegra X1 and X2 processor in the Drive platform according to Nvidia's website?
Quoting Nvidia site to show there is only marginal difference in terms of Teraflops between both generations for the Tegra processor, meaning logically that TX1 would also be good enough to be close to the Xbox-one (in reality it is not).

DRIVE PX features dual NVIDIA Tegra® X1 processors and delivers 2.3 teraflops of performance.

Cheers
Edit:
Hence the importance of the discrete GPU integration in this context and how they improved over previous Drive/Tegra platform.
 
Last edited:
I literally have no idea what you're saying. :oops::???: Can you explain your point regards NX and what Tegra/X2/Drive means in relation to that? Are you saying NX will have more flops, or less, or use TX1, or a discrete GPU, or what??

As for TX1 being comparable to XB1, GPU flops isn't only part of the picture as discussed earlier/elsewhere.
 
I literally have no idea what you're saying. :oops::???: Can you explain your point regards NX and what Tegra/X2/Drive means in relation to that? Are you saying NX will have more flops, or less, or use TX1, or a discrete GPU, or what??

As for TX1 being comparable to XB1, GPU flops isn't only part of the picture as discussed earlier/elsewhere.

The comparison between Tegra and XB1 is not just based upon GPU flops, it is pretty clear the Nvidia Shield Console with Tegra X1 is nowhere near the XB1, and some sources are suggesting that the NX will be only a little bit behind the XB1, these were the same sources going back several months that said it was not x86/AMD solution.
But we can compare dual 'X2' to dual X1, and both have very close numbers to each other 2.3Tflops vs 2.5Tflops as reported from Nvidia for the Drive platform (only data we can use so far).

I am saying one cannot just take the Tegra 'X2' processor on its own and use that as the basis of the Nintendo NX, for the reasons I have mentioned earlier as there is a radical change in approach from Nvidia.
Also as I pointed out, this focus on the Tegra 'X2' and its performance is also academic because it has little performance gain over the X1; specifically all of this is in the context of the Drive platform that others brought into this discussion.
Going by that logic, Nintendo might as well save cash and use the previous X1 with its comparable performance shown for Drive platform (again has to come back to this as X2 presented so far is only for Drive PX2), and that goes against the sources mentioning it is a very current next gen design (same sources saying it was not x86/AMD).

Assumptions are being made about the design, where it is pretty clear that to improve over the previous Tegra X1 Drive platform it had to change and integrate discrete GPU architecture.
How this impacts the internal design and integration, we cannot say until further information is released, maybe they can release a solution without a discrete aspect but if using Drive PX2 as the basis of any discussions it can only be an assumption as the discrete aspect is core to the design with this generation so far, compounded that the Tegra X1 performance is very close to the Tegra 'X2' using Nvidia figures.
And to re-iterate, my posts were only after the discussion started regarding the Drive PX2.

Of course this all blows up if it does turn out to be AMD win (seems more and more unlikely) :)
Cheers
 
Last edited:
Interesting. They specifically mention strong single thread performance. In light on the Nintendo NX, how do you think the first generation of Denver compared with the Jaguar cores in PS4/XB1? Could the second generation of Denver be more powerful single thread wise?
EDIT - Could Denver face less problems in a gaming console than it did inside Android mobile devices, since software would be designed with it in mind since the beginning?
 
Last edited:
I also see that the latest information says there is only 256 Pascal Cuda Cores inside the X2 SoC, which explains the performance similarity to X1, shame they did not go into information about the GPUs as their performance figures do not match the GP106 and are actually worst; to put it into perspective the 1060 can still hit near its 3.8Tflops with 1500MHz just using 60W (measured by Tom's Hardware), and the Drive PX2 has dual GPUs providing 5 Tflops in an automobile solution designed for greater power demands.
Still not convinced 256 Pascal Cuda Cores per processor is enough unless they go with 2xprocessors in the NX or 1xprocessor and 1x smallest mobile Pascal GPU.
Does anyone know if the X1 Shield Console had 1 or 2 Tegra X1 processors (I assume the dual setup was only for Drive PX)?
Cheers
 
Last edited:
That Tegra SoC TFLOP figure though is without considering Pascal GPU capability, unlike TX1 processor (which has 256 Maxwell Cuda cores) it is not integral but discrete with the Drive PX2 Tegra precessor.
So it would in theory be higher as it is CPU 2.5 TFLOPs and discrete Pascal GPU.
Definitely not CPU 2.5 TFLOPs as that would require a massive server to achieve. Maybe you meant SoC 2.5 TFLOPs?

As I also pointed out, there is only a marginal performance difference between the Tegra X1 and Tegra 'X2' when comparing the performance figures given by Nvidia for the Drive PX and PX2 platforms....
Dual Tegra X1 is 2.3 Teraflops in Drive PX, while dual Tegra 'X2' is 2.5 Teraflops in Drive PX2.

The dual-TX1 in the PX1 can do 2.3 TFLOPs FP16 and around half of that if it's FP32. The dual TX2 can do 2.5 TFLOPs FP32 which matches the "total 8 TFLOPs" in the PX2 that nvidia claimed a while ago.
We don't know yet if the Pascal GPU in Parker/TX2 can do 2*FP16 like the TX1. nvidia is certainly very quiet about that so far.

All things considered, we're looking at ~2x the compute throughput between TX1 and TX2 in FP32.
 
Definitely not CPU 2.5 TFLOPs as that would require a massive server to achieve. Maybe you meant SoC 2.5 TFLOPs?



The dual-TX1 in the PX1 can do 2.3 TFLOPs FP16 and around half of that if it's FP32. The dual TX2 can do 2.5 TFLOPs FP32 which matches the "total 8 TFLOPs" in the PX2 that nvidia claimed a while ago.
We don't know yet if the Pascal GPU in Parker/TX2 can do 2*FP16 like the TX1. nvidia is certainly very quiet about that so far.

All things considered, we're looking at ~2x the compute throughput between TX1 and TX2 in FP32.
Where are you getting the FP32 figure from?
Some calculations in the other thread seem to think it is still FP16 without the discrete GPU cores.
Also how is such a compute figure so different as you say when both generations have the same number of Cuda cores?
Look at Maxwell to Pascal to see how close they are without the extreme clocks, and remember the X1 was a 20nm SoC (not that it makes much difference in this context).
Seems you are suggesting there is a doubling of performance using same number of Cuda cores and this was not even possible with discrete Maxwell-to-Pascal, unless I am missing something.

Cheers
 
Last edited:
Where are you getting the FP32 figure from?
Some calculations in the other thread seem to think it is still FP16 without the discrete GPU cores.
Cheers
nvidia's PX2 page claims 2.5 TFLOPs with the dual TX2 plus 5 TFLOPs with the dual discrete GPUs (which are probably two GP107 or GP108) for a total of 7.5 TFLOPs.
Back in January, nvidia claimed the PX2 would do 8 TFLOPs FP32:

Ryan said:
As far as performance goes, NVIDIA spent much of the evening comparing the PX 2 to the GeForce GTX Titan X, and for good reason. The PX 2 is rated for 8 TFLOPS of FP32 performance, which puts PX 2 1 TFLOPS ahead of the 7 TFLOPS Titan X.


So the TFLOPs in PX 1's specs are measured with FP16 operations while the TFLOPs in PX 2 are FP32.
Therefore, the TX 2 either has 4 SM / 512 cuda cores at ~1.2GHz or something like 3 SM / 384 cuda cores at ~1.8GHz (or 2 SM / 256 cuda cores at 2.4 GHz?). Considering it's a SoC, I don't think that iGPU is getting >1.2 GHz clock speeds, but I could be wrong.
 
nvidia's PX2 page claims 2.5 TFLOPs with the dual TX2 plus 5 TFLOPs with the dual discrete GPUs (which are probably two GP107 or GP108) for a total of 7.5 TFLOPs.
Back in January, nvidia claimed the PX2 would do 8 TFLOPs FP32:
I think some of that was based upon unknowns and the problem is removing the 2xdiscrete Pascal GPUs figures from that data, the X1 Drive PX never implemented this; this is where the improvements are coming from with the 'X2' over the X1 IMO.

Is it possible to explain how they managed to double the performance using same number of 256 Cuda Cores going from Maxwell in X1 to Pascal in 'X2'?
Unless now we are not talking about GPU cores within X1 or 'X2'.
It does not make sense to me, especially as the discrete GPUs did not come anywhere close to that doubling going to a smaller node and also more cores with extreme clocks when switching from Maxwell to Pascal.
The
Cheers
 
Last edited:
More and more rumors say that in fact, Tegra won't be in NX, so...

Would that mean that NVidia is not in the NX? Or are we to believe NVidia developed custom silicon for it? Would the rumours about the development kit using Tegra be fake then?
 
Can you explain how they managed to double the performance using same number of 256 Cuda Cores going from Maxwell in X1 to Pascal in 'X2'?

Who is saying the TX 2 has 256 cuda cores? I find that rather hard to believe.

A larger SM count and higher compute troughput would explain the TDP demands for the PX 2.
From a 250W total TDP if you take 75W for each GP107, it means the remaining 100W will be for two TX 2. 50W for a Tegra X2 that has almost the same performance as a Tegra X1 which pulls 20W at the wall in a full system would make the new SoC a lot less efficient. That doesn't add up, especially considering the TX2 is made on FinFet+ and the TX1 was made on planar 20nm.

Even if the GP107s are actually using 100W each and each TX2 are using 25W it wouldn't make sense from a process evolution standpoint.
 
Who is saying the TX 2 has 256 cuda cores? I find that rather hard to believe.

A larger SM count and higher compute troughput would explain the TDP demands for the PX 2.
From a 250W total TDP if you take 75W for each GP107, it means the remaining 100W will be for two TX 2. 50W for a Tegra X2 that has almost the same performance as a Tegra X1 which pulls 20W at the wall in a full system would make the new SoC a lot less efficient. That doesn't add up, especially considering the TX2 is made on FinFet+ and the TX1 was made on planar 20nm.

Even if the GP107s are actually using 100W each and each TX2 are using 25W it wouldn't make sense from a process evolution standpoint.

Computerbase.de is with slides they now have: https://www.computerbase.de/2016-08/nvidia-tegra-parker-denver-2-arm-pascal-16-nm/

Also this is how the figure breaks down accurately to 25 DL TOPs, which is a form of Int8 using Nvidia figures (as an example the Pascal Titan X has 11TFLOPS FP32 and 44 TOPS Int 8):
  • Dual NVIDIA Tegra® processors delivering a combined 2.5 Teraflops
  • Dual NVIDIA Pascal discrete GPUs delivering over 5 TFLOPS and over 24 DL TOPS
Discrete GPU with 5 TFLOPs of FP32 provides 20 DL TOPS.
Tegra 'X2' with 2.5 Teraflops of FP16 provides 5 DL Tops.
That gives the total of 25 DL TOPS.

In the slide they showed the overall figure as 8 Tflops and 24DL TOPs, which is only possible if there is a differentiation between the discrete GPUs and the Tegra 'X2' as FP16 (which ties in with the other thread and their calculations); yeah Nvidia has fudged figures again.
So I think from all discussed it is becoming clearer why they radically went with a discrete GPU solution that integrates with Pascal Tegra, IMO anyway.
Cheers
 
Last edited:
Would that mean that NVidia is not in the NX? Or are we to believe NVidia developed custom silicon for it? Would the rumours about the development kit using Tegra be fake then?

nVidia not in the NX at all is the new "trend". Won't be the first time that Eurogamer is wrong about nVidia and consoles... I think maybe it was in the mix at one point, maybe even some devkits for the handheld version, but I believe in the DMP gpu rumor more, and AMD for the home version (I don't believe in the hybrid rumor either....)
 
nVidia not in the NX at all is the new "trend". Won't be the first time that Eurogamer is wrong about nVidia and consoles... I think maybe it was in the mix at one point, maybe even some devkits for the handheld version, but I believe in the DMP gpu rumor more, and AMD for the home version (I don't believe in the hybrid rumor either....)

Hmm.. The plot thickens :D
 
Wow, the TX 2's iGPU really sucks! Close to zero GPU performance enhancements apart from bandwidth, at the cost of higher power consumption while using a more expensive manufacturing process? Damn, those Denver cores had better be the second coming of Conroe or something..
 
Well, the density improvements aren't significant between the two nodes, so there's only so much they can fit reasonably.

Perhaps doubling bandwidth + compression improvements should still see a decent boost.
 
Last edited:
Nvidia has pretty much stopped tegra development it seems.
Or for their purposes Pascal is now so efficient when closer to their Vlow value it made more sense in having a solution that scaled with single to multiple 'X2' cores and single to multiple lower spec mobile GPUs.
This is even more relevant with hints that Volta is meant to be another noteable step up in performance and efficiency, and if going Tegra/Arm SoC you can only do so much due to the diverse functions and components in the design.

So from that strategy point seems to make sense to make the switch as they have done and concede Tablets/etc.
Tegra development still exists and will continue to do so as it is more than just the internal Cuda cores.
Cheers
 
Last edited:
Status
Not open for further replies.
Back
Top