NVIDIA Tegra Architecture

MyDrivers.com with some Wayne Tegra 4 information: http://news.mydrivers.com/1/232/232173.htm

64 CUDA cores (probably Kepler) @520MHz
Measured shader performance at laboratory:
6x Adreno 320 @ APQ8064
4x SGX544MP2 @ OMAP5​

Actually, in their article they really do mean adreno 225 and not 320 so 6x the performance of 225 doesn't sound that impressive considering that 320 is almost 3x faster than 225. Besides, won't believe before we'll see it.
 
Actually, in their article they really do mean adreno 225 and not 320 so 6x the performance of 225 doesn't sound that impressive considering that 320 is almost 3x faster than 225. Besides, won't believe before we'll see it.

Well 6x adreno 225 certainly will be impressive if that translates into real gaming performance, although I very much doubt it somehow.
 
Well 6x adreno 225 certainly will be impressive if that translates into real gaming performance, although I very much doubt it somehow.

Nothing out of it makes sense either way you turn it. However currently the SGX543MP4 in iPad3 turns out almost 2.5x times faster than the fastest Adreno225 in GLbenchmark2.1 Egypt offscreen. Claiming "any" Wayne (remember Wayne will scale from superphones to clamshells with varying amount of units and/or frequencies) is by 6x times faster than a Adreno225 isn't that absurd per se, if hypothetically someone compared a =/>tablet design against a smart-phone design. The problem here is rather two fold:

1. How the hell do you get such a performance difference with only 64SPs@520MHz.
2. TI revealed some early GLBenchmark2.5 results, where their OMAP5430 gets 46,0 fps against 43,0 fps of an iPad3. Now if the above claim would be accurate that supposed Wayne would need to get at least 170fps in GLBenchmark2.5@1080p.

Given the original "source" where that rumor mongering is based on, it's even a waste of bandwidth further dealing with that kind of crap.

I'm positive that NV will be far more aggressive with GPU performance with Wayne, but if someone wants me to believe those claims the first question would design target per SoC variant and power consumption. Obviously a clamshell isn't going to have the same power requirements as a smartphone.
 
Nothing out of it makes sense either way you turn it. However currently the SGX543MP4 in iPad3 turns out almost 2.5x times faster than the fastest Adreno225 in GLbenchmark2.1 Egypt offscreen. Claiming "any" Wayne (remember Wayne will scale from superphones to clamshells with varying amount of units and/or frequencies) is by 6x times faster than a Adreno225 isn't that absurd per se, if hypothetically someone compared a =/>tablet design against a smart-phone design. The problem here is rather two fold:

1. How the hell do you get such a performance difference with only 64SPs@520MHz.
2. TI revealed some early GLBenchmark2.5 results, where their OMAP5430 gets 46,0 fps against 43,0 fps of an iPad3. Now if the above claim would be accurate that supposed Wayne would need to get at least 170fps in GLBenchmark2.5@1080p.

Given the original "source" where that rumor mongering is based on, it's even a waste of bandwidth further dealing with that kind of crap.

I'm positive that NV will be far more aggressive with GPU performance with Wayne, but if someone wants me to believe those claims the first question would design target per SoC variant and power consumption. Obviously a clamshell isn't going to have the same power requirements as a smartphone.

I call bull on this one, doesn't seem realistic in phone, a tablet maybe..but if you look at the claim other specifically says "shader performance"..not outright gaming performance, that could narrow the scope somewhat.

It could be in gpgpu abilities? Or smoother metric that would not translate into real world 6x adreno 320.
And as pointed out, ardent 320 tests so far have been Ian smartphone setup up..so.

Edit: how would they even have access to those gpus to test??
 
FYI: Dan Vivoli seemed pretty confident that nVIDIA will be the first to market with a DX9 compliant part. We'll see if that comes about. I have seen one developer say that ATI stated R300 would be a Xmas 2002 part.

Sorry what. AmI going on about? I meant 225 anyway they mentioned shades performance not outright performance.
 
MyDrivers.com with some Wayne Tegra 4 information: http://news.mydrivers.com/1/232/232173.htm

64 CUDA cores (probably Kepler) @520MHz
Measured shader performance at laboratory:
6x Adreno 320 @ APQ8064
4x SGX544MP2 @ OMAP5​

I was looking at this post and got my self mixed up even though I knew it was 225 :)

What I mean just looking at that quote is that it states shader performance..not omit right gpu performance...they don't necessarilly mean the same thing, if the metric used to define shades performance is gpgpu capabilities or something then being 6x adreno 225 in that metric might now make it 6x in real benchmark terms....similar to how fermi was miles more powerfull at gpgpu or dx11, where as in real games that did not work out that way.

I'm just throwing that one out there of course.
 
I was looking at this post and got my self mixed up even though I knew it was 225 :)

What I mean just looking at that quote is that it states shader performance..not omit right gpu performance...they don't necessarilly mean the same thing, if the metric used to define shades performance is gpgpu capabilities or something then being 6x adreno 225 in that metric might now make it 6x in real benchmark terms....similar to how fermi was miles more powerfull at gpgpu or dx11, where as in real games that did not work out that way.

I'm just throwing that one out there of course.

Oh, okay now it makes sense :)
 
I was looking at this post and got my self mixed up even though I knew it was 225 :)

What I mean just looking at that quote is that it states shader performance..not omit right gpu performance...they don't necessarilly mean the same thing, if the metric used to define shades performance is gpgpu capabilities or something then being 6x adreno 225 in that metric might now make it 6x in real benchmark terms....similar to how fermi was miles more powerfull at gpgpu or dx11, where as in real games that did not work out that way.

I'm just throwing that one out there of course.

Ignoring real time efficiency and purely on theoretical maximum numbers:

Adreno225 = 8 Vec4 ALUs@400MHz = 32 * 2 FLOPs * 0.4 GHz = 25.6 GFLOPs
Hypothetical Wayne = 64SPs@520MHz = 64 * 2 FLOPs * 0.52 GHz = 66.6 GFLOPs (ooops sorry UltraSuperDuper FLOPs (tm))

On paper difference = 2.6x times

If the difference would really be at 6x times, it would mean that roughly 1/3rd of the 225 ALUs are sitting idle and twisting thumbs. Adrenos have USC ALUs, therefore FP32 and of course GPGPU capable.

Oh and just for the record Fermi has against it's direct competitor from AMD an advantage under extreme tessellation scenarios under DX11. Not really related to GPGPU.
 
Ignoring real time efficiency and purely on theoretical maximum numbers:

Adreno225 = 8 Vec4 ALUs@400MHz = 32 * 2 FLOPs * 0.4 GHz = 25.6 GFLOPs
Hypothetical Wayne = 64SPs@520MHz = 64 * 2 FLOPs * 0.52 GHz = 66.6 GFLOPs (ooops sorry UltraSuperDuper FLOPs (tm))

On paper difference = 2.6x times

If the difference would really be at 6x times, it would mean that roughly 1/3rd of the 225 ALUs are sitting idle and twisting thumbs. Adrenos have USC ALUs, therefore FP32 and of course GPGPU capable.

Oh and just for the record Fermi has against it's direct competitor from AMD an advantage under extreme tessellation scenarios under DX11. Not really related to GPGPU.
Either its fake or that is what they are talking about.
Fermi was much better than evergreen at tessellation and gpgpu, evergreen was better for pure gaming fps.
 
While it's completely OT: why do you keep repeating that Fermi was supposedly so much better with GPGPU?
 
While it's completely OT: why do you keep repeating that Fermi was supposedly so much better with GPGPU?

Sorry I don't know what that (OT ) abbreviation is?.

I keep repeating it ailuros because it was...it was built for it from the ground up, because it didn't do so well against evergreen in power consumption and game fps, they chopped out some of the compute capabilities and left them to tesler cards...this improving allocated die area for gaming performance per watt in even the new kepler...which in an ironic twist has reversed positions with amd ...who's gcn is an all rounded with very good gpgpu capabilities, but because of that functionality and the lack of focus in that area (in kepler)...kepler is better..(at gaming and power consumption) even if just like fermi before it gcn is far superior at gpgpu.

Here is a couple of reviews I dug up from well respected sites which back up what I have said:
http://www.anandtech.com/show/2977/...tx-470-6-months-late-was-it-worth-the-wait-/6

http://hothardware.com/Reviews/NVIDIA-GeForce-GTX-480-GF100-Has-Landed/?page=15

Bearing in mind fermi was late with likely less mature drivers than evergreen, also had lots fp 64 capped (not sure that would effect the results..you would know more than me) and also various accounts of people preferring to use fermi for gpgpu as it's apparently easier to develop for...indeed there has been a discussion on the console thread about amd vliw 5 being poor for such scenarios.

Edit: we seem to have got lost little..this is of course in the context of shader compute performance not necessarily meaning gaming performance being just as good with Wayne vs adreno 225.
 
Last edited by a moderator:
OT= off topic

As for the OT above it depends on the applications used and perspective as for another cross example: http://www.ixbt.com/video3/gf100-2-part2.shtml

Things aren't exactly just black and white and it still remains a fact that despite architectural differences on paper a 5870 has 2,72 TFLOPs single precision and 544 GFLOPs double precision against the 1.3 TFLOPs single precision and 168 GFLOPs double precision of a GTX480. Clearly Fermi/GF100/110 were far more concentrated on HPC markets than Cypress and co and they have distinct advantages, but it doesn't mean that it's an all time win win situation for Fermi.
 
OT= off topic

As for the OT above it depends on the applications used and perspective as for another cross example: http://www.ixbt.com/video3/gf100-2-part2.shtml

Things aren't exactly just black and white and it still remains a fact that despite architectural differences on paper a 5870 has 2,72 TFLOPs single precision and 544 GFLOPs double precision against the 1.3 TFLOPs single precision and 168 GFLOPs double precision of a GTX480. Clearly Fermi/GF100/110 were far more concentrated on HPC markets than Cypress and co and they have distinct advantages, but it doesn't mean that it's an all time win win situation for Fermi.

Well anyway I was just pointing out that "shader" performance doze not necessarily mean overall gaming performance.
 
Well anyway I was just pointing out that "shader" performance doze not necessarily mean overall gaming performance.

Since next generation small form factor GPUs will most likely all (if not then at least the majority) have "scalar" ALUs, it should be expected that ALU utilisation or if you prefer arithmetic and general efficiency will be quite a bit higher than in current GPUs with vector ALUs.

While it'll obviously work in favor of Wayne's GPU (as much as Adreno320, Mali T6xx etc.) against today's GPUs, I still have a damn hard time believing that the difference will equal to such a magnitude of difference. If then highest end Wayne obviously has a magnitude of more stream processors in its GPU than "just" 64@520MHz.
 
Could something like compiler efficiency pull the scores up? Maybe some cache coherency with the cpu ala t-604?
Also nvidia does have the best drivers..probably only reason why tegra pulls ahead of adreno 225?

I havnt got a clue too be honest, if it is real then it's going to be some obscure nvidia inhouse compute benchmark.
 
Could something like compiler efficiency pull the scores up?

Probably. And even if Adreno 3xx is scalar that doesn't mean it's trivial to schedule code for.

Maybe some cache coherency with the cpu ala t-604?

I doubt it. Cache coherency may be useful if the CPU and GPU are sharing memory. For games it's usually very one way, where the CPU writes to the GPU/somewhere in memory that the GPU reads.
 
Probably. And even if Adreno 3xx is scalar that doesn't mean it's trivial to schedule code for.



I doubt it. Cache coherency may be useful if the CPU and GPU are sharing memory. For games it's usually very one way, where the CPU writes to the GPU/somewhere in memory that the GPU reads.

Yes but I read somewhere for compute tasks some shared memory between gpu and cpu is beneficial? As it is being compared to adreno 225 (as opposed to 3xx) in this scenario, it is possible that increased shader power coupled to a much more impressive compiler could yield 6x?
 
Yes but I read somewhere for compute tasks some shared memory between gpu and cpu is beneficial? As it is being compared to adreno 225 (as opposed to 3xx) in this scenario, it is possible that increased shader power coupled to a much more impressive compiler could yield 6x?

Sure, but you don't really think that 6x number involves compute tasks, do you? How much compute do you really think will be happening on phones and tablets even in that timeframe?
 
Back
Top