NVIDIA Tegra Architecture

They have AZDO, is there a real need for yet another ctm api?
http://c0de517e.blogspot.ca/2014/06/rate-my-api.html

To a degree AZDO is more a solution "around" OpenGL, rather than fixing OpenGL by creating an api that allows fast multithreaded command buffer generation, it provides a way to draw with minimal API/driver intervention.
In a way is a feat of engineering genius, instead of waiting for OpenGL to evolve its multithreading model it found a minimal set of extensions to work around it, on the other hand this probably will further delay the multithreading changes...

Results seem great, the downside of this approach is that all other modern competitors (DirectX12, Mantle, XBox One and PS4 libGNM) allow both to reduce CPU work by offloading state binding to GPU indirection and support fast CPU command buffer generation via multithreading and lower-level concepts, which map to more "conventional" engine pipelines a bit more easily. There is also a question about if the more indirect approach is always the fastest (i.e. when dealing with draws that generate little GPU work) but that's yet up to debate (as AZDO is very new and I'm not aware of comparisons pitting it against the other approach).
From the description it sounds like ADZO tackles one aspect of overhead by addressing command buffer generation, but doesn't address state binding. So ADZO probably leaves some performance on the table that a new CTM API could address.

If Khronos is slow to evolve OpenGL and not willing to sacrifice compatibility for CAD makers and other parties, in an idealized world it would be nice if nVidia would write a new, clean, modern API in a manner that is familiar to OpenGL developers to allow quick transition, but has low CPU overhead and is GPU efficient as well as being cross-platform and cross-vendor and force their hand by publicly giving it to Khronos for standardization as say OpenGL CTM and seeing how they respond. Of course, that's kind of like what Mantle is, but it may be easier to convince AMD and Intel to adopt a nVidia designed spec than nVida to adopt an AMD designed spec.
 
Of course, that's kind of like what Mantle is, but it may be easier to convince AMD and Intel to adopt a nVidia designed spec than nVida to adopt an AMD designed spec.

Just like I don't see Nvidia adopting Mantle although they could. I don't see AMD adopting something similar from Nvidia, especially as they already have Mantle.

And when it comes to Intel, well. You'd have to show where it is in their best interest AND convince them that the market won't adopt their own solution (like x64 where they were forced to adopt AMD's solution).

Considering Nvidia only holds a small fraction of the Android ecosystem and Intel currently dominates the PC ecosystem, that's a pretty tall order. And even worse for AMD with regards to Mantle.

On PC, Intel is likely more than happy to just wait for Dx12. And for Android...well Google seems to be extremely adverse to anything close to the metal if their comments with regards to GPU compute on Android devices is anything to go by..

Regards,
SB
 
Last edited by a moderator:
It is not angry it is twisted, thought it was a good match
As for why I did this "necro post" simply because I think the info was not reported already.

No sarcam or hard fellings here, cool :)

Edit
I've to say that that product captured my attention, I might not buy it but I would be pretty happy to try it as contrary to my first reaction toward those devices for me tablets do not cut it.

As a side note I wish HP came with a tinier version akin to the chromebook they do.
 
Last edited by a moderator:
Have they canned gamepad shield?

No.

The Nvidia Shield has been one of the more impressive dedicated portable gaming platforms we've seen recently, but it's also been out for quite some time. As we all know, technology has a pretty short shelf life. In an interview with Engadget back in October, Nvidia CEO Jen-Hsun Huang said that we should expect a second generation of the Shield soon. Now it seems that there's an official FCC filing detailing some of the key bits of the Shield 2. The document refers to the device as the "P2570," very similar to the current Shield's model number of "P2450."

Comparing the two suggests that the update is a sleeker, albeit more aggressive design. It's not much, but the filing shows that it'll run a bit lighter than the original, use a Tegra K1 chip, and likely will have a higher screen resolution. It also seems that the device is finished and ready to go, so it shouldn't be too much longer before we start seeing them on store shelves.


Nvidia Shield Line Will See an Update

http://www.tomshardware.com/news/nvidia-shield-2-android-shield,27010.html
 
I've been working on a die area analysis of Tegra K1 over the last few months, since the announcement at CES, and I think I've finally managed to identify the GPU. Not only that but I think I've also figured out where the Cortex A15s and the companion core are. Check it out:

tegrak1.png


The latest techniques were used to get an accurate image of the first silicon layer and while it's never going to be obvious to the layperson where the main blocks are, I've highlighted them.

The blue is the GK20A GPU. I've counted all of the cores and there's exactly the right number of them according to what Jen Hsun said at CES. Pretty awesome that you can see them all on the accurate die shot, and it's quite clear that they're all custom logic as well.

What's surprising is just how big they are, IMO. Knowing how big the die area is (the other estimates in the tech press have really been so close that I wonder if they all work for the foundry), I've been able to use a number of complex algorithms involving squares and other boxes to determine the GPU is 39.61mm2. That gives Tegra K1 some performance per square millimetre.

Then there's the CPUs. The green highlights are the main A15 array. NVIDIA have clearly spaced them out on the die for power reasons. Because they don't need to be connected together in any way, you can see large amounts of area between them. NVIDIA use the GK20A as the L2 cache, so there's none of that dedicated to the main array.

What's less obvious is the smaller A15 companion core. Shunning big.LITTLE in favour of their own BIGBIGBIGBIG.miniscule technology, the companion core on the K1 is even smaller than most A7 synthesis in the market today. I don't know how they do it.

It should then be readily apparent from the layout, and the connecting green things around the GK20A near the companion core, that the companion core is actually the cache controller for the GK20A cache for the main A15 array.

I'm still working on it and should have more information to share in a few months.
 
Why big? 39.61 / 192SPs = 0.206mm2 / SP

Where are the TMUs/ROPs, 64bit units (?) in that thing? (yes I'm completely clueless with die shots)....
 
ROPs and TMU? Everyone knows that the holy grail is to get rid of them. To do so while still staying fully compatible with the GK10x and GK110 and having same or better performance profile is an impressive achievement.
 
In fact it uses a mixed process, 28nm for all the chip but companion A15 is on 16nm (and actually identical to the four others elsewise).
It's a very special arrangement, nvidia and TSMC have a secret deal on sharing the cost of it, the latter beneficiates from early hands-on experience from deploying the process in a real world chip.

It does bring the yield down, but not too much as the chip area on 16nm is very small. (and the tech to align the wafer for a second etching was a bitch to develop, but it's working fine)
 
In fact it uses a mixed process, 28nm for all the chip but companion A15 is on 16nm (and actually identical to the four others elsewise).
It's a very special arrangement, nvidia and TSMC have a secret deal on sharing the cost of it, the latter beneficiates from early hands-on experience from deploying the process in a real world chip.

It does bring the yield down, but not too much as the chip area on 16nm is very small.

Wait a second: is that supposed to be some funky technical experiment (which 16nm anyway? 16FF isn't 16nm afaik) or what? Going through as much lenghts to save just a theoretical ~1.3mm2 doesn't sound all that convincing to me. If yes why not use the same hypothetical "smaller whatever" for the GPU also or any other bigger block of the SoC?
 
Wait a second: is that supposed to be some funky technical experiment (which 16nm anyway? 16FF isn't 16nm afaik) or what? Going through as much lenghts to save just a theoretical ~1.3mm2 doesn't sound all that convincing to me. If yes why not use the same hypothetical "smaller whatever" for the GPU also or any other bigger block of the SoC?
It's because of the specific power requirements of the companion core. Because it's really quite power hungry, they've spent the mixed 16FF transistors (feature size for 16FF is really 17nm) just on that. It's expensive, but NV have judged it to be worth it.
 
It's because of the specific power requirements of the companion core. Because it's really quite power hungry, they've spent the mixed 16FF transistors (feature size for 16FF is really 17nm) just on that. It's expensive, but NV have judged it to be worth it.

So why didn't they use 16FF for the GPU also? Yes before anyone says it it would had been DAMN expensive to go as far, but if they would had been able to shrink the GPU block down to say 13+mm2 (assuming all your data is accurate) the die area and power savings would had been huge. When you have a custom layout already....or am I just rambling bullshit?
 
That's probably happening in the Denver version, now they know it works on T124.
 
Back
Top