NVIDIA Tegra Architecture

Exophase · Jun 19, 2014

Ailuros said:
Or else IMG is simply lying that even "just" improved rounding support has a significant hw cost: http://blog.imgtec.com/powervr/powervr-gpu-the-mobile-architecture-for-compute

I as a layman understand under "significant" at least 10%. Now if my original estimate of at least +50% from 10.0 to 11.x should be in your book "minimal" additional area so be it.

Significant is subjective and nearly arbitrary - you shouldn't really understand it to mean anything in particular. I've hit points in optimizing software where 1-2% performance improvements are very significant to me.

silent_guy · Jun 19, 2014

Ailuros said:
And who says that hull & domain shaders and all the other logic you need for programmable tessellation are for free?

Rarely is anything for free. But in a unified shader architecture, the extra logic for the shaders is probably pretty small. The same way a geometry shader isn't very costly.

And the 'programmable tessellation' doesn't seem to be that programmable to me in that a texture or ROP unit is more programmable.

My understanding could be very wrong since it's based on just some explanations on gamedev etc., the tessellation specific hardware looked like just a block that accepts a few input parameters (internal tessellation factor and boundary tessellation factor) and out of that come a bunch of attributes that are used by a shader to do something useful. It doesn't feel like something that costs an insane amount of area.

(Check out this article: http://www.gamedev.net/page/resources/_/technical/directx-and-xna/d3d11-tessellation-in-depth-r3059 )

Ailuros · Jun 19, 2014

silent_guy said:
Rarely is anything for free. But in a unified shader architecture, the extra logic for the shaders is probably pretty small. The same way a geometry shader isn't very costly.

Then why isn't a Fermi or Kepler cluster as small as a G80 cluster?

And the 'programmable tessellation' doesn't seem to be that programmable to me in that a texture or ROP unit is more programmable.

The tessellation units can be as fixed function as TMUs or ROPs; programmability comes from the shaders themselves.

My understanding could be very wrong since it's based on just some explanations on gamedev etc., the tessellation specific hardware looked like just a block that accepts a few input parameters (internal tessellation factor and boundary tessellation factor) and out of that come a bunch of attributes that are used by a shader to do something useful. It doesn't feel like something that costs an insane amount of area.

(Check out this article: http://www.gamedev.net/page/resources/_/technical/directx-and-xna/d3d11-tessellation-in-depth-r3059 )

Especially NV's implementation since Fermi where units are distributed across clusters and connected via a highly complex interconnect sure. Of course does it all sound soooo simple and pretty trivial but when it comes to real hw implementations and millions of transistors flying around for each and every bit of it it's a totally different story.

Everything is so "miniscule" and "simple" that alone Fermi went from one single raster unit and a trisetup unit to 4 raster & 4 trisetup units, for which admittedly a raster unit accounted a single digit persentage before Fermi from the die estate but with Fermi it suddenly multiplied by 4 and that twice for raster and trisetups.

Ok that's a high end GPU implementation, but in order to get tessellation into a pipeline with N degree of geometry generation and N degree of programmability it isn't just a dull tessellation unit and some magic software wand to get the job done.

silent_guy · Jun 19, 2014

You are correct in pointing out that, starting with Fermi, Nvidia architectures dramatically changed the way in handles geometry.

IMHO, that, by itself, is a much better explanation for the area changes in those architecture than the tessellation by itself.

IOW, correlation vs causation may be a factor here.

Ailuros · Jun 19, 2014

silent_guy said:
You are correct in pointing out that, starting with Fermi, Nvidia architectures dramatically changed the way in handles geometry.

IMHO, that, by itself, is a much better explanation for the area changes in those architecture than the tessellation by itself.

IOW, correlation vs causation may be a factor here.

As with all things just getting the job done for meaningless paper specs is easy. Either you do something well or you better leave it be. High efficiency or a job well done takes time and efforts.

That said I'd love to see a synthetic benchmark attempt from anyone with a K1 preferably compared to a Kepler GPU.

silent_guy · Jun 19, 2014

Ailuros said:
As with all things just getting the job done for meaningless paper specs is easy. Either you do something well or you better leave it be.

Not sure how this statement fits in the current discussion.

My point is that the tessellation specific HW seems pretty simple and that the heavy lifting is done in the shaders. We already know the TK1 has standard Kepler shader units, so the heavy lifting is already taken care off.

If you have specific argument as to why the tessellation unit must have a major complexity and area impact, I'm more than happy to learn.

Rys · Jun 20, 2014

It really depends on the rest of the architecture as to whether it has major impact in implementation complexity. For any tiler there are issues (generating or removing geometry that crosses a tile boundary in the most obvious case) to solve that potentially affect multiple parts of the pipeline. For an IMR, adding the tessellator should have little impact if you already support stream out.

Definitely a solvable problem without significant area impact though, in all cases (IMHO and YMMV, I'm not a hardware architect).

Ailuros · Jun 20, 2014

silent_guy said:
Not sure how this statement fits in the current discussion.

My point is that the tessellation specific HW seems pretty simple and that the heavy lifting is done in the shaders. We already know the TK1 has standard Kepler shader units, so the heavy lifting is already taken care off.

No one ever said or implied that GK20A lacks anywhere in that regard to clarify that one. For your information the original SGX had a primitive processor integrated afaik (because when they conceived the architecture those type of units were still bouncing in and out DX10 specs) which actually supports what you're saying.

I was very clear for more than a couple of times that the hw overhead is NOT meant just for the tessellation unit but for the entire bundle of logic to get it work in the first place. And yes there could be a crappy implementation in theory where DX11 is barely supported but efficiency with tessellation just stinks. This is clearly NOT the case from what I've seen so far with GK20A or even Adreno420 therefore.

If you have specific argument as to why the tessellation unit must have a major complexity and area impact, I'm more than happy to learn.

It's not the tessellation unit per se that creates by itself the hw overhead. Am I in some sort of detention that I have to write it 100x times on the wall so that it finally fits in your head or what?

If I take a DX10 Rogue with 192SPs against the DX11.0 (feature level not API support) GK20A again with 192SPs, which of the two has the higher perf/mm2 and perf/mW for any OGL_ES3.1 material and why?

Ailuros · Jun 20, 2014

Rys said:
It really depends on the rest of the architecture as to whether it has major impact in implementation complexity. For any tiler there are issues (generating or removing geometry that crosses a tile boundary in the most obvious case) to solve that potentially affect multiple parts of the pipeline. For an IMR, adding the tessellator should have little impact if you already support stream out.

Definitely a solvable problem without significant area impact though, in all cases (IMHO and YMMV, I'm not a hardware architect).

Aren't tilers in general buffering a lot more when it comes to any sort of geometry than IMRs?

ams · Jun 20, 2014

Looks like NVIDIA will be coming out with an 8" tablet [ST8 = Shield Tablet 8?] this year with TK1 inside: https://apps.fcc.gov/oetcf/eas/repo...ame=N&application_id=289458&fcc_id=VOB-P1761W

Dimensions are 221mm x 126mm x 8mm (LxWxH), weight is 350g, and battery capacity is 5200mAh.

For reference, the Xiaomi Mi Pad dimensions are 202mm x 135mm x 8.5mm (LxWxH), weight is 360g, and battery capacity is 6700mAh.

Presumably the screen resolution will be higher on Mi Pad than ST8, which explains why ST8 has significantly less battery capacity.

Wynix · Jun 20, 2014

ams said:
Looks like NVIDIA will be coming out with an 8" tablet [ST8 = Shield Tablet 8?] this year with TK1 inside: https://apps.fcc.gov/oetcf/eas/repo...ame=N&application_id=289458&fcc_id=VOB-P1761W

Dimensions are 221mm x 126mm x 8mm (LxWxH), weight is 350g, and battery capacity is 5200mAh.

For reference, the Xiaomi Mi Pad dimensions are 202mm x 135mm x 8.5mm (LxWxH), weight is 360g, and battery capacity is 6700mAh.

Presumably the screen resolution will be higher on Mi Pad than ST8, which explains why ST8 has significantly less battery capacity.

Going by width I'd guess it's either 1920x1200p or 1920x1080p.

Alexko · Jun 20, 2014

Wynix said:
Going by width I'd guess it's either 1920x1200p or 1920x1080p.

From the big, 144-page document:

"The EUT measures approximately 218 mm (L) x 123 mm (W) x 8 mm (H) and weighs approximately 350 g. "

218/123 = 1.772357724
16/9 = 1.777777778

Of course, the 218×123 dimensions include the bezels, so we can't know for sure, but this looks like a 16:9 device to me; so it's probably 1080p.

wco81 · Jun 20, 2014

Would they be continuing to put out their own tablets if they're successful in selling to OEMs?

Rurouni · Jun 20, 2014

If the design is following the current trend of small tablet (8" or lower), then the bezel on the sides should be thin and the top/bottom should be thick. If this is the case, then it might be a 16:10 screen.

AlNom · Jun 20, 2014

Keep it to PMS folks.

Helmore · Jun 20, 2014

silent_guy · Jun 20, 2014

Helmore said:
PMS?

As very appropriate choice of word in this case!

Though I think he meant private message.

Helmore · Jun 20, 2014

Oh, I know

.

A1xLLcqAgt0qc2RyMz0y · Jun 20, 2014

AlNets said:
Keep it to PMS folks.

Keep WHAT?

Wynix · Jun 20, 2014

Helmore said:
PMS?

I believe he meant Perpetual Motion Squad:

NVIDIA Tegra Architecture

Exophase

silent_guy

Ailuros

Epsilon plus three

silent_guy

Ailuros

Epsilon plus three

silent_guy

Rys

Graphics @ AMD

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

ams

Wynix

Alexko

wco81

Rurouni

AlNom

Moderator

Helmore

silent_guy

Helmore

A1xLLcqAgt0qc2RyMz0y

Wynix

Similar threads