Semi Accurate's 4XX views

Status
Not open for further replies.
Should we ignore the best current source only because he like another brand than you? Nobody cares what are his personal preferences, but how reliable info can he get.
 
Nobody cares what are his personal preferences, but how reliable info can he get.

On the contrary, I think the personal preferences of his as well as those of other bloggers are relevant and interesting.
 
Should we ignore the best current source only because he like another brand than you? Nobody cares what are his personal preferences, but how reliable info can he get.
I agree, once you sort out his bias from actual information, there is no other better leaks source on Fermi (or NV issues, for that matter). Kyle from [H] also has juicy bits now and then, but usually later than Charlie.
 
I really would like to know, because I'm puzzled by statements like the one you just made. At first sight, they just don't make sense... You obviously have better insight. Please enlighten me.

Thanks.

Unfortunately no, and I could be quite wrong in both my terminology and understanding. I suppose it would have been better to say it's "more" reliant on shader work than Cypress?

Everything I'm reading from reviews to discussions points to the fact that EG has the same tesselation unit from top to bottom, while Fermi's tesselation will largely be dependant on how many shaders are available for tesselation.

Yes, I realize that lower end EG parts will still require work by the Hull and Domain shaders, however, the tesselation unit remains unchanged, which is contrary to what I'm reading with regards to Fermi.

So perhaps, there needs to be more clarification on this for both me and various reviewers.

Regards,
SB
 
Fermi's tessellation capability is scalable yes - it starts out at 16 units and scales down from there. Cypress has one unit and all derivatives also only have one unit. However, that difference has no bearing on one architecture being more reliant on shader work. What gave you that impression?
 
Given that Fermi can get somewhere in the range of 3-4 triangles a clock in very heavy tessellation, and Cypress is showing a throughput of 1 triangle every 3 cycles, it would need to be a very low-level Fermi derivative to show significant inferiority on that very specific part of the architecture.
 
Hemlock (2 R8xx tesselators) is at similar level as the GTX480 in many tessellation benchmarks. I expect half-Fermi could be comparable to any RV8xx GPU tessellation-wise (with the exception of RV810, of course)
 
Maybe it's finally time to get to the bottom of this...

1) It's obvious that GF100 is using the general shaders units for hull and domain shader. Are you saying it's doing the FF tessellator operations on the shaders too? Considering that it's a very specific kind of logic that doesn't map at all to multiple SIMD cores and that it's probably very small in area, that doesn't sound very likely, does it?

Thanks.

It shouldnt be a problem at all. Tesselation is used long time ago in rendering software without any hardware. Nvidia has a Direct 3D 10 SDK demo caled instanced tesselation. (http://developer.download.nvidia.com/SDK/10.5/direct3d/samples.html#InstancedTessellation).
They claim to have a 16 "polymorph engines" , one on each SM. And that with 4 raster engines. I think the FF tessellator operations could run on each SM emulated trough cuda without a problem.(and the whole polymorph engine too)

Why would u waste core logic for it 16*times in each SM anyway when u have 4 raster engines :?:. Maybe it would not be as fast as a FF tesselator but much more flexible and of course everything stays within a single SM.
 
Everything I'm reading from reviews to discussions points to the fact that EG has the same tesselation unit from top to bottom, while Fermi's tesselation will largely be dependant on how many shaders are available for tesselation.
That's very questionable, if most of the work of the tessellation pipeline is done in the domain shader. It is very important to make the distinction between the FF tessellator and the tessellation pipeline. The former is the same accross the whole AMD line. The latter is not.

Yes, I realize that lower end EG parts will still require work by the Hull and Domain shaders, however, the tesselation unit remains unchanged, which is contrary to what I'm reading with regards to Fermi.
In that case, the only fundamental difference would be if the FF tessellator is effectively in hardware or emulated on an SM in SW.

The computational effort to be done in the hull shader and (especially) the domain shader, will far outweigh what's done in the tessellator block. So even if the latter function is implemented on the shaders (which I highly doubt) it's impact on overall performance should be minimal.

Which gets me back to where I started: given that the overall tessellation architecture between Fermi and RV8xx is fundamentally the same, I don't see how some seem to come to the conclusion that tessellation on Fermi will somehow have a much bigger performance impact on overall shader performance (which is what, e.g. Charlie has been claiming and has been parroted to death in a number of forums.)
 
It is not at all obvious to me that this is the case.

Is it not true that the hull shader can create an insane higher amount of vertices for the output patch compared to the input patch? I'd say that for someone that takes a deeper look into the DX11 tessellation pipeline, those things are more than obvious.
 
It is not at all obvious to me that this is the case.
The tessellator needs to spit out (u,v) coordinates of a uniform triangle or quad in a 2D plane bounded by (1,1) based on a couple of input parameters. This sounds to me like a fairly simple job of some kind of lookup table/calculation based on the integer portion of the the input parameters and a simple accumulation loop. Once the increments have been calculated, it shouldn't be much more than two low precision adds per coordinate.

Compare this to the domain shader, where each coordinate needs to be transformed to a 3D coordinate on an arbitrary complex surface.

That said, thinking about it more now, the tessellator should be so simple to implement in HW that it's probably less than 0.1mm2. One would be crazy to waste SM resources on this...
 
Last edited by a moderator:
Is it not true that the hull shader can create an insane higher amount of vertices for the output patch compared to the input patch? I'd say that for someone that takes a deeper look into the DX11 tessellation pipeline, those things are more than obvious.

From somewhere I seem to remember the figure of about 8k tris which can be generated per patch. Not a 100 percent sure, though.
 
Thanks for the additional clarification silent_guy, think that gives me a better layman's understanding of what's going on.

Regards,
SB
 
Thanks for the additional clarification silent_guy, think that gives me a better layman's understanding of what's going on.
I could be very wrong about the simplicity of it all. ;)

But the main reason to think it could be small in terms of area is that this would be one of the rare units that requires only a handful of input parameters, just spits out coordinates and connectivity information about the how the coordinates connect to each other and doesn't need any latency hiding because there's nothing to read. I can't see anything in there that would require tons of logic. Hence the engineer's approximation of stamping an area of 0.1mm2 on there.
 
I'd say that for someone that takes a deeper look into the DX11 tessellation pipeline, those things are more than obvious.

I'll happily admit this though. My understanding of DX11's tess is less than adequate. :oops: Unfortunately, in the near future, I am too tied up to dig into this. :cry:

So I could very well be on the wrong track here.
 
I could be very wrong about the simplicity of it all. ;)

But the main reason to think it could be small in terms of area is that this would be one of the rare units that requires only a handful of input parameters, just spits out coordinates and connectivity information about the how the coordinates connect to each other and doesn't need any latency hiding because there's nothing to read. I can't see anything in there that would require tons of logic. Hence the engineer's approximation of stamping an area of 0.1mm2 on there.

I doesnt bring up the SM tesselation to bitch nvidia or to give credit charlie, but as a possible solution. Its completly irelevant how they made it. Each polymorph engine is bound to single SM. So each part with less SM-s will have less tesselation performance regardless of hardware or any other implentation. Thats the price you need to pay when u place it deeper in the pipeline. We will need to wait if it will have any impact on tesselation performance of the slower parts.
 
Last edited by a moderator:
I doesnt bring up the SM tesselation to bitch nvidia or to give credit charlie, but as a possible solution. Its completly irelevant how they made it. Each polymorph engine is bound to single SM. So each part with less SM-s will have less tesselation performance regardless of hardware or any other implentation. Thats the price you need to pay when u place it deeper in the pipeline. We ill need to wait if it will have any impact on tesselation performance of the slower parts.

http://www.hardware.fr/articles/787-7/dossier-nvidia-geforce-gtx-480-470.html

IMG0028392.gif


IMG0028393.gif
IMG0028392.gif

IMG0028393.gif


In the latter graph take the no culling results and consider 14SMs@607MHz vs. 15SMs@700MHz.

I don't know how reliable Damien's test here are, but at least I have some first results to debate than funky theories without a shred of documentation. I'm not referring to any miracles here, but if Damien's tests should be indicative (for which I stand open to be corrected), then there seems to be quite a bit of headroom while scaling down even to just a 4 SM variant. Below that I doubt DX11 is anything more worth than just a funky number on the box and that goes for both IHVs.
 
I doesnt bring up the SM tesselation to bitch nvidia or to give credit charlie, but as a possible solution. Its completly irelevant how they made it. Each polymorph engine is bound to single SM. So each part with less SM-s will have less tesselation performance regardless of hardware or any other implentation. Thats the price you need to pay when u place it deeper in the pipeline. We will need to wait if it will have any impact on tesselation performance of the slower parts.

Its not really deeper in the pipeline, both architectures are similar in that regard, its just that nV has more tessellation (polymorph) units. But over all, you still need serious amounts of shader power to do tessellation and displacement.

http://www.pcgameshardware.com/aid,...-benchmarks-and-graphics-comparison/Practice/

This is a 5870 at fairly low res, and minimal tessellation, what do you expect a lesser end variant to do?
 
Status
Not open for further replies.
Back
Top