I've replied to your points in various places, so let me collate them here:
I'll note that my point in addressing your comments is that pointing out that "it doesn't matter given the rest of the nv30 design" is something I would agree with when the pipeline count is indicative of either the actual pixel pipeline count or the performance of the rest of the nv30 design. It only has to do one for the pipeline count to be justified in my mind. That is, if the "proxel pipeline" count was 8 effictively (the hope of those expecting very significant performance gains in the future for the nv30), and the pixel pipeline count was 4, I might still be proposing "proxel" to avoid confusion, but I wouldn't be finding fault with nvidia for calling it an 8 pipeline part (atleast, I wouldn't still be finding fault with them). That's what I meant by my comments about if the nv35 had 8x1 pixel pipelines, but still had insufficient "proxel" pipelines to match, I wouldn't be criticizing nvidia for calling it 8x1, though I would likely be criticizing the nv35, in that case, for being an underperforming part for executing shaders.
Oh, and to answer another question, I'd say the point of terms such as those the Xabre uses is to be a substitute for actual specs. As I said, just because some people don't read the actual specs doesn't mean it is OK to put anything there you want to (but we agree on that in any case). BTW, I feel I must again point out there are people smart enough to recognize such terms as pure BS, even without fully understanding the (what is supposed to be) factual specs elsewhere, and are able to critically compare the figures presented in the specs. Heck, that's the kind of person I was when I was beginning to learn about computers, and would be still if I hadn't had the time and desire to progress further.
- Addressing...
Dave H said:...shixel...
I have this response (to the empty space next to someone I'm not talking to... ).
- Addressing...
Dave H said:...
And in general I'm afraid I don't think "N shixel pipelines" is ever a useful term without discussing what constitutes each pipeline. Again, with fixed-function pipelines, the only relevant constituent has been the number of TMUs. (And the amount of filtering they can do. And, for MSAA, the number of z-units. And maybe something else I'm forgetting.) But with shixel pipelines, you also need to talk about the ALU resources in each pipeline as well. And if you want to be accurate, you need to talk about them in some detail. I'm afraid that, for PS 2.0 type shaders at least, counting up the number of pipelines will have nearly zero bearing on overall performance, as the only thing it really limits is the number of pixels issued or retired on any given clock, and these limits are never going to be a limiting factor. (Well it also limits texturing efficiency to the degree that you're reading e.g. an odd number of textures with a 4x2 vs. an 8x1 organization. But we already have 4x2 and 8x1 as terms.)
It's sort of like...it's sort of like we're trying to measure the area of a rectangle and you're just telling me the width.
...
I share my thoughts in these comments. Also, addressing your edge triangle comments...
demalion said:...
Note also that it is my understanding that the R300 acts like a 8 proxel pipeline lock-step processor...I think it would be a matter of the organization of the pixels being rendered (4x2? 2x4? I could swear someone actually said sometime) and statistics whether it ever performed lower than the "4x2" as you are calling it for non-branching shaders.
I'll point out further, that it seems to me the difference is more an issue of how the pixel coverage is organized. The R300 processes more pixels, and will not statistically be worse than the nv30 (unless it is organized 8 across and 1 down, I think...but it is 4 by 2 AFAIK), but since it is (my assumption, and my term, perhaps there is a better one) lock step (which could be ameliorated as another benefit of conditional pixel shading with arbitrary length, methinks) it could statistically offer worse waste. That doesn't make 4 pipelines have an (overall) advantage compared to 8 at all, it just makes it more efficient. "1x1" is perfectly efficient for dispatching shader execution, but is it superior to other organizations?
I do think (in my complete lack of hardware designing experience in the field ) the best case would be to have independent "1x1" shader handling, replicated however many times and in whatever organization desired, or served whatever limitations existed in the execution of the design, and I had thought at the beginning the nv30 might offer that, but unfortunately it doesn't seem to, atleast with current drivers (anyone want to run some pixel shaded line benchmarks? ). I'm not sure if that is planned for the nv40, though they seem to be heading in that direction, and I actively doubt nv35 would offer it, and the only problem with this for nvidia is that ATI has demonstrated the ability to reinvent more thoroughly in century revisions than nvidia has in their decade revisions. Even then, it would likely take very specific circumstances to show a benefit for that, and I'm not sure (not knowing how much complexity it would add) if it is worthwhile unless it can be utilized to facilitate other improvements at the same time.
Hmm...come to think of it, has anyone completely tested to eliminate the possibility of independent pixel shading assignment for the 4 pipelines, or proven/disproven that the nv30 offers this advantage with current benchmarks?
- Addressing...
Dave H said:...[Wait a sec: just to be sure, by "proxel"(/"shixel") we are counting the number of fragments shaded in parallel, not the number of ops applied per clock, right? Because now that I think about it, the analogy with "texel" would probably suggest the latter definition. Anyways, it still doesn't capture all the necessary constraints.] ...
I point you to my proxel explanation again:
demalion said:...The basic idea behind it is to have a "proxel pipeline" relate to "proxel fillrate" as "pixel pipelines" used to be related to pixel fillrate. It is in the "fine" tradition of 'texels" and, more recently, "zixels", to offer more opportunities to accurately portray strengths and weaknesses of a design.
This could allow 4x? notations that actual made sense for shading (see below), but until people got over laziness to perform multiplication themselves, it would most likely be best to simply count the pipelines or focus on proxel fillrate. The nv30 would still have problems with architecture expressed in proxel pipelines (they couldn't playas fast and loose with the definition of proxel), but that's where proxel fillrate (similar to texel fillrate) comes in. Also, a valid case could be made for calling the nv30 "8x0.5", but simply calling it 8 pipelines would be inaccurate.
The proposed measurements were minimum proxel fillrate, which illustrates worst case behavior (for the nv30 this would be fp32/texturing, I think), maximum proxel fillrate (for nv30 would be intermixed integer and fp16 with no texturing access), which indicates best case (where actual calculations occur...again see my mention of completeness and think of the z only shader performance figures), and a standardized measurement (which would likely resemble pocektmoon's benchmark testing examples), which would let NV30 (and hopefully more significantly NV35) optimizations and R300 >1 op per clock circumstances to be represented in a real-world usage related way.
...
The notations I'm discussing are proxel pipeline notations. Note that the standardized proxel measurment figure I mention might facilitate a notation, but that I think it would be too confusing with other factors affecting execution, so the point of "proxel" is to be used only for "proxel pipeline count" and for the "proxel fill rate" specifications I list. Quite simply, the variations of calculation performance possible don't allow the concept to be successfully simplifed in the full "XxY" notation as for pixel pipelines. While less ubiquotous for its purpose than "pixel", those stipulations allow it to retain the simplicity of comparison...more points and parallels to the pixel/texel relationship are offered in that post.
- Addressing...
Dave H said:...
But if we combine the fact that I don't want to talk about shader performance because there are too many other variables with the fact that I only want to talk about performance hits that actually mean something, I think you end up almost necessarily in a situation with trilinear and/or AF. Unless you turn on MSAA or something, but in that case you're going to be bandwidth limited, so no point in blaming anything on low fillrate!
...
I already addressed that in isolation here, if my understanding of both these comments, and your prior comments, are accurate.
I'll note that my point in addressing your comments is that pointing out that "it doesn't matter given the rest of the nv30 design" is something I would agree with when the pipeline count is indicative of either the actual pixel pipeline count or the performance of the rest of the nv30 design. It only has to do one for the pipeline count to be justified in my mind. That is, if the "proxel pipeline" count was 8 effictively (the hope of those expecting very significant performance gains in the future for the nv30), and the pixel pipeline count was 4, I might still be proposing "proxel" to avoid confusion, but I wouldn't be finding fault with nvidia for calling it an 8 pipeline part (atleast, I wouldn't still be finding fault with them). That's what I meant by my comments about if the nv35 had 8x1 pixel pipelines, but still had insufficient "proxel" pipelines to match, I wouldn't be criticizing nvidia for calling it 8x1, though I would likely be criticizing the nv35, in that case, for being an underperforming part for executing shaders.
Oh, and to answer another question, I'd say the point of terms such as those the Xabre uses is to be a substitute for actual specs. As I said, just because some people don't read the actual specs doesn't mean it is OK to put anything there you want to (but we agree on that in any case). BTW, I feel I must again point out there are people smart enough to recognize such terms as pure BS, even without fully understanding the (what is supposed to be) factual specs elsewhere, and are able to critically compare the figures presented in the specs. Heck, that's the kind of person I was when I was beginning to learn about computers, and would be still if I hadn't had the time and desire to progress further.