What indication is inspiring this dedication to "not a pipeline"?
A pipeline is a conceptual "start->end" organization of processing throughput. The NV3x has those AFAICS. The rest of what you are describing are implementation details.
When you say it doesn't have pipelines, it doesn't make sense to me.
It does if you say it doesn't have a "traditional" pipeline, but "programmable pixel pipeline" already indicates that evolution as programmability increases.
It does if you say it has flexibility in pipeline organization, but that wouldn't be saying anything new regarding the NV3x. Going further and saying it is completely flexible in that ability also contradicts things having to be changed so the NV35 could (possibly) output 8 color output per clock, while the NV30 could not (and not even that is confirmed yet).
Once you are processing multiple data concepts, in whatever design, you seem to have pipelines. And things still have to be done in order, the pipeline is just a tool to hide undesirable effects of that, with the implementation determining the method.
Concerning ILDP, if your comments in that thread are related to what you are proposing: I'll mention two opinions I formed when reading through the PDF: 1) the ways in which its improvements are inapplicable as a GPU solution, unless you replicated them for parallelism (in which case they'd be pipeline
s) 2) they still talk about the ILDP design as a pipeline, and the benefits it offers are for interdependency optimization (i.e, not a substitute for parallelism, atleast for GPUs inherently parallel workload, but as a tool for enhancing the functionality of the parallelism) and for hardware implementation allowing higher clock speeds. EDIT Does seem interesting for both branching and looping evolution and the idea of shader output AA solutions, however.
In pipelines, you'd have to respect an order. For example, I think the R300 needs to do Texturing, and then Arithmetic ( or it could be the opposite ). That means if you do one texturing operation dependant on an arithmetic operation, you need two clock cycles.
In the case of the NV3x, however, I seem to remember that the order has no importance. My memory COULD be tricking me, however, so I might have to check on this.
Pipeline discussion.
What you are describing looks to me like a pipeline implementation decision to hide dependency latency, if true. It even makes sense for
this "component cascade" idea, if you implemented it towards the goal of more processing throughput per pixel (rather than more pixels, and higher efficiency of processing for each one).
Since the data dependency doesn't actually disappear, it would have to be hidden. What are you calling the conceptual execution structure that would hide that, if not a pipeline?