Well, what I'm saying here is that if you want to do something like this in general, you'll destroy most of what makes a GPU a GPU.
The only sort of thing you could possibly hope for in this avenue is a set of limited, pre-set neighboring-pixel accesses, but since current hardware acts on quads, well, you can pretty much only hope to operate on one quad at a time at most. Regardless, I have rather strong doubts that you could find an algorithm that desires data from nearby pixels, but doesn't just depend upon the partial derivatives of the data at the on-site pixel.
Note that there is always a workaround to this:
You could calculate within the current pixel what the value of a specific register in a neighboring pixel should be. On the whole, this would probably be more efficient, even though it wastes quite a bit of processing, since it doesn't destroy parallelism.
And lastly I'd like to comment directly on your limited destruction of parallelism by only, say, reading pixels in the x direction: this sort of limitation doesn't help unless the hardware developers know you would have wanted to do such a thing. In other words, you'd need this to be one of a library of accepted neighboring pixel reads, but, as I said, since hardware works on quads, well, I just don't think it's feasible.