First, you need to remember that in FP each possible value is not equally spaced. The mantissa is 11 bits (and i believe there's one implied bit), so if you address 4096 texels (in any direction, not total texels like you thought) without filtering, each texel should come out pretty much fine, although small errors could make each texel come out differently sized. Remember that this includes repeating, e.g. a 256x256 texture repeated 16 times, or a 4096x4096 texture with no repetition
For filtering, it would make sense to consider the worst case scenario. Lets say one texel is black, and one next to it is white. Given that 32-bit color is generally enough to make a gradient without banding, you need 8 bits more precision in your texture coordinate data. This is only really true, however, if you are close enough to a texture that two texels are 256 pixels apart on the screen, which is some pretty serious magnification.
In the end there is no definitive answer, just slowly degrading picture quality with lower precision. To further complicate matters, most hardware probably does filtering calculations at higher precision, even if there is lower precision data at the vertices. For this reason you'd probably only need a few (maybe 4) extra bits so that you don't have any local shifting of the texels. However, this wouldn't work for dependent texture lookups (e.g. reflective water).