Nothing really "happens" on GCN hardware. Simplified: When you "bind" stuff on CPU side, the driver puts a "pointer" (resource descriptor) to an array. Later when a wave starts running in a CU, it will issue (scalar) instructions to load this array from the memory to scalar registers. Buffer load / texture sample instruction takes a resource descriptor (scalar register(s)) and 64 offsets/UVs as input and returns 64 results (filtered texels or loaded values from a buffer). Texture sampling has higher latency than buffer loads, as the texture filtering hardware is further away from the execution units (buffer loads have low latency as those get the data directly from the CU L1 cache).
Thanks but I didn't quite follow that. Are you describing what happens currently or in a bindless scenario? For bindless I understand the shader gets a descriptor list and textures are loaded on the fly.
I was asking about the current situation with limits on maximum textures bindable in a shader. I thought those limits were initially determined by the number of available physical texture mapping units on chip. I'm trying to understand why the number of physical units is relevant.