At the most basic level, there is a fixed sized per pixel in the surface/texture. The usual is a 4 tuple of 8 bit integer data (RGBA8), but increasingly common is 4 FP16 (16 bit floating point).
Thats its for a simple texture (in practise we tend to use block compression techniques so for textures its not actually stored like that).
Thats also all thats needed for a render target/surface but in practise there is more. The first is a depth buffer, technically a separate surface its usual for render targets to have one, these usually store an extra 24 bits of depth (float or integer) and an 8 bit stencil values (used a per pixel counters or to exclude/include operations from each pixel).
Then you get hardware specific data used to accelerate rendering, this includes colour and z compression data (used to reduce bandwidth when using MSAAA) and z/stencil acceleration data (hierarchal z etc.)