The tightest G-buffer layout is 12 bytes per pixel. That is two ARGB8 render targets plus a 32 bit per pixel depth+stencil buffer.
You store albedo.rgb and roughness to the first ARGB8 render target, and specular (grayscale) and normal.xyz to the second render target. You can store the normal with only two channels if you use some encoding scheme, but 8+8 bits provides slightly too bad quality for my taste. This way you free one 8 bit channel for some other use.
Chroma subsampling also frees one extra 8 bit channel per pixel (Cr and Cb are stored for every other pixel). We don't use this, because we have a third g-buffer layer on next gen consoles (total of 16 bytes per pixel).
In our case (Trials Evolution & Trials Fusion) we use 10-10-10-2 format (instead of ARGB8) as the second render target format. This gives 10+10 bits for the normal vector (four times the precision compared to 8 bit). We encode our normals with Lambert azimuthal equal-area projection. It's costs only a few ALU instructions to encode and decode. Roughness is stored to the third component with 10 bits precision (we use 2^x roughness for our physically based formula, so all the extra precision is welcome). The remainig 2 bit channel is used to store the selected lighting formula (four different formulas are supported).
If you use traditional (pixel shader based) deferred rendering. You also need to have a (HDR) lighting buffer in the memory at the same time as the g-buffer, as the lighting shader reads the g-buffer and writes to the lighting buffer. This consumes extra 8 bytes per pixel (ARGB16F). However with modern compute shader based lighting, you can do the lighting "in-place", meaning that you first read the g-buffer to the GPU local memory (LDS), do the lighting there, and output the result on top of the existing g-buffer. This way you can do pretty robust deferred rendering with just 16 bytes per pixel (eight 8 bit channels, three 10 bit channels, one 2 bit channel, 24 bit depth, 8 bit stencil).
You store albedo.rgb and roughness to the first ARGB8 render target, and specular (grayscale) and normal.xyz to the second render target. You can store the normal with only two channels if you use some encoding scheme, but 8+8 bits provides slightly too bad quality for my taste. This way you free one 8 bit channel for some other use.
Chroma subsampling also frees one extra 8 bit channel per pixel (Cr and Cb are stored for every other pixel). We don't use this, because we have a third g-buffer layer on next gen consoles (total of 16 bytes per pixel).
In our case (Trials Evolution & Trials Fusion) we use 10-10-10-2 format (instead of ARGB8) as the second render target format. This gives 10+10 bits for the normal vector (four times the precision compared to 8 bit). We encode our normals with Lambert azimuthal equal-area projection. It's costs only a few ALU instructions to encode and decode. Roughness is stored to the third component with 10 bits precision (we use 2^x roughness for our physically based formula, so all the extra precision is welcome). The remainig 2 bit channel is used to store the selected lighting formula (four different formulas are supported).
If you use traditional (pixel shader based) deferred rendering. You also need to have a (HDR) lighting buffer in the memory at the same time as the g-buffer, as the lighting shader reads the g-buffer and writes to the lighting buffer. This consumes extra 8 bytes per pixel (ARGB16F). However with modern compute shader based lighting, you can do the lighting "in-place", meaning that you first read the g-buffer to the GPU local memory (LDS), do the lighting there, and output the result on top of the existing g-buffer. This way you can do pretty robust deferred rendering with just 16 bytes per pixel (eight 8 bit channels, three 10 bit channels, one 2 bit channel, 24 bit depth, 8 bit stencil).