Consider an arm. Ideally, you can think of it as a three points with lines between them. These lines can be represented in the simplest case by two cylinders model (so that the arm can bend) with two separate transform matrixes. However, at the joints, these cylinders leave cracks. In order to cover these cracks, you could do the simplest method of having a third 'elbow' model that bridges the gap. This elbow model would likely need to be created on the fly to match the bend in the arm.
Or, with vertex skinning, you could have a single model of the arm that is controlled by two or more separate transform matrices plus a 'skinning' array which tells the vertex processor how much each matrix affects a particular vertex.
In the case of the arm, the upper arm vertexes would be controlled entirely by the 'upper arm' transform matrix; the lower arm entirely by the 'lower arm' transform matrix; and the vertexes around the bent elbow would be controlled by a blending of the two.
In practical terms, it makes modelling 'easier', and rendering more efficient because it removes some state changes.
The DX7 class vertex skinning was given a bit of bad press because it only supported the blending of two matrices, which was deemed by some as "not enough". I believe DX8 and up support a somewhat arbitrary number (7?) which should make modelling much easier.
Of course, I might be wildly wrong on this, but I'll trust some other know it all to let us know if that's the case.
p.s. tanks probably don't need skinning, since there's no 'deformable' parts in them. Robots would still likely need some sort of something at the joints (assuming hominoid robots prevalent in classical Anime).