Huh? Did you read beyond the first sentence?
Make a grid that's 1024x1024 vertices. You have a heightmap that's 1024x1024 pixels. That texture fits perfectly to the grid, right? Now have an LOD system that will tesselate further. Now, let's say you have a grid that has 4x the vertex density, maximum. Take a 256x256 detail heightmap (representing rocky areas, I suppose) and apply it on a 1:1 ratio. If you used point sampling, the result would be full of steep angles when it should just be interpolated.
Here's another example (that probably makes more sense):
If you just take a heighmap and simply use each pixel as the height for a vertex, you can get some very poor terrain. A cross-section for an area with a slope of 0.5 will look like this:
Ugly, eh?
The alternative is to use a grid that's one larger on each dimension than the heightmap. Thus, for a 1024x1024 heightmap you have a 1025x1025 vertex terrain (the dimensions of which work out to 1024x1024). To get it to look right, without the stairsteps, is to sample heights from the corners of each texel, rather than the centre. If you use point-sampline, you have to do the math manually in the vertex shader and sample up to four texels. With linear texture filtering, it's automatic. The result for the a 0.5 slope will be a smooth line.
EDIT: Of course, this is only the case for an 8-bit heightmap. A heightmap that is generated in floating-point precision and stored in float-point precision will not suffer the same way.