Even that isn't sufficient for a "perfect" implementation. You still wouldn't be able to handle cases where features that are occluded in your current frame, but weren't occluded for the entirety of the shutter period. As an example, take this image of real-world motion blur from a camera:
If you look at the horse's legs, you'll notice that you can see the green grass through them even though they're opaque. This is because the grass was visible for some portion of the time that the shutter was open, and was only occluded for a part of that time.
Now with your typical 2D motion blur that's used in games, you would have a big problem in this scenario. Every frame is basically an instantaneously sample of the scene from an infinitesimally small period of time, as if you used a camera with an insanely high shutter speed. So in the case of the horse you would get an image with no motion blur, where the legs are occluding the area of grass behind them based on where they were posed at the current simulation state. When it comes time for the 2D motion blur to do its thing, the shader will typically look at the velocity (and probably neighboring velocities) and try to gather a whole bunch of samples to simulate the partial coverage that naturally happens from having a non-instantaneous shutter period. This works okay when the entire neighborhood is moving in the same direction, or for areas where the moving elements have "spread" onto nearby static elements. In both of those cases, gathering your neighbors along the direction velocity works well enough for approximating the real motion-blurred result. However it fails for the partial occlusion case that you have on the legs, since your 2D image simply doesn't enough information to tell you what as in the occluded area. You also have a similar situation when a non-moving foreground element (such as the right horse's head) occluding a moving background element: you need to know the contents of the occluded background so that you can integrate it with the visible neighboring areas.
In general, games will try to deal with this by re-using neighboring pixels to "make up" the contents of the partially occluded elements. So in the case of the horse legs, it might actually work okay since the grass is pretty uniform in color, and so you probably wouldn't notice that the post-process sampled the neighboring grass color instead of sampling the occluded grass. But if the scene is more complex the error may be more jarring. It's also worth pointing out that defocus blur (depth of field) has almost the exact same problem, except the partial occlusion happens due to the lens aperture instead of a shutter.
The other problem you can run into with 2D motion blur is errors from complex motion. Most games store a per-pixel velocity that's computed by taking the different between the pixel's current screen position and it's screen position from the previous frame. This essentially assumes that the surface being sampled for that pixel moved in a straight line between the previous frame in the current frame. In some cases this might be true, in other cases it's definitely not. The classic example is a spinning wheel: a typical motion blur algorithm will end blurring along a straight line based on a pixel's instantaneous velocity, but in real-life the motion blur will follow the curve of the wheel.