Right now the spec says if an h-entry contains a video, then it's a video post. In practice, there are often h-entrys with both a video and photo property, where the photo is intended to be the poster frame of the video. This also serves as a fallback for consumers that don't know about videos.
It'd be useful for the spec explicitly say how to handle poster images, and what to do when there is a photo property in addition to the video property.
The way I've been explaining it is:
if there is a single value for both the video and photo properties, the photo is a poster frame for the video