Motion models

The key function of PatchMaker is estimating motion of an image region selected by the user. Motion is analyzed and described as a parametric transformations of one of the model types supported by PatchMaker. In this section, the merits of using a particular motion model in each particular case are discussed.

Theoretically, a parametric model can be applied to describe the region motion if:

  1. The given region corresponds to a plain surface in space, or

  2. The region "covers" different objects at different depths relative to the camera but the camera does not move with respect to these objects during the episode.

In most other cases, object image will evolve in a way that cannot be described exactly by a relatively simple parametric model. Even so, certain models can yield a good enough fit to the real motion.

The use of parametric motion models allows PatchMaker to accurately determine object motion by taking into account the displacements of all its pixels. The following motion models are supported:

  • Translation — a two-parameter model. The parameters are the displacements in the horizontal and vertical directions;

  • Rigid — a three-parameter model. The displacements along the two axes and the rotation angle are estimated;

  • Rigid & Scale — a four-parameter model. The displacement vector, the rotation angle, and the scaling factor are determined;

  • Affine — a six-parameter model. The displacement vector, the rotation angle, the two scaling factors along two perpendicular axes, and their orientation angle are computed.

The motion model to be used is a property of the selected segment:

The default is the affine motion model. It is worth remembering, however, that, when the object motion can be accurately described by a simpler model, the use of a more complex model might lead to inferior results, especially if the object is small or has few textured details.

If the object's perspective does not change

The example below demonstrates shows how the Rigid & Scale model can be applied when the aspect of the moving object does not change. Suppose, we have two consecutive frame and the task is to track the plane motion in order to place a number on the plane body.

Let us shift, rotate, and scale the first frame to match the image of the plane on the second frame. If we simply copy the first frame plane contour to the second frame, we get:

By shifting the first frame by 13 pixels left and 3 pixels down, we get:

The matching can be improved if we now rotate the aircraft image by a small angle counterclockwise about its "mass center":

Finally, shrinking the yellow contour a little bit towards the "mass center" yields an almost exact matching:

This is roughly what PatchMakerdoes as it tries to find a match for the key frame mask in another frame. The basic transforms that we applied combine to yield the four parameter values of the Rigid & Scale transform. Finally, the overlay available for the key frame is warped using this transform and pasted onto the next frame.

When An affine model is indispensable

In the following example, the aspect of the object changes during the episode — the car first moves towards the camera and then turns right revealing its side:

As you can see, to match the car front panels in these two frames requires different scaling along the horizontal and vertical axes. This is possible only in the affine motion model.

Unstability of tracking with excessive parameters

Suppose we have to follow the motion of a low low-contrast or noisy region as in the example below:

If an overlay fails to move as it should (correctly and stably) under the affine model, it can be because of low "information density" of the tracked region. Indeed, to estimate six affine parameters, the mask region must have at least three salient gradients in different directions or three distinguishable spots. In such cases, a model with a smaller number of parameters (Translation, Rigid, Rigid & Scale) can yield better results.

Motion models in PatchMaker do not cover all types of transformations of a plain surface

In the video clip below, the task was to "color" the fence. However, PatchMaker (version 1.0) failed to do this. The most complex motion that PatchMaker can currently handle is Affine motion represented by affine transformations in the image plane. Such transformations transform parallel lines into parallel lines. This condition is not satisfied in the two frames below

The right transform to be used here is an eight-parameter perspective transform, but PatchMaker version 1.0 does not support it.

Exact MOTION Model equations

The overlay color at pixel (xk, yk) of frame #k retains the color C(x0, y0) of the pixel (x0, y0) in the match frame (the frame #0), where (xk, yk) and (x0, y0) are related by one of the following transformations:

  • The Translation model:

    ,

    where tx , ty  are the displacement parameters;

  • The Rigid model:

    where is the rotation angle (in radians);

  • The Rigid & Scale model:

    where r is the scaling factor;

  • The Affine model:

    where a, b, c, d, tx , ty are the coefficients of an affine transform.