Active Shape Model

ALL INFORMATION FOLLOWING IS SUMMARIZED FROM THE ACADEMICA PAPER FROM https://link.springer.com/chapter/10.1007/978-3-642-54851-2_1


1. labeling trainning set

Supposing we have labeled each of the N N N images in the training set with n n n landmarks. Now we get a landmark set to represent the shape. For the i t h i^{th} ith image, we denote the j t h j^{th} jth landmark coordinate point by ( x i j , y i j ) ( x^{ij} , y^{ij} ) (xij,yij). And the 2 n 2n 2n element vector describing the n n n point of the i t h i^{th} ith image can be written as
X i = [ x i 0 , y i 0 , x i 1 , y i 1 , … , x i n − 1 , y i n − 1 ] T X_{i} = \begin{bmatrix} x_{i0}, y_{i0}, x_{i1}, y_{i1}, \dots, x_{in-1}, y_{in-1} \end{bmatrix}^{T} Xi=[xi0,yi0,xi1,yi1,,xin1,yin1]T
where 1 ≤ i ≤ N 1 \le i \le N 1iN . As a result, we generate N N N such representative vectors from N N N training images. Before carrying out statistical analysis on those vectors, we should ensure that the shapes represented are in the same coordinate frame.

2. aligning tranning set

As mentioned above, in order to study the statistical characteristics of the coordinates of the landmark points, it is important that the shapes represented should be in a common coordinate frame. To achieve this, all the shapes must be aligned to each other to remove variation that could affect the statistical analysis result. The aligning can be done by scaling, rotating and translating the shapes of the training set to meet the requirement. The aim of alignment is to minimize a weighted sum of squares of distances between equivalent landmarks on different images.

Procrustes analysis is the most popular method to align shapes into the same coordinate.

A weighted diagonal matrix W is used to give points that are more stable over the set more significance. Stable point is defined as having less movements respect to other points in a shape.

The following algorithm is an easy handling iterative approach to align the set of N N N shapes to each other.

a. Align each shape to one of the shapes, for instance, the first one or the mean shape.

b. Calculate the mean shape from the aligned shape.

c. Normalize the pose of the current mean shape.

d. Realign the each shape with the normalized mean.

f. If converged, stop the process, or else go to step 2 and repeat.

The meaning of the normalization is

a. Scale the shape so that the distance between two points becomes a certain constant.

b. Rotate the shape so that the line joining two pre-specified landmarks is directed in a certain direction.

c. Translate the shape so that it becomes centered at a certain coordinate.

3. Statistical Model of Variation — PCA

After alignment, N N N sets of vector X i X_{i} Xi representing the images of training set now contain the new coordinate. These vectors form a distribution from which we can generate new examples similar to the original ones. Also, we can determine clearly whether a new shape produced by the model during search procedure is allowable or acceptable.

b = P − 1 ( X − X ˉ ) b=P^{-1}(X-\bar{X}) b=P1(XXˉ)
where b b b is derived from the eigenvalues. By changing the elements of b b b , we can vary the shape.

But arbitrary b b b could result in an unallowable shape. To avoid the problem, we can impose constrains on b b b . By giving limits, we can control the changing of shape in order to generate plausible shapes. Since the variance of b b b of the training set has a relationship with the eigenvalue λ \lambda λ . The typical limits are
− 3 λ i ≤ b i ≤ 3 λ i -3\sqrt{\lambda_{i}} \le b_{i} \le 3\sqrt{\lambda_{i}} 3λi bi3λi
where λ i \lambda_{i} λi is the eigenvalue corresponding to the i t h i^{th} ith eigenvector.

4. Modeling Local Structure – Template Matching

It is not enough to obtain only the statistical characteristics of the position of landmarks. In order to find desired movement and make a good estimation of model position during the image search and classification procedure, besides shape information, a model containing gray-level information of the images in the training set should also be established. The core idea of the gray-level information modeling method is to collect pixels around each landmark and try to put the pixels’ gray information in a compact form so we can use it for image search.

For each landmark, we can sample k k k pixels on either side of the landmark along a profile (the line passing through the landmark and perpendicular to the line connecting the landmark and its neighbors). Then we obtain a gray-level profile of 2 k + 1 2k+1 2k+1 (include the landmark itself) length. We describe it by a vector g g g. To reduce the effect of global intensity changes, we do not use the actual vector g g g but use the normalized derivative instead. It reflects the change of gray-level along the profile.

The gray-level profile of the j t h j^{th} jth landmark in the i t h i^{th} ith image is a vector of 2 k + 1 2k+1 2k+1 element
g i j = [ g i j 0 , g i j 1 , … , g i j ( 2 k ) , g i j ( 2 k + 1 ) ] T g_{ij} = \begin{bmatrix} g_{ij0}, g_{ij1}, \dots, g_{ij(2k)}, g_{ij(2k+1)} \end{bmatrix}^{T} gij=[gij0,gij1,,gij(2k),gij(2k+1)]T
Then, building statistical models (PCA) of the gray-level profile for all the landmarks in the training images.

5. Multi-resolution Active Shape Model

Based on the analysis above, the algorithm is implemented in a multi-resolution approach, which is involved in searching firstly in a coarse image for remote points with large jumps and refining the location in a series of finer resolution images by limiting the jump to only close points.

Each training image has its own pyramid.

During the training, we build statistical models of gray-level along profile through each landmark, at each level of the image pyramid.

During the search, we start our searching from the top level of the pyramid. In each image, the initial search position of the model is the search output of its upper level, until the lowest level is reached.

6. Image Search Using Active Shape Model

Now, based on the model, we will locate a new example of the object in an image. The idea is: first giving the model an initial position through some prior knowledge; second, we examine landmarks and their neighbors along gray-level profile to find better location of landmarks; third, we update the pose and shape parameters with suitable constraints and move the model to the new location and produce a plausible shape. The fact that the shapes are modeled so that they can only vary in a controllable way by constraining the weights of the principal components explains why such model is named Active Shape model or ASM.

We assume that an instance of an object is described as the mean shape obtained from the training set plus a weighted sum of the t principal components, with the possibility of this sum being scaled, rotated and translated.

6.1 Initial Estimate

We can express the estimate X i X_{i} Xi of the shape as a scaled, rotated and translated version of a shape X l X_{l} Xl .
X i = M ( s i , θ i ) [ X l ] + t i X_{i} = M(s_{i}, \theta_{i})[X_{l}]+t_{i} Xi=M(si,θi)[Xl]+ti
Where s i s^{i} si , θ i \theta_{i} θi and t i t_{i} ti is respectively scale, rotation and translation parameters. t = [ t x i , t y i , t x i , t y i , … , t x i , t y i ] T t =[t_{xi} ,t_{yi} ,t_{xi} ,t_{yi} , \dots , t_{xi} ,t_{yi} ]^{T} t=[txi,tyi,txi,tyi,,txi,tyi]T with a length of 2 n 2n 2n. X l X_{l} Xl can also be expressed as X l = X ˉ + d X l X_{l}=\bar{X}+dX_{l} Xl=Xˉ+dXl, with d X l = P b l dX_{l} =Pb_{l} dXl=Pbl. X l X_{l} Xl is mean shape of the model.
X i = M ( s i , θ i ) [ X ˉ + P b l ] + t i X_{i} = M(s_{i},\theta_{i})[\bar{X}+Pb_{l}] + t_{i} Xi=M(si,θi)[Xˉ+Pbl]+ti

6.2 Compute the Movements of Landmarks

Using a algorithm, the best fit position for the landmark j j j can be found. For the shape X i X_{i} Xi , this process should be repeated to find a suggested new position for each landmark, so we get a position offset d X i dX_{i} dXi. Thus, we finally obtain an “ideal” shape X i ′ = X i + d X i X_{i}^{\prime} = X_{i} +dX_{i} Xi=Xi+dXi .

Generally, X i ′ X_{i}^{\prime} Xi obtained through gray-level profile search is closer to the shape of the target object. Then we can adjust the pose parameters, namely scale, rotation and translation, to move the initial estimate as close as possible to the target object. But usually we do not update the X i X_{i} Xi to X i ′ X_{i}^{\prime} Xi directly, because X i ′ X_{i}^{\prime} Xi maybe doesn’t satisfy the shape constraints and generate an unallowable shape.

With this problem into consideration, the shape parameter b b b should also be updated to make X i X_{i} Xi as close as possible to X i ′ X_{i}^{\prime} Xi . Meanwhile, constraints should be imposed on b b b to ensure that X i ′ X_{i}^{\prime} Xi is a plausible shape.

X i = M ( s i , θ i ) [ X ˉ + P b l ] + t i → ( 1 + d s ) , d θ , d t X i ′ = X i + d X i X_{i} = M(s_{i}, \theta_{i})[\bar{X}+Pb_{l}]+t_{i} \xrightarrow {(1+ds),d \theta,dt} X_{i}^{\prime} = X_{i} +dX_{i} Xi=M(si,θi)[Xˉ+Pbl]+ti(1+ds),dθ,dt Xi=Xi+dXi

Compute d b l db_{l} dbl by solving the following equation

M ( s i ( 1 + d s ) , θ i + d θ ) [ X ˉ + P ( b l + d b l ) ] + t i + d t = X i + d X i M(s_{i}(1+ds), \theta_{i}+d\theta)[\bar{X}+P(b_{l}+db_{l})] + t_{i} +dt = X_{i} +dX_{i} M(si(1+ds),θi+dθ)[Xˉ+P(bl+dbl)]+ti+dt=Xi+dXi

Let X i = M ( s i , θ i ) [ X ˉ + P b l ] + t i X_{i} = M(s_{i}, \theta_{i})[\bar{X}+Pb_{l}]+t_{i} Xi=M(si,θi)[Xˉ+Pbl]+ti, we get

M ( s i ( 1 + d s ) , θ i + d θ ) [ X ˉ + P ( b l + d b l ) ] + t i + d t = M ( s i , θ i ) [ X ˉ + P b l ] + t i + d X i M ( s i ( 1 + d s ) , θ i + d θ ) [ X ˉ + P ( b l + d b l ) ] = M ( s i , θ i ) [ X ˉ + P b l ] + d X i − d t \begin{aligned} M(s_{i}(1+ds), \theta_{i}+d\theta)[\bar{X}+P(b_{l}+db_{l})] + t_{i} +dt = M(s_{i}, \theta_{i})[\bar{X}+Pb_{l}]+t_{i} + dX_{i} \\ M(s_{i}(1+ds), \theta_{i}+d\theta)[\bar{X}+P(b_{l}+db_{l})] = M(s_{i}, \theta_{i})[\bar{X}+Pb_{l}] + dX_{i} -dt \end{aligned} M(si(1+ds),θi+dθ)[Xˉ+P(bl+dbl)]+ti+dt=M(si,θi)[Xˉ+Pbl]+ti+dXiM(si(1+ds),θi+dθ)[Xˉ+P(bl+dbl)]=M(si,θi)[Xˉ+Pbl]+dXidt

and since

M − 1 ( s , θ ) [ …   ] = M ( s − 1 , − θ ) [ …   ] M^{-1}(s,\theta)[\dots] = M(s^{-1}, -\theta)[\dots] M1(s,θ)[]=M(s1,θ)[]

we obtain

d b l = P − 1 ( M ( ( s i ( 1 + d s ) ) − 1 , − ( θ + d θ ) ) [ M ( s i , θ i ) [ X ˉ + P b l ] + d X i − d t ] − X ˉ ) − b l db_{l} = P^{-1}(M((s_{i}(1+ds))^{-1}, -(\theta + d\theta))[M(s_{i},\theta_{i})[\bar{X}+Pb_{l}]+dX_{i}-dt]-\bar{X})-b_{l} dbl=P1(M((si(1+ds))1,(θ+dθ))[M(si,θi)[Xˉ+Pbl]+dXidt]Xˉ)bl

Now, we have enough information to form a new shape estimate using the parameters above.

X i 1 = M ( s i ( 1 + d s ) , θ i + d θ ) [ X l + P d b l ] + t i + d t X_{i}^{1} = M(s_{i}(1+ds), \theta_{i}+d\theta)[X_{l}+Pdb_{l}] + t_{i} +dt Xi1=M(si(1+ds),θi+dθ)[Xl+Pdbl]+ti+dt

Then we start this procedure from X i 1 X_{i}^{1} Xi1 to produce X i 2 X_{i}^{2} Xi2, until there is no significant change of the shape.

The procedure of searching for a target object in a new image can be summarized as follows

  1. Let X i = M ( s i , θ i ) [ X l ] + t i X_{i} = M (s_{i} ,\theta_{i})[X_{l}]+ t_{i} Xi=M(si,θi)[Xl]+ti be the initial estimate of the model in the new image, where X l = X ˉ + P b l X_{l} = \bar{X} + Pb_{l} Xl=Xˉ+Pbl , and P P P contains the first t t t principal component.

  2. Use the gray-level profile of landmarks to search for the suggested movements of landmarks. By doing this, we can get a position offset d X i dX_{i} dXi for each landmarks of the model. Then move the initial shape to a new plausible position X i + d X i X_{i} + dX_{i} Xi+dXi.

  3. Calculate the additional pose parameters s , θ , t s,\theta,t s,θ,t.

  4. Calculate the additional shape parameter d b l db_{l} dbl and notice that a suitable constraint (eg. − 3 λ i < d b l < 3 λ i -3\sqrt{\lambda_{i}} < db_{l} < 3\sqrt{\lambda_{i}} 3λi <dbl<3λi ) should imposed on b b b to avoid unallowable shapes appear.

  5. Update the pose and shape parameters on X i X_{i} Xi to obtain a new shape X i 1 X_{i}^{1} Xi1 close to the target. Then use X i 1 X_{i}^{1} Xi1 as the estimate, repeat from step 2.

  6. Stop the iteration until no significant change is found.

你可能感兴趣的:(计算机视觉,Face,Recognition,Active,Shape,Model,ASM,人脸识别)