CH04. Stereo Systems (1)

728x90

지난 번까지는 여러 추가적인 viewpoints 들이 얼마나 해당 scene에 대한 정보를 많이 줄 수 있는지에 대해서 알아봤었다.

Epipolar geometry는 3D scene에 대한 정보 없이 한 image plane의 점들을 다른 image plane으로 대응했었다면, 이번 강에서는 여러 2D images를 이용해서 3D scene을 recover 하는 법에 대해 알아본다.

Triangulation

Triangulation은 multiview geometry에서 가장 fundamental한 문제 중 하나다.

Triangulation은 3D point가 projection된 두 개 이상의 images를 이용해서 해당 3D point의 위치를 결정하는 작업이다.

만약 두 개의 view를 이용한 triangulation의 경우, 두 개의 camera가 있을 것이고 각각은 camera intrinsic parameter인 $K, K' <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo>,</mo><msup><mi>K</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 를 갖는다.

그리고 서로에 대한 $R, T <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>R</mi><mo>,</mo><mi>T</mi></math>$ 도 정의할 수 있다.

3D 상의 점 $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 의 camera image plane으로의 projection이 각각 $p, p' <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo>,</mo><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 라고 하자.

$P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 의 정확한 위치를 알 수는 없지만, $p, p' <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo>,</mo><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 의 정확한 위치는 알 수 있다.

그리고 $K, K', R, T <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>K</mi><mo>,</mo><msup><mi>K</mi><mi data-mjx-alternate="1">'</mi></msup><mo>,</mo><mi>R</mi><mo>,</mo><mi>T</mi></math>$ 도 알고 있으므로 각 카메라의 center $O 1, O 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>O</mi><mn>1</mn></msub><mo>,</mo><msub><mi>O</mi><mn>2</mn></msub></math>$ 와 $p, p' <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo>,</mo><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 을 잇는 직선인 $ℓ, ℓ' <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>ℓ</mi><mo>,</mo><msup><mi>ℓ</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 을 계산해낼 수 있다.

따라서 $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 는 $ℓ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>ℓ</mi></math>$ 과 $ℓ' <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>ℓ</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 의 교점이므로 $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 를 구해낼 수 있다.

하지만 이 방법이 수학적으로도 당연하고, 직관적이긴 하나 실제로는 잘 안 된다.

왜냐하면 real world에서는 $p <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$ 와 $p' <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 를 얻을 때 noise가 생기기 마련이고, camera parameters들이 정확하지 않기 때문에 $ℓ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>ℓ</mi></math>$ 과 $ℓ' <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>ℓ</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 의 교점을 찾는 게 쉽지 않다.

대부분의 경우, 두 직선이 교차하지 않을 확률이 더 크므로 $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 를 찾지 못한다.

그렇다면 이를 어떻게 해결할 수 있을지를 고민해보자.

A linear method for triangulation

간단한 linear triangulation method를 알아보자.

두 점 $p = M P = (x, y, 1) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo>=</mo><mi>M</mi><mi>P</mi><mo>=</mo><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>,</mo><mn>1</mn><mo stretchy="false">)</mo></math>$ 과 $p' = M' P = (x', y', 1) <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup><mo>=</mo><msup><mi>M</mi><mi data-mjx-alternate="1">'</mi></msup><mi>P</mi><mo>=</mo><mo stretchy="false">(</mo><msup><mi>x</mi><mi data-mjx-alternate="1">'</mi></msup><mo>,</mo><msup><mi>y</mi><mi data-mjx-alternate="1">'</mi></msup><mo>,</mo><mn>1</mn><mo stretchy="false">)</mo></math>$ 은 주어진다.

Cross product의 정의에 따르면 $p \times (M P) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo>\times</mo><mo stretchy="false">(</mo><mi>M</mi><mi>P</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn></math>$ 가 된다.

따라서

$x (M 3 P) - (M 1 P) = 0 y (M 3 P) - (M 2 P) = 0 x (M 2 P) - y (M 1 P) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>x</mi><mo stretchy="false">(</mo><msub><mi>M</mi><mn>3</mn></msub><mi>P</mi><mo stretchy="false">)</mo><mo>-</mo><mo stretchy="false">(</mo><msub><mi>M</mi><mn>1</mn></msub><mi>P</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn><mspace linebreak="newline"></mspace><mi>y</mi><mo stretchy="false">(</mo><msub><mi>M</mi><mn>3</mn></msub><mi>P</mi><mo stretchy="false">)</mo><mo>-</mo><mo stretchy="false">(</mo><msub><mi>M</mi><mn>2</mn></msub><mi>P</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn><mspace linebreak="newline"></mspace><mi>x</mi><mo stretchy="false">(</mo><msub><mi>M</mi><mn>2</mn></msub><mi>P</mi><mo stretchy="false">)</mo><mo>-</mo><mi>y</mi><mo stretchy="false">(</mo><msub><mi>M</mi><mn>1</mn></msub><mi>P</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn></math>$

과 같은 constraint를 얻을 수 있다.( $M i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>M</mi><mi>i</mi></msub></math>$ 는 행렬 $M <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$ 의 $i <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$ 번째 row)

$p' <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 과 $M' <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>M</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 으로도 비슷한 constraint를 만들어 낼 수 있다.

이 constraint들을 이용하면 $A P = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi><mi>P</mi><mo>=</mo><mn>0</mn></math>$ 꼴의 linear equation을 세울 수 있다.

$A = [x M 3 - M 1 y M 3 - M 2 x' M' 3 - M' 1 y' M' 3 - M' 2] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>A</mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mi>x</mi><msub><mi>M</mi><mn>3</mn></msub><mo>-</mo><msub><mi>M</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><mi>y</mi><msub><mi>M</mi><mn>3</mn></msub><mo>-</mo><msub><mi>M</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><msup><mi>x</mi><mi data-mjx-alternate="1">'</mi></msup><msubsup><mi>M</mi><mn>3</mn><mi data-mjx-alternate="1">'</mi></msubsup><mo>-</mo><msubsup><mi>M</mi><mn>1</mn><mi data-mjx-alternate="1">'</mi></msubsup></mtd></mtr><mtr><mtd><msup><mi>y</mi><mi data-mjx-alternate="1">'</mi></msup><msubsup><mi>M</mi><mn>3</mn><mi data-mjx-alternate="1">'</mi></msubsup><mo>-</mo><msubsup><mi>M</mi><mn>2</mn><mi data-mjx-alternate="1">'</mi></msubsup></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$

이거는 이제 SVD 이용하면 된다.

이 방법이 좋은 또 다른 이유는 여러 view를 통해 얻은 여러 projection들도 그냥 행렬을 아래로 쌓기만 하면 되므로 처리하기가 쉽다.

하지만, 이 방법은 projective-invariant하지 않기 때문에 projective reconstruction에는 적합하지 않다.

예를 들어, 카메라 행렬 $M, M' <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mo>,</mo><msup><mi>M</mi><mi data-mjx-alternate="1">'</mi></msup></math>$ 을 projective transformation 된 행렬 $M H - 1, M' H - 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><msup><mi>H</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mo>,</mo><msup><mi>M</mi><mi data-mjx-alternate="1">'</mi></msup><msup><mi>H</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup></math>$ 로 바꿔보자.

그러면 $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi></math>$ 는 $A H - 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi><msup><mi>H</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup></math>$ 이 될 것이다.

따라서 $A P = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi><mi>P</mi><mo>=</mo><mn>0</mn></math>$ 이었던 문제가 $(A H - 1) (H P) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>A</mi><msup><mi>H</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mo stretchy="false">)</mo><mo stretchy="false">(</mo><mi>H</mi><mi>P</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn></math>$ 이 된다.

SVD는 $‖ P ‖ = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi>P</mi><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo>=</mo><mn>1</mn></math>$ 이라는 constraint이 있는데, 이는 projective matrix $H <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>H</mi></math>$ 하에서는 만족하지 않는다.(not invariant)

따라서 이 방법은 간단하긴 하지만 대부분 optimal solution을 찾지는 못한다.

A nonlinear method for triangulation

대신 real-world에서 triangulation은 대부분 다음과 같은 minimization problem으로 표현된다.

$min ˆ P ‖ M ˆ P - p ‖ 2 + ‖ M' ˆ P - p' ‖ 2 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mtext>min</mtext><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></munder><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi>M</mi><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo>-</mo><mi>p</mi><msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn></msup><mo>+</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><msup><mi>M</mi><mi data-mjx-alternate="1">'</mi></msup><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo>-</mo><msup><mi>p</mi><mi data-mjx-alternate="1">'</mi></msup><msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn></msup></math>$

위 식은 각 $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 을 image로 reproject한 reprojection error를 통해 $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 를 가장 근사하는 3D 상의 점 $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 를 찾는 것이다.

3D point의 reprojection error는 해당 점의 projection과 대응하는 점의 image plane에서의 관찰되는 point와의 distance를 의미한다.

위의 Figure2에서를 예로 들면, $M ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 가 $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 의 projected point가 되고, $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 과 대응되는 관찰되는 점은 $p <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi></math>$ 가 된다.

따라서 image1에서의 reprojection error은 $‖ M ˆ P - p ‖ <math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi>M</mi><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo>-</mo><mi>p</mi><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo></math>$ 가 되는 것이다.

(위의 식은 모든 image의 reprojection error을 합한 것을 의미)

그렇기 때문에 더 일반화해서 식을 표현해보면

$min ˆ P \sum i ‖ M ˆ P i - p i ‖ 2 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mtext>min</mtext><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></munder><munder><mo data-mjx-texclass="OP">\sum</mo><mi>i</mi></munder><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi>M</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mi>i</mi></msub><mo>-</mo><msub><mi>p</mi><mi>i</mi></msub><msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn></msup></math>$

처럼 쓸 수 있다.

실제로는, 굉장히 좋은 근사치를 내는 정교한 optimization technique들이 존재하지만 이 강의에서는 이 중 하나의 기법에만 집중하기로 한다.

우리가 사용할 최적화 기법은 Gauss-Newton 알고리즘이다.

일반적인 nonlinear least squares problem의 경우는

$‖ r (x) ‖ 2 = m \sum i = 1 r i (x) 2 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mi>r</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn></msup><mo>=</mo><munderover><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><msub><mi>r</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mi>x</mi><msup><mo stretchy="false">)</mo><mn>2</mn></msup></math>$

를 minimize하는 $x \in R n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>\in</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ 을 찾는 것이다.

( $r <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi></math>$ 은 $r (x) = f (x) - y <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>-</mo><mi>y</mi></math>$ 를 만족하는 $r : R n \to R m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi><mo>:</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup><mo stretchy="false">\to</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>m</mi></msup></math>$ 의 residual function이다.

이 때, $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ 는 function, $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ 는 input, $y <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>y</mi></math>$ 는 observation)

함수 $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ 가 linear하다면, nonlinear least squares problem은 일반적인 linear least squares problem로 간소화된다.

하지만, image plane으로의 projection은 homogeneous coordinate으로 나눠줘야 하므로 대부분의 projection은 nonlinear하다.

따라서 카메라 행렬은 affine하지 않다.

$e i = M ˆ P i - p i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>e</mi><mi>i</mi></msub><mo>=</mo><mi>M</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mi>i</mi></msub><mo>-</mo><msub><mi>p</mi><mi>i</mi></msub></math>$ 의 $2 \times 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mo>\times</mo><mn>1</mn></math>$ 벡터 $e i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>e</mi><mi>i</mi></msub></math>$ 를 정의하면, 최적화문제를 다음과 같이 변경할 수 있다.

$min ˆ P \sum i e i (ˆ P) 2 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mtext>min</mtext><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></munder><munder><mo data-mjx-texclass="OP">\sum</mo><mi>i</mi></munder><msub><mi>e</mi><mi>i</mi></msub><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><msup><mo stretchy="false">)</mo><mn>2</mn></msup></math>$

그리고 이 식은 완벽한 nonlinear least squares problem 꼴이 된다.

그러면 어떻게 Gauss-Newton algorithm을 적용하는지 알아보자.

먼저, 앞에서 봤던 간단한 linear method로 대충 어림잡아 구한 꽤나 합리적인 $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 가 있다고 하자.

Gauss-Newton algorithm은 estimate를 더 나은 방향(reprojection error을 minimize하는)으로 update 해나가는 것이다.

각 step마다 우리는 $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 를 $δ P <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>δ</mi><mi>P</mi></msub></math>$ 만큼 업데이트 하려고 한다. ( $ˆ P = ˆ P + δ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo>+</mo><msub><mi>δ</mi><mi>P</mi></msub></math>$ )

그렇다면 update parameter $δ P <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>δ</mi><mi>P</mi></msub></math>$ 는 어떻게 설정하는 걸까?

Gauss-Newton 알고리즘의 key insight는 현재의 추정치인 $ˆ P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></math>$ 근처의 residual function을 linearize하는 것이다.

우리의 optimization problem의 경우에는,

$e(ˆP+δP)≈e(ˆP)+∂e∂PδP<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>e</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo>+</mo><msub><mi>δ</mi><mi>P</mi></msub><mo stretchy="false">)</mo><mo>≈</mo><mi>e</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo stretchy="false">)</mo><mo>+</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><mi>e</mi></mrow><mrow><mi>∂</mi><mi>P</mi></mrow></mfrac></mstyle><msub><mi>δ</mi><mi>P</mi></msub></math>$

처럼 point $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>P</mi></math>$ 의 residual error $e <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>$ 를 근사할 수 있고, minimization problem이

$minδP‖∂e∂PδP−(−e(ˆP))‖2<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mtext>min</mtext><msub><mi>δ</mi><mi>P</mi></msub></munder><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><mi>e</mi></mrow><mrow><mi>∂</mi><mi>P</mi></mrow></mfrac></mstyle><msub><mi>δ</mi><mi>P</mi></msub><mo>−</mo><mo stretchy="false">(</mo><mo>−</mo><mi>e</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mo stretchy="false">)</mo><mo stretchy="false">)</mo><msup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn></msup></math>$

와 같이 변환된다.

이렇게 residual을 정의하면, 일반적인 linear least squares problem의 꼴과 같다.

따라서 $N <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>N</mi></math>$ 개의 이미지에 대한 triangulation problem에서 linear least squares solution은

$δ P = - (J T J) - 1 J T e <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>δ</mi><mi>P</mi></msub><mo>=</mo><mo>-</mo><mo stretchy="false">(</mo><msup><mi>J</mi><mi>T</mi></msup><mi>J</mi><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><msup><mi>J</mi><mi>T</mi></msup><mi>e</mi></math>$

이고 이 때 $e <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>$ 와 $J <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi></math>$ 는

$e=[e1⋮eN]=[p1−M1ˆP⋮pn−MnˆP],J=[∂e1∂ˆP1∂e1∂ˆP2∂e1∂ˆP3⋮⋮⋮∂eN∂ˆP1∂eN∂ˆP2∂eN∂ˆP3]<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>e</mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>e</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>e</mi><mi>N</mi></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>p</mi><mn>1</mn></msub><mo>−</mo><msub><mi>M</mi><mn>1</mn></msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>p</mi><mi>n</mi></msub><mo>−</mo><msub><mi>M</mi><mi>n</mi></msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>,</mo><mstyle scriptlevel="0"><mspace width="1em"></mspace></mstyle><mi>J</mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><msub><mi>e</mi><mn>1</mn></msub></mrow><mrow><mi>∂</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mn>1</mn></msub></mrow></mfrac></mstyle></mtd><mtd><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><msub><mi>e</mi><mn>1</mn></msub></mrow><mrow><mi>∂</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mn>2</mn></msub></mrow></mfrac></mstyle></mtd><mtd><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><msub><mi>e</mi><mn>1</mn></msub></mrow><mrow><mi>∂</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mn>3</mn></msub></mrow></mfrac></mstyle></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><msub><mi>e</mi><mi>N</mi></msub></mrow><mrow><mi>∂</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mn>1</mn></msub></mrow></mfrac></mstyle></mtd><mtd><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><msub><mi>e</mi><mi>N</mi></msub></mrow><mrow><mi>∂</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mn>2</mn></msub></mrow></mfrac></mstyle></mtd><mtd><mstyle displaystyle="true" scriptlevel="0"><mfrac><mrow><mi>∂</mi><msub><mi>e</mi><mi>N</mi></msub></mrow><mrow><mi>∂</mi><msub><mrow data-mjx-texclass="ORD"><mover><mi>P</mi><mo stretchy="false">^</mo></mover></mrow><mn>3</mn></msub></mrow></mfrac></mstyle></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$

이다.

특정 이미지의 residual error vector $e i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>e</mi><mi>i</mi></msub></math>$ 는 $2 \times 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mo>\times</mo><mn>1</mn></math>$ vector임을 유의하자. (image plane의 차원은 2D이므로)

결과적으로 two camera의 triangulation의 경우에는 ( $N = 2) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>N</mi><mo>=</mo><mn>2</mn><mo stretchy="false">)</mo></math>$ residual vector $e <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi></math>$ 가 $2 N \times 1 = 4 \times 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mi>N</mi><mo>\times</mo><mn>1</mn><mo>=</mo><mn>4</mn><mo>\times</mo><mn>1</mn></math>$ vector이고, Jacobian $J <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>J</mi></math>$ 는 $2 N \times 3 = 4 \times 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mi>N</mi><mo>\times</mo><mn>3</mn><mo>=</mo><mn>4</mn><mo>\times</mo><mn>3</mn></math>$ 행렬이 된다.

이 경우도 새로운 image에 대해서 $e, J <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>e</mi><mo>,</mo><mi>J</mi></math>$ 에 행만 추가해주면 되기 때문에 매우 유용하다.

이렇게 update $δ P <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>δ</mi><mi>P</mi></msub></math>$ 를 계산하고 나면, 정해진 step만큼 혹은 충분히 수렴할 때까지 이 과정을 반복해주면 된다.

Gauss-Newton 알고리즘의 중요한 특성은, 우리의 estimate 근처에서 residual function이 linear 하다고 했던 가정은 수렴을 보장하지 않는다는 것이다.

따라서 정해진 update의 횟수를 정해놓고 하는 것이 실제로는 더 유용하다.

Affine structure from motion

앞 section에서는 3D scene의 정보를 얻기 위해 두 개 이상의 view를 어떻게 활용할 수 있는지 보았다. 이제는 이를 더 확장해보자.

Multiple view로부터 획득한 observations of points를 결합함으로써, **Structure from motion(SFM)**이라는 방법을 통해 해당 scene의 3D structure 뿐만 아니라 카메라의 parameter까지도 결정할 수 있다.

우리가 $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ 개의 카메라가 있고 각 카메라의 intrinsic, extrinsic 특성을 포함한 camera transformation $M i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>M</mi><mi>i</mi></msub></math>$ 까지 알고 있다고 해보자.

$X j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>X</mi><mi>j</mi></msub></math>$ 를 scene의 $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ 개의 3D 점 중 하나라고 하자.

각 3D 점은 각 카메라에 location $x i j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub></math>$ 로 보일 것이다. (projective transformation $M i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>M</mi><mi>i</mi></msub></math>$ 를 통한 camera $i <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$ image로의 $X j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>X</mi><mi>j</mi></msub></math>$ projection)

SFM의 목적은 모든 observations $x i j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub></math>$ 를 이용해 scene을 recover하는 것( $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ 개의 3D 점 $X j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>X</mi><mi>j</mi></msub></math>$ )과 camera의 motion을 recover하는 것( $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ 개의 projection matrix $M i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>M</mi><mi>i</mi></msub></math>$ )이다.

The affine structure from motion problem

General한 SFM을 다루기 전에 좀 쉬운 문제를 먼저 살펴보자.

여기서는 카메라가 affine하거나 weak-perspective임을 가정한다.

Full perspective model의 경우는

$M = [A b v 1] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>M</mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mi>A</mi></mtd><mtd><mi>b</mi></mtd></mtr><mtr><mtd><mi>v</mi></mtd><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$

이었다. ( $v <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>v</mi></math>$ 는 non-zero $1 \times 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>1</mn><mo>\times</mo><mn>3</mn></math>$ vector)

반면 weak perspective model의 경우 $v = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>v</mi><mo>=</mo><mn>0</mn></math>$ 이다.

따라서 이 성질을 이용해서 $M X <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mi>X</mi></math>$ 의 homogeneous coordinate을 계산해보면

$x = M X = [m 1 m 2 0001] [X 1 X 2 X 3 1] = [m 1 X 1 m 2 X 2 1] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>x</mi><mo>=</mo><mi>M</mi><mi>X</mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>m</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>m</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mn>0</mn><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mn>0</mn><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mn>0</mn><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>X</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>X</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><msub><mi>X</mi><mn>3</mn></msub></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>m</mi><mn>1</mn></msub><msub><mi>X</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>m</mi><mn>2</mn></msub><msub><mi>X</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$

과 같다. 또는 이를

$M = [A b] X = A X + b <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>M</mi><mrow data-mjx-texclass="ORD"></mrow></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mi>A</mi></mtd><mtd><mi>b</mi></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mi>X</mi><mo>=</mo><mi>A</mi><mi>X</mi><mo>+</mo><mi>b</mi></math>$

와 같이 나타낼 수도 있다. (camera matrix를 $M affine = [A b] <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>M</mi><mrow data-mjx-texclass="ORD"><mtext>affine</mtext></mrow></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mi>A</mi></mtd><mtd><mi>b</mi></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$ 로 표현)

SFM으로 돌아와서, 우리는 $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ 개의 $M i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>M</mi><mi>i</mi></msub></math>$ 행렬과 $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ 개의 world좌표 $X j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>X</mi><mi>j</mi></msub></math>$ 을 추정해야하고, 이는 즉 $m n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mi>n</mi></math>$ 개의 observation으로부터 $8 m + 3 n <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>8</mn><mi>m</mi><mo>+</mo><mn>3</mn><mi>n</mi></math>$ 개의 미지수를 찾아야 함을 의미한다.

앞서 각 observation으로부터 2개의 constraint를 얻을 수 있음을 확인했었다.

따라서 총 $2 m n <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mi>m</mi><mi>n</mi></math>$ 개의 equation을 얻을 수 있다.

이 식으로부터 최소 몇 개의 observation이 필요한지를 계산할 수 있는데, 예를 들어 $m = 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>=</mo><mn>2</mn></math>$ 개의 카메라가 있다면 최소 $n = 16 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>=</mo><mn>16</mn></math>$ 개의 3D 점이 필요하다는 것을 의미한다.

그럼 이렇게 충분한 대응점이 있을 때, 문제를 푸는 방법에 대해 알아보자.

The Tomasi and Kanade factorization method

여기서는 Tomasi와 Kanade의 factorization method에 대해 알아본다.

이 method는 크게 두 단계로 구성된다.

data centering step
actual factorization step

먼저 data centering step에 대해서 알아보자.

이 단계에서 main idea는 원점으로 데이터를 center시키는 것이다.

그러기 위해서 각 이미지 $i <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$ 에서 새로운 좌표 $ˆ x i j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub></math>$ 를 다음과 같이 정의한다.

$ˆxij=xij−―xi=xij−1nn∑j=1xij<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow><mo>=</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow><mo>−</mo><mover><mi>x</mi><mo accent="true">―</mo></mover><mi>i</mi><mo>=</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow><mo>−</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mn>1</mn><mi>n</mi></mfrac></mstyle><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub></math>$

Affine SFM 의 경우 카메라 행렬의 $A i, b i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>A</mi><mi>i</mi></msub><mo>,</mo><msub><mi>b</mi><mi>i</mi></msub></math>$ 를 통해

$x i j = A i X j + b i <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msub><mi>A</mi><mi>i</mi></msub><msub><mi>X</mi><mi>j</mi></msub><mo>+</mo><msub><mi>b</mi><mi>i</mi></msub></math>$

과 같이 표현할 수 있다.

따라서

$ˆxij=xij−1nn∑k=1xik=AiXj−1nn∑k=1AiXk=Ai(Xj−1nn∑k=1Xk)=Ai(Xj−―X)=AiˆXj<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></mtd><mtd><mi></mi><mo>=</mo><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow><mo>−</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mn>1</mn><mi>n</mi></mfrac></mstyle><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>k</mi></mrow></msub></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><msub><mi>A</mi><mi>i</mi></msub><msub><mi>X</mi><mi>j</mi></msub><mo>−</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mn>1</mn><mi>n</mi></mfrac></mstyle><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>A</mi><mi>i</mi></msub><msub><mi>X</mi><mi>k</mi></msub></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><msub><mi>A</mi><mi>i</mi></msub><mo stretchy="false">(</mo><msub><mi>X</mi><mi>j</mi></msub><mo>−</mo><mstyle displaystyle="true" scriptlevel="0"><mfrac><mn>1</mn><mi>n</mi></mfrac></mstyle><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>X</mi><mi>k</mi></msub><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><msub><mi>A</mi><mi>i</mi></msub><mo stretchy="false">(</mo><msub><mi>X</mi><mi>j</mi></msub><mo>−</mo><mover><mi>X</mi><mo accent="true">―</mo></mover><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><msub><mi>A</mi><mi>i</mi></msub><msub><mrow data-mjx-texclass="ORD"><mover><mi>X</mi><mo stretchy="false">^</mo></mover></mrow><mi>j</mi></msub></mtd></mtr></mtable></math>$

과 같이 유도할 수 있다.

이처럼 월드 좌표계의 원점을 centroid $― X <math xmlns="http://www.w3.org/1998/Math/MathML"><mover><mi>X</mi><mo accent="true">―</mo></mover></math>$ 로 옮기면, $ˆ x i j <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></math>$ 와 $ˆ X i j <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>X</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></math>$ 가 $2 \times 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mo>\times</mo><mn>3</mn></math>$ 행렬 $A i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>A</mi><mi>i</mi></msub></math>$ 만으로 연관이 된다.

궁극적으로, 이 centering step은 compact한 행렬곱으로 표현하게 해준다.

하지만, $ˆ x i j = A i ˆ X j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msub><mi>A</mi><mi>i</mi></msub><msub><mrow data-mjx-texclass="ORD"><mover><mi>X</mi><mo stretchy="false">^</mo></mover></mrow><mi>j</mi></msub></math>$ 에서 우리는 오로지 좌변의 값만 알 수 있다.

따라서 우리는 어떻게든 $A i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>A</mi><mi>i</mi></msub></math>$ 와 $ˆ X i j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mover><mi>X</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub></math>$ 로 분리를 해야 한다.

모든 카메라의 모든 observation을 이용해 measurement matrix $D <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>$ 를 세워보면

$D = [ˆ x 11 ˆ x 12 \dots ˆ x 1 n ˆ x 21 ˆ x 22 \dots ˆ x 2 n ⋱ ˆ x m 1 ˆ x m 2 \dots ˆ x m n] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>D</mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mn>11</mn></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mn>12</mn></mrow></mtd><mtd><mo>\dots</mo></mtd><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mn>1</mn><mi>n</mi></mrow></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mn>21</mn></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mn>22</mn></mrow></mtd><mtd><mo>\dots</mo></mtd><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mn>2</mn><mi>n</mi></mrow></mtd></mtr><mtr><mtd></mtd><mtd></mtd><mtd><mo>⋱</mo></mtd><mtd></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mn>1</mn></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mn>2</mn></mrow></mtd><mtd><mo>\dots</mo></mtd><mtd><msub><mrow data-mjx-texclass="ORD"><mover><mi>x</mi><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>n</mi></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$

이 된다.

Affine assumption에 따라 $D <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>$ 는 $2 m \times 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mi>m</mi><mo>\times</mo><mn>3</mn></math>$ 의 motion matrix $M <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$ ( $A 1, \dots, A m <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>A</mi><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mi>A</mi><mi>m</mi></msub></math>$ 으로 구성)과 $3 \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>3</mn><mo>\times</mo><mi>n</mi></math>$ 의 structure matrix $S <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>S</mi></math>$ ( $X 1, \dots, X n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>X</mi><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mi>X</mi><mi>n</mi></msub></math>$ 으로 구성)로 표현 가능하다.

중요한 점은 $D <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>$ 의 rank가 3이라는 것인데, $D <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>$ 가 차원이 3인 두 행렬의 곱이기 때문이다.

$D <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi></math>$ 를 $M <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi></math>$ 과 $S <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>S</mi></math>$ 로 factorize하기 위해 SVD를 쓸 것이다. ( $D = U Σ V T <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi><mo>=</mo><mi>U</mi><mi mathvariant="normal">Σ</mi><msup><mi>V</mi><mi>T</mi></msup></math>$ )

Rank가 3인 것을 알고 있으므로, 단 3개의 non-zero singular value가 $Σ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">Σ</mi></math>$ 에 있을 것이다.

따라서 다음과 같이 분해를 할 수 있다.

하지만 실제로는 noise와 affine camera approximation에 의해 D의 rank가 3보다 크다.

하지만, rank가 3보다 클 때도, $U 3 W 3 V T 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>U</mi><mn>3</mn></msub><msub><mi>W</mi><mn>3</mn></msub><msubsup><mi>V</mi><mn>3</mn><mi>T</mi></msubsup></math>$ 가 Frobenius norm 측면에서 여전히 제일 좋은 rank-3 approximation이다.

$Σ 3 V T 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi mathvariant="normal">Σ</mi><mn>3</mn></msub><msubsup><mi>V</mi><mn>3</mn><mi>T</mi></msubsup></math>$ 가 $3 \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>3</mn><mo>\times</mo><mi>n</mi></math>$ 행렬이고 $U 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>U</mi><mn>3</mn></msub></math>$ 가 $2 m \times 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mi>m</mi><mo>\times</mo><mn>3</mn></math>$ 행렬이다.

이렇게 크기를 따져서 $M, S <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mo>,</mo><mi>S</mi></math>$ 로 분해하는 것이 affine SFM에서는 타당하게 들릴 수 있지만, UNIQUE한 solution은 주지 않는다.

왜냐하면 $M = U 3 Σ 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mo>=</mo><msub><mi>U</mi><mn>3</mn></msub><msub><mi mathvariant="normal">Σ</mi><mn>3</mn></msub></math>$ 그리고 $S = V T 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>S</mi><mo>=</mo><msubsup><mi>V</mi><mn>3</mn><mi>T</mi></msubsup></math>$ 로 잡을 수도 있기 때문이다.

그렇다면 뭘 골라야 할까?

Tomasi와 Kanade의 논문에서 그들은 robust한 선택은 $M = U 3 \sqrt Σ 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>M</mi><mo>=</mo><msub><mi>U</mi><mn>3</mn></msub><msqrt><msub><mi mathvariant="normal">Σ</mi><mn>3</mn></msub></msqrt></math>$ 와 $S = \sqrt Σ 3 V T 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>S</mi><mo>=</mo><msqrt><msub><mi mathvariant="normal">Σ</mi><mn>3</mn></msub></msqrt><msubsup><mi>V</mi><mn>3</mn><mi>T</mi></msubsup></math>$ 로 잡는 것이라고 한다.

Ambiguity in reconstruction

그럼에도 불구하고, $D = M S <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>D</mi><mo>=</mo><mi>M</mi><mi>S</mi></math>$ factorization은 여전히 ambiguity가 존재한다.

$D = M A A - 1 S = (M A) (A - 1 S) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>D</mi><mo>=</mo><mi>M</mi><mi>A</mi><msup><mi>A</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mi>S</mi><mo>=</mo><mo stretchy="false">(</mo><mi>M</mi><mi>A</mi><mo stretchy="false">)</mo><mo stretchy="false">(</mo><msup><mi>A</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mi>S</mi><mo stretchy="false">)</mo></math>$

와 같이 invertible한 임의의 행렬 $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi></math>$ 를 사이에 집어넣을 수 있기 때문이다.

따라서 우리의 solution은 underdetermined하므로 추가적인 constraints가 필요하다.

이렇게 reconstruction이 affine ambiguity가 있다는 말은, 즉 평행성은 보존이 되고 metric scale은 알 수가 없다는 것을 의미한다.

Reconstruction에서 또 다른 중요한 ambiguity중 하나는 similarity ambiguity이다.

(rotation, translation, scaling와 같은 similarity transform에 따라 생김)

이 ambiguity는 카메라가 intrinsically calibrated 되어 있어도 존재한다.

(다행인 것은 카메라가 calibrated되어 있는 경우라면 이 ambiguity가 유일한 ambiguity임)

만약 추가적인 가정(figure에 있는 집의 높이를 안다던가)이나 추가적인 데이터를 병합하지 않는 이상 object의 scale, 정확한 위치, 표준 방향 등을 정확히 알 수는 없다.

물체를 앞뒤로 옮기는 대신 scale을 조정해주는 식으로 같은 이미지를 얻을 수 있기 때문이다.

이런 예가 바로 camera calibration이다.

Calibration을 할 때, 체커보드의 정확한 월드 좌표계에서의 위치를 알 수 있으므로 사각형의 크기에 대한 정보를 우리가 알고 있다.

따라서 우리가 metric scale을 알 수 있는 것이다.

Tomasi and Kanade factorization method 구현 코드가 궁금하다면,

https://github.com/ianpark318/CS231A/blob/main/ps2/ps2_code/PSET2.ipynb 참고

728x90

저작자표시 비영리

'3D\Multiview Geometry > CS231A' 카테고리의 다른 글

CH04. Stereo Systems (2) (0)	2023.03.05
CH03. Epipolar Geometry (2) (0)	2023.02.24
CH03. Epipolar Geometry (1) (0)	2023.02.12
CH02. Single View Metrology (2) (0)	2023.02.09
CH02. Single View Metrology (1) (0)	2023.02.08

Triangulation
A linear method for triangulation
A nonlinear method for triangulation
Affine structure from motion
The affine structure from motion problem
The Tomasi and Kanade factorization method
Ambiguity in reconstruction

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

CH04. Stereo Systems (1)

Triangulation

A linear method for triangulation

A nonlinear method for triangulation

Affine structure from motion

The affine structure from motion problem

The Tomasi and Kanade factorization method

Ambiguity in reconstruction

'3D\Multiview Geometry > CS231A' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역