📝

[Vision Study] Week 2 문제 해설

링크

https://www.notion.so/janghoo/week2-3cec6897e76a4b7fb81cb19e36277f81

문제 1.

객관식1. 다음은 SIFT 알고리즘에 대한 설명입니다. 다음 중 틀린 것을 모두 고르세요.

salkuma.files.wordpress.com

https://salkuma.files.wordpress.com/2014/04/sifteca095eba6ac.pdf

Blob detection - Wikipedia

In computer vision, blob detection methods are aimed at detecting regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are constant or approximately constant; all the points in a blob can be considered in some sense to be similar to each other.

https://en.wikipedia.org/wiki/Blob_detection#The_Laplacian_of_Gaussian

Scale-invariant feature transform - Wikipedia

The scale-invariant feature transform ( SIFT) is a computer vision algorithm to detect, describe, and match local in images, invented by David Lowe in 1999. Applications include object recognition, robotic mapping and navigation, image stitching, 3D modeling, gesture recognition, video tracking, individual identification of wildlife and match moving.

https://en.wikipedia.org/wiki/Scale-invariant_feature_transform

Laplacian Operator

jem_graddivcurl.pdf 라플라스 연산자는 위의 기호가 말해주듯이 Divergence of Gradient 이다. 이 연산자를 어렴풋이라도 이해하기 위해서는 벡터의 발산(Divergence)과 경도(Gradient)에 대해서 잘 이해하고 있어야 한다. 이해를 돕기 위해 누군가의 설명을 여기에 옮겨본다 From : http://physics.stackexchange.com/questions/20714/laplace-operators-interpretation The Laplacian measures what you could call the " curvature " or stress of the field.

https://micropilot.tistory.com/2970

SIFT 알고리즘은 최종적으로 Keypoint Descriptor를 구하는 것이 목적이다.

→ 해당 알고리즘은 Keypoint Descriptor를 구한 뒤 Image Matching까지 하는 것이 목표다.

Scale-space extrema detection 단계에서 DoG를 활용하여 Keypoint를 얻는 과정을 거친다. 여기서 DoG는 노이즈에 민감하고 parameter가 많은 LoG를 대신하여 사용되었는데, 이는 또한 2차 미분 성질을 이용한 LoG를 근사시킨 방법이다.

•

문제 정정 : “노이즈에 민감하고 parameter가 많은 LoG를 대신하여” → “parameter가 많은 LoG를 대신하여”

LoG : 이미지를 Gaussian함수를 통해 Blur처리를 하여 이미지에 있는 Noise를 조금 제거를 해주고 나서 Laplacian Operator를 통해 Edge를 검출하게 된다.

Laplacian Operator

Gaussian function

LoG

그런데

여기서

\sigma

를 1.6 : 1의 비율로 한 두 Blurred Image를 빼는 방법으로 연산량을 줄인 것이 DoG이다.

해당 방법으로 연산하면 LoG에 굉장히 근사하는 값을 가질 수 있기 때문에

이 방법을 사용해서

\sigma

에 할애되는 연산 시간을 단축한다고 한다.

DoG

2번의 단계에서 한 이미지를 여러 scale로 변환한 후 특정 Point의 extrema를 통해 Keypoint를 구하게된다. 그 과정에서 여러 Scaled image 별로 Keypoint가 존재하게 되는데 각 Scaled Image는 원본 Image와 같은 Coordinate System_좌표계 를 사용하기 때문에 Keypoint의 위치를 연산 없이 원본 Image에 나타낼 수 있다.

의 연장으로 이미지 Scale 값을 다르게 한 여러 크기의 이미지를 DoG를 통해 Keypoint를 찾아내게 되는데 이때 output의 이미지들은 전부 Scale이 원본 이미지와는 다르기 때문에 각각의 좌표가 의미하는 바가 다르다

Gradient feature vector를 만들기 위해 Keypoint의 주변을 만약 32x32 window로 나눈다면 8개의 방향 정보를 담은 256차원의 feature vector를 얻을 수 있다.

위의 이미지는 하나의 point를 기준으로 16x16 Window를 상정하였고, 다시 이 Window를 4x4로 묶어서 8(orientation) x 16(window) 의 feature vector를 만들 수 있다.

여기서 Window를 단순히 32x32로 상정 하였다고해서 단순히 8x32의 feature vector를 만들 수 있는 것이 아니라 32x32의 window를 또 ?x?의 window로 묶어서 feature vector를 만들 것인지를 정해줘야 한다. 이 값에 따라 8x16 인지 8x32 인지 확정 지을 수 있다. 따라서 SubWindow의 크기를 정해줘야 한다.

SIFT Feature를 나타내는 f vector에는 scale에 대한 정보도 들어있다.