funMV: DPM: Deformable Part Model

자세한 이론적 내용은 [1]을 참고한다. 사용한 프로그램은 Felzenszwalb의 홈페이지에서 다운 받았다. 기본적으로 Matlab을 사용하고 있으며 속도가 필요한 부분은 c로 구현이 되었으며 c 코드를 컴파일해서 mex파일을 생성하고 matlab에서 호출하여 사용하고 있다. version 3.1을 다운받아 사용하였다. 홈페이지[2]를 참조한다.

코드는 크게 learning부분과 detection부분으로 구성되었고 여기서 소개된 내용은 이미 학습되어 있는 파일을 이용하여 물체를 검출하는 부분만을 기술한다.

물체 검출을 실행하는 matlab 파일은 detect.m인데 너무 복잡하다. 별 필요없는 부분을 제거 후 간단해 진 파일이 detect1.m이다 [3].

먼저 학습된 모델을 load한다.

>> load car_final.mat

테스트할 영상 파일을 읽는다.

>> im = imread('000034.jpg'); %car image

검출을 실행한다.

>> boxes = detect1(im, model, 0);

결과를 화면에 출력한다.

>> showboxes(im, boxes);

검출된 모든 결과가 overlap으로 출력되어 진짜 솔루션을 찾기가 어렵다.
따라서 non-maximum supp을 수행한 후 다시 출력한다.

>> top = nms(boxes, 0.5);
>> showboxes(im, top);

detect1.m 파일은 실행하는 동안 여러 c언어로 만들어진 함수들을 호출한다. resize.cc, features.cc, fconv.cc, dt.cc 등인데 원 파일은 windows7하의 vs2010에서 컴파일이 되지 않아 주로 header부분을 수정하였다.

DPM의 원래 논문[2]에 있는 테스트 영상에 실행해 보니, 논문에 실린 동일한 결과가 출력 되었다. 직접 찍은 영상을 테스트 해 보기로 한다.

일전에 출근하다가 도로에서 찍은 영상이 있어 테스트를 해 보았다. 아래는 원래 크기의 영상이다. 크기를 좀 줄였는데도 크다.

>> size(im)

ans =

1224 1632 3

>>boxes=detect1(im,model,0);

>>top=nms(boxes,0.5);

>>showboxes(im,top);

실행해서 검출한 결과이다.

뒷 모습이 보이는 큰 차 뿐 만 아니라 반대 차선에서 다가오는 작게 앞면이 보이는 차량 까지도 감지 되었다. 사실 작게 보이지만 이미지 자체가 큰 영상이므로 실제는 작지 않다.

실제로 작은 크기의 영상에 대해 테스트 해 보았다.

>> size(im)

ans =

398 598 3

이미지 크기는 처음 영상의 30% 정도이다. 결과를 보자.

앞 부분에 있는 차는 크기가 매우 커 감지 되었으나, 뒷 부분은 크기가 작아서 감지 되지 않는다.

이번에는 사람을 검출해 보자. 건널목에서 찍은 사진이다. 사람들이 개별로 잘 분리되어 있다.

>> load person_final.mat % 사람을 검출하는 학습 모델을 로드

>> im=imread('people1.jpg'); % 테스트할 영상 입력

>> size(im) % 입력된 영상의 크기 출력

ans =

586 888 3

크기는 앞의 큰 도로 영상보다 더 작다.

>> boxes=detect1(im,model,0); % 사람 검출 실행

>> top=nms(boxes,0.5); % non maximum suppression (다중 응답을 제거)

>> showboxes(im,top); % 화면에 출력

전체적으로는 잘 검출이 되었으나 양 옆에서 약간 오차가 있는 것 같다.

좀 더 복잡한 영상이다.

>> im=imread('people2.jpg');

>> size(im)

ans =

618 940 3

>> boxes=detect1(im,model,0);

>> showboxes(im,boxes);

non-maximum suppression을 하지 않고 검출한 결과를 출력해 보았다.

검출 윈도가 너무 많이 오버랩이 되어서 결과 확인이 어렵다. 이번에는 nms를 수행해 보았다.

오버랩은 많이 제거 되었으나 오 검출 결과들이 몇 보인다. 아주 멀리 있는 사람 말고 윤곽이 보이는 사람들은 검출이 되었다. Red box가 물체의 root window이고, root box 내부에 있는 Blue box가 part window이다. 차나 사람이나 part는 6개가 설정되었다.

원 논문의 저자 홈페이지에서 코드를 다운받아 설치하여 보면 이미 학습되어 있는 많은 물체 카테고리가 있다.

load 명령을 통해 모델을 Matlab에 로딩 시키고 car와 human의 경우처럼 테스트 영상에 대해 물체를 감지하면 된다.

새로운 물체를 감지할 필요가 있다면 learning부의 코드를 실행 시켜 위와 같은 물체category.mat 파일을 생성해 주어야 한다.

References
[0] 설정방법
설치 컴퓨터를 바꾸거나 matlab version이 바뀌거나 재설치 시 함수가 실행이 되지 않는 문제가 있다. DPM 코드는 연산량이 많은 부분은 c함수를 통해 계산하기 때문에 c함수를 현재의 설치 환경 하에서 재 컴파일을 해 주어야 한다.
(1) >> mex -setup: 먼저 현재 환경에 맞는 c컴파일러를 선정해 준다. 주어진 명령을 실행하면 사용 가능한 컴파일러가 나타나는데 이 중에서 하나를 선정해 주어야 한다.
(2) 컴파일해주어야 하는 파일은 모두 5개이다. yprime.c, dt.cc, fconv.cc, features.cc, resize.cc이다.
>> mex yprime.c
>> mex dt.cc
> ...
(3) 컴파일된 mex파일이 생성되었다면 detect1.m이 실행될 수 있다.

[1]P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep. 2010
[2] http://www.cs.berkeley.edu/~rbg/latent/index.html
[3] d:\temp_mat
[4] Detect1.m:

function [boxes] = detect(input, model, thresh, bbox, ...
overlap, label, fid, id, maxsize)

% boxes = detect(input, model, thresh, bbox, overlap, label, fid, id, maxsize)
% Detect objects in input using a model and a score threshold.
% Higher threshold leads to fewer detections.
%
% The function returns a matrix with one row per detected object. The
% last column of each row gives the score of the detection. The
% column before last specifies the component used for the detection.
% The first 4 columns specify the bounding box for the root filter and
% subsequent columns specify the bounding boxes of each part.
%
% If bbox is not empty, we pick best detection with significant overlap.
% If label and fid are included, we write feature vectors to a data file.

if nargin > 3 && ~isempty(bbox)
latent = true;
else
latent = false;
end

if nargin > 6 && fid ~= 0
write = true;
else
write = false;
end

if nargin < 9
maxsize = inf;
end

% we assume color images
input = color(input); % 아래에서 test하는 영상 크기는 480x640

% prepare model for convolutions
rootfilters = [];
%{
model =

sbin: 8
rootfilters: {[1x1 struct] [1x1 struct]}
offsets: {[1x1 struct] [1x1 struct]}
blocksizes: [1 775 1 744 1085 4 1085 4 620 4 620 4 992 4 496 4 992 4 496 4]
regmult: [0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
learnmult: [20 1 20 1 1 0.1000 1 0.1000 1 0.1000 1 0.1000 1 0.1000 1 0.1000 1 0.1000 1 0.1000]
lowerbounds: {1x20 cell}
components: {[1x1 struct] [1x1 struct]}
interval: 10
numcomponents: 2
numblocks: 20
partfilters: {1x12 cell}
defs: {1x12 cell}
maxsize: [6 10]
minsize: [5 8]
thresh: -1.3918
%}

%size(model.rootfilters) = 2: 필터가 2개
%size(model.rootfilters{1}.w) = 5 10 31: win patch가 5x10개이고, 각 patch는
%size(model.rootfilters{2}.w) = 6 8 31: 31개의 값을 가지는 vector
for i = 1:length(model.rootfilters)
rootfilters{i} = model.rootfilters{i}.w;
end

partfilters = [];
% size(model.partfilters)=12
% size(model.partfilters{1}.w) = 7 5 31: patch가 7x5개이고 각 patch는 31개의
% size(model.partfilters{2}.w) = 7 5 31: 값을 가지는 vector. 이런 filter가 12개
for i = 1:length(model.partfilters)
partfilters{i} = model.partfilters{i}.w;
end

% cache some data
for c = 1:model.numcomponents % model.numcomponents=2
ridx{c} = model.components{c}.rootindex; % 1, 2
oidx{c} = model.components{c}.offsetindex; % 1, 2
root{c} = model.rootfilters{ridx{c}}.w; % root=[5x10x31]
rsize{c} = [size(root{c},1) size(root{c},2)]; % rsize{1}= 5 10
numparts{c} = length(model.components{c}.parts); % 6
%{
model.components{1}.parts{1}
ans =
partindex: 1
defindex: 1
model.components{1}.parts{2}
ans =
partindex: 2
defindex: 2
...
model.components{2}.parts{1}
ans =
partindex: 7
defindex: 7
...
%}
for j = 1:numparts{c} % 6(c=1)
pidx{c,j} = model.components{c}.parts{j}.partindex;
didx{c,j} = model.components{c}.parts{j}.defindex;
part{c,j} = model.partfilters{pidx{c,j}}.w; % part{1,1}=[7x5x31]
psize{c,j} = [size(part{c,j},1) size(part{c,j},2)]; %psize{1,1}=7 5
% reverse map from partfilter index to (component, part#)
rpidx{pidx{c,j}} = [c j];
end
end

% we pad the feature maps to detect partially visible objects
padx = ceil(model.maxsize(2)/2+1); % model.maxsize= 6 10
pady = ceil(model.maxsize(1)/2+1); % padx=6, pady=4

% the feature pyramid
interval = model.interval; % 10
[feat, scales] = featpyramid(input, model.sbin, interval);

% detect at each scale
best = -inf;
ex = [];
boxes = [];

% level: 1~10: 4x4 Hog가 적용된 pyramid 이미지
for level = interval+1:length(feat) % length(feat)=46 -> level: 11~46
scale = model.sbin/scales(level); % scale=8/1

% size(feat{level}, 1) = 58, size(feat{level}, 2) = 78
if size(feat{level}, 1)+2*pady < model.maxsize(1) || ... % 66<6
size(feat{level}, 2)+2*padx < model.maxsize(2) || ... % 90<10
(write && ftell(fid) >= maxsize)
continue;
end

% convolve feature maps with filters
% size(feat{11})=58 78 31
% size(featr) = 66 90 31: 원이미지(480x640)에 8x8Hog가 적용된 feature img.
% size(featr(:,:,1))=66 90: 즉, patch의 수가 58x78개
featr = padarray(feat{level}, [pady padx 0], 0);
% rootfilters = [5x10x31 double] [6x8x31 double]
% level=11이면 original size(480x640)이미지의 feature patch에 root filter
% 를 conv하여 search 한 것
% ****** root match 계산이 중요
rootmatch = fconv(featr, rootfilters, 1, length(rootfilters));
% rootmatch=[62x81 double] [61x83 double]

if length(partfilters) > 0 %12
% size(featp)=134 182 31, size(feat{1})=118 158 31
featp = padarray(feat{level-interval}, [2*pady 2*padx 0], 0);
% level-interval: level=11이고, interval=10이므로 feat{1}이고 원 이미지
% 에 대한 4x4 Hog 특징을 추출한 것
%{
partfilters=[7x5x31 double] [7x5x31 double] [5x7x31 double] [5x7x31 double]
[5x8x31 double] [5x8x31 double] [8x4x31 double] [8x4x31 double]
[4x8x31 double] [4x8x31 double] [4x8x31 double] [4x8x31 double]
%}
% ****** part match 계산이 중요
partmatch = fconv(featp, partfilters, 1, length(partfilters));
%{
[128x178 double] [128x178 double] [130x176 double] [130x176 double]
[130x175 double] [130x175 double] [127x179 double] [127x179 double]
[131x175 double] [131x175 double] [131x175 double] [131x175 double]
%}
end

for c = 1:model.numcomponents %2
% root score + offset
%{
model.offsets{1} = w: -6.7609, blocklabel: 1
model.offsets{2} = w: -3.4770, blocklabel: 3
ridx = 1, 2, size(rootmatch{1})=62 81
%}
score = rootmatch{ridx{c}} + model.offsets{oidx{c}}.w; % 62x81

% add in parts
for j = 1:numparts{c} % numparts=6, 6
%{
model.defs{1} = anchor: [1 4], w: [0.0227 0.0144 0.0202 0.0024]
blocklabel: 6
model.defs{2} = anchor: [16 4], w: [0.0227 -0.0144 0.0202 0.0024]
model.defs{3} = anchor: [4 1], w: [0.0155 -0.0239 0.0560 -0.0014]
blocklabel: 8
...
model.defs{12} = anchor: [5 5], w: [0.0314 -0.0056 0.0521 -0.0033]
blocklabel: 20
%}

def = model.defs{didx{c,j}}.w; % size(def)=1 4, didx{1,1}=1
anchor = model.defs{didx{c,j}}.anchor; % size(anchor)=1 2
% 1 2 는 rootblock내의 위치가 아닌지?
% the anchor position is shifted to account for misalignment
% between features at different resolutions
ax{c,j} = anchor(1) + 1; % ax=[2]
ay{c,j} = anchor(2) + 1; % ay=[5]
%{
pidx =
[1] [2] [3] [ 4] [ 5] [ 6]
[7] [8] [9] [10] [11] [12]
%}
match = partmatch{pidx{c,j}}; % size(pidx)= 2 6, size(match)=128x178
% size(M)=128x178, Ix=[128 178 int32], Iy=[128 178 int32]
% size(score)=62x81
% ay{c,j}:2:ay{c,j}+...=5:2:127, size(5:2:127)=62
% ax{c,j}:2:ax{c,j}+...=2:2:162, size(2:2:162)=81
[M, Ix{c,j}, Iy{c,j}] = dt(-match, def(1), def(2), def(3), def(4));
score = score - M(ay{c,j}:2:ay{c,j}+2*(size(score,1)-1), ...
ax{c,j}:2:ax{c,j}+2*(size(score,2)-1));
% score = 62x81(score: root)-62x81(M: part)
end

if ~latent
% get all good matches
I = find(score > thresh);

level
I

[Y, X] = ind2sub(size(score), I);
tmp = zeros(length(I), 4*(1+numparts{c})+2);
for i = 1:length(I)
x = X(i);
y = Y(i);
[x1, y1, x2, y2] = rootbox(x, y, scale, padx, pady, rsize{c});
b = [x1 y1 x2 y2];

for j = 1:numparts{c}
[probex, probey, px, py, px1, py1, px2, py2] = ...
partbox(x, y, ax{c,j}, ay{c,j}, scale, padx, pady, ...
psize{c,j}, Ix{c,j}, Iy{c,j});
b = [b px1 py1 px2 py2];
end

tmp(i,:) = [b c score(I(i))];
end

boxes = [boxes; tmp];

end % end of ~latent

end % end of c

end % end of level

funMV

2014년 6월 23일 월요일

DPM: Deformable Part Model

댓글 1개:

태그

프로필