한국과학기술원 도서관

서지주요정보
A low cost 210 fps/core corner detector hardware for HD video based on FAST algorithm = FAST 알고리즘 기반의 210 fps/core 저비용 코너 검출기 하드웨어
서명 / 저자	A low cost 210 fps/core corner detector hardware for HD video based on FAST algorithm = FAST 알고리즘 기반의 210 fps/core 저비용 코너 검출기 하드웨어 / Jun-Seok Park.
발행사항	[대전 : 한국과학기술원, 2011].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8024225

소장위치/청구기호

학술문화관(문화관) 보존서고

MEE 11169

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Corner detection is used as the first step of many vision tasks such as object tracking, localization, SLAM(simultaneous localization and mappling), image matching and recognition and image stitching. However, the mobile environment is many restrictions such as size, weight, and especially power. So, to achieve the best performance on the limited resources, many researchers have tried to optimize vision process at the hardware level. In this work, we propose new corner detector based on FAST corner detector, which is optimized on the CPU based platform. To achieve both high performance and low power, we used three approaches : new detection method based on string searching algorithm, new hardware which can reduce redundant computation and increase performance using the bit level parallelism, and delay reduction. The segment test of FAST algorithm operates by considering a circle of sixteen pixels around the corner candidate P. It classifies P as a corner if there exists a set of 9 contiguous pixels in the circle which are all brighter than the intensity of the candidate pixel $I_P$ plus a threshold t, or all darker than $I_P$-t. When the tree is trained, the entropy is reduced. So, the FAST corner detection based on decision tree is well optimized method on CPU based platform. However, since it required more than 20KB I-cache and many branch operation, it can be the major reason to decrease performance in mobile based system. Proposed method is based on string searching algorithm, which finds a short string in long text. This method is slower than the decision tree method in PC system, because there are a lot of redundant comparisons. But, we can accelerate the performance with the proposed hardware, which help us to avoid redundant calculation and to execute the process in parallel using the bit level parallelism. It is proved that the pro-posed method spent 4 execution cycles to decide whether a pixel is corner or not in worst case. It can be implemented with simple hardware, which have an array consisted of 9 AND gates. We optimize this hardware using carry look-ahead adder to reduce the critical path. Proposed hardware, consisted of 300 logic gates, can finish a segment test of a pixel for 3~4 cycles. So, we achieves 12~15 frames/s with a resolution 1920 X 1080(HD image) at the operation frequency of 100MHz. If we consider all additional logic, the hardware uses only 4500 logic gates. It can be adopted to embedded system or mobile environment, since it is small. The other works have more than 500K gates. Proposed hardware is coded and synthesized at 400MHz. Furthermore, the operation frequency is widen to 1.75GHz by full custom hardware design and delay optimization technique. Previous works can use only dedicated hardware because of limited operation frequency, ranged from 100MHz to 200MHz. But proposed work can be widely adopted even in high-end GPU and commercial mobile CPU, which have the operation frequency as 1.5GHz and 1.2GHz respectively. It has good scalability of performance, since the performance increases linearly according to its operation frequency and the number of cores. Proposed method detect corners more energy efficiently than previous works. The normalized performance by gate count to estimate power efficiency is more 50 times larger than the previous works since the algorithm is simple and it has small size. The results show that the proposed method gives a practical solution to detect corners from high resolution video stream on mobile and GPGPU environment.

코너 디텍터는 object tracking, augmented reality, SLAM(simultaneous localization and mappling), image matching and recognition, image stitching과 같은 많은 비전 어플리케이션의 기반이 되는 알고리즘이다. 하드웨어의 발전에도 불구하고 코너 디텍터가 요구하는 연산량이 많아서 고해상도의 비디오 스트림을 실시간으로 처리하기가 아직까지는 힘든 것이 사실이다. 뿐만 아니라 스마트 폰이 대중화되면서 모바일 환경에서의 실시간 비전 어플리케이션에 대한 관심이 높아지고 있다. 반면 모바일 환경에서는 배터리에서 기인하는 전력 문제, CPU의 성능 저하, 캐시 메모리 용량 부족 등의 하드웨어적인 제약 사항이 많아서 desktop 중심의 최적화 기법은 적당하지 않다. 따라서 모바일 환경에 적합한 코너 디텍터에 대한 연구가 필요하다. 이번 연구에서는 CPU에 적합한 FAST 알고리즘을 기반으로 해서 새로운 코너 디텍터를 제안했다. 우리는 크게 4가지 접근 방식을 통해 높은 성능과 적은 파워를 얻었다. 패턴 매칭을 기반으로 한 새로운 FAST corner detection 방법, smart prediction과 bit level parallelism 가속이 가능한 하드웨어 제안, 그리고 carry look-ahead adder를 이용한 delay optimization을 수행했다. 기존에 발표된 FAST algorithm의 segment test는 원 위의 16개의 점 중에서 원의 중심에 있는 점의 밝기 값에 비해 밝은 점이 연속으로 9개 이상 발견되었을 때 혹은 어두운 점이 연속 9개 이상 발견되었을 때 코너로 인지한다. Decision tree 방식은 학습을 통해서 tree 형태로 분기문을 만들고, 이를 이용해서 최소한의 비교로 코너인지 아닌지를 분별해낸다. CPU를 기반으로 한 하드웨어에서는 높은 성능을 얻기에 적합하다. 하지만 decision tree는 20KB이상의 instruction memory를 요구하고 잦은 branch를 필요로 하기 때문에 mobile CPU나 GPGPU 구조에 적용하면 오히려 속도를 떨어뜨리는 원인이 된다. 모바일 CPU에서 FAST algorithm을 적용했을 때 320X240 해상도에서 11~13 frame/s정도의 성능을 나타낸다. 제안된 방법은Decision tree를 사용하지 않고 detector를 가속하기 위해서 string searching 방식을 응용했다. String searching 은 긴 text에서 특정 string을 찾기 위한 알고리즘이다. Segment test에서 원 위의 16개 점의 값이 주기적으로 반복되는 1차원 배열을 text라 하고 연속적으로 밝거나 어두운 9개의 1차원 배열을 string으로 가정하면, FAST corner detection algorithm을 string searching 문제로 대응시킬 수 있다. 이 방식은 CPU를 기반으로 한 system에서 decision tree에 비해 성능이 떨어진다. 그 이유는 decision tree가 최소한의 비교만을 수행하는 반면 string searching 방식은 중복되는 연산이 많기 때문이다. 하지만 제안된 특성을 가지는 하드웨어의 도움을 받으면, 불필요한 계산을 최대한 피하고, bit level 병렬화를 극대화해서 성능을 끌어올릴 수 있다. 제안된 방식은 worst case에서 하나의 코너를 판단하는데 4cycle이 소요됨이 증명된다. 이 방법은 9개의 AND gate를 array로 만든 간단한 하드웨어로 구현이 가능하다. 그런데 이 구조에서는 delay 가 크게 문제가 될 수 있어서 carry look-ahead adder를 응용해서 최적화했다. 제안한 방식은 300 gate의 작은 하드웨어로 3~4 cycle에 하나의 corner를 test할 수 있기 때문에 100MHz의 동작 주파수를 가지는 하드웨어에서 HD 급 이미지에 대해12~15Frame/s 의 성능이 기대된다. 컨트롤과 관련된 추가적인 로직을 포함하더라도 4500개의 logic gate로 구성이 가능할 만큼 작다. 기존에 구현된 하드웨어들이 50만 gate 이상의 gate로 만들어져서 하나의 dedicated chip으로 밖에 쓰일 수 없었던 반면 제안된 하드웨어는 작은 크기로 인해 embedded system이나 mobile 환경에 부담이 없이 적용할 수 있다. 제안된 하드웨어는 400MHz에서 합성이 가능하다. 더 나아가 레이아웃과 delay optimization을 통해 하드웨어의 동작 주파수를 1.75GHz까지 늘렸다. 기존에 발표된 하드웨어들은 100~200Mhz의 제한된 동작 주파수 때문에 응용 범위가 극히 적었던 반면 제안된 하드웨어는 상용 GPU(1.5GHz)나 mobile CPU(1.2GHz)에도 적용이 가능하다. 주파수와 코어 개수에 따라서 선형적으로 성능이 증가하기 때문에 application에 맞는 성능을 조절하기에도 유연하다. 제안한 방식은 기존의 방식에 비해서 에너지 효율적으로 계산을 한다. 발표된 연구 결과들과 비교했을 때 성능만 놓고 보면 최고라고 보기는 힘들다. 다만, 사용한 gate가 워낙 적기 때문에 소모되는 파워가 적고 칩에서 차지하는 면적이 작다. 따라서 소모되는 에너지 대비 성능 혹은 면적 대비 성능의 관점에서 본다면 기존의 결과들이 비해서 50배 이상 좋다. 결과적으로 제안된 방법은 mobile 환경과 GPGPU 환경에서 고해상도 비디오 스트림에 대해 코너 디텍션을 하기 위한, 실질적인 해결책을 제시한다.

서지기타정보

서지기타정보
청구기호	{MEE 11169
형태사항	vi, 46 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 박준석 지도교수의 영문표기 : Lee-sup Km 지도교수의 한글표기 : 김이섭
학위논문	학위논문(석사) - 한국과학기술원 : 전기및전자공학과,
서지주기	References : p. 42-43

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서