한국과학기술원 도서관

서지주요정보
Classification algorithms via integer optimization = 정수최적화를 이용한 분류 해법
서명 / 저자	Classification algorithms via integer optimization = 정수최적화를 이용한 분류 해법 / Ja-Young Kang.
발행사항	[대전 : 한국과학기술원, 2010].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8021943

소장위치/청구기호

학술문화관(문화관) 보존서고

DIE 10009

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

In this thesis, classification problems are considered. There are various classification algorithms, but they have some weak points in their modeling and it is generally accepted that no algorithm is superior to others on all data sets. Therefore, it is still worthwhile to develop new classification algorithms. Due to the heavy burden in computation of integer optimization, integer optimization-based classification methods had not been seriously investigated. However, since the late 2000s, novel classification approaches via integer optimization have been proposed with some ideas to overcome the computational issues. These new algorithms are different from the previous classification algorithms via integer optimization in two points. First, these new algorithms have excellent modeling power. Most previous approaches generate a single hyperplane to separate two classes and their application is limited to the cases of which two classes can be mostly separated by a single hyperplane. However, these new algorithms make groups of points of the same class and determine non-overlapping group regions and finally they predict class labels of unknown points based on the group regions. Therefore, they can construct non-linear decision boundaries between classes and can model various data distributions. Second, these new algorithms alleviate the computational burden of integer optimization with some ideas. Until now, two ideas are presented to reduce the computational load. One is to make clusters of points and assign clusters, not individual points, to groups. By doing this, the size of the integer optimization problem can be reduced and the computational load can be alleviated. The other one is to start with a simplified integer optimization problem and iteratively modify and solve it. By using this iterative solution approach, the structural issue which increases the computational burden can be avoided. In this study, we provide a unified view on the classification algorithms via integer optimization to compare them with the other classification approaches and propose two classification algorithms via integer optimizations to improve the previous ones. In Chapter 2, Clustering and Optimization-based Algorithm(COA) is proposed for the binary classification problem. COA was firstly presented by Bertsimas and Shioda(2006). It was intended to improve the generalization performance via optimization and to improve the computational performance via clustering. Its main training procedure consists of three steps. The first step is to make clusters of points by incremently merging neighbor points of the same class. The second step is to assign clusters to groups via integer optimization. The third step is to determine non-overlapping group regions. After the training procedure, non-overlapping regions for a class are obtained and unknown points can be classified based on the regions. COA has potential to model various data distribution, but the previous algorithm has some limitations in modeling since its clustering algorithm may provide inapproapriate clustering results. Therefore, we propose a clustering algorithm to overcome this issue of the previous COA. Computational tests were preformed to compare the proposed algorithm and other algorithms including the previous COA and the results show the effectiveness of the proposed algorithm. In Chapter 3, Iterative integer Optimizabion-based Algorithm(IOA) is proposed for the multi-class classification problem. IOA was firstly presented by Xu and Papageorgiou(2009). It was intended to improve the generalization performance via integer optimization and to improve the computational performance via iterative solution approach. Similar to COA, IOA determines group assignment of points via integer optimization and depending on the data distributions, multiple groups for each class should be considered to model data distributions properly. But, when we consider multiple groups simultaneously, the corresponding integer optimization problem has a structural issue which increases computational burden unnecessarily. Therefore, iterative solution approach was adapted to avoid this difficulty. The previous IOA improved the compurational performance of integer optimization-based classification algorithm by adapting iterative solution approach, but it restricts the shape of group regions as hyper-boxes. Therefore, to overcome this representational limitations of the previous IOA, we propose another IOA, which uses new Mixed-Integer Programming(MIP) models for the group assignment and a new classifying procedure. To compare the proposed IOA with other algorithms including the previous IOA, computational tests were conducted and the results show the effectiveness of the proposed algorithm.

본 논문에서는 분류 문제(classification problem)에 대한 두 가지 정수최적화 기반 해법을 제안한다. 일반적으로 정수최적화는 NP-hard이므로 문제 크기가 큰 분류 문제에는 적합하지 않다는 인식이 있었다. 그러므로 분류 문제에 정수최적화를 적용한 연구가 충분히 이루어지지 않았었는데 2000년대 후반에 들어서면서 정수최적화를 적용한 새로운 분류 해법들이 발표되기 시작했다. 이 새로운 해법들은 기존의 정수최적화 해법들에 비해 두 가지 측면에서 차별화된다. 첫번째는 이들의 우수한 모델링 능력을 들 수 있다. 기존의 정수최적화 기반 해법들이 두개의 클래스(class)를 하나의 초평면(hyperplane)으로 구별하는 단순한 형태의 분류기(classifier)를 생성함으로써 그 모델링 능력의 한계가 두드러졌다면, 새로운 정수최적화 기반 해법들은 같은 클래스에 속한 포인트(point)들을 적절하게 그룹(group)에 할당하고 그룹 영역을 정의한 후 이에 근거하여 예측함으로써 비선형 경계를 요구하는 다양한 데이터 분포에 대해서도 적절하게 모델링하는 능력을 갖추고 있다. 두번째로는 새로운 정수최적화 기반 해법들이 정수최적화의 계산부담을 줄이기 위한 아이디어를 적용함으로써 그 적용 범위를 넓혔다는 것을 들 수 있다. 현재까지 발표된 아이디어는 두 가지인데 하나는 군집화(clustering) 한후 포인트 대신 클러스터(cluster)을 그룹에 할당함으로써 정수최적화의 문제 크기를 줄이는 것이고, 다른 하나는 반복적인 접근 방법(iterative solution approach)을 통하여 정수최적화 문제가 갖는 구조적인 어려움을 피하는 것이다. 본 논문에서는 이러한 정수최적화 기반 분류 해법들이 정수최적화를 사용하지 않는 기존의 해법들과 비교할 때에 갖는 의미를 정리하고 또한 이들의 성능을 향상시키는 새로운 정수최적화 기반 분류 해법을 제안하였다. 2장에서는 두 개의 클래스를 고려한 분류 문제를 다루며 이에 대하여 군집화와 최적화 기반 분류 해법(Clustering and Optimization-based Algorithm;COA)을 제안하였다. COA는 Bertsimas와 Shioda\cite{BS}에 의해 처음으로 개발된 방법으로써 최적화기법을 통하여 예측력을 높이고, 군집화(clustering)를 통하여 정수최적화의 계산부담을 줄이고자하는 새로운 분류 알고리듬이다. COA의 트레이닝 과정(training procedure)는 크게 세 단계로 구성된다. 첫번째 단계에서는 같은 클래스에 속하며 가까이 위치한 포인트들을 클러스터로 군집화하고, 두번째 단계에서는 한 클래스의 클러스터들을 적절한 수의 그룹에 할당한다. 세번째 단계에서는 그룹과 다른 클래스의 클러스터 사이를 가르는 초평면을 결정하여 각 그룹의 영역을 정의한다. 이러한 트레이닝 과정을 거치게 되면 두 클래스 중 한 클래스에 대하여 서로 겹치지 않는 여러 개의 그룹 영역이 결정되는데 클래스가 주어지지 않은 포인트에 대한 분류는 이러한 그룹 영역에 기반하여 이루어지게 된다. 그런데 Bertsimas와 Shioda\cite{BS}가 제안한 기존의 군집화 알고리듬은 서로 다른 클래스의 영역을 완벽하게 분할하지 못할 가능성을 가지고 있다. 그러므로 본 논문에서는 이러한 모델링 상의 취약점을 극복하는 새로운 군집화 알고리듬을 제안하였다. 새로운 군집화 알고리듬을 탑재한 COA의 성능을 비교 평가하기 위해 실제 데이터와 가상 데이터에 대하여 실험을 수행하였다. 그 결과 새로운 COA는 최적화 기반 해법인 기존의 COA, SVM과 근사하거나 더 좋은 예측 성능을 보여주었고, 휴리스틱인 의사결정트리(decision trees)에 비해서는 월등히 좋은 예측 성능을 보여주었다. 또한 기존의 COA와 SVM이 데이터에 따라 예측력이 크게 떨어지는 경우가 있었던 것에 비해 새로운 COA는 다양한 데이터에 대해 안정적으로 우수한 예측 성능을 보여주었다. 이것은 새로운 군집화 알고리듬으로 인하여 COA의 모델링 능력이 향상되었음을 의미한다. 3장에서는 다수의 클래스를 고려한 분류 문제를 다루며 이에 대하여 반복적으로 정수최적화를 적용하는 분류 해법(Iterative integer Optimization-based Algorithm; IOA)를 제안하였다. IOA는 Xu와 Papageorgiou$\cite{XuPapa}$에 의해 처음으로 개발된 방법으로써 최적화 기법을 통하여 예측력을 높이고, 반복적 접근 방식(iterative solution approach)을 통하여 정수최적화의 계산부담을 줄이고자 하는 새로운 분류 알고리듬이다. IOA는 다양한 데이터 분포를 적절하게 모델링하기 위해서는 정수최적화를 통하여 포인트들을 적절한 수의 그룹에 할당하는데 동시에 여러 개의 그룹을 고려하여 정수계획모형을 풀면 모형의 구조적인 특성으로 인하여 불필요한 계산부담을 떠앉게 된다. 이러한 문제를 해결하기 위하여 Xu와 Papageorgiou\cite[27]는 한번에 한 그룹씩 추가하여 정수최적화를 적용하되 기존 그룹에 대한 결과는 고정시키는 반복적 접근 방법을 적용하였는데 이는 IOA를 다른 정수최적화 기반 해법과 구별해주는 특징이라 할 수 있다. 이러한 IOA는 COA와 유사하게 그룹 영역에 기반하여 새로운 포인트의 클래스를 예측한다. 그런데 기존의 IOA는 그룹 영역의 형태를 하이퍼박스(hyper-box)로 제한하고 있으며 이로 인하여 IOA의 성능이 떨어질 수 있다. 그러므로 본 논문에서는 기존 IOA의 이러한 취약점을 극복하기 위하여 포인트의 그룹 할당을 위한 새로운 정수계획모형과 새로운 분류 과정(classifying procedure)를 개발하였다. 새로운 IOA의 성능 평가를 위하여 수행한 실험 결과 새로운 정수계획모형이 기존의 모형에 비하여 더 안정적으로 그룹 할당 결과를 제공하는 것으로 나타났으며 새로운 IOA는 기존의 IOA보다 우수한 예측 성능을 보여주었다. 실험에는 두 개의 IOA이외에도 일곱 개의 다른 알고리듬들이 테스트되었는데 제안하는 IOA는 SVM, Logistic Regression, Multilayer Perceptron과 더불어 우수한 성능을 보여주는 알고리즘 군을 이루었으며 특정 데이터에서는 가장 우수한 예측 성능을 보여주기도 하였다. 2장과 3장의 결과를 정리하자면, 제안하는 정수최적화 기반 분류 해법들은 기존의 정수최적화 기반 분류 해법들보다 더 우수한 성능을 나타내며, 정수최적화를 사용하지 않은 다른 분류 해법들과 비교할 때에 다양한 데이터 분포를 유연하게 모델링하여 우수한 예측력을 보인다는 측면에서 장점을 갖는 것을 확인할 수 있었다.

서지기타정보

서지기타정보
청구기호	{DIE 10009
형태사항	vii, 65 p. : 삽화 ; 26 cm
언어	영어
일반주기	저자명의 한글표기 : 강자영 지도교수의 영문표기 : Sung-Soo Park 지도교수의 한글표기 : 박성수
학위논문	학위논문(박사) - 한국과학기술원 : 산업및시스템공학과,
서지주기	References: p. 63-65

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서