한국과학기술원 도서관

서지주요정보
Recovering valuable information from large dataset = 빅데이터로부터 가치 있는 정보를 복구하는 기법
서명 / 저자	Recovering valuable information from large dataset = 빅데이터로부터 가치 있는 정보를 복구하는 기법 / Daesung Kim.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8040303

소장위치/청구기호

학술문화관(도서관)2층 학위논문

DEE 23032

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Recovering valuable information from large datasets has become a major challenge in this era of data explosion. Large datasets seem to be random at first glance, but they are usually explainable through much small number of features. The goal of this thesis is to propose some computationally efficient, mathematical algorithms that recover the underlying signals or features. We theoretically analyze the algorithms under some probabilistic models, and we verify our findings with some simulations. Three different problems, binary classification with XOR queries, hypergraph clustering, matrix completion, are considered in this thesis. For the first problem, we provide a sharp threshold on the number of queries required to recover the binary labels, and we propose an algorithm based on belief propagation that achieves this. We assume a practical scenario where the error probability changes depending on the worker, and the proposed algorithm achieves the bound even without the knowledge of the worker reliability. For the hypergraph clustering, we first estimate the adjacency relation between nodes through convex optimization, and we recover the communities by clustering the rows of estimated adjacency matrix. Lastly, we studied nonconvex optimization and gradient descent in matrix completion. We prove that gradient descent converges to the global minima even in the presence of local minima or saddle points if accompanied with small random initialization.

데이터가 폭발적으로 증가하는 최근, 빅데이터로부터 가치 있는 정보를 복구하는 일은 아주 중요해졌다. 빅데이터는 별도의 분석 없이는 무작위 데이터처럼 보이기도 하지만, 보통 훨씬 작은 수의 특징으로부터 생성된다. 이 논문에서는 빅데이터 속에 숨어 있는 중요한 특징들을 복구하기 위한 낮은 복잡도의 수학적 알고리즘들을 제안하고자 한다. 알고리즘들은 확률적 모델을 통해 이론적으로 분석되었으며, 이론적 결과들을 뒷받침하는 시뮬레이션 결과도 제시되었다. 구체적으로 세 가지 문제, XOR 질문을 통한 이진 분류, 하이퍼그래프 군집화, 행렬 채우기가 연구되었다. 첫 번째 문제에 대해서는 이진 레이블을 복구하는 데 필요한 최소한의 질문 개수를 분석하였고, 이것을 달성하는 알고리즘을 신뢰 전파 기법을 통해 제안하였다. 질문에 대답하는 사람에 따라 에러 확률이 달라지는 현실적인 모델을 사용하였으며, 사람들의 에러 확률에 대한 정보 없이도 제안된 알고리즘은 최소한의 질문 개수로 레이블을 복구할 수 있었다. 두 번째 문제에 대해서는 노드 간의 인접 정보가 먼저 볼록 최적화를 통해 예측되었으며, 인접 행렬의 행들을 군집화함으로써 각 노드의 군집 정보를 복구하였다. 마지막으로 행렬 채우기 문제에 대해서는 비볼록 최적화와 경사 하강법이 사용되었다. 작은 무작위 초기화가 같이 사용될 경우, 경사 하강법이 극소점과 안장점이 존재함에도 불구하고 전역 최소점으로 수렴한다는 것을 증명하였다.

서지기타정보

서지기타정보
청구기호	{DEE 23032
형태사항	v, 133 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 김대성 지도교수의 영문표기 : Hye Won Chung 지도교수의 한글표기 : 정혜원 수록잡지명 : "Binary Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm". IEEE Transactions on Information Theory, vol. 67, no. 7, pp. 4588-4612(2021) 수록잡지명 : "Robust Hypergraph Clustering via Convex Relaxation of Truncated MLE". IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 613-631(2020) Including appendix
학위논문	학위논문(박사) - 한국과학기술원 : 전기및전자공학부,
서지주기	References : p. 123-131
주제	XOR query Hypergraph clustering Matrix completion Belief propagation Convex optimization Nonconvex optimization Gradient descent XOR 질문 하이퍼그래프 군집화 행렬 채우기 신뢰 전파 볼록 최적화 비볼록 최적화 경사 하강법

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서