서지주요정보
Variable grouping by CART and combination of marginal models for large scale modelling = CART를 활용한 변수 군집화와 주변 모형 결합에 의한 거대모형 개발
서명 / 저자 Variable grouping by CART and combination of marginal models for large scale modelling = CART를 활용한 변수 군집화와 주변 모형 결합에 의한 거대모형 개발 / Yoon-Jung Kim.
발행사항 [대전 : 한국과학기술원, 2005].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8015950

소장위치/청구기호

학술문화관(문화관) 보존서고

MMA 05009

휴대폰 전송

도서상태

이용가능

대출가능

반납예정일

리뷰정보

초록정보

These days we are exposed to huge data, some of which has relations each other. But it is different to find them. Data mining is a series of procedure which extracts information by exploring and modelling the relationships within such data and CART is one of the most popular tools for Data Mining. It develops for us a classification tree for categorical response variables and a regression tree for continuous response variables. The trees are developed in such a way that predictor variables are selected one after another in the order of the information amount that a predictor variable has for the response variable, where the information amount is computed conditional on the outcome of the predictor variables that are already selected in the tree construction process. Our goal in this thesis is finding a model structure for a large set of random variables, some of which are continuous and the rest are categorical. While CART is useful for a supervised learning, log-linear modelling is an unsupervised learning. We use CART at an initial stage of large scale modelling for the purpose of selecting subgroups of the random variables that are involved in the whole data set. Since CART is available to a data set of many random variables of mixed type, easy to apply, and easy to interpret the result of analysis, we can easily group the variables so that the variables in a group are associated highly with each other. Once groups of random variables are obtained, we then apply log-linear modelling to individual groups and obtain graphical log-linear models whose model structures are rep-resentable via graphs of vertices and edges. From each graphical model, we find particular types of graph separators called "prime separators", which are each defined as a graph separator which separates cliques or irreducible cycles. The prime separators have a nice property that they remain as prime separators both in a graphical model and its marginal model. This property is used in combining marginal models of a graphical log-linear model. It is found out that the grouping of random variables affects mostly the whole modelling procedure. Any edge connecting a pair of random variables has a high probability of missing in the combined model if there is no group of variables which contain both of the variables. To get back these missing edges, we need a further grouping of variables and build a marginal model for the set of variables which contain both of such pair of variables corresponding to missing edges. The approach as proposed in this thesis is applied to a simulated data of 100 random variables, 80 of which is binary and the rest continuous. We categorized the continuous variables into binary or 4-level categorical variables. The approach came up with a model which detected most of the edges that lie in the true model with some overly added edges that can be removed by an extended procedure of marginal modelling.

데이터 마이닝은 데이터간의 숨겨진 관계, 또는 너무 복잡하여 잘들어나지 않는 관계를 찾아내고 이 관계를 바탕으로 앞날을 예측하는 기술이다. CART 알고리즘은 변수간의 선형성이나 연관성 유ㆍ무에 상관없이, 독립변수에 영향을 미치는 주요 종속 변수를 차례대로 알려주기 때문에 데이터 마이닝의 주요 분석 도구로 활용할 수 있다. 이 논문의 목적은 범주형 자료와 연속형 자료를 모두 변수로 갖는 거대모형의 구조를 찾아내는 데에 있다. CART를 활용하면 변수들간의 관계를 쉽고 빠르게 파악하여 부분 별로 군집화시킬 수 있기 때문에, 이 논문에서는 CART를 통해 얻는 각각의 군집을 로그 선형 분석을 거쳐 주변 그래프 모형을 얻고, 각 모형마다 “prime separator”를 찾아내어 이를 골격으로 주변 모형을 결합, 거대 모형을 개발하는 방법을 제시하였다. 마지막으로 직접 자료에 적용시켜 봄으로써 이 모형탐색 방법의 효율성을 확인하였고 그에 따른 문제점을 제시한다.

서지기타정보

서지기타정보
청구기호 {MMA 05009
형태사항 vi, 50 p. : 삽도 ; 26 cm
언어 영어
일반주기 Appendix : A, graphical terminology. - B, CART result
저자명의 한글표기 : 김윤정
지도교수의 영문표기 : Sung-Ho Kim
지도교수의 한글표기 : 김성호
학위논문 학위논문(석사) - 한국과학기술원 : 응용수학전공,
서지주기 Reference : p. 48-50
주제 CART
Variable grouping
Large scale modelling
Continuous and categrical variable
separator
카트
변수 군집화
거대 모델링
연속과 이산 변수
분리자
QR CODE qr code