서지주요정보
층화 샘플링을 이용한 결정 트리의 효율적인 구성 방법 = Enfficient construction of decision trees using stratified sampling
서명 / 저자 층화 샘플링을 이용한 결정 트리의 효율적인 구성 방법 = Enfficient construction of decision trees using stratified sampling / 정문영.
발행사항 [대전 : 한국과학기술원, 2001].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8012009

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 01039

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9007624

소장위치/청구기호

서울 학위논문 서가

MCS 01039 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Classification is an important data mining problem. Given a training database of records, each tagged with a class label, the goal of classification is to build a concise model that can be used to predict the class label of future, unlabeled records. A number of popular classifiers construct decision trees to generate class models. Generally, current algorithms to construct decision trees, including main-memory algorithms, make several scans over the training databases. Given the computation-intensive nature of decision tree construction, algorithms for efficiently constructing an approximate decision tree are of crucial importance. In this paper, we propose a novel solution to the problem of efficient decision tree construction based on the idea of stratified sampling. We observe that a decision tree structure provides a natural, effective collection of strata that can significantly improve sampling accuracy compared to simple random sampling. By employing random samples drawn from leaves of the tree to construct the new tree, our algorithm is able to construct trees very quickly with higher accuracy than those constructed using a uniform random sample of the same size (drawn from the entire database). We present theoretical results that demonstrate the superiority of our stratified sampling method for estimating class proportions, a critical element for the construction of accurate decision trees. Our theoretical analysis is validated by extensive experimental results with both real-life data sets and synthetic data sets, demonstrating the effectiveness of our proposed scheme.

서지기타정보

서지기타정보
청구기호 {MCS 01039
형태사항 [v], 45 p. : 삽화 ; 26 cm
언어 한국어
일반주기 부록 : A, 합성 데이타 집합의 분류 함수
저자명의 영문표기 : Moon-Young Chung
지도교수의 한글표기 : 심규석
지도교수의 영문표기 : Kyu-Seok Shim
학위논문 학위논문(석사) - 한국과학기술원 : 전산학전공,
서지주기 참고문헌 : p. 37-41
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서