서지주요정보
확률벡터와 메타 범주를 이용한 최적 문서범주화 모델 = Optimizing for text categorization using probability vector and meta category
서명 / 저자 확률벡터와 메타 범주를 이용한 최적 문서범주화 모델 = Optimizing for text categorization using probability vector and meta category / 권오욱.
발행사항 [대전 : 한국과학기술원, 1995].
Online Access 제한공개(로그인 후 원문보기 가능)원문

소장정보

등록번호

8005588

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 95007

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

등록번호

9001742

소장위치/청구기호

서울 학위논문 서가

MCS 95007 c. 2

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Traditionally, the concept of document has been represented using multiple pre-defined categories. Most previous models for text categorization did not consider the similarity between compound category, which consists of multiple categories, and a document. Instead the models computed the similarity between each category and a document to rank candidate categories, and then assigned one or more top-ranking categories to the document using experimental and statistical criteria. This thesis presents a new model for text categorization that associates compound category with a document to enhance both recall and precision. A compound category is seen as a "meta category". In the model, document, category, and meta category are represented by probability vectors that are composed of terms with weights for the terms. Probability vector of category or meta category can be incrementally learned from sample documents. The model selects most relevant meta category to a document using cross entropy of meta category and the document. The model was implemented using simulated annealing, and an experiment was carried out on Reuter-22,173 corpus. The experimental results show micro-averaged recall 69% and micro-averaged precision 72% in sparse training data set, and micro-averaged recall 89% and micro-averaged precision 84% in sufficient training data set. Recall and precision of the result are higher than them in Lewis' model which used Bayesian probability with experimental and statistical criteria.

서지기타정보

서지기타정보
청구기호 {MCS 95007
형태사항 iv, 45 p. : 삽화 ; 26 cm
언어 한국어
일반주기 저자명의 영문표기 : Oh-Woog Kwon
지도교수의 한글표기 : 최기선
지도교수의 영문표기 : Key-Sun Choi
학위논문 학위논문(석사) - 한국과학기술원 : 전산학과,
서지주기 참고문헌 : p. 41-45
주제 Categorization (Linguistics)
Text processing (Computer science)
Statistical methods.
문헌 검색. --과학기술용어시소러스
범주화. --과학기술용어시소러스
확률 모델. --과학기술용어시소러스
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서