한국과학기술원 도서관

서지주요정보
커뮤니티 제한 검색을 위한 웹 크롤링 및 PageRank 계산 = Web crawling and PageRank calculation for community-limited search
서명 / 저자	커뮤니티 제한 검색을 위한 웹 크롤링 및 PageRank 계산 = Web crawling and PageRank calculation for community-limited search / 김계정.
발행사항	[대전 : 한국과학기술원, 2005].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8016266

소장위치/청구기호

학술문화관(문화관) 보존서고

MCS 05004

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Recently, there have been a number of efforts on limiting the scope of searching from the web. Representative work includes the techniques of site-limited searching, focused crawling, and web clustering. These techniques, however, suffer from some drawbacks. Site-limited searching cannot search the semantically-related web pages. Focused crawling requires crawling at query time. Web clustering incurs much overhead since it performs global clustering over a large number of web pages after crawling is completed. To solve these drawbacks, we first propose the notion of community-limited searching, which allows us to limit the scope of searching to a specific community. A community is defined as a collection of semantically-related web pages. For identifying communities, we then propose a technique of the cluster crawler. The novelty of the cluster crawler is incremental clustering, which drastically saves the clustering cost. Community-limited searching has the following advantages: searching the semantically-related web pages is possible; no crawling is required at query time; and overhead of clustering web pages is minimal. Finally, we propose a two-step method for approximately computing the PageRank using communities. The proposed method first computes the local PageRank values within a community and, next, the global PageRank values based on the local ones. Experimental results show that our method reduces the estimation error to as little as 59% of that of the method by Wang et al., which uses sites in place of communities.

최근 웹 검색 분야에서는 검색 범위를 한정하기 위한 기법들이 많이 연구되어 왔으며, 대표적인 연구로는 제한 검색, focused crawling, web clustering 등이 있다. 그러나 이들 방법들은 다음과 같은 문제점이 있다. 제한 검색은 검색 범위를 의미적으로 관련된 사이트들로 제한할 수 없으며, focused crawling은 질의 시점에 크롤링해야 한다. Web clustering은 많은 웹 페이지들을 대상으로 클러스터링하기 위한 오버헤드가 크다. 이러한 문제점들을 해결하기 위하여 본 논문에서는 community 제한 검색의 개념을 제안한다. Community 제한 검색은 검색 범위를 특정 community로 제한하여 검색 하는 방법으로, community는 링크 기반의 클러스터링을 통해 구해지는 의미적으로 관련된 사이트들의 집합으로 정의된다. 그리고 community를 구하는 방법으로서 cluster crawler를 제안한다. Cluster crawler는 크롤링 중에 웹 페이지들을 점증적(incremental)으로 클러스터링하기 때문에 클러스터링 비용을 획기적으로 줄일 수 있다. Community 제한 검색은 의미적으로 관련된 웹 페이지들로 검색 범위를 제한할 수 있고, 질의 시점에 크롤링하지 않으며, 클러스터링의 오버헤드가 최소화되는 이점이 있다. 마지막으로, 본 논문에서는 community를 이용하여 PageRank를 2단계로 계산하는 방법을 제안한다. 제안된 방법은 첫 번째 과정에서 community 단위로 지역적(local)으로 PageRank를 계산한 후, 두 번째 과정에서 이를 바탕으로 전역적(global)으로 PageRank를 계산한다. 제안된 방법은 Wang에 의해 제안된 방법에 비해 PageRank 근사치의 오차를 59%정도로 줄일 수 있다.

서지기타정보

서지기타정보
청구기호	{MCS 05004
형태사항	vii, 49 p. : 삽화 ; 26 cm
언어	한국어
일반주기	저자명의 영문표기 : Gye-Jeong Kim 지도교수의 한글표기 : 황규영 지도교수의 영문표기 : Kyu-Young Whang
학위논문	학위논문(석사) - 한국과학기술원 : 전산학전공,
서지주기	참고문헌 : p. 46-49

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서