서지주요정보
Query enhancement for patent prior art search with keyterm dependency relations and semantic tags = Query enhancement for patent prior art search with keyterm dependency relations and semantic tags
서명 / 저자 Query enhancement for patent prior art search with keyterm dependency relations and semantic tags = Query enhancement for patent prior art search with keyterm dependency relations and semantic tags / Khanh Ly Nguyen.
발행사항 [대전 : 한국과학기술원, 2012].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8023771

소장위치/청구기호

학술문화관(문화관) 보존서고

MICE 12002

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

The increasing number of applications and granted patents constantly leads to critical demands for patent search. Prior art search is one of the most common patent searches and its goal is to find patent documents that constitute prior art to a given patent. Current patent searches are mostly keyword-based systems and due to com-plex structures and length of patent documents, they do not perform very well. In this research, we propose a new query formulation method for patent prior art search by identifying the most discriminate terms using keyword dependency relations. Instead of using only a separate field, our intention is to select the most significant field or combination of fields to find the best one for query formulation. Furthermore, we concentrated on appropriating number of key terms that should be included in the query by performing experiments with different query size. Specifically, our work is different from all previously reported ones in a way that instead of using only keyterm extraction based on dependency relations, our idea is to combine the keyterm extraction with semantic tags; which are identified from patent documents to find prior art patents with similar IPC codes. And for prior art search evaluation, we applied the re-ranking method based on the IPC classification codes which were assigned to the patent document since this method can aid in the identification of prior art patents without extra cost of expert judgments and incompleteness of citations. In this work, 36 experiments were conducted, and the results show that the proposed method achieves significant improvement over the baseline. The results indicate that: 1) For query formulation from a separate field, e.g. query formulated by top 10 terms from Abstract, 18% improvement of Sub-class, 17% improvement of Main-group, and 13% improvement of Sub-group compared to those of the baseline method can be obtained; 2) For query formulation from combined fields, e.g. query formulated by top 10 terms from Abstract and top 10 terms from Claims, we can achieve 16% improvement of Sub-class, 16% improvement of Main-group, and 13% improvement of Sub-group compared to those of the baseline method; 3) For query formulation combined with semantic tags, e.g. for Abstract, 46% improvement of Sub-class, 42% improvement of Main-group, and 45% improvement of Sub-group compared to those of the baseline method can be achieved. Experiment results also show that extracting terms from Description gave the best performance over all other fields (e.g. Abstract, Claims field). The reason for this is the Description field contains specification about what a process or method of the invention is and how it differs from previous patents and technology. By identifying IFPS terms from Description, we can achieve better performance if IFPS is used as a query itself or and the best is to use in combination with query selection by KDR since IFPS includes information related to the areas a patent belong to which can be very helpful to identify the IPC sub-classes of a patent document (IF) and it includes Problems/Solutions (PS) which related to limitations of previous patents and effects of present invention that may help to identify IPC main-groups or sub-groups of the query patent. We also show the effectiveness of IFPS terms when IFPS is combined with KDR terms or tf*idf. When IFPS is added we gain much more improvement that shows a good strategy for query expansion. Our experiments show that terms about details of method or process of the invention are more significant for query formulation from Abstract or Claims; while terms about limitations or effects are more significant for query formulation from Description.

The increasing number of applications and granted patents constantly leads to critical demands for patent search. Prior art search is one of the most common patent searches and its goal is to find patent documents that constitute prior art to a given patent. Current patent searches are mostly keyword-based systems and due to com-plex structures and length of patent documents, they do not perform very well. In this research, we propose a new query formulation method for patent prior art search by identifying the most discriminate terms using keyword dependency relations. Instead of using only a separate field, our intention is to select the most significant field or combination of fields to find the best one for query formulation. Furthermore, we concentrated on appropriating number of key terms that should be included in the query by performing experiments with different query size. Specifically, our work is different from all previously reported ones in a way that instead of using only keyterm extraction based on dependency relations, our idea is to combine the keyterm extraction with semantic tags; which are identified from patent documents to find prior art patents with similar IPC codes. And for prior art search evaluation, we applied the re-ranking method based on the IPC classification codes which were assigned to the patent document since this method can aid in the identification of prior art patents without extra cost of expert judgments and incompleteness of citations. In this work, 36 experiments were conducted, and the results show that the proposed method achieves significant improvement over the baseline. The results indicate that: 1) For query formulation from a separate field, e.g. query formulated by top 10 terms from Abstract, 18% improvement of Sub-class, 17% improvement of Main-group, and 13% improvement of Sub-group compared to those of the baseline method can be obtained; 2) For query formulation from combined fields, e.g. query formulated by top 10 terms from Abstract and top 10 terms from Claims, we can achieve 16% improvement of Sub-class, 16% improvement of Main-group, and 13% improvement of Sub-group compared to those of the baseline method; 3) For query formulation combined with semantic tags, e.g. for Abstract, 46% improvement of Sub-class, 42% improvement of Main-group, and 45% improvement of Sub-group compared to those of the baseline method can be achieved. Experiment results also show that extracting terms from Description gave the best performance over all other fields (e.g. Abstract, Claims field). The reason for this is the Description field contains specification about what a process or method of the invention is and how it differs from previous patents and technology. By identifying IFPS terms from Description, we can achieve better performance if IFPS is used as a query itself or and the best is to use in combination with query selection by KDR since IFPS includes information related to the areas a patent belong to which can be very helpful to identify the IPC sub-classes of a patent document (IF) and it includes Problems/Solutions (PS) which related to limitations of previous patents and effects of present invention that may help to identify IPC main-groups or sub-groups of the query patent. We also show the effectiveness of IFPS terms when IFPS is combined with KDR terms or tf*idf. When IFPS is added we gain much more improvement that shows a good strategy for query expansion. Our experiments show that terms about details of method or process of the invention are more significant for query formulation from Abstract or Claims; while terms about limitations or effects are more significant for query formulation from Description.

서지기타정보

서지기타정보
청구기호 {MICE 12002
형태사항 iv, 54 p. : 삽화 ; 30 cm
언어 영어
일반주기 저자명의 한글표기 : N. Khanh Ly
지도교수의 영문표기 : Sung-Hyon Myaeng
지도교수의 한글표기 : 맹성현
학위논문 학위논문(석사) - 한국과학기술원 : 정보통신공학과,
서지주기 References : p. 45-48
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서