한국과학기술원 도서관

서지주요정보
Extracting relation triples from unseen data = 관계추출 모델의 일반화를 위한 학습 방법
서명 / 저자	Extracting relation triples from unseen data = 관계추출 모델의 일반화를 위한 학습 방법 / Juhyuk Lee.
발행사항	[대전 : 한국과학기술원, 2021].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8038325

소장위치/청구기호

학술문화관(문화관) 보존서고

MAI 21017

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Developing a relational extraction model from unstructured text is essential for the automation of large-scale knowledge graph maintenance. To maintain the knowledge graph up-to-date, it is required for a model to extract relational triples from sentences that might contain unseen entities. Simply fine-tuning BERT on a relational triple extraction task shows excellent performance on seen entities (entities in the training data), although it does not generalize well for new entities. We find that augmentation with noisy data helps to extract relational triples between unseen entities, while it comes at a cost of performance degradation on seen entities. Since we have two experts, one on seen entities and the other one on unseen entities, we filter predictions of each experts and union them to get the best of both experts. Experiments on two standard benchmark datasets, NYT and WebNLG, show that our model outperforms current state-of-the-art model on unseen data, along with competitive results on the original seen data.

비정형 텍스트로부터 관계를 추출하는 모델을 연구하는 것은 지식 그래프 유지보수의 자동화를 위해서 필수적이다. 지식그래프를 최신 상태로 유지하려면, 관계추출 모델은 학습 때 보지 못한 개체에 대해서도 관계를 잘 추출할 수 있어야 한다. 간단하게 버트를 미세조정하여도 관계추출 모델을 만들 수 있고, 그럴 경우 학습 때 본 개체에 대해서는 성공적으로 관계를 추출할 수 있다. 하지만 학습 때 보지 못한 개체에 대해서는 일반화가 되지 않는다. 우리는 노이지한 개체를 이용해 학습데이터를 증강해주면, 학습 때 보지 못한 개체에 대해서도 관계를 잘 추출할 수 있게 된다는 것을 발견하였다. 하지만 그럴 경우, 학습 때 본 개체에 대해서는 관계 추출을 이전만큼 잘 하지 못한다는 비용이 따른다. 여기서 우리는 학습 때 본 개체에서 관계추출을 잘하는 모델과 학습 때 못본 개체에서 관계추출을 잘하는 모델 두 개를 동시에 이용할 수 있다. 각 모델의 예측 결과를 필터링한 후 합하여 사용을 할 수 있고, 이런 방법을 이용하여 두 모델의 장점을 결합할 수 있다. 관계추출 모델의 학습 및 검증을 위해 자주 사용되는 두 데이터셋을 이용하여 우리는 학습 때 본 개체와 학습 때 못본 개체 모두에 대해서 제안된 모델이 기존 모델보다 관계추출을 잘한다는 것을 보였다.

서지기타정보

서지기타정보
청구기호	{MAI 21017
형태사항	iii, 20 p. : 삽화 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 이주혁 지도교수의 영문표기 : Eunho Yang 지도교수의 한글표기 : 양은호 Including Appendix
학위논문	학위논문(석사) - 한국과학기술원 : AI대학원,
서지주기	References : p. 19-20

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서