서지주요정보
Certainty and uncertainty in the hidden states of large language models = 대형 언어 모델의 숨은 상태에서의 확신과 불확신
서명 / 저자 Certainty and uncertainty in the hidden states of large language models = 대형 언어 모델의 숨은 상태에서의 확신과 불확신 / Yeonjea Kim.
발행사항 [대전 : 한국과학기술원, 2025].
Online Access 원문보기 원문인쇄

소장정보

등록번호

8043961

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MAI 25012

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Large Language Models (LLMs) demonstrate remarkable performance in natural language processing but pose risks by generating incorrect information (hallucinations) without indicating uncertainty. This study investigates whether the hidden states of LLMs encode the model's level of certainty and whether we can distinguish between certain generations and uncertain generations through these. We extracted hidden states from specific layers and time steps across multiple models and visualized them by applying dimensionality reduction techniques. Using datasets consisting of factuality-based, hallucination-based, and qualitative-based, we defined certain generations as consistently answers and uncertain generations as varying responses. Our results confirmed that even when the generated outputs are the same, the hidden states of certain and uncertain generations form distinctly separated clusters in low-dimensional space. This suggests that the hidden states of LLMs contain features related to the level of certainty, which can help identify and mitigate hallucinations.

대규모 언어 모델(LLM)은 자연어 처리에 뛰어난 성능을 보이지만, 불확실성을 표시하지 않고 잘못된 정보(환각)를 생성하는 위험이 있습니다. 연구는 LLM의 숨은 상태가 모델의 확신의 정도를 내포하는지, 이를 통해 확신 있는 출력과 확신하지 않는 출력을 구별할 수 있는지 밝힙니다. 우리는 여러 모델에 대해 특정 레이어와 시간 단계에서 숨은 상태를 추출하고, 차원 축소 기법을 적용하여 시각화했습니다. 사실 기반, 환각 기반, 정성적 데이터셋을 사용하여, 확신 있는 생성은 일관된 답을, 확신이 없는 생성은 변동되는 응답으로 정의하여 실험군을 분류했습니다. 연구 결과, LLM의 생성물이 동일함에도 불구하고, 확신 있는 생성과 확신이 없는 생성의 숨은 상태가 저차원 공간에서 명확히 분리되는 것을 확인했습니다. 이는 LLM의 숨은 상태가 확신의 정도와 관련된 특징을 내포하여, 환각을 식별하고 완화하는 데 도움이 될 수 있음을 시사합니다.

서지기타정보

서지기타정보
청구기호 {MAI 25012
형태사항 v, 36 p. : 삽화 ; 30 cm
언어 영어
일반주기 저자명의 한글표기: 김연지
지도교수의 영문표기: Choi, Jae Sik
지도교수의 한글표기: 최재식
Appendix: A. Datasets and models -- A.1. Facts : True and False -- A.2. Social Norms : Very Good to Very Bad -- A.3. Models -- B. Experiments setting and t-SNE results -- B.1. Experiments setting -- B.2. t-SNE results
학위논문 학위논문(석사) - 한국과학기술원 : 김재철AI대학원,
서지주기 References: p. 32-34
주제 Distinguishing Certainty and Uncertainty
Hidden State Analysis
LLM
확신과 불확신 구별
숨은 상태 분석
대규모 언어 모델
QR CODE

책소개

전체보기

목차

전체보기

이 주제의 인기대출도서

Even when the model generates same outputs both with certainty and uncertainty, we can distinguish between these cases through hidden state analysis. This implies that the concept of certainty is embedded in the hidden states.

A Taxonomy of Hallucination and Uncertainty of LLMs

Factuality-based Certainty and Uncertainty Clustering

Hallucination-based Certainty and Uncertainty Clustering

It can be observed that certain and uncertain generations are clearly distingnishable. Despite including misclassified data in the calculations, the silhouette scores for certainty and uncertainty remain high. Additionally, we observe that the silhouette score is noticeably lower when clustering within certainty alone.

Qualitative-based Certainty and Uncertainty Clustering

These are the silhouette scores according to the time step. The score is highest at the initial generation point, but classification between certainty and uncertainty is also possible at the last input step and the next generation step.

Red indicates certainty, and orange represents uncertainty. From t = N tot = N+2, the two groups show meaningful clustering, with a clear separation observed at the initial generation point, t = N+1.

Certainty and uncertainty are observed to become clearly separated as they pass through the middle layers. Since they were extracted at time step N +1, the probability distribution remains consistent throughout.

Despite differences in density, all three LLMs show a clear distinction between certainty and uncertainty.

We calculated the Mean Concept Direction and examined its values, but it was difficult to make any meaningful discoveries.

2D and 3D Clustering ofConcept Directions : The Pairwise Concept Direction Set exhibits clear clusters among groups ofconcept directions, indicating shared features like Certainty.

[t-SNE] Factuality-based Certainty and Uncertainty Clustering

[t-SNE] Hallucination-based Certainty and Uncertainty Clustering

[t-SNE] Qualitative-based Certainty and Uncertainty Clustering

「t-SNE Silhouette Scores of Certainty and Uncertainty