한국과학기술원 도서관

서지주요정보
모델 전도 공격에 안전한 모델 설명 생성에 관한 연구 = Learning to generate inversion-resistant model explanations
서명 / 저자	모델 전도 공격에 안전한 모델 설명 생성에 관한 연구 = Learning to generate inversion-resistant model explanations / 정호용.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8040864

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MIS 23010

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

The wide adoption of deep neural networks (DNNs) in mission-critical applications has spurred the need for interpretable models that provide explanations of the model's decisions. Unfortunately, previous studies have demonstrated that model explanations facilitate information leakage, rendering DNN models vulnerable to model inversion attacks. These attacks enable the adversary to reconstruct original images based on model explanations, thus leaking privacy-sensitive features. To this end, we present Generative Noise Injector for Model Explanations (GNIME), a novel defense framework that perturbs model explanations to minimize the risk of model inversion attacks while preserving the interpretabilities of the generated explanations. Specifically, we formulate the defense training as a two-player minimax game between the inversion attack network on the one hand, which aims to invert model explanations, and the noise generator network on the other, which aims to inject perturbations to tamper with model inversion attacks. We demonstrate that GNIME significantly decreases the information leakage in model explanations, decreasing transferable classification accuracy in facial recognition models by up to 84.8% while preserving the original functionality of model explanations.

중대한 결정에 인공지능이 광범위하게 채택되면서 자신의 결정에 대한 설명을 제공하는 해석 가능한 인공지능의 필요성이 높아졌다. 안타깝게도 모델 설명을 공개하는 것은 정보 유출을 촉진하여 인공지능 모델이 모델 전도 공격(Model Inversion Attack)에 취약해진다는 것이 최근 연구를 통해 입증되었다. 공격자는 공개된 모델 설명을 기반으로 원본 이미지를 보다 정확히 복원해낼 수 있게되어 민감한 정보가 유출될 수 있다. 본 연구는 모델 설명의 해석 가능성을 유지하면서 전도 공격의 위험을 최소화하기 위해 모델 설명을 교란하는 새로운 방어 프레임워크인 GNIME(Generative Noise Injector for Model Explanations)을 제시한다. 해당 기법은 최적의 노이즈를 주입하는 모델을 학습하기 위해 방어자인 노이즈 생성기와 공격자인 전도 모델을 최소극대화 시나리오를 통해 경쟁적으로 학습시킨다. 실험을 통해 GNIME이 원본 모델 설명의 기능을 유지하면서도 정보 유출을 크게 줄여 얼굴 인식 모델에서 전이 분류 정확도(Transferable Classification Accuracy)를 최대 84.8%까지 감소시킬 수 있음을 확인했다.

서지기타정보

서지기타정보
청구기호	{MIS 23010
형태사항	iv, 26 p. : 삽도 ; 30 cm
언어	한국어
일반주기	저자명의 영문표기 : Hoyong Jeong 지도교수의 한글표기 : 손수엘 지도교수의 영문표기 : Sooel Son
학위논문	학위논문(석사) - 한국과학기술원 : 정보보호대학원,
서지주기	참고문헌 : p. 23-25
주제	모델 전도 공격 설명 가능한 인공지능 모델 설명 model inversion attack explainable AI model explanation

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서