한국과학기술원 도서관

서지주요정보
Text-conditioned sampling framework for text-to-image generation with masked generative models = 마스크 생성 모델에서의 문장 정보와 일치하는 이미지 생성을 위한 토큰 추출 프레임워크
서명 / 저자	Text-conditioned sampling framework for text-to-image generation with masked generative models = 마스크 생성 모델에서의 문장 정보와 일치하는 이미지 생성을 위한 토큰 추출 프레임워크 / Jaewoong Lee.
발행사항	[대전 : 한국과학기술원, 2023].
Online Access	원문보기 원문인쇄

소장정보

등록번호

8041172

소장위치/청구기호

학술문화관(도서관)2층 학위논문

MAI 23055

휴대폰 전송

도서상태

이용가능(대출불가)

사유안내

반납예정일

리뷰정보

초록정보

Token-based masked generative models are gaining popularity for their fast inference time with parallel decoding. While recent token-based approaches achieve competitive performance to diffusion-based models, their generation performance is still suboptimal as they sample multiple tokens simultaneously without considering the dependence among them. We empirically investigate this problem and propose a learnable sampling model, Text-Conditioned Token Selection (TCTS), to select optimal tokens via localized supervision with text information. TCTS improves not only the image quality but also the semantic alignment of the generated images with the given texts. To further improve the image quality, we introduce a cohesive sampling strategy, Frequency Adaptive Sampling (FAS), to each group of tokens divided according to the self-attention maps. We validate the efficacy of TCTS combined with FAS with various generative tasks, demonstrating that it significantly outperforms the baselines in image-text alignment and image quality. Our text-conditioned sampling framework further reduces the original inference time by more than 50% without modifying the original generative model.

이 논문에서는 마스크 생성 모델에서 문장 정보와 일치하는 이미지 생성을 위한 토큰 추출 방법을 다루었다. 이미지 생성 모델 자체는 유지한 채로 학습 가능한 부 모듈과 문장 정보를 활용하여 생성된 이미지가 문장 정보와 일치하도록 하는 토큰을 추출하였다. 또한, 다수의 단계를 통해 확산하는 모델의 경우 발생하는 과단순화 현상을 해결하기 위해 주파수 정보를 기반으로 토큰 추출 방법을 다르게 가져가는 방안을 제시하였다. 제안한 토큰 추출 기법을 통해 생성된 이미지들이 기존의 방법들보다 사실적이고 문장 정보를 잘 반영한다는 것을 해석적인 방법과 계산적인 방법으로 보였다.

서지기타정보

서지기타정보
청구기호	{MAI 23055
형태사항	v, 28 p. : 삽도 ; 30 cm
언어	영어
일반주기	저자명의 한글표기 : 이재웅 지도교수의 영문표기 : Sungju Hwang 지도교수의 한글표기 : 황성주 Including appendix
학위논문	학위논문(석사) - 한국과학기술원 : 김재철AI대학원,
서지주기	References : p. 23-26
주제	Multimodal Text-to-image generation Token-based diffusion model 다중모달 문장 정보 기반 이미지 생성 토큰 기반 확산모델

QR CODE

책소개

전체보기

나의 도서관정보

메뉴

소장정보

리뷰정보

초록정보

서지기타정보

책소개

목차

이 주제의 인기대출도서