We often face uncertain decision making situations, in which we don't aware of full components to accomplish the tasks. However, the majority of studies were conducted with full explanation of including factors, only a few studies were performed along insufficient task information owing to experimental difficulty and hardship in making computational model. Beside active investigation on exploration of choice options, no previous study has yet investigated how the brain explores new attributes during learning trials. In this study, we investigated how the brain integrates, so that explores new information and learning processes during inadequately informed multi-dimensional reinforcement learning task using computational models including hidden markov model (HMM), soft-max function, and reinforcement learning (RL) and fMRI; we observed that our brain integrates pieces of information in a sequential manner and that the insula is responsible for the exploration process.
29 subjects participated in the multi-dimensional behavioral task. Subjects were shown pictures with multiple features-shapes, color, and patterns-and asked to collect as many points (reward) as possible; participants were not informed of the rule and were to find optimal strategies to achieve their goals. During behavioral tasks, subjects' performance and the fMRI signals were simultaneously recorded.
To examine the exploratory behavior of new attributes and the learning processes for newly acquired attributes, we fitted behavior data into two computational models - two separate probabilistic policy searching model using HMM or soft-max function and two distinctly initialized RL learning models. The comparison within each of two models demonstrates that the participants explores under high cognitive ambiguity and learns new values inferring from previous policies.
After confirming that information is explored and learning occurs with regard to previous values, we examined the related brain areas from the fMRI data. The state-action value signals, the reward prediction error signals, the cognitive entropy signals, which is cognitive ambiguity that induces exploration, and policy transition time-points were extracted from the model and detected in fMRI results by GLM analysis. As previously reported, the value signals and the error signals were observed in ventromedial prefrontal cortex (vmPFC) and anterior cingulate cortex respectively. The exploration related signals, transition time points and entropy, were activated in the areas including exploration related area frontopolar cortex (FPC). These results indicate that higher cognition areas including vmPFC, ACC, and FPC work as a meta-controller and integrator of value and error signals.
Overall, our result explains how our brain explores and learn new attributes for making a decision during reinforcement learning. Our study, therefore, suggests how human effectively processes new information when the dimension is extended.
우리는 종종 불충분한 정보를 가지고 있지만 의사결정을 해야하는 상황에 처해있다. 하지만 이전의 연구들에서는 제시한 상황을 실험적으로 설계하기 힘들고 계산 모델을 만드는 데 어려움이 있어 충분한 연구가 진행되지 않았다. 따라서 대부분의 이전 연구들은 이미 정보를 가지고 있는 상태에서 정보를 줄이는 식으로 또는 정보들 간에 경쟁을 하는 식으로 연구를 진행하였다. 본 연구에서는 불충분한 정보가 있지만 의사결정을 해야 하는 실험적인 상황을 다차원의 보상학습 과제를 만들어 사람들을 대상으로 실험하였다. 총 29명의 피험자를 대상으로 실험을 수행하였으며, 피험자들은 세가지 정보 (모양, 색깔, 패턴)를 인식하고 통합하여 올바른 의사결정을 해야 했다. 이러한 과정에서 강화학습 알고리즘을 기반으로 은닉 마르코프 모델과 소프트맥스 함수를 이용하여 피험자들의 탐색 과정을 어떤 정보를 사용하여 의사결정을 했는지를 통해 알아보았고, 이렇게 새로운 정보를 통합하는 과정에서 새로운 정보에 대한 학습이 이전에 학습된 값을 이용하여 학습하는 것을 확인하였다. 마지막으로 기능적자기공명영상 촬영을 통해 관련된 뇌 부위의 활성화를 확인하였고, 이전에 탐색 과정을 수행한다고 알려진 전전두엽에서 새로운 정보의 탐색 역시 일어나는 것을 확인할 수 있었다. 결과적으로 본 실험에서 사람들은 정보가 불충분할 때 새로운 정보를 탐색하며, 이 과정을 전전두엽에서 주관하고 엔트로피가 높아질 때 더 탐색하는 경향성을 보임을 확인하였다.