These days researches on emotional robots are made actively. Emotional robots use human language, expression, action etc. to understand human emotion. In the interior of the room, there are many sound sources and background noise, so the robots should be able to separate the mixture of these sound sources into original sound sources to understand the meaning of voice of a specific person. Also they should be able to turn or move to the direction of a specific person to observe his expression or action effectively. Until now the researches on the localization and separation of sound sources have been so theoretical and computative that real-time processing is not possible. Therefore, for practical emotional robot, fast computation should be realized by using simple principle. In this paper the methods for detecting the direction of sound sources by using the phase difference between peaks on spectrums of signals of two microphones, and separating the mixture of sound sources by using fundamental frequency and its overtones of human voice, are proposed. Also by using these methods, it is showed that the effective and real-time localization and separation of sound sources using simple equipment in usual room is possible.