Article Preview
TopIntroduction
Over the years, many studies have been conducted to facilitate the working of robots in dynamic environments (Thrun et al., 2005; Asaka & Ishikawa, 1994; Kanda et al., 2002). Various robots have been developed to support humans in workspaces such as a house and factory. However, it is not easy to make a robot behave like a human in dynamic environments (Sogo et al., 1999). Humans select the appropriate course of action, though prediction of all the changes in the environment and their next state when they work in a certain environment, subconsciously. Humans subconsciously use their past experience and memory to predict the posture and force required in certain environments. That is, if they have to consciously perform these actions, it would be difficult. Therefore, we cannot accomplish our objective, and hence, there are cases that we sustain a loss. For example, human walking is rhythmic and stable because we adjust appropriately according to the sensory input relating to the environments and the body. At this time, the brain should recognize the act of walking and the environment, and accordingly adjust each joint of the body so that we adapt to environments, without knowing it. Through this prediction, humans suitably control the body balance in ordinary social life, to reduce the risk of falling or avoid crashing into any obstacle. Similarly, in the case of robots, it is considered that the load of control processing for behavior selection will be large if prediction is not used in behavior selection. In recent years, robots have been developed with advanced behavior characteristics; humans control these robots through a control rule. However, in the future, robots are expected to be incorporated with this control rule themselves, through machine learning in dynamic environments, and not be controlled by a fixed pre-control rule, such as those robots that are required to replace human labor. It is desirable that robots decide the course of each action to be taken in a dynamic environment and in addition to the pre-registered commands, like a human. Moreover, hardware and the limited computational resources pose a physical limitation, so a robot needs some time to decide its course of action. For example, about 1 or more steps are required. Thus, state prediction is important, if robots have to replace humans in a dynamic environment.
When authors consider that an action decision based on future prediction, the property of a disturbance that will be given by outside environment must be known (Pivonˇka et al., 2009). On the other hand, the properties of disturbance signals cannot be described simply, such as non-periodic function, nonlinear time-varying function or almost-periodic function. In contrast, a future prediction result obtained using a machine learning technique is based on the tendency obtained by means of past training or learning. In this type of situation, the learning time increases in proportional to the amount of training data, either, the tendency may not find using prediction, in worst case (He et al., 2012).
A state-action pair prediction method was proposed. In this method, the prediction performance (Sugimoto & Kurashige, 2013; Sugimoto & Kurashige, 2013, The proposal for deciding; Sugimoto & Kurashige, 2015; Sugimoto & Kurashige, 2015; A study on the deciding) and action decision methods were based on prediction results (Sugimoto & Kurashige, 2014; Sugimoto & Kurashige, 2015, The proposal for real-time; Sugimoto & Kurashige, 2015, The proposal for compensation, Sugimoto & Kurashige, 2015, Future motion; Sugimoto et al., 2016). In the above-mentioned methods, the behavior of the robot was considered when an unknown periodic disturbance signal was transmitted to the robot continuously. For example, a fixed weighted prediction or a variable weighted prediction, moreover, a stochastic approach, in addition, varying a sampling rate. However, the methods have been considering a periodic disturbance in the action decision or future prediction model. However, thus far we have not focused on how to treat a non-periodic disturbance for obtaining a prediction and action decision when using state-action pair prediction.