Abstract:The Q-value function was approached with RBF(radial basis function) neural network was presented to generalise the information learnt by learning agent in continuous state space and action space. The input of RBF network is the pair of state and action, and the output is the Q-value of the pair of state and action. The state is decided by the transfer characteristic of system. The act of the input is consisted of the greedy act, which can be calculated with the Q-value optimization in the RBF neural network and noise act which has a normal distribution. The structure and parameters of network were adjusted with RNA algorithm and gradient decent algorithm. The effectiveness of the proposed Q-learning method is shown through simulation on the balancing control of a cart-pole system.