Abstract:Based on locally weighted learning, a temporal difference(TD) learning framework with incremental nearest neighbors is proposed for reinforcement learning problems in continuous spaces. The framework selects some observed states to construct an instance dictionary in increments, uses the range nearest neighbor instances of the new observed state to approximate its value function and policy, and combines with a TD algorithm to update the value function and eligibility trace of each instance in the dictionary iteratively. Some schemes are designed for each key component of the framework, and theoretical analyses are given for its convergence. Finally, twenty-four scheme combinations are verified by simulations, which show that the combination SNDN has better learning performance and computational efficiency.