Path planning for mobile robots is paramount in unknown environment exploration. Exploration efficiency and prediction accuracy largely depend on appropriate waypoint decision. In this paper, an informative path planning approach is proposed based on the reinforcement learning paradigm for static environment exploration. In contrast to model based algorithms, no assumption is presumed for the environmental features. The computational cost is reduced and the online planning capability is enhanced by evaluating the values of actions through robot’s interaction with the environment. To improve the prediction accuracy, an action selection algorithm based on the Upper Confidence Bound (UCB) is utilized to balance exploration and utilization. Exploration in unknown areas is encouraged, which also potentially avoid stucking into local extrema. Numerical simulations have been performed on environments that are modeled with Gaussian distribution and Ackley function respectively. Results show that the characteristics of the entire environment field can be effectively captured using the proposed path planning approach.