We study the problem of adaptive locomotion learning for modular self-reconfigurable robots (MSRs). MSRs are mostly used in unknown and difficult-to-navigate environments where they can take a completely new shape to accomplish the current task at hand. Therefore it is almost impossible to develop the control sequences for all possible configurations with varying shape and size. The modules have to learn and adapt their locomotion in dynamic time to be more robust in nature. In this paper, we propose a Q-learning based locomotion adaptation strategy which balances exploration versus exploitation in a more sophisticated fashion. We have applied our proposed strategy mainly on the ModRED modular robot within the Webots simulator environment. To show the generalizability of our approach, we have also applied it on a Yamor modular robot. Experimental results show that our proposed technique outperforms a random locomotion strategy and it is able to adapt to module failures.