【文章內(nèi)容簡介】
B or state E or back to state C (look at the 4 arrow out of state D). If the agent is in state E, then three possible actions are to go to state A, or state F or state D. If agent is state B, it can go either to state F or state D. From state A, it can only go back to state E. We can put the state diagram and the instant reward values into the following reward table, or matrix R . Action to go to state Agent now in state A B C D E F A 0 B 0 100 C 0 D 0 0 0 E 0 0 100 F 0 0 100 The minus sign in the table says that the row state has no action to go to column state. For example, State A cannot go to state B (because no door connecting room A and B, remember?) In the previous sections of this tutorial, we have modeled the environment and the reward system for our agent. This section will describe learning algorithm called Q learning (which is a simplification of reinforcement learning). We have model the environment reward system as matrix R. 5 Now we need to put similar matrix name Q in the brain of our agent that will represent the memory of what the agent have learned through many experiences. The row of matrix Q represents current state of the agent, the column of matrix Q pointing to the action to go to the next state. In the beginning, we say that the agent know nothing, thus we put Q as zero matrix. In this example, for the simplicity of explanation, we assume the number of state is known (to be six). In more general case, you c