freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

算法類外文資料翻譯-其他專業(yè)(編輯修改稿)

2025-02-24 01:06 本頁面
 

【文章內(nèi)容簡介】 B or state E or back to state C (look at the 4 arrow out of state D). If the agent is in state E, then three possible actions are to go to state A, or state F or state D. If agent is state B, it can go either to state F or state D. From state A, it can only go back to state E. We can put the state diagram and the instant reward values into the following reward table, or matrix R . Action to go to state Agent now in state A B C D E F A 0 B 0 100 C 0 D 0 0 0 E 0 0 100 F 0 0 100 The minus sign in the table says that the row state has no action to go to column state. For example, State A cannot go to state B (because no door connecting room A and B, remember?) In the previous sections of this tutorial, we have modeled the environment and the reward system for our agent. This section will describe learning algorithm called Q learning (which is a simplification of reinforcement learning). We have model the environment reward system as matrix R. 5 Now we need to put similar matrix name Q in the brain of our agent that will represent the memory of what the agent have learned through many experiences. The row of matrix Q represents current state of the agent, the column of matrix Q pointing to the action to go to the next state. In the beginning, we say that the agent know nothing, thus we put Q as zero matrix. In this example, for the simplicity of explanation, we assume the number of state is known (to be six). In more general case, you c
點擊復(fù)制文檔內(nèi)容
環(huán)評公示相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖片鄂ICP備17016276號-1