freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

算法類外文資料翻譯-其他專業(yè)-文庫吧

2024-12-30 01:06 本頁面


【正文】 e assign two arrows to each room of the previous graph. Each arrow contains an instant reward value. The graph bees state diagram as shown below Additional loop with highest reward (100) is given to the goal room (F back to F) so that if the agent arrives at the goal, it will remain there forever. This type of goal is called absorbing goal because when it reaches the goal state, it will stay in the goal state. Ladies and gentlemen, now is the time to introduce our superstar agent…. Imagine our agent as a dumb virtual robot that can learn through experience. The agent can pass one room to another but has no knowledge of the environment. It does not know which sequence of doors the agent must pass to go outside the building. Suppose we want to model some kind of simple evacuation of an agent from any room in the building. Now suppose we have an agent in Room C and we want the agent to learn to reach outside the house (F). (see diagram below) 3 How to make our agent learn from experience? Before we discuss about how the agent will learn (using Q learning) in the next section, let us discuss about some terminologies of state and action . We call each room (including outside the building) as a state . Agent39。s movement from one room to another room is called action . Let us draw back our state diagram. State is depicted using node in the state diagram, while action is represented by the arrow. Suppose now the agent is in state C. From state C, the agent can go to state D because the state C is connected to D. From state C, however, the agent cannot directly go to state B because there is no direct door connecting room B and C (thus, no arrow). From state D, the agent can go either to state
點擊復(fù)制文檔內(nèi)容
環(huán)評公示相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號-1