【正文】
Tsitsiklis, 1996) applies naturally to the case of autonomous agents, which receive sensations as inputs and take actions that affect their environments in order to achieve their own goals. RL is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weak end if it produces favorable framework appealing from a biological point of view, since an animal has builtin preferences but does not always have a teacher to tell it exactly what action it should take in every situation. If the members of a group of agents each employ an RL algorithm, the resulting collective algorithm allows control policies to be learned in a decentralized way. Even in situations where centralized information is available, it may be advantageous to develop control polisci in a decentralized way in order to simplify the search through policy space. Although it may be possible to synthesize a system whose goals can be achieved by agents with conic objectives, this paper focuses on teams of agents that share identical objectives corresponding directly to the goals of the system as a whole. To demonstrate the power of multiagent RL, we focus on the difference problem of elevator group supervisory control. Elevator systems operate in highdimensional continuous state spaces and in continuous time as discrete event dynamic systems. Their states are not fully observable, and they are nonstationary due to changing 天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 32 passenger arrival rates. We use a team of RL agents, each of which is responsible for controlling one elevator car. Each agent uses article neural works to store its actionvalue estimates. We pare a parallel architecture, in which the agents share the same works, with a decentralized architecture, in which the agents have their own independent works. In either case, the team receives a global reward signal that is noisy from the perspective of each agent due in part to the effects of the actions of the other agents. Despite these difficulties, our system outperforms all of the heuristic elevator control algorithms known to us. We also analyze the policies learned by the agents, and show that learning is relatively robust even in the face of increasingly inplete state information. These results suggest that approaches to decentralized control using multiagent RL have considerable promise. In the following sections, we give some additional background on RL, introduce the elevator domain, describe in more detail the multiagent RL algorithm and work architecture we used, present and discuss our results, and finally draw some conclusions. 2. Reinforcement Learning Machine learning researchers have focused primarily on supervised learning, where a “ teacher” provides the learning system with a set of training examples in the form of inputoutput pairs. Supervised learning algorithms are useful in a wide variety of problems involving pattern classic and function approximation. However, there are many problems in which training examples are costly or even impossi。 Barto, 1998。.,by . 天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 31 附件 中英文翻譯 1. Introduction Interest in developing capable learning systems is increasing within the multiagent and AI research munities (., Weiss amp。 .,by et al. [12] “Chips for the Niies and Beyond?。Scientific American。 ( 2)對(duì)曳引系統(tǒng)進(jìn)行了設(shè)計(jì),包括曳引機(jī)功率計(jì)算、選型;聯(lián)軸器的選型;減速器的選型;制動(dòng)器的計(jì)算、選型;曳引輪和導(dǎo)向輪的設(shè)計(jì),鋼絲繩的選擇及校核。當(dāng)轉(zhuǎn)動(dòng)到偏心叉中的壓繩舌與限速器繩接觸時(shí),根據(jù)自鎖原理, 鉗塊 4 將限速器摯停,進(jìn)而帶動(dòng)安全鉗動(dòng)作將轎廂加持于導(dǎo)軌上。因而離心錘所受到的離心力相應(yīng)地也隨之增加,并使離心錘繞著銷軸轉(zhuǎn)動(dòng),中心外移。擺錘 式限速器一般用于速度較低的電梯。轉(zhuǎn)動(dòng)速度大,擺錘的擺動(dòng)幅度也大。轎廂在運(yùn)行時(shí),通過限速器繩頭拉動(dòng)限速器繩,使限速器繩天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 28 輪和連在一起的 轉(zhuǎn)軸 同步轉(zhuǎn)動(dòng)。 限速器 的設(shè)計(jì) 限速器按動(dòng)作原理可分為擺錘式和離心式兩種。對(duì)重安全鉗若速度大于 ,也應(yīng)用漸進(jìn)式安全鉗。標(biāo)準(zhǔn)要求轎廂制停的平均減速度在 ~ (g:重力加速度 )之間,所以安全鉗動(dòng)作時(shí),轎廂必須有一定的制停距離。 安全鉗按結(jié)構(gòu)和工作原理可分為瞬時(shí)式安全鉗和漸進(jìn)式安全鉗。 如圖 52 所示,即為本設(shè)計(jì)采用的彈簧式緩沖器。 當(dāng)電梯額定速度很低時(shí) (如小于 ),轎廂和對(duì)重底下的緩沖器也可以天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 27 使用實(shí)體式緩沖塊來代替,其材料可用橡膠、木材或其它具有適當(dāng)彈性的材料制成。 緩沖器分蓄能型緩沖器和耗能型緩沖器。兩道水平的紅外光好似在整個(gè)開門寬度上設(shè)了兩排看不見的“欄桿 ”,當(dāng)有人或物在門的行程中遮斷了任一根光線都會(huì)使門重開。本電梯即采用的光電是保護(hù)裝置。同時(shí)設(shè)置一種保護(hù)裝置,當(dāng)乘客在門的關(guān)閉過程中被門撞擊或可能會(huì)被撞擊時(shí), 保護(hù)裝置將停止關(guān)門動(dòng)作使門重新自動(dòng)開啟。首先門扇朝向乘員的一面要光滑,不得有可能鉤掛人員和衣服的大于 3mm 的凹凸。 圖 51為電梯門機(jī)構(gòu) 1變頻電機(jī) 。由電機(jī) 1帶動(dòng)皮帶輪 2,與皮帶輪同軸的齒輪帶動(dòng)同步皮帶 3,使連接在同步皮帶上的門刀 5作水平運(yùn)動(dòng) ,有門刀 5 帶動(dòng)電梯曾門運(yùn)動(dòng) 。目前乘客電梯多采用變頻門機(jī)機(jī)構(gòu)。根據(jù)使用要求,一般關(guān)門的平均速度要低于開門平均速度,這樣可以天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 26 防止關(guān)門時(shí)將人夾住,而且客梯的門還設(shè)有安全觸板。它裝設(shè)在轎門的上方及轎門的連接處。一般沒有特殊要求的都采用自動(dòng)開門機(jī),本 設(shè)計(jì)即采用了自動(dòng)式變頻門機(jī)的設(shè)計(jì)。 根據(jù)客戶的功能要求,以及設(shè)計(jì)要求, 我們選擇門的形式為中分式。 電梯門的主要參數(shù)確定 ( 1) 門形式的選擇 電梯門主要有兩類,即滑動(dòng)門和旋轉(zhuǎn)門,目前普遍采用的是滑動(dòng)門。 為了將轎門的運(yùn)動(dòng)傳遞給層門,轎門上設(shè)有系合裝置 (如門刀 ),門刀通過與層門門鎖的配合,使轎門能帶動(dòng)層門運(yùn)動(dòng)。所以轎門稱為主動(dòng)門,層門稱為被動(dòng)門。一般的電梯,都裝有自動(dòng)開啟,由轎門帶動(dòng)的,層門上裝有電氣、機(jī)械聯(lián)鎖裝置的門鎖。 ( 3) 轎門、層門及其相互關(guān)系 轎門是設(shè)置在轎廂入口的門,是設(shè)在轎廂靠近層門的一側(cè),供司機(jī)、乘客和貨物的進(jìn)出。是電梯很重要的一個(gè)安全設(shè)施,根據(jù)不完全統(tǒng)計(jì),電梯發(fā)生的人身傷亡事故約有 70%是由于層門的質(zhì)量及使用不當(dāng)?shù)纫鸬摹? 天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 25 第 五 章 門系統(tǒng)、 緩沖器 、 安全鉗與限速器 門系統(tǒng) 的設(shè)計(jì) 電梯門系統(tǒng)及其作用 (1)門系統(tǒng)的 作用 層門和轎門都是為了防止人員和物品墜入井道或轎內(nèi)乘客和物品與井道相撞而 發(fā)生危險(xiǎn),都是電梯的重要安全保護(hù)設(shè)施。 T型導(dǎo)軌的主要規(guī)格參數(shù),是底寬 b、高度 h和工作面厚度 k。用 于速度低于 導(dǎo)軌表面一般不作機(jī)械加工。 一般鋼導(dǎo)軌,常采用機(jī)械加工方式或冷軋加工方式制作 。所以電梯的 導(dǎo)軌 ,包括轎廂的 導(dǎo)軌 和對(duì)重的 導(dǎo)軌 兩部分。 天津工業(yè)學(xué) 2021 屆本科畢業(yè)生畢業(yè)設(shè)計(jì) 24 導(dǎo)軌 的設(shè)計(jì) 導(dǎo)軌 功能是限制轎廂和對(duì)重的活動(dòng)自由度,使轎廂和對(duì)重只沿著各自的導(dǎo)軌作升降運(yùn)動(dòng),使兩者在運(yùn)行中平穩(wěn),不會(huì)偏擺 , 有了 導(dǎo)軌 ,轎廂只能沿著左右兩側(cè)的豎直方向的導(dǎo)軌上下運(yùn)行。 表 42 額定載重量與轎廂最大有效面積 額定 載重( kg) 轎廂最大 有效面積 ( m) 額定 載重 ( kg) 轎廂最大 有效面積 ( m) 額定 載重 ( kg) 轎廂最大 有效面積 ( m) 額定 載重( kg) 轎廂最大 有效面積 ( m) 100 525 900 1275 180 600 975 1350 225 630 1000 1425 300 675 1050 1500 375 750 1125 1600 400 800 1200 2021 450 825 1250 2500 我們?cè)O(shè)計(jì)的電梯 其有效面積為 = ,其大于最小值 ,小于最大值 。為了減少電梯傳動(dòng)中曳 引輪所承重的載荷差,提高電梯的曳引性能,就必須采用補(bǔ)償裝置。 例如,有一 60m 高建筑內(nèi)使用的電梯,用 6根 Φ 13mm 的鋼絲繩,其中不可忽視的是繩的總重量約 360kg。尤其當(dāng)電梯的提升高度超過30m 時(shí),這二側(cè)的平衡變化就更大 ,因而必須增設(shè)平衡補(bǔ)償裝置來減弱其變化。 解:已知 G= 1000kg Q= 1000kg K=0. 45 代入