【正文】
鑒于兩個(gè)父配置P1 ={R11, R12, .... R1K}和 p2= {R21, R22, ...。重復(fù)這一過(guò)程,直到所有的節(jié)點(diǎn)(即,其相應(yīng)的變量)被映射到了寄存器類別之一。節(jié)點(diǎn)進(jìn)行排序在溢油度遞減順序。一個(gè)節(jié)點(diǎn)的泄漏程度,我可以由下面給出的三個(gè)方程之一定義 SDegree1(i) = SCost(i) x Degree(i) SDegree2(i) = SCost(i) x Degree2(i) SDegree3(i) = SCost(i) (1) 這些表達(dá)式, SCost 的( i)為漏油成本和學(xué)位( I)是節(jié)點(diǎn) i的邊數(shù)事件。在寄存器分 配的問(wèn)題,無(wú)論是泄漏的成本和被認(rèn)為是在一個(gè)給定的干擾圖的節(jié)點(diǎn)的程度。 (節(jié)點(diǎn) i 和 j干擾圖中被說(shuō)成是無(wú)沖突的時(shí)候,有沒(méi)有連接它們的邊緣) 17 B. 初始人口代 一般來(lái)說(shuō)使用的 DSatur 算法 [5],它是一個(gè)曲線圖著色的啟發(fā)式生成初始種群。 R2K},部分配置將集合 {R1, R2, ...。每個(gè)解決方案 PI 分區(qū)變量到寄存器類, PI= {R1, R2, ...... RK},其中每個(gè)類日包括被映射到的寄存器里,而 k 是寄存器的總 數(shù)的變量的生存區(qū)間。對(duì)于兩種配置 p1 和p2, p1 和 p2 之間的距離是要變換 p1 的 p2 的初等變換的最小數(shù)目。重復(fù)該過(guò)程直到迭代的值小于或等于一個(gè)前綴的數(shù)字 MAXITER 達(dá)到或種群的多樣性(群體多樣性)大于零。 LS運(yùn)營(yíng)商適用,提高 p 為固定數(shù)目 L 的迭代( local_search)。在每一代中, p1 和 p2 兩個(gè)配置選擇中的的人口( select_parents)。一般的算法,如下所示: Input : Interference graph , IG = ( V,E ) 。 A. 算法 HEA 由遺傳因素和本地搜索( LS)。 我們提出了一個(gè)混合進(jìn)化算法對(duì)圖的著色寄存器分配問(wèn)題,也是一個(gè)新的本地搜索功能的基礎(chǔ)上一個(gè)新的交叉算子。 HEA是基于兩個(gè)要素:一個(gè)高 效的本地搜索( LS)運(yùn)算符和一個(gè)高度專業(yè)化的交叉運(yùn)營(yíng)商。進(jìn)化算法相結(jié)合,與專業(yè)運(yùn)營(yíng)商來(lái)產(chǎn)生復(fù)雜的混合動(dòng)力系統(tǒng)稱為混合遺傳算法,混合進(jìn)化算法,遺傳局部搜索算法和模因算法。還有其他一些方法,如遺傳規(guī)劃,蟻群優(yōu)化等簡(jiǎn)單的進(jìn)化算法效率不是一般的復(fù)雜的組合問(wèn)題。遺傳算法是一種經(jīng)典方法,在這個(gè)類別。在此算法中的一個(gè)重要特 點(diǎn)是一個(gè)專門的交叉,但對(duì) GA 的變異算子被替換為一個(gè) LS 運(yùn)算符。馬哈詹 [16]提出一種新的混合進(jìn)化算法( HGR)嵌入式系統(tǒng)作為一個(gè)高度專業(yè)化的,特定領(lǐng)域的交叉和本地搜索功能的最有效的算法對(duì)圖的著色寄存器分配問(wèn)題。肖爾茨 [18]制定,寄存器分配分區(qū)布爾二次優(yōu)化問(wèn)題,允許處理器的通用建模特殊性。 一些研究人員嘗試使用非圖形著色方法。 VPO解決這個(gè)問(wèn)題的異構(gòu)寄存器類別計(jì)算和沿 IR 樹(shù)傳播正確的減少寄存器類(注冊(cè)類產(chǎn)生交集的兩個(gè)指令的目標(biāo) /源操作數(shù)),允許寄存器進(jìn)行分配而不會(huì)插入額外的移動(dòng)操作。 [17]提出了一種泛化的程度 K測(cè)試,稱為 P, Q,檢驗(yàn),處理不規(guī)則的寄存器設(shè)置和注冊(cè)類。算法和啟發(fā)式已經(jīng)適應(yīng)在我們的基礎(chǔ)設(shè)施,以限制泄漏幾個(gè)寄存器時(shí)。喬治和蘋果 [11]提出迭代凝聚他們積極凝聚完全消除。他介紹了一個(gè)改進(jìn)的著色策略,產(chǎn)生更好的分配給許多圖形,蔡廷的方法失敗。雖然文獻(xiàn) [13]提出了一種采用整數(shù)規(guī)劃支持不規(guī)則的寄存器組,這種方法需要注冊(cè)不等式約束建模,使其難以在工業(yè)編譯器實(shí)現(xiàn)。 II. 相關(guān)工作 寄存器分配已經(jīng)在文獻(xiàn)中廣泛討論,已經(jīng)提出了許多方法。在嵌入式處理器的上下文中,圖著色的方法的最重要的限制是,它是基于這樣的假設(shè)的均勻的寄存器集。 溢出的一個(gè)或多個(gè)現(xiàn)場(chǎng)范圍內(nèi)創(chuàng)建了一個(gè)新的和不同的干擾圖。由于圖著色是 NP完全的,編譯器采用了啟發(fā)式搜索著色的方法,它不能保證找到所有的 k染色圖的 k著色 。 找到一個(gè)分配從 G,編譯器查找的 kG,即, k 種顏色分配到節(jié)點(diǎn) G 的相鄰節(jié)點(diǎn)總是有不同的顏色著色。 模型寄存器分配圖著色問(wèn)題,編譯器首先構(gòu)造一個(gè)干擾圖 G G 中對(duì)應(yīng)的節(jié)點(diǎn)的生活范圍,和邊緣的干擾。 圖著色分配寄存器 是 抽象的問(wèn)題,程序問(wèn)題干擾圖中的節(jié)點(diǎn)指定顏色的生活范圍。嵌入式處理器的編譯器必須應(yīng)對(duì)這些建筑的優(yōu)化,并能夠 14 利用它們。關(guān)鍵應(yīng)用程序,尤其是在嵌入式計(jì)算機(jī),工業(yè)編譯器都愿意接受較長(zhǎng)的編 譯時(shí)間,如果最終的代碼得到改善。通常情況下,這降低了運(yùn)行時(shí)的性能,并增加了功耗。當(dāng)然,這并不總是可能的,從而必須 從內(nèi)存中 轉(zhuǎn)移一些變量(溢出)。它的目標(biāo)是找到一種方法來(lái)映射到物理內(nèi)存中的位置(不論是主存儲(chǔ)器或機(jī)器寄存器)在程序中使用的臨時(shí)變量。我們提出了一個(gè)混合進(jìn)化算法對(duì)圖的著色寄存器分配的基礎(chǔ)上一個(gè)新的交叉運(yùn)營(yíng)商稱為無(wú)沖突集( CCS)和一個(gè)新的本地搜索功能交叉的問(wèn)題。相比內(nèi)存,訪問(wèn)寄存器的速度要快得多,但它們是稀缺資源,可以非常有效地利用。阿里 教授拉姆 Meghe研究所的技術(shù)研究, Badnera,印度 Amravati 電子郵件: 摘要: 嵌 入式系統(tǒng)有一個(gè)不斷增長(zhǎng)的需求,優(yōu)化編譯器, 用有限的通用寄存器設(shè)置 以生產(chǎn)高品質(zhì)的代碼 。 other parts of the framework are not changed. We have used x86architecture with its limited register file and register usage constraint. Our machine is a Ghz Pentium 4 with 1 GB of RAM. We applied the algorithms to 6 embedded and real time applications. For each application, the population size is set to 20 and LS iterations are all set from 500 to 2021. For each application, we run the allocator five times and average the results. Performance evaluation was done with respect to the following parameters : number of memory accesses required , spill loads, spill costs, the pile time needed by the allocator, the execution time of the generated code, including the number of load/store generated , size of allocator itself. Figure 1. Number of memory accesses TABLE I SPILL COST OF INSTRUCTION 10 TABLE II RATIO OF THE SPILL LOADS PRODUCED Comparison of number of memory accesses is the parison of total static number of load and store instructions inserted by each register allocator. Figure 1. pares the number of load/store instructions in the assembly code. The HEA inserts fewer memory access instruction than ORA, % fewer memory access instruction than IRC and % fewer memory access instruction than GPX. Table I give the spill costs of all the algorithms. For lowest population size the spill cost is less. For each benchmark given, the spill cost of each variable is set to the number of occurrences of the variables. The spill costs of the variables are the average values. The IRC algorithm gives the highest total spill cost. The HEA algorithm produce less spill cost than the maximum in four tests and it outperforms the IRC and GPX algorithms in all tests. Geic operator systematically eliminates lowquality solutions from the population, preserve diversity between solutions, and provide better input for local search. A small amount of spill cost is due to function callers and callees saving many contents of registers in order to preserve correct program semantics. Spill loads refers to additional number of loads incurred by the allocation algorithm. Spill loads give an indication of how well the allocator is able to perform the task. The number of spill loads is highly correlated with application running time. We calculate the dynamic number of spill loads added to each module of the program by multiplying the number of spill loads added to each block by the number of times that block is executed. Then we sum the number of dynamic spill loads added to each block. We obtain the dynamic number of spill loads for the entire program by summing the number of dynamic spill loads added to each module. Table II shows the spill loads for each allocator as a ratio to spill loads generated by IRC allocator (considered as base allocator for parison). The numbers are given as geometric mean. We see improvements of HEA over other allocator. Table III gives results in terms of pile time and run time. We observe that in most of the cases, the performance of different allocators were almost similar in terms of pile 11 times。 number of registers , k Output : best configuration begin P = generate_population(|P|) iter = 0 while ( iter MaxIter or popudiversity0 ) do (p1, p2) = select_parents(P) p = crossover (p1, p2) p = local_search( p, L ) P = update_population(P,p) iter =iter + 1 endwhile end The algorithm first builds an initial population of configurations (gen