freepeople性欧美熟妇, 色戒完整版无删减158分钟hd, 无码精品国产vα在线观看DVD, 丰满少妇伦精品无码专区在线观看,艾栗栗与纹身男宾馆3p50分钟,国产AV片在线观看,黑人与美女高潮,18岁女RAPPERDISSSUBS,国产手机在机看影片

正文內(nèi)容

雙三次插值及優(yōu)化-全文預(yù)覽

2024-08-28 04:18 上一頁面

下一頁面
  

【正文】 d2按位與ANDPS xmm0,xmm1/m128ORPDBitwise Logical OR of DoublePrecision FloatingPoint ValuesOpcodeInstructionDescription66 0F 56 /rORPD xmm1, xmm2/m128Bitwise OR of xmm2/m128 and xmm1.DEST[1270] DEST[1270] BitwiseOR SRC[1270]。r32[1] SRC[15]。* repeat operation for bytes 2 through 6。 DEST[1] SRC[127]。 DEST[314] 0000000H。MOVHPD instruction for XMM to memory move:DEST SRC[12764] 。MOVDQA Move Aligned Double QuadwordInstructionDescriptionMOVDQA xmm1, xmm2/m128Move aligned double quadword from xmm2/m128 to xmm1.MOVDQA xmm2/m128, xmm1Move aligned double quadword from xmm1 to xmm2/m128.MOVDQU Move Unaligned Double QuadwordInstructionDescriptionMOVDQU xmm1, xmm2/m128Move unaligned double quadword from xmm2/m128 to xmm1.MOVDQU xmm2/m128, xmm1Move unaligned double quadword from xmm1 to xmm2/m128.MOVHLPS Move Packed SinglePrecision FloatingPoint Values High to LowInstructionDescriptionMOVHLPS xmm1, xmm2Move two packed singleprecision floatingpoint values from high quadword of xmm2 to low quadword of xmm1.DEST[630] SRC[12764]。RSQRTSSScalar SinglePrecision FloatingPoint Square Root ReciprocalOpcodeInstructionDescriptionF3 0F 52 /rRSQRTSS xmm1, xmm2/m32Returns to xmm1 an approximation of the reciprocal of the square root of the low singleprecision floatingpoint value in xmm2/m32.DEST[310] APPROXIMATE((SRC[310]))。RSQRTPSPacked SinglePrecision FloatingPoint Square Root ReciprocalOpcodeInstructionDescription0F 52 /rRSQRTPS xmm1, xmm2/m128Returns to xmm1 the packed approximations of the reciprocals of the square roots of the packed singleprecision floatingpoint values in xmm2/m128.DEST[310] APPROXIMATE((SRC[310]))。DEST[9564] APPROXIMATE((SRC[9564]))。DIVSSScalar SinglePrecision FloatingPoint Divide DIVSS xmm0, xmm1/m32DEST[310] DEST[310] / SRC[310]。DEST[9564] DEST[9564] / (SRC[9564])。DIVPDPacked DoublePrecision FloatingPoint Divide DIVPD xmm0, xmm1/m128DEST[630] DEST[630] / (SRC[630])。MULSDScalar DoublePrecision FloatingPoint MultiplyOpcodeInstructionDescriptionF2 0F 59 /rMULSD xmm1, xmm2/m64Multiply the low doubleprecision floatingpoint value in xmm2/mem64 by low doubleprecision floatingpoint value in xmm1.DEST[630] DEST[630] * xmm2/m64[630]。MULPSPacked SinglePrecision FloatingPoint MultiplyOpcodeInstructionDescription0F 59 /rMULPS xmm1, xmm2/m128Multiply packed singleprecision floatingpoint values in xmm2/mem by xmm1.DEST[310] DEST[310] * SRC[310]。PMULUDQ instruction with 128Bit operands:DEST[630] DEST[310] * SRC[310]。附:SSE2指令整理算術(shù)(Arithmetic)指令:ADDPDPacked DoublePrecision FloatingPoint Add SSE2 2個double對應(yīng)相加ADDPD xmm0, xmm1/m128ADDPSPacked SinglePrecision FloatingPoint Add SSE 4個float對應(yīng)相加ADDPS xmm0, xmm1/m128ADDSDScalar DoublePrecision FloatingPoint Add 1個double(低端)對應(yīng)相加 SSE2ADDSD xmm0, xmm1/m64ADDSSScalar SinglePrecision FloatingPoint Add SSE 1個float(低端)對應(yīng)相加ADDSS xmm0, xmm1/m32PADDB/PADDW/PADDDPacked AddOpcodeInstructionDescription0F FC /rPADDB mm, mm/m64Add packed byte integers from mm/m64 and mm.66 0F FC /rPADDB xmm1,xmm2/m128Add packed byte integers from xmm2/m128 and xmm1.0F FD /rPADDW mm, mm/m64Add packed word integers from mm/m64 and mm.66 0F FD /rPADDW xmm1, xmm2/m128Add packed word integers from xmm2/m128 and xmm1.0F FE /rPADDD mm, mm/m64Add packed doubleword integers from mm/m64 and mm.66 0F FE /rPADDD xmm1, xmm2/m128Add packed doubleword integers from xmm2/m128 and xmm1.PADDQPacked Quadword AddOpcodeInstructionDescription0F D4 /rPADDQ mm1,mm2/m64Add quadword integer mm2/m64 to mm166 0F D4 /rPADDQ xmm1,xmm2/m128Add packed quadword integers xmm2/m128 to xmm1PADDSB/PADDSWPacked Add with SaturationOpcodeInstructionDescription0F EC /rPADDSB mm, mm/m64Add packed signed byte integers from mm/m64 and mm and saturate the results.66 0F EC /rPADDSB xmm1, 。 cpuid。 3. 算法優(yōu)化 由于雙三次插值計算一個點的坐標(biāo)需要其周圍16個點,更有多達20次的乘法及15次的加法,計算量可以說是非常大,勢必要進行優(yōu)化。 1. 獲取16個點的坐標(biāo)PP2……P16 2. 由插值核計算公式S(x) 分別計算出x、y方向的插值核向量Su、Sv 3. 進行矩陣運算,得到插值結(jié)果 iTemp1 = Su0 * P1 + Su1 * P5 + Su2 * P9 + Su3 * P13 iTemp2 = Su0 * P2 + Su1 * P6 + Su2 * P10 + Su3 * P14 iTemp3 = Su0 * P3 + Su1 * P7 + Su2 * P11 + Su3 * P15 iTemp4 = Su0 * P4 + Su1 * P8 + Su2 * P12 + Su3 * P16 iResult = Sv1 * iTemp1 + Sv2 * iTemp2 + Sv3 * iTemp3 + Sv4 * iTemp4 4. 在得到插值結(jié)果圖后,我們發(fā)現(xiàn)圖像中有“毛刺”,因此對插值結(jié)果做了個后處理,即:設(shè)該點在原圖中的像素值為pSrc,若abs(iResult pSrc) 大于某閾值,我們認為插值后的點可能污染原圖,因此用原像素值pSrc代替。 __asm { mov eax, 1。 mov g_bSSE2, 1 NotSupport: } 支持SSE2的CPU引入了8個128位的寄存器,這樣一個寄存器中就可以存放4個點(RGB),有利于并行計算。3. 為了消除除法及浮點運算,對權(quán)值放大256倍,這樣在計算插值核時,必須用2Bytes來表示1個系數(shù),而圖像數(shù)據(jù)都是1Byte,這樣在對齊做乘法時,要浪費一半的SSE2寄存器的空間,導(dǎo)致運算時間變長;而若降低插值核的精度,使其在1Byte表示范圍內(nèi)時,運算的精度又大為下降 ;4. 對各指令的周期以及 若干行指令是否能夠并行流水缺乏經(jīng)驗和認識。 the 8 low differences and 8 high differences are then summed separately to produce two word integer results.PSUBB/PSUBW/PSUBDPacked SubtractOpcodeInstructionDescription0F F8 /rPSUBB mm, mm/m64Subtract packed byte integers in mm/m64 from packed byte integers in mm.66 0F F8 /rPSUBB xmm1, xmm2/m128Subtract packed byte integers in xmm2/m128 from packed byte integers in xmm1.0F F9 /rPSUBW mm, mm/m64Subtract packed word integers in mm/m64 from packed word integers in mm.66 0F F9 /rPSUBW xmm1, xmm2/m128Subtract packed word integers in xmm2/m128 from packed word integers in xmm1.0F FA /rPSUBD mm, mm/m64Subtract packed doubleword integers in mm/m64 from packed doubleword integers in mm.66 0F FA /rPSUBD xmm1, xmm
點擊復(fù)制文檔內(nèi)容
黨政相關(guān)相關(guān)推薦
文庫吧 www.dybbs8.com
備案圖鄂ICP備17016276號-1