【正文】
ecision floatingpoint value in xmm2/m32.DEST[310] APPROX ((SRC[310]))。DEST[9564] APPROXIMATE((SRC[9564]))。PAVGB/PAVGWPacked AverageOpcodeInstructionDescription0F E0 /rPAVGB mm1, mm2/m64Average packed unsigned byte integers from mm2/m64 and mm1, with rounding.66 0F E0, /rPAVGB xmm1, xmm2/m128Average packed unsigned byte integers from xmm2/m128 and xmm1, with rounding.0F E3 /rPAVGW mm1, mm2/m64Average packed unsigned word integers from mm2/m64 and mm1, with rounding.66 0F E3 /rPAVGW xmm1, xmm2/m128Average packed unsigned word integers from xmm2/m128 and xmm1, with rounding.PMAXSWPacked Signed Integer Word MaximumOpcodeInstructionDescription0F EE /rPMAXSW mm1, mm2/m64Compare signed word integers in mm2/m64 and mm1 for maximum values.66 0F EE /rPMAXSW xmm1, xmm2/m128Compare signed word integers in xmm2/m128 and xmm1 for maximum values.PMAXUBPacked Unsigned Integer Byte MaximumOpcodeInstructionDescription0F DE /rPMAXUB mm1, mm2/m64Compare unsigned byte integers in mm2/m64 and mm1 for maximum values.66 0F DE /rPMAXUB xmm1, xmm2/m128Compare unsigned byte integers in xmm2/m128 and xmm1 for maximum values.PMINSWPacked Signed Integer Word MinimumOpcodeInstructionDescription0F EA /rPMINSW mm1, mm2/m64Compare signed word integers in mm2/m64 and mm1 for minimum values.66 0F EA /rPMINSW xmm1, xmm2/m128Compare signed word integers in xmm2/m128 and xmm1 for minimum values.PMINUBPacked Unsigned Integer Byte MinimumOpcodeInstructionDescription0F DA /rPMINUB mm1, mm2/m64Compare unsigned byte integers in mm2/m64 and mm1 for minimum values.66 0F DA /rPMINUB xmm1, xmm2/m128Compare unsigned byte integers in xmm2/m128 and xmm1 for minimum values.RCPPSPacked SinglePrecision FloatingPoint ReciprocalOpcodeInstructionDescription0F 53 /rRCPPS xmm1, xmm2/m128Returns to xmm1 the packed approximations of the reciprocals of the packed singleprecision floatingpoint values in xmm2/m128.DEST[310] APPROXIMATE((SRC[310]))。DIVSSScalar SinglePrecision FloatingPoint Divide DIVSS xmm0, xmm1/m32DEST[310] DEST[310] / SRC[310]。DIVSDScalar DoublePrecision FloatingPoint Divide DIVSD xmm0, xmm1/m64DEST[630] DEST[630] / SRC[630]。DEST[9564] DEST[9564] / (SRC[9564])。DIVPSPacked SinglePrecision FloatingPoint Divide DIVPS xmm0, xmm1/m128DEST[310] DEST[310] / (SRC[310])。DIVPDPacked DoublePrecision FloatingPoint Divide DIVPD xmm0, xmm1/m128DEST[630] DEST[630] / (SRC[630])。MULSSScalar SingleFP MultiplyOpcodeInstructionDescriptionF3 0F 59 /rMULSS xmm1, xmm2/m32Multiply the low singleprecision floatingpoint value in xmm2/mem by the low singleprecision floatingpoint value in xmm1.DEST[310] DEST[310] * SRC[310]。MULSDScalar DoublePrecision FloatingPoint MultiplyOpcodeInstructionDescriptionF2 0F 59 /rMULSD xmm1, xmm2/m64Multiply the low doubleprecision floatingpoint value in xmm2/mem64 by low doubleprecision floatingpoint value in xmm1.DEST[630] DEST[630] * xmm2/m64[630]。DEST[9564] DEST[9564] * SRC[9564]。MULPSPacked SinglePrecision FloatingPoint MultiplyOpcodeInstructionDescription0F 59 /rMULPS xmm1, xmm2/m128Multiply packed singleprecision floatingpoint values in xmm2/mem by xmm1.DEST[310] DEST[310] * SRC[310]。MULPDPacked DoublePrecision FloatingPoint MultiplyOpcodeInstructionDescription66 0F 59 /rMULPD xmm1, xmm2/m128Multiply packed doubleprecision floatingpoint values in xmm2/m128 by xmm1.DEST[630] DEST[630] * SRC[630]。PMULUDQ instruction with 128Bit operands:DEST[630] DEST[310] * SRC[310]。 differences are then summed to produce an unsigned word integer result.66 0F F6 /rPSADBW xmm1, xmm2/m128Absolute difference of packed unsigned byte integers from xmm2 /m128 and xmm1。附:SSE2指令整理算術(shù)(Arithmetic)指令:ADDPDPacked DoublePrecision FloatingPoint Add SSE2 2個double對應(yīng)相加ADDPD xmm0, xmm1/m128ADDPSPacked SinglePrecision FloatingPoint Add SSE 4個float對應(yīng)相加ADDPS xmm0, xmm1/m128ADDSDScalar DoublePrecision FloatingPoint Add 1個double(低端)對應(yīng)相加 SSE2ADDSD xmm0, xmm1/m64ADDSSScalar SinglePrecision FloatingPoint Add SSE 1個float(低端)對應(yīng)相加ADDSS xmm0, xmm1/m32PADDB/PADDW/PADDDPacked AddOpcodeInstructionDescription0F FC /rPADDB mm, mm/m64Add packed byte integers from mm/m64 and mm.66 0F FC /rPADDB xmm1,xmm2/m128Add packed byte integers from xmm2/m128 and xmm1.0F FD /rPADDW mm, mm/m64Add packed word integers from mm/m64 and mm.66 0F FD /rPADDW xmm1, xmm2/m128Add packed word integers from xmm2/m128 and xmm1.0F FE /rPADDD mm, mm/m64Add packed doubleword integers from mm/m64 and mm.66 0F FE /rPADDD xmm1, xmm2/m128Add packed doubleword integers from xmm2/m128 and xmm1.PADDQPacked Quadword AddOpcodeInstructionDescription0F D4 /rPADDQ mm1,mm2/m64Add quadword integer mm2/m64 to mm166 0F D4 /rPADDQ xmm1,xmm2/m128Add packed quadword integers xmm2/m128 to xmm1PADDSB/PADDSWPacked Add with SaturationOpcodeInstructionDescription0F EC /rPADDSB mm, mm/m64Add packed signed byte integers from mm/m64 and mm and saturate the results.66 0F EC /rPADDSB xmm1,2. 讀16字節(jié)數(shù)據(jù)到寄存器時,由于圖像地址不能保證是16字節(jié)對齊,因此需用更多時鐘周期的MOVDQU指令(6個以上時鐘周期);如能使地址16字節(jié)對齊,則可用MOVDQA指令(1個時