PERFLAB实验报告
【实验目的】优化smooth函数。 【函数介绍】
Rotate:将图片逆时针旋转90°。 Smooth:用3*3的窗进行图片的均值滤波。
【程序优化】
Smooth: 1.1初始代码:
void naive_smooth(int dim, pixel *src, pixel *dst) {
int i, j;
for (i = 0; i < dim; i++) }
代码频繁调用avg函数,avg函数也频繁调研initialize_pixel_sum、sccumulate_sum、assign_sum_to_pixel函数。应该减少函数调用的时间开销。因此将所有函数都写在smooth内部,不再调用avg。
像素点分成图片四个角、图片四条边、图片内部三块分别进行处理。对角而言只需要4个像素点的均值,对于边而言为6个像素点均值,图片内部则需要9个像素点均值。
如图所示图片内部对于每行相邻的像素点A、B,滤波时其6个像素块重叠(绿色),对B而言,完全可以用上一步处理A后所得到的值,仅需加上B右边3个像素点(蓝色),并减去A多余的像素点(红色)即可。 每次所需要的数据由9个变为7个。
1.2改动代码:
void smooth(int dim, pixel *src, pixel *dst) {
dst[0].red=(src[0].red+src[1].red+src[dim].red+src[dim+1].red)>>2; //用\代替\节省时间 >2;
dst[dim*dim-1].red=(src[dim*dim-1].red+src[dim*dim-2].red+src[(dim-1)*dim-1].red+src[(dim-1)*dim-2].red)>>2; dst[dim*dim-1].blue=(src[dim*dim-1].blue+src[dim*dim-2].blue+src[(dim-1)*dim-1].blue+src[(dim-1)*dim-2].blue)>>2; dst[dim*dim-1].green=(src[dim*dim-1].green+src[dim*dim-2].green+src[(dim-1)*dim-1].green+src[(dim-1)*dim-2].green)>>dst[(dim-1)*dim].green=(src[(dim-1)*dim].green+src[(dim-1)*dim+1].green+src[(dim-2)*dim].green+src[(dim-2)*dim+1].green)>>2;
dst[(dim-1)*dim].red=(src[(dim-1)*dim].red+src[(dim-1)*dim+1].red+src[(dim-2)*dim].red+src[(dim-2)*dim+1].red)>>2; dst[(dim-1)*dim].blue=(src[(dim-1)*dim].blue+src[(dim-1)*dim+1].blue+src[(dim-2)*dim].blue+src[(dim-2)*dim+1].blue)>dst[dim-1].red=(src[dim-1].red+src[dim-2].red+src[dim*2-2].red+src[dim*2-1].red)>>2; dst[dim-1].blue=(src[dim-1].blue+src[dim-2].blue+src[dim*2-2].blue+src[dim*2-1].blue)>>2; dst[dim-1].green=(src[dim-1].green+src[dim-2].green+src[dim*2-2].green+src[dim*2-1].green)>>2; dst[0].blue=(src[0].blue+src[1].blue+src[dim].blue+src[dim+1].blue)>>2; dst[0].green=(src[0].green+src[1].green+src[dim].green+src[dim+1].green)>>2; int i,j,lastr,lastb,lastg; int row = dim; int curr;
for (j = 0; j < dim; j++)
dst[RIDX(i, j, dim)] = avg(dim, i, j, src);
2; //四个角
for (j=1; j < dim-1; j++) {
dst[j].red=(src[j].red+src[j-1].red+src[j+1].red+src[j+dim].red+src[j+1+dim].red+src[j-1+dim].red)/6;
dst[j].green=(src[j].green+src[j-1].green+src[j+1].green+src[j+dim].green+src[j+1+dim].green+src[j-1+dim].green)/6; dst[j].blue=(src[j].blue+src[j-1].blue+src[j+1].blue+src[j+dim].blue+src[j+1+dim].blue+src[j-1+dim].blue)/6; }
for (j=dim*(dim-1)+1; j < dim*dim-1; j++) {
dst[j].red=(src[j].red+src[j-1].red+src[j+1].red+src[j-dim].red+src[j+1-dim].red+src[j-1-dim].red)/6;
dst[j].green=(src[j].green+src[j-1].green+src[j+1].green+src[j-dim].green+src[j+1-dim].green+src[j-1-dim].green)/6; dst[j].blue=(src[j].blue+src[j-1].blue+src[j+1].blue+src[j-dim].blue+src[j+1-dim].blue+src[j-1-dim].blue)/6; }
for (i=dim; i < dim*(dim-1); i+=dim) {
dst[i].red=(src[i].red+src[i-dim].red+src[i+1].red+src[i+dim].red+src[i+1+dim].red+src[i-dim+1].red)/6;
dst[i].green=(src[i].green+src[i-dim].green+src[i+1].green+src[i+dim].green+src[i+1+dim].green+src[i-dim+1].green)/6; dst[i].blue=(src[i].blue+src[i-dim].blue+src[i+1].blue+src[i+dim].blue+src[i+1+dim].blue+src[i-dim+1].blue)/6; }
for (i=dim+dim-1; i < dim*dim-1; i+=dim) {
dst[i].red=(src[i].red+src[i-1].red+src[i-dim].red+src[i+dim].red+src[i-dim-1].red+src[i-1+dim].red)/6;
dst[i].green=(src[i].green+src[i-1].green+src[i-dim].green+src[i+dim].green+src[i-dim-1].green+src[i-1+dim].green)/6; dst[i].blue=(src[i].blue+src[i-1].blue+src[i-dim].blue+src[i+dim].blue+src[i-dim-1].blue+src[i-1+dim].blue)/6; } //四条边
lastg=src[row-dim].green+src[row-dim+1].green+src[row-dim+2].green+src[row].green+src[row+1].green+src[row+2].green+
dst[row+1].red=lastr/9; dst[row+1].blue=lastb/9;
dst[row+1].green=lastg/9; //内部每行第一个点的滤波结果,由于循环内部耗费时间会严重影响程序的耗时,因此对于lastr等的计算,for(j=2;j curr=row+j; src[row+dim].green+src[row+dim+1].green+src[row+dim+2].green; lastb=src[row-dim].blue+src[row-dim+1].blue+src[row-dim+2].blue+src[row].blue+src[row+1].blue+src[row+2].blue+src[row+dim].blue+src[row+dim+1].blue+src[row+dim+2].blue; lastr=src[row-dim].red+src[row-dim+1].red+src[row-dim+2].red+src[row].red+src[row+1].red+src[row+2].red+src[row+dim].red+src[row+dim+1].red+src[row+dim+2].red; for(i=1;i 像素点的顺序都是计算最优的 lastr=lastr-src[curr-dim-2].red+src[curr-dim+1].red-src[curr-2].red+src[curr+1].red-src[curr+dim-2].red+src[curr+dim lastb=lastb-src[curr-dim-2].blue+src[curr-dim+1].blue-src[curr-2].blue+src[curr+1].blue-src[curr+dim-2].blue+src[cur lastg=lastg-src[curr-dim-2].green+src[curr-dim+1].green-src[curr-2].green+src[curr+1].green-src[curr+dim-2].green+sr } dst[curr].red=lastr/9; dst[curr].blue=lastb/9; dst[curr].green=lastg/9; //内部其他点参考该行前一个点的像素值得到结果 +1].red; r+dim+1].blue; c[curr+dim+1].green; } row+=dim; } //图片内部 2.3程序跑分: 2.4实验出现的问题: Smooth刚开始在处理图片内部时,循环内部的代码是: 每行第一个像素点的滤波均值(这里以red为例) dst[row+1].red=(src[row-dim].red+src[row-dim+1].red+src[row-dim+2].red+src[row].red+src[row+1].red+src[row+2].red+src[row+dim].red+src[row+dim+1].red+src[row+dim+2].red)/9,那么lastr=dst[row+1].red (即该点周围九个像素点rgb的均值),则该行其余像素点都可以表示为dst[curr].red=lastr+(-src[curr-dim-2].red+src[curr-dim+1].red-src[curr-2].red+src[curr+1].red-src[curr+dim-2].red+src[curr+dim+1].red)/9,那么lastr=dst[curr].red。 运行后报错。观察发现所得rgb与理论的rgb相差不大,可能是因为计算过程中对数字的有效位数限制导致最后结果的差异。在代码中lastr为上一点计算rgb的均值,因此可能是在做除法过程中出现估算,导致相加后的差异。因此代码改动为现在的代码,即令lastr为上一点滤波所需像素点rgb的和。 对于每行第一个点: lastr=src[row-dim].red+src[row-dim+1].red+src[row-dim+2].red+src[row].red+src[row+1].red+src[row+2].red+src[row+dim].red+src[row+dim+1].red+src[row+dim+2].red,那么dst[row+1].red=lastr/9。 对于该行其他像素点: lastr=lastr-src[curr-dim-2].red+src[curr-dim+1].red-src[curr-2].red+src[curr+1].red-src[curr+dim-2].red+src[curr+dim+1].red,那么dst[curr].red=lastr/9。 运行后正确,说明猜测正确。