On this post I’m going to implement the canny edge detector algorithm made by John Canny.

I will provide multiple implementations on CUDA:

  • The canny edge detector CPU approach.
  • The canny edge detector GPU 1D approach.
  • The canny edge detector GPU 2D approach.

The Process of Canny edge detection algorithm can be broken down to 5 different steps:

  • Apply Gaussian filter to smooth the image in order to remove the noise.
  • Find the intensity gradients of the image.
  • Apply non-maximum suppression to get rid of spurious response to edge detection.
  • Apply double threshold to determine potential edges.
  • Track edge by hysteresis: Finalise the detection of edges by suppressing all the other edges that are weak and not connected to strong edges.

The explanation in depth can be found here.

I’m not going to demonstrate or explain the equations, that is up to you my dear reader. But what I’m going to do is to explain to you my very own implementation of this algorithm step by step.

First we need to know where do we want to go, assuming the canny edge detector algorithm is a box I want something like this:

cannyedgedetectorglobalprocess
Global Process

OK, the global process seems pretty straight forward, an input image, do some process of the image, get output image.

For the box is going to be something like this:

cannyalgorithm

Canny edge detector CPU approach

Gaussian filter for noise reduction:

Supposing my input image has dimensions width x Height and applying a 5×5 Gaussian filter:

gassianfilter5x5

the noise reduction of an image can go like this:


for(i=2; i<height-2; i++)
		for(j=2; j<width-2; j++)
		{
			// Noise reduction
			NR[i*width+j] =
				 (2.0*im[(i-2)*width+(j-2)] +  4.0*im[(i-2)*width+(j-1)] +  5.0*im[(i-2)*width+(j)] +  4.0*im[(i-2)*width+(j+1)] + 2.0*im[(i-2)*width+(j+2)]
				+ 4.0*im[(i-1)*width+(j-2)] +  9.0*im[(i-1)*width+(j-1)] + 12.0*im[(i-1)*width+(j)] +  9.0*im[(i-1)*width+(j+1)] + 4.0*im[(i-1)*width+(j+2)]
				+ 5.0*im[(i  )*width+(j-2)] + 12.0*im[(i  )*width+(j-1)] + 15.0*im[(i  )*width+(j)] + 12.0*im[(i  )*width+(j+1)] + 5.0*im[(i  )*width+(j+2)]
				+ 4.0*im[(i+1)*width+(j-2)] +  9.0*im[(i+1)*width+(j-1)] + 12.0*im[(i+1)*width+(j)] +  9.0*im[(i+1)*width+(j+1)] + 4.0*im[(i+1)*width+(j+2)]
				+ 2.0*im[(i+2)*width+(j-2)] +  4.0*im[(i+2)*width+(j-1)] +  5.0*im[(i+2)*width+(j)] +  4.0*im[(i+2)*width+(j+1)] + 2.0*im[(i+2)*width+(j+2)])
				/159.0;
		}

In NR I’m storing the image after applying the Gaussian Filter.

Finding the intensity gradient of the image and Non-maximum suppression.

To find the intensity gradient of the image.

gphi

and to apply the Non-maximum suppression we need to determine if the gradient magnitude (G) is a local maximum :

  • if Θ = 0º (N or S) is border if gradient  G >  Ge  and Gw.
  • if Θ = 90º (E or W) is border if gradient G > Gn and Gs.
  • if Θ = 135º (NE or SW) is border if gradient G > Gnw and Gse.
  • if Θ = 45º (NW or SW) is border if gradient G > Gne and Gsw.

For this step I’ll be needing NR  


for(i=2; i<height-2; i++)
		for(j=2; j<width-2; j++)
		{
			// Intensity gradient of the image
			Gx[i*width+j] =
				 (1.0*NR[(i-2)*width+(j-2)] +  2.0*NR[(i-2)*width+(j-1)] +  (-2.0)*NR[(i-2)*width+(j+1)] + (-1.0)*NR[(i-2)*width+(j+2)]
				+ 4.0*NR[(i-1)*width+(j-2)] +  8.0*NR[(i-1)*width+(j-1)] +  (-8.0)*NR[(i-1)*width+(j+1)] + (-4.0)*NR[(i-1)*width+(j+2)]
				+ 6.0*NR[(i  )*width+(j-2)] + 12.0*NR[(i  )*width+(j-1)] + (-12.0)*NR[(i  )*width+(j+1)] + (-6.0)*NR[(i  )*width+(j+2)]
				+ 4.0*NR[(i+1)*width+(j-2)] +  8.0*NR[(i+1)*width+(j-1)] +  (-8.0)*NR[(i+1)*width+(j+1)] + (-4.0)*NR[(i+1)*width+(j+2)]
				+ 1.0*NR[(i+2)*width+(j-2)] +  2.0*NR[(i+2)*width+(j-1)] +  (-2.0)*NR[(i+2)*width+(j+1)] + (-1.0)*NR[(i+2)*width+(j+2)]);

			Gy[i*width+j] =
				 ((-1.0)*NR[(i-2)*width+(j-2)] + (-4.0)*NR[(i-2)*width+(j-1)] +  (-6.0)*NR[(i-2)*width+(j)] + (-4.0)*NR[(i-2)*width+(j+1)] + (-1.0)*NR[(i-2)*width+(j+2)]
				+ (-2.0)*NR[(i-1)*width+(j-2)] + (-8.0)*NR[(i-1)*width+(j-1)] + (-12.0)*NR[(i-1)*width+(j)] + (-8.0)*NR[(i-1)*width+(j+1)] + (-2.0)*NR[(i-1)*width+(j+2)]
				+    2.0*NR[(i+1)*width+(j-2)] +    8.0*NR[(i+1)*width+(j-1)] +    12.0*NR[(i+1)*width+(j)] +    8.0*NR[(i+1)*width+(j+1)] +    2.0*NR[(i+1)*width+(j+2)]
				+    1.0*NR[(i+2)*width+(j-2)] +    4.0*NR[(i+2)*width+(j-1)] +     6.0*NR[(i+2)*width+(j)] +    4.0*NR[(i+2)*width+(j+1)] +    1.0*NR[(i+2)*width+(j+2)]);

			G[i*width+j]   = sqrtf((Gx[i*width+j]*Gx[i*width+j])+(Gy[i*width+j]*Gy[i*width+j]));	//G = √Gx²+Gy²
			phi[i*width+j] = atan2f(fabs(Gy[i*width+j]),fabs(Gx[i*width+j]));

			if(fabs(phi[i*width+j])<=PI/8 )
				phi[i*width+j] = 0;
			else if (fabs(phi[i*width+j])<= 3*(PI/8))
				phi[i*width+j] = 45;
			else if (fabs(phi[i*width+j]) <= 5*(PI/8))
				phi[i*width+j] = 90;
			else if (fabs(phi[i*width+j]) <= 7*(PI/8))
				phi[i*width+j] = 135;
			else phi[i*width+j] = 0;
	}

Double threshold and Edge tracking by hysteresis.

 

// Edge
	for(i=3; i<height-3; i++)
		for(j=3; j<width-3; j++) 		{ 			pedge[i*width+j] = 0; 			if(phi[i*width+j] == 0){ 				if(G[i*width+j]>G[i*width+j+1] && G[i*width+j]>G[i*width+j-1]) //edge is in N-S
					pedge[i*width+j] = 1;

			} else if(phi[i*width+j] == 45) {
				if(G[i*width+j]>G[(i+1)*width+j+1] && G[i*width+j]>G[(i-1)*width+j-1]) // edge is in NW-SE
					pedge[i*width+j] = 1;

			} else if(phi[i*width+j] == 90) {
				if(G[i*width+j]>G[(i+1)*width+j] && G[i*width+j]>G[(i-1)*width+j]) //edge is in E-W
					pedge[i*width+j] = 1;

			} else if(phi[i*width+j] == 135) {
				if(G[i*width+j]>G[(i+1)*width+j-1] && G[i*width+j]>G[(i-1)*width+j+1]) // edge is in NE-SW
					pedge[i*width+j] = 1;
			}
		}

	// Hysteresis Thresholding
	lowthres = level/2;
	hithres  = 2*(level);

	for(i=3; i<height-3; i++)
		for(j=3; j<width-3; j++) 		{ 			if(G[i*width+j]>hithres && pedge[i*width+j])
				image_out[i*width+j] = 255;
			else if(pedge[i*width+j] && G[i*width+j]>=lowthres && G[i*width+j]<hithres)
				// check neighbours 3x3
				for (ii=-1;ii<=1; ii++)
					for (jj=-1;jj<=1; jj++) 						if (G[(i+ii)*width+j+jj]>hithres)
							image_out[i*width+j] = 255;
		}

Entire code implementing the CPU approach can be found on my git repository.

git clone https://adri1988@bitbucket.org/adri1988/cannyedgedetectorcpu.git

The canny edge detector GPU 1D approach.

We already know all the theory so I’m gonna jump right in to the kernels.

I’ve coded four different kernels, one for each step of the algorithm.

Noise reduction

 

__global__ void noiseReduction1d(float *d_NR, float *d_im, int height, int width)
{

	int j;
	j = blockIdx.x * blockDim.x + threadIdx.x;

	if (j>=2 && j < width-2)
		for (int i=2;i<height-2;i++){

					d_NR[i*width+j] =  	(2.0*d_im[(i-2)*width+(j-2)] +  4.0*d_im[(i-2)*width+(j-1)] +  5.0*d_im[(i-2)*width+(j)] +  4.0*d_im[(i-2)*width+(j+1)] + 2.0*d_im[(i-2)*width+(j+2)]
									+ 4.0*d_im[(i-1)*width+(j-2)] +  9.0*d_im[(i-1)*width+(j-1)] + 12.0*d_im[(i-1)*width+(j)] +  9.0*d_im[(i-1)*width+(j+1)] + 4.0*d_im[(i-1)*width+(j+2)]
									+ 5.0*d_im[(i  )*width+(j-2)] + 12.0*d_im[(i  )*width+(j-1)] + 15.0*d_im[(i  )*width+(j)] + 12.0*d_im[(i  )*width+(j+1)] + 5.0*d_im[(i  )*width+(j+2)]
									+ 4.0*d_im[(i+1)*width+(j-2)] +  9.0*d_im[(i+1)*width+(j-1)] + 12.0*d_im[(i+1)*width+(j)] +  9.0*d_im[(i+1)*width+(j+1)] + 4.0*d_im[(i+1)*width+(j+2)]
									+ 2.0*d_im[(i+2)*width+(j-2)] +  4.0*d_im[(i+2)*width+(j-1)] +  5.0*d_im[(i+2)*width+(j)] +  4.0*d_im[(i+2)*width+(j+1)] + 2.0*d_im[(i+2)*width+(j+2)])
									/159.0;
	}

}

Finding the intensity gradient of the image and Non-maximum suppression

 

__global__ void intensityGradienteIm1d(float *d_G, float *d_Gx, float *d_Gy ,float *d_phi ,float *d_NR ,int height, int width)
{
	int j;
	float PI = 3.141593;
	j = blockIdx.x * blockDim.x + threadIdx.x;

	if (j>=2 && j < width-2)
		for (int i=2;i<height-2;i++){
				// Intensity gradient of the image
			d_Gx[i*width+j] =
					 (1.0*d_NR[(i-2)*width+(j-2)] +  2.0*d_NR[(i-2)*width+(j-1)] +  (-2.0)*d_NR[(i-2)*width+(j+1)] + (-1.0)*d_NR[(i-2)*width+(j+2)]
					+ 4.0*d_NR[(i-1)*width+(j-2)] +  8.0*d_NR[(i-1)*width+(j-1)] +  (-8.0)*d_NR[(i-1)*width+(j+1)] + (-4.0)*d_NR[(i-1)*width+(j+2)]
					+ 6.0*d_NR[(i  )*width+(j-2)] + 12.0*d_NR[(i  )*width+(j-1)] + (-12.0)*d_NR[(i  )*width+(j+1)] + (-6.0)*d_NR[(i  )*width+(j+2)]
					+ 4.0*d_NR[(i+1)*width+(j-2)] +  8.0*d_NR[(i+1)*width+(j-1)] +  (-8.0)*d_NR[(i+1)*width+(j+1)] + (-4.0)*d_NR[(i+1)*width+(j+2)]
					+ 1.0*d_NR[(i+2)*width+(j-2)] +  2.0*d_NR[(i+2)*width+(j-1)] +  (-2.0)*d_NR[(i+2)*width+(j+1)] + (-1.0)*d_NR[(i+2)*width+(j+2)]);

			d_Gy[i*width+j] =
					 ((-1.0)*d_NR[(i-2)*width+(j-2)] + (-4.0)*d_NR[(i-2)*width+(j-1)] +  (-6.0)*d_NR[(i-2)*width+(j)] + (-4.0)*d_NR[(i-2)*width+(j+1)] + (-1.0)*d_NR[(i-2)*width+(j+2)]
					+ (-2.0)*d_NR[(i-1)*width+(j-2)] + (-8.0)*d_NR[(i-1)*width+(j-1)] + (-12.0)*d_NR[(i-1)*width+(j)] + (-8.0)*d_NR[(i-1)*width+(j+1)] + (-2.0)*d_NR[(i-1)*width+(j+2)]
					+    2.0*d_NR[(i+1)*width+(j-2)] +    8.0*d_NR[(i+1)*width+(j-1)] +    12.0*d_NR[(i+1)*width+(j)] +    8.0*d_NR[(i+1)*width+(j+1)] +    2.0*d_NR[(i+1)*width+(j+2)]
					+    1.0*d_NR[(i+2)*width+(j-2)] +    4.0*d_NR[(i+2)*width+(j-1)] +     6.0*d_NR[(i+2)*width+(j)] +    4.0*d_NR[(i+2)*width+(j+1)] +    1.0*d_NR[(i+2)*width+(j+2)]);

			 __syncthreads();
			d_G[i*width+j]   = sqrtf((d_Gx[i*width+j]*d_Gx[i*width+j])+(d_Gy[i*width+j]*d_Gy[i*width+j]));	//G = √Gx²+Gy²
			d_phi[i*width+j] = atan2f(fabs(d_Gy[i*width+j]),fabs(d_Gx[i*width+j]));
			 __syncthreads();
			 	if(fabs(d_phi[i*width+j])<=PI/8 )
					d_phi[i*width+j] = 0;
				else if (fabs(d_phi[i*width+j])<= 3*(PI/8))
					d_phi[i*width+j] = 45;
				else if (fabs(d_phi[i*width+j]) <= 5*(PI/8))
					d_phi[i*width+j] = 90;
				else if (fabs(d_phi[i*width+j]) <= 7*(PI/8))
					d_phi[i*width+j] = 135;
				else d_phi[i*width+j] = 0;
		}

}

Double threshold

 

__global__ void edges1d(int *d_pedge, float *d_G,float *d_phi, int height, int width)
{
	int j;
	j = blockIdx.x * blockDim.x + threadIdx.x;

	// Edge
	if (j>=2 && j < width-2)
		for (int i=2;i<height-2;i++){ 				d_pedge[i*width+j] = 0; 				if(d_phi[i*width+j] == 0){ 					if(d_G[i*width+j]>d_G[i*width+j+1] && d_G[i*width+j]>d_G[i*width+j-1]) //edge is in N-S
						d_pedge[i*width+j] = 1;

				} else if(d_phi[i*width+j] == 45) {
					if(d_G[i*width+j]>d_G[(i+1)*width+j+1] && d_G[i*width+j]>d_G[(i-1)*width+j-1]) // edge is in NW-SE
						d_pedge[i*width+j] = 1;

				} else if(d_phi[i*width+j] == 90) {
					if(d_G[i*width+j]>d_G[(i+1)*width+j] && d_G[i*width+j]>d_G[(i-1)*width+j]) //edge is in E-W
						d_pedge[i*width+j] = 1;

				} else if(d_phi[i*width+j] == 135) {
					if(d_G[i*width+j]>d_G[(i+1)*width+j-1] && d_G[i*width+j]>d_G[(i-1)*width+j+1]) // edge is in NE-SW
						d_pedge[i*width+j] = 1;
				}
			}

}

Edge tracking by hysteresis

 

__global__ void hysteresisThresholding1d(float *d_image_out, int level ,int* d_pedge,float *d_G, int height, int width)
{

	int j;
	j = blockIdx.x * blockDim.x + threadIdx.x;

	float lowthres,hithres;

// Hysteresis Thresholding
	lowthres = level/2;
	hithres  = 2*(level);
	if (j<width)
		for (int i=0;i<height;i++) 			d_image_out[i*width+j]=0.0; //Inicializamos a 0 	 __syncthreads(); 	 if (j>=3 && j < width-3)
		 for (int i=3;i<height-3;i++){ 			if(d_G[i*width+j]>hithres && d_pedge[i*width+j])
				d_image_out[i*width+j] = 255;
			else if(d_pedge[i*width+j] && d_G[i*width+j]>=lowthres && d_G[i*width+j]<hithres)
				// check neighbours 3x3
				for (int ii=-1;ii<=1; ii++)
					for (int jj=-1;jj<=1; jj++) 						if (d_G[(i+ii)*width+j+jj]>hithres)
							d_image_out[i*width+j] = 255;
		}
}

so basically all I did was remove the inner loop and let the parallel magic do the work.

you can clone my entire project.

git clone https://adri1988@bitbucket.org/adri1988/cannyedgedetectorcpuandgpu1d.git

executing this code on an image (1920×1200)  .

cpuvsgpu1d

speed Up of 5.01, pretty cool.

The canny edge detector GPU 2D approach.

the same as the 1D approach, four kernels but now I paralleled both loops.

Noise reduction

 

__global__ void noiseReduction2d(float *d_NR, float *d_im, int height, int width)
{

	int j,i;

	i = blockIdx.x * blockDim.x + threadIdx.x;
	j = blockIdx.y * blockDim.y + threadIdx.y;

	if (j>=2 && j<width-2 && i>=2 && i < height-2){
		d_NR[i*width+j] =    (2.0*d_im[(i-2)*width+(j-2)] +  4.0*d_im[(i-2)*width+(j-1)] +  5.0*d_im[(i-2)*width+(j)] +  4.0*d_im[(i-2)*width+(j+1)] + 2.0*d_im[(i-2)*width+(j+2)]
							+ 4.0*d_im[(i-1)*width+(j-2)] +  9.0*d_im[(i-1)*width+(j-1)] + 12.0*d_im[(i-1)*width+(j)] +  9.0*d_im[(i-1)*width+(j+1)] + 4.0*d_im[(i-1)*width+(j+2)]
							+ 5.0*d_im[(i  )*width+(j-2)] + 12.0*d_im[(i  )*width+(j-1)] + 15.0*d_im[(i  )*width+(j)] + 12.0*d_im[(i  )*width+(j+1)] + 5.0*d_im[(i  )*width+(j+2)]
							+ 4.0*d_im[(i+1)*width+(j-2)] +  9.0*d_im[(i+1)*width+(j-1)] + 12.0*d_im[(i+1)*width+(j)] +  9.0*d_im[(i+1)*width+(j+1)] + 4.0*d_im[(i+1)*width+(j+2)]
							+ 2.0*d_im[(i+2)*width+(j-2)] +  4.0*d_im[(i+2)*width+(j-1)] +  5.0*d_im[(i+2)*width+(j)] +  4.0*d_im[(i+2)*width+(j+1)] + 2.0*d_im[(i+2)*width+(j+2)])
							/159.0;

	}

}
 

Finding the intensity gradient of the image and Non-maximum suppression

 

{
	int j,i;
	float PI = 3.141593;
	i = blockIdx.x * blockDim.x + threadIdx.x;
	j = blockIdx.y * blockDim.y + threadIdx.y;

	if (j>=2 && j<width-2 && i>=2 && i < height-2)
			{
				// Intensity gradient of the image
			d_Gx[i*width+j] =
					 (1.0*d_NR[(i-2)*width+(j-2)] +  2.0*d_NR[(i-2)*width+(j-1)] +  (-2.0)*d_NR[(i-2)*width+(j+1)] + (-1.0)*d_NR[(i-2)*width+(j+2)]
					+ 4.0*d_NR[(i-1)*width+(j-2)] +  8.0*d_NR[(i-1)*width+(j-1)] +  (-8.0)*d_NR[(i-1)*width+(j+1)] + (-4.0)*d_NR[(i-1)*width+(j+2)]
					+ 6.0*d_NR[(i  )*width+(j-2)] + 12.0*d_NR[(i  )*width+(j-1)] + (-12.0)*d_NR[(i  )*width+(j+1)] + (-6.0)*d_NR[(i  )*width+(j+2)]
					+ 4.0*d_NR[(i+1)*width+(j-2)] +  8.0*d_NR[(i+1)*width+(j-1)] +  (-8.0)*d_NR[(i+1)*width+(j+1)] + (-4.0)*d_NR[(i+1)*width+(j+2)]
					+ 1.0*d_NR[(i+2)*width+(j-2)] +  2.0*d_NR[(i+2)*width+(j-1)] +  (-2.0)*d_NR[(i+2)*width+(j+1)] + (-1.0)*d_NR[(i+2)*width+(j+2)]);

			d_Gy[i*width+j] =
					 ((-1.0)*d_NR[(i-2)*width+(j-2)] + (-4.0)*d_NR[(i-2)*width+(j-1)] +  (-6.0)*d_NR[(i-2)*width+(j)] + (-4.0)*d_NR[(i-2)*width+(j+1)] + (-1.0)*d_NR[(i-2)*width+(j+2)]
					+ (-2.0)*d_NR[(i-1)*width+(j-2)] + (-8.0)*d_NR[(i-1)*width+(j-1)] + (-12.0)*d_NR[(i-1)*width+(j)] + (-8.0)*d_NR[(i-1)*width+(j+1)] + (-2.0)*d_NR[(i-1)*width+(j+2)]
					+    2.0*d_NR[(i+1)*width+(j-2)] +    8.0*d_NR[(i+1)*width+(j-1)] +    12.0*d_NR[(i+1)*width+(j)] +    8.0*d_NR[(i+1)*width+(j+1)] +    2.0*d_NR[(i+1)*width+(j+2)]
					+    1.0*d_NR[(i+2)*width+(j-2)] +    4.0*d_NR[(i+2)*width+(j-1)] +     6.0*d_NR[(i+2)*width+(j)] +    4.0*d_NR[(i+2)*width+(j+1)] +    1.0*d_NR[(i+2)*width+(j+2)]);

			d_G[i*width+j]   = sqrtf((d_Gx[i*width+j]*d_Gx[i*width+j])+(d_Gy[i*width+j]*d_Gy[i*width+j]));	//G = √Gx²+Gy²
			d_phi[i*width+j] = atan2f(fabs(d_Gy[i*width+j]),fabs(d_Gx[i*width+j]));

			 	if(fabs(d_phi[i*width+j])<=PI/8 )
					d_phi[i*width+j] = 0;
				else if (fabs(d_phi[i*width+j])<= 3*(PI/8))
					d_phi[i*width+j] = 45;
				else if (fabs(d_phi[i*width+j]) <= 5*(PI/8))
					d_phi[i*width+j] = 90;
				else if (fabs(d_phi[i*width+j]) <= 7*(PI/8))
					d_phi[i*width+j] = 135;
				else d_phi[i*width+j] = 0;
		}

}
 

Double threshold

 

__global__ void edges2d(int *d_pedge, float *d_G,float *d_phi, int height, int width)
{
	int j,i;
	i = blockIdx.x * blockDim.x + threadIdx.x;
	j = blockIdx.y * blockDim.x + threadIdx.y;

	// Edge
	if (j>=2 && j<width-2 && i>=2 && i < height-2){ 				d_pedge[i*width+j] = 0; 				if(d_phi[i*width+j] == 0){ 					if(d_G[i*width+j]>d_G[i*width+j+1] && d_G[i*width+j]>d_G[i*width+j-1]) //edge is in N-S
						d_pedge[i*width+j] = 1;

				} else if(d_phi[i*width+j] == 45) {
					if(d_G[i*width+j]>d_G[(i+1)*width+j+1] && d_G[i*width+j]>d_G[(i-1)*width+j-1]) // edge is in NW-SE
						d_pedge[i*width+j] = 1;

				} else if(d_phi[i*width+j] == 90) {
					if(d_G[i*width+j]>d_G[(i+1)*width+j] && d_G[i*width+j]>d_G[(i-1)*width+j]) //edge is in E-W
						d_pedge[i*width+j] = 1;

				} else if(d_phi[i*width+j] == 135) {
					if(d_G[i*width+j]>d_G[(i+1)*width+j-1] && d_G[i*width+j]>d_G[(i-1)*width+j+1]) // edge is in NE-SW
						d_pedge[i*width+j] = 1;
				}
			}

}
 

Edge tracking by hysteresis

 

__global__ void hysteresisThresholding2d(float *d_image_out, int level ,int* d_pedge,float *d_G, int height, int width)
{

	int j,i;
	i = blockIdx.x * blockDim.x + threadIdx.x;
	j = blockIdx.y * blockDim.x + threadIdx.y;

	float lowthres,hithres;

// Hysteresis Thresholding
	lowthres = level/2;
	hithres  = 2*(level);
	if (j<width && i<height) 			d_image_out[i*width+j]=0.0; //Inicializamos a 0 	 __syncthreads(); 	if (j>=3 && j<width-3 && i>=3 && i<height-3) 		{ 			if(d_G[i*width+j]>hithres && d_pedge[i*width+j])
				d_image_out[i*width+j] = 255;
			else if(d_pedge[i*width+j] && d_G[i*width+j]>=lowthres && d_G[i*width+j]<hithres)
				// check neighbours 3x3
				for (int ii=-1;ii<=1; ii++)
					for (int jj=-1;jj<=1; jj++) 						if (d_G[(i+ii)*width+j+jj]>hithres)
							d_image_out[i*width+j] = 255;
		}
}
 

Now with both loops executing on parallel things gets interesting.

you can clone the 2D approach.

git clone https://adri1988@bitbucket.org/adri1988/cannyedgedetectotgpu2d.git

executing this code on an image (1920×1200) .

cpuvsgpu1d2d

Speed Up is even bigger now, 9.4X times faster than the CPU and 2X times faster than 1D version.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s