经过测试几十张图片,得出的结论是,c#用TPL(任务并行库)比 c++ amp方式快 2-10倍
release vs2012 rc
对了,你需要一块dx11的显卡,如果没有,就是软件模拟的大概,速度比gpu慢几十倍上百倍。
从测试可知,千万像素的时候才差不多持平,
这是我电脑不行咩,还是显卡不行,怎么会这样的结果
准备周一去公司电脑试试,真奇怪
对了,这次测试速度比以前用wpf的要慢,主要差别就是锁定内存的方式不同,
等有空测试一下 wpf下的速度
1. c# TPL
1: private static unsafe Image GrayByParallelForEach(Image image)
2: {
3: var bmp = (Bitmap) image;
4:
5: int height = bmp.Height;
6: int width = bmp.Width;
7:
8: var data = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
9: var startPtr = (PixelColor*)data.Scan0.ToPointer();
10: ParallelForEach(startPtr, width, height);
11: bmp.UnlockBits(data);
12:
13: return bmp;
14: }
15:
16: private static unsafe void ParallelForEach(PixelColor* startPtr, int width, int height)
17: {
18: Parallel.ForEach(Partitioner.Create(0, height), (h) =>
19: {
20: var ptr = startPtr + h.Item1*width;
21:
22: for (int y = h.Item1; y < h.Item2; y++)
23: {
24: for (int x = 0; x < width; x++)
25: {
26: var c = *ptr;
27: var gray = ((c.Red*38 + c.Green*75 + c.Blue*15) >> 7);
28: (*ptr).Green = (*ptr).Red = (*ptr).Blue = (byte) gray;
29:
30: ptr++;
31: }
32: }
33: });
34: }
1: [StructLayout(LayoutKind.Sequential)]
2: public struct PixelColor
3: {
4: public byte Blue;
5: public byte Green;
6: public byte Red;
7: public byte Alpha;
8:
9: }
主要是利用微软的 TPL并行库和指针操作,还有一个颜色结构的指针类型转换。
2. c++11 amp 代码
1: extern "C" __declspec ( dllexport ) void _stdcall gray_image(unsigned int* image, int height,int width)
2: {
3: concurrency::extent<2> image_extent(height, width);
4:
5: /* texture of four 8-bit integers */
6: array_view< unsigned int, 2> image_av(image_extent, image);
7:
8: parallel_for_each(image_av.extent,
9: [=](index<2> idx) restrict(amp)
10: {
11: unsigned int color = image_av[idx];
12: unsigned int a = (color >> 24) & 0xFF;
13: unsigned int r = (color >> 16) & 0xFF;
14: unsigned int g = (color >> 8) & 0xFF;
15: unsigned int b = (color) & 0xFF;
16:
17: auto gray = ((r * 38 + g * 75 + b * 15) >> 7);
18:
19: image_av[idx]= a<<24 | gray<<16 | gray<<8 |gray ;
20:
21: });
22:
23: // Copy data from GPU to CPU
24: image_av.synchronize();
25:
26: }
貌似 amp中不能使用byte,所以只能通过int来转换。
为了对比,我测试了一下c++普通代码的速度,速度差不多,所以应该不是调用的问题,而是amp本身性能或显卡性能有问题。