图像截屏的优化处理方案

 最近花了点时间研究这个。写写收获吧。

首先要说的是,要实现屏幕的捕捉,目前有以下几种方法。

1。使用GDI函数或者Windows Media API函数
2。使用DirectX技术
3。使用api hook技术
4。使用图形驱动技术

关于实现,这里有一篇老外的文章,来自codeproject,可以看一下。
引用
Various methods for capturing the screen
By Gopalakrishna Palem

Contents
Introduction
Capture it the GDI way
And the DirectX way of doing it
Capturing the screen with Windows Media API
Introduction

Some times, we want to capture the contents of the entire screen programmatically. The following explains how it can be done. Typically, the immediate options we have, among others, are using GDI and/or DirectX. Another option that is worth considering is Windows Media API. Here, we would consider each of them and see how they can be used for our purpose. In each of these approaches, once we get the screenshot into our application defined memory or bitmap, we can use it in generating a movie. Refer to the article Create Movie From HBitmap for more details about creating movies from bitmap sequences programmatically.
Capture it the GDI way

When performance is not an issue and when all that we want is just a snapshot of the desktop, we can consider the GDI option. This mechanism is based on the simple principle that the desktop is also a window - that is it has a window Handle (HWND) and a device context (DC). If we can get the device context of the desktop to be captured, we can just blit those contents to our application defined device context in the normal way. And getting the device context of the desktop is pretty straightforward if we know its window handle - which can be achieved through the function GetDesktopWindow(). Thus, the steps involved are:
Acquire the Desktop window handle using the function GetDesktopWindow();
Get the DC of the desktop window using the function GetDC();
Create a compatible DC for the Desktop DC and a compatible bitmap to select into that compatible DC. These can be done using CreateCompatibleDC() and CreateCompatibleBitmap(); selecting the bitmap into our DC can be done with SelectObject();
Whenever you are ready to capture the screen, just blit the contents of the Desktop DC into the created compatible DC - that's all - you are done. The compatible bitmap we created now contains the contents of the screen at the moment of the capture.
Do not forget to release the objects when you are done. Memory is precious (for the other applications).
Example
Void CaptureScreen()
{
    int nScreenWidth = GetSystemMetrics(SM_CXSCREEN);
    int nScreenHeight = GetSystemMetrics(SM_CYSCREEN);
    HWND hDesktopWnd = GetDesktopWindow();
    HDC hDesktopDC = GetDC(hDesktopWnd);
    HDC hCaptureDC = CreateCompatibleDC(hDesktopDC);
    HBITMAP hCaptureBitmap =CreateCompatibleBitmap(hDesktopDC,
                            nScreenWidth, nScreenHeight);
    SelectObject(hCaptureDC,hCaptureBitmap);
    BitBlt(hCaptureDC,0,0,nScreenWidth,nScreenHeight,
           hDesktopDC,0,0,SRCCOPY|CAPTUREBLT);
    SaveCapturedBitmap(hCaptureBitmap); //Place holder - Put your code

                                //here to save the captured image to disk

    ReleaseDC(hDesktopWnd,hDesktopDC);
    DeleteDC(hCaptureDC);
    DeleteObject(hCaptureBitmap);
}

In the above code snippet, the function GetSystemMetrics() returns the screen width when used with SM_CXSCREEN, and returns the screen height when called with SM_CYSCREEN. Refer to the accompanying source code for details of how to save the captured bitmap to the disk and how to send it to the clipboard. Its pretty straightforward. The source code implements the above technique for capturing the screen contents at regular intervals, and creates a movie out of the captured image sequences.
And the DirectX way of doing it

Capturing the screenshot with DirectX is a pretty easy task. DirectX offers a neat way of doing this.

Every DirectX application contains what we call a buffer, or a surface to hold the contents of the video memory related to that application. This is called the back buffer of the application. Some applications might have more than one back buffer. And there is another buffer that every application can access by default - the front buffer. This one, the front buffer, holds the video memory related to the desktop contents, and so essentially is the screen image.

By accessing the front buffer from our DirectX application, we can capture the contents of the screen at that moment.

Accessing the front buffer from the DirectX application is pretty easy and straightforward. The interface IDirect3DDevice9 provides the GetFrontBufferData() method that takes a IDirect3DSurface9 object pointer and copies the contents of the front buffer onto that surface. The IDirect3DSurfce9 object can be generated by using the method IDirect3DDevice8::CreateOffscreenPlainSurface(). Once the screen is captured onto the surface, we can use the function D3DXSaveSurfaceToFile() to save the surface directly to the disk in bitmap format. Thus, the code to capture the screen looks as follows:
extern IDirect3DDevice9* g_pd3dDevice;
Void CaptureScreen()
{
    IDirect3DSurface9* pSurface;
    g_pd3dDevice->CreateOffscreenPlainSurface(ScreenWidth, ScreenHeight,
        D3DFMT_A8R8G8B8, D3DPOOL_SCRATCH, &pSurface, NULL);
    g_pd3dDevice->GetFrontBufferData(0, pSurface);
    D3DXSaveSurfaceToFile("Desktop.bmp",D3DXIFF_BMP,pSurface,NULL,NULL);
    pSurface->Release();
}

In the above, g_pd3dDevice is an IDirect3DDevice9 object, and has been assumed to be properly initialized. This code snippet saves the captured image onto the disk directly. However, instead of saving to disk, if we just want to operate on the image bits directly - we can do so by using the method IDirect3DSurface9::LockRect(). This gives a pointer to the surface memory - which is essentially a pointer to the bits of the captured image. We can copy the bits to our application defined memory and can operate on them. The following code snippet presents how the surface contents can be copied into our application defined memory:
extern void* pBits;
extern IDirect3DDevice9* g_pd3dDevice;
IDirect3DSurface9* pSurface;
g_pd3dDevice->CreateOffscreenPlainSurface(ScreenWidth, ScreenHeight,
                                          D3DFMT_A8R8G8B8, D3DPOOL_SCRATCH,
                                          &pSurface, NULL);
g_pd3dDevice->GetFrontBufferData(0, pSurface);
D3DLOCKED_RECT lockedRect;
pSurface->LockRect(&lockedRect,NULL,
                   D3DLOCK_NO_DIRTY_UPDATE|
                   D3DLOCK_NOSYSLOCK|D3DLOCK_READONLY)));
for( int i=0 ; i < ScreenHeight ; i++)
{
    memcpy( (BYTE*) pBits + i * ScreenWidth * BITSPERPIXEL / 8 ,
        (BYTE*) lockedRect.pBits + i* lockedRect.Pitch ,
        ScreenWidth * BITSPERPIXEL / 8);
}
g_pSurface->UnlockRect();
pSurface->Release();

In the above, pBits is a void*. Make sure that we have allocated enough memory before copying into pBits. A typical value for BITSPERPIXEL is 32 bits per pixel. However, it may vary depending on your current monitor settings. The important point to note here is that the width of the surface is not same as the captured screen image width. Because of the issues involved in the memory alignment (memory aligned to word boundaries are assumed to be accessed faster compared to non aligned memory), the surface might have added additional stuff at the end of each row to make them perfectly aligned to the word boundaries. The lockedRect.Pitch gives us the number of bytes between the starting points of two successive rows. That is, to advance to the correct point on the next row, we should advance by Pitch, not by Width. You can copy the surface bits in reverse, using the following:
for( int i=0 ; i < ScreenHeight ; i++)
{
    memcpy((BYTE*) pBits +( ScreenHeight - i - 1) *
        ScreenWidth * BITSPERPIXEL/8 ,
        (BYTE*) lockedRect.pBits + i* lockedRect.Pitch ,
        ScreenWidth* BITSPERPIXEL/8);
}

This may come handy when you are converting between top-down and bottom-up bitmaps.

While the above technique of LockRect() is one way of accessing the captured image content on IDirect3DSurface9, we have another more sophisticated method defined for IDirect3DSurface9, the GetDC() method. We can use the IDirect3DSurface9::GetDC() method to get a GDI compatible device context for the DirectX image surface, which makes it possible to directly blit the surface contents to our application defined DC. Interested readers can explore this alternative.

The sample source code provided with this article implements the technique of copying the contents of an off-screen plain surface onto a user created bitmap for capturing the screen contents at regular intervals, and creates a movie out of the captured image sequences.

However, a point worth noting when using this technique for screen capture is the caution mentioned in the documentation: The GetFrontBufferData() is a slow operation by design, and should not be considered for use in performance-critical applications. Thus, the GDI approach is preferable over the DirectX approach in such cases.
Windows Media API for capturing the screen

Windows Media 9.0 supports screen captures using the Windows Media Encoder 9 API. It includes a codec named Windows Media Video 9 Screen codec that has been specially optimized to operate on the content produced through screen captures. The Windows Media Encoder API provides the interface IWMEncoder2 which can be used to capture the screen content efficiently.

Working with the Windows Media Encoder API for screen captures is pretty straightforward. First, we need to start with the creation of an IWMEncoder2 object by using the CoCreateInstance() function. This can be done as:
IWMEncoder2* g_pEncoder=NULL;
CoCreateInstance(CLSID_WMEncoder,NULL,CLSCTX_INPROC_SERVER,
        IID_IWMEncoder2,(void**)&g_pEncoder);

The Encoder object thus created contains all the operations for working with the captured screen data. However, in order to perform its operations properly, the encoder object depends on the settings defined in what is called a profile. A profile is nothing but a file containing all the settings that control the encoding operations. We can also create custom profiles at runtime with various customized options, such as codec options etc., depending on the nature of the captured data. To use a profile with our screen capture application, we create a custom profile based on the Windows Media Video 9 Screen codec. Custom profile objects have been supported with the interface IWMEncProfile2. We can create a custom profile object by using the CoCreateInstance() function as:
IWMEncProfile2* g_pProfile=NULL;
CoCreateInstance(CLSID_WMEncProfile2,NULL,CLSCTX_INPROC_SERVER,
        IID_IWMEncProfile2,(void**)&g_pProfile);

We need to specify the target audience for the encoder in the profile. Each profile can hold multiple number of audience configurations, which are objects of the interface IWMEncAudienceObj. Here, we use one audience object for our profile. We create the audience object for our profile by using the method IWMEncProfile::AddAudience(), which would return a pointer to IWMEncAudienceObj which can then be used for configurations such as video codec settings (IWMEncAudienceObj::put_VideoCodec()), video frame size settings (IWMEncAudienceObj::put_VideoHeight() and IWMEncAudienceObj::put_VideoWidth()) etc. For example, we set the video codec to be Windows Media Video 9 Screen codec as:
extern IWMEncAudienceObj* pAudience;
#define VIDEOCODEC MAKEFOURCC('M','S','S','2')
    //MSS2 is the fourcc for the screen codec

long lCodecIndex=-1;
g_pProfile->GetCodecIndexFromFourCC(WMENC_VIDEO,VIDEOCODEC,
    &lCodecIndex); //Get the Index of the Codec

pAudience->put_VideoCodec(0,lCodecIndex);

The fourcc is a kind of unique identifier for each codec in the world. The fourcc for the Windows Media Video 9 Screen codec is MSS2. The IWMEncAudienceObj::put_VideoCodec() accepts the profile index as the input to recognize a particular profile - which can be obtained by using the method IWMEncProfile::GetCodecIndexFromFourCC().

Once we have completed configuring the profile object, we can choose that profile into our encoder by using the method IWMEncSourceGroup :: put_Profile() which is defined on the source group objects of the encoder. A source group is a collection of sources where each source might be a video stream or audio stream or HTML stream etc. Each encoder object can work with many source groups from which it get the input data. Since our screen capture application uses only a video stream, our encoder object need to have one source group with a single source, the video source, in it. This single video source needs to configured to use the Screen Device as the input source, which can be done by using the method IWMEncVideoSource2::SetInput(BSTR) as:
extern IWMEncVideoSource2* pSrcVid;
pSrcVid->SetInput(CComBSTR("ScreenCap://ScreenCapture1");

The destination output can be configured to save into a video file (wmv movie) by using the method IWMEncFile::put_LocalFileName() which requires an IWMEncFile object. This IWMEncFile object can be obtained by using the method IWMEncoder::get_File() as:
IWMEncFile* pOutFile=NULL;
g_pEncoder->get_File(&pOutFile);
pOutFile->put_LocalFileName(CComBSTR(szOutputFileName);

Now, once all the necessary configurations have been done on the encoder object, we can use the method IWMEncoder::Start() to start capturing the screen. The methods IWMEncoder::Stop() and IWMEncoder::Pause might be used for stopping and pausing the capture.

While this deals with full screen capture, we can alternately select the regions of capture by adjusting the properties of input video source stream. For this, we need to use the IPropertyBag interface of the IWmEnVideoSource2 object as:
#define WMSCRNCAP_WINDOWLEFT CComBSTR("Left")
#define WMSCRNCAP_WINDOWTOP CComBSTR("Top")
#define WMSCRNCAP_WINDOWRIGHT CComBSTR("Right")
#define WMSCRNCAP_WINDOWBOTTOM CComBSTR("Bottom")
#define WMSCRNCAP_FLASHRECT CComBSTR("FlashRect")
#define WMSCRNCAP_ENTIRESCREEN CComBSTR("Screen")
#define WMSCRNCAP_WINDOWTITLE CComBSTR("WindowTitle")
extern IWMEncVideoSource2* pSrcVid;
int nLeft, nRight, nTop, nBottom;
pSrcVid->QueryInterface(IID_IPropertyBag,(void**)&pPropertyBag);
CComVariant varValue = false;
pPropertyBag->Write(WMSCRNCAP_ENTIRESCREEN,&varValue);
varValue = nLeft;
pPropertyBag->Write( WMSCRNCAP_WINDOWLEFT, &varValue );
varValue = nRight;
pPropertyBag->Write( WMSCRNCAP_WINDOWRIGHT, &varValue );
varValue = nTop;
pPropertyBag->Write( WMSCRNCAP_WINDOWTOP, &varValue );
varValue = nBottom;
pPropertyBag->Write( WMSCRNCAP_WINDOWBOTTOM, &varValue );
http://www.codeproject.com/KB/dialog/screencap.aspx

The accompanied source code implements this technique for capturing the screen. One point that might be interesting, apart from the nice quality of the produced output movie, is that in this, the mouse cursor is also captured. (By default, GDI and DirectX are unlikely to capture the mouse cursor).

Note that your system needs to be installed with Windows Media 9.0 SDK components to create applications using the Window Media 9.0 API.

To run your applications, end users must install the Windows Media Encoder 9 Series. When you distribute applications based on the Windows Media Encoder SDK, you must also include the Windows Media Encoder software, either by redistributing Windows Media Encoder in your setup, or by requiring your users to install Windows Media Encoder themselves.

The Windows Media Encoder 9.0 can be downloaded from:
Windows Media Encoder
Conclusion

All the variety of techniques discussed above are aimed at a single goal - capturing the contents of the screen. However, as can be guessed easily, the results vary depending upon the particular technique that is being employed in the program. If all that we want is just a random snapshot occasionally, the GDI approach is a good choice, given its simplicity. However, using Windows Media would be a better option if we want more professional results. One point worth noting is, the quality of the content captured through these mechanisms might depend on the settings of the system. For example, disabling hardware acceleration (Desktop properties | Settings | Advanced | Troubleshoot) might drastically improve the overall quality and performance of the capture application.
关于API HOOK和图形驱动Mirror Driver,可以看看这篇文章:
引用
http://www.cnblogs.com/niukun/archive/2008/03/08/1096601.html

屏幕录制,远程桌面传输,基于Windows图形驱动的屏幕截图技术
 

计算机屏幕图像的截取在屏幕的录制、计算机远程控制以及多媒体教学软件中都是关键术,基于Windows操作系统有多种截屏方法,研究的重点集中在如何快速有效的截取DBI(Device-Independent Bitmap)格式的屏幕图形数据。现在商业软件流行的截屏技术主要采取的Api Hook技术,但这种技术一次截屏仍有较大的时间消耗,这样就对运行软件的硬件仍有较多的限制,而且是一种非标准的技术,不为微软公司所推荐。

1截屏技术

1.1使用api hook技术

         使用api hook技术截屏基于一下的原理;多数屏幕图形的绘制都是通过调用用户态gdi32.dll中的绘图函数实现的,如果利用api hook技术拦截系统中所有对这些函数的调用,就可以得到屏幕图形刷新或变化的区域坐标;然后使用api函数bitblt将刷新或者变化后的屏幕区域的ddb格式的位图拷贝到内存中,接着使用函数getbits将ddb位图转换为dbi位图,最后压缩、存储或者传输。

         这种方案只有捕捉到变化,才进行截屏。这样每次截屏都是有效的操作。每次(第一次除外)仅截取了栓新或变化部分,从根本上解决了数据量大的问题。但是这种技术仍然有一下缺点:1实际截屏采用的api函数,截取的是ddb位图,要经过一次格式转换,耗时较大。2如果屏幕变化区域矩形的坐标r1、r2、……rn相继到达,为了不是屏幕变化的信息丢失,通常不是逐个截取,而是将矩形坐标合并,这样就可以截取并未变化的区域,不经增加截屏的时间消耗,而且产生冗余数据。3该技术不支持directdraw技术,由于应用程序可能使用directdraw驱动程序进行直接操纵显示内存、硬件位块转移,硬件重叠和交换表面等图形操作,而不必进行gdi调用,所以此时api hook技术将失去效用,不能捕捉屏幕变化。4api hook技术在屏幕取词,远程控制和多媒体教学中都有实际的应用,但是这种技术是一种非标准的技术,微软公司并不推荐使用。

1.2 使用图形驱动技术

该技术的原理:应用程序调用win32 gdi 函数进行图形输出请求,这个请求通过核心模式gdi发送。核心模式gdi把这些请求发送到相应的图形驱动程序。如,显示器驱动程序,通信流如图。现将该技术详细解释如下:

(1)显示器驱动输出一系列设备驱动程序接口DDI(Device Driver Interface)函数供GDI调用。信息通过这些入口点的输入/输出参数在GDI和驱动程序之间传递。

(2)       在显示器驱动加载时,GDI调用显示器驱动程序函数DrvEnableDriver,在这里我们向GDI提供显示器驱动支持的,可供GDI调用的DDI函数入口点,其中部分时将要Hook的图形输出函数。

(3)       在GDI调用函数DrvEnableDriver成功返回后,GDI调用显示器驱动的DrvEnablePDEV函数,在这里可以设置显示器的显示模式,然后创建一个PDEV结构,PDEV结构是物理显示器的逻辑表示。

(4)       在成功创建PDEV结构之后,显示驱动为视频硬件创建一个表面,该表面可以是标准的DIB位图管理表面,然后驱动程序使该表面与一个PDEV结构相关联,这样显示驱动支持的所有绘画操作都将在该DIB位图表面上进行。

(5)       当应用程序调用用户态GDI32.DLL中的绘图函数发出图形请求时,该请求将图形引擎通过相应的DDI函数发送到显示驱动,显示驱动程序将这次图形变化事件通知应用程序。

(6)       应用程序接受到通知后,调用函数ExtEscape发出一个请求,并通过参数传递一个缓冲区Buffer,图形引擎调用DDI函数DrvEscape处理应用层的ExtEscape调用,将变化部分的图形数据从其创建的表面拷贝Buffer,这样数据就从核心层图形驱动传到应用层。

(7)       应用程序接收到的图形数据已是DIB标准格式,所以可以直接进行压缩传输或储存。

1.3图形驱动技术的特点

         上面叙述了采用图形驱动实现屏幕实现截屏的原理和过程,可以看出这种技术涉及核心图形驱动的编写,实现上较为复杂,而其具备的优点主要为:

(1)       驱动技术只截取变化的屏幕区域,这一点与API Hook技术相当;但驱动技术是一种标注技术,为微软公司所推荐。

(2)       API Hook技术在实际截屏时,采用API函数实现,截取DDB位图,必须经过一次DDB到DIB的转换;而驱动技术直接从其管理的DIB位图(表面)中将截取区域的图形数据拷贝到应用程序,显著的降低了一次截屏的时间消耗。

(3)       如果屏幕图形小区域范围变化较快,屏幕变化区域矩形坐标R1、R2、R3……、Rn相继到达,由于一次截屏时间消耗降低,区域矩形坐标叠加的概率变小,这样屏幕变化区域及时的得到了处理,不仅增加了连续性,而且截屏时间消耗和产生的数据量一般不会出现峰值,这也是这种技术的优越之处。

经过以上对比,无论是做远程桌面还是屏幕录制,基于MirrorDriver的屏幕截取将会是一个不错的选择,无论从性能占用资源的大小(主要是cpu),取得的数据量来说都要优于Hook。
最近在做远程桌面的传输,所以有必要研究一下Mirror,这项技术在很多软件中都有应用。但是开源的driver我还没有看见过,因为没有精力去编写。所以才用网上的免费的driver同时也提供了api文档。
driver内部实现的原理大致就是把显示输出拷贝到一个缓冲区当中,并且记录每次屏幕更新的矩形区域。根据这些输出,应用程序就很容易得到缓冲区中的数据了。
代码还在调试,等写完了,上源码!!!
看了这两篇文章,我想对于屏幕监控的实现应该有个概念了吧。当然,对于像Mirror Driver这样高深的东西我现在还没有入门。我主要研究了下GDI的实现。

说到这里,我们不能不提到大名鼎鼎的Radmin,对于早期的版本,它就是通过隔行扫描实现的。图象非常流畅,网络通信量非常小,CPU占用较低。我也搜集了不少木马程序的屏幕监控,发现很少有超过Radmin的。大概说来,目前使用GDI函数有这样几种方法:隔行扫描,差异比较,XOR算法,最小矩形算法,“格”算法等等。我着重分析了我认为两个比较优秀的XOR算法和隔行扫描。

先说说XOR(异或)算法吧。XOR算法具有实现简单,画面质量高,网络通信量小等明显优点,不足之处在于CPU消耗相对较高,每秒刷新帧数有限,在本机测试时最快可以达到40帧/秒。它的实现方法就是不停的截屏,然后将后一屏的图象数据与前一屏的数据逐个进行xor处理,因为屏幕变化的区域通常较少,所以得到的结果将是大量的0。再采用ZLib算法等压缩,压缩率将非常高。目前东南网安远程控制软件SEU_Peeper 0.11 beta3就是采用该算法。

再来说说隔行扫描。因为最近弄到一份Gh0st的代码,其中的屏幕传输就是用的隔行扫描。其实现就是每隔一定的行(比如10行)从上往下扫描整个屏幕,根据扫描的结果推测屏幕变化的区域,然后仅截取变化的区域发送。当然,发送前也可以进行压缩以减少通信量。隔行扫描可以达到60帧/秒以上。故反应十分敏捷。其通信量是XOR(异或)算法的两倍。但CPU占用率极低。

这几天仔细看了下Gh0st屏幕传输的代码,其实隔行扫描很像是在不停的给屏幕打补丁。用一个接一个小补丁来逐渐更新屏幕的变化。其实Gh0st的隔行扫描也存在一些问题。尽管刷新的频率很高,但是屏幕在运动的时候仍然不是很流畅。可以看到明显的碎片。细看它的代码可以发现,它并不是扫描完整个屏幕再来分析的,而是扫到一个改变,就适当的扩大区域,发送数据。这样一是造成区域重复发送,二是造成大量碎片。

针对这个问题,我试图改进算法,我先是采用将屏幕扫描完后,得到一个大的改变区域发送,结果发现虽然碎片的问题解决了,但cpu占用却上去了。另外,我尝试对屏幕进行分块,然后来推测改变的区域,结果发现碎片问题没有根本解决,cpu占用也增加了。

也想不到其它的好方法。先写到这里吧。

本文来源:单克隆抗体's blog   
原文地址:http://www.dklkt.cn/article.asp?id=114

你可能感兴趣的:(算法相关)