NBU MediaServer 备份失败状态码2

NBUversion:7.5

MediaServer:WindowsServer 2008R2

备份内容:SQLServer 数据

带库: IBM3584

在activity monitor中显示如下

Info nbjm(pid=7004) started backup (backupid=xxxx_1379096131) job for client xxxx, policy centralDWH, schedule full on storage unit xxxx-hcart2-robot-tld-0

9/14/2013 2:15:33 AM - started process bpbrm (14008)

9/14/2013 2:15:34 AM - connecting

9/14/2013 2:15:34 AM - connected; connect time: 00:00:00

9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2)

9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003.

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed.

9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds.

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - Results of executing <C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch>:

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) <0> operations succeeded. <1> operations failed.

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - The following object(s) were not backed up successfully.

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - CentralDWH

同时间SQLserver log

Date

Source

Severity

Message

09/14/2013 02:20:15

Backup

Unknown

BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages.

09/14/2013 02:20:15

Backup

Unknown

Error: 3041
Severity: 16
State: 1.

09/14/2013 02:04:57

Backup

Unknown

BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages.

09/14/2013 02:04:57

Backup

Unknown

Error: 3041
Severity: 16
State: 1.


问题分析:

首先日志内容中

Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2)

Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up

说明bch脚本运行失败,并没有找到数据库中需要备份的文件

然后这部分

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003.

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed.

9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds.

说明nbu连接vdi超时,一般vdi默认是300秒,因为没有请求到数据库的文件,所以脚本300秒后超时,vdi报错,与此同时在windows server日志中有一条error也记录这个信息:

SQLVDI: Loc=SignalAbort. Desc=Client initiates abort

既然脚本没执行就检查了一下bch脚本,并没有发现什么问题,然后手动重新运行了一下这个policy,NBU又报错了,不过这次不是脚本问题

INF - Created VDI object for SQL Server instance <xxxx>. Connection timeout is <300> seconds.
ERR - Error in GetConfiguration: 0x80770003.

在创建vdi后,等了300秒,又出现了Error in GetConfiguration 0x80770003,看来是创建vdi object出了问题,应该是nbu client调用SQLVDI.DLL来创建。

接下来看看dbclient log,这个日志必须在nerbackup\log下新建一个dbclient文件夹才会有:

<2> logconnections: BPRD CONNECT FROM media-ip.62961 TO master-ip.1556 fd = 1268

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_34284_37776_1>, SQL userid <sa> handle <0x0080d1b0>.

<4> CDBbackrec::InitDeviceSet(): INF - Created VDI object for SQL Server instance <instance>. Connection timeout is <300> seconds.------可以看到这里创建vdi了

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.62962 TO master-ip.1556 fd = 1396

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.62963 TO master-ip.1556 fd = 952

<4> CGlobalInformation::VCSVirtualNameList: INF - Veritas Cluster Server is not installed.---这里显示没有安装veritas集群

<1> CGlobalInformation::VCSVirtualNameList: CONTINUATION: - The system cannot find the path specified. ------找不到路径

<4> getServerName: Read server name from nb_master_config: xxxxx

<4> CDBIniParms::CDBIniParms: INF - NT User is Administrator

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065acf0>.----sa0x0065acf0 登录 SQLserver

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065c260>.----sa0x0065c260 登录 SQLserver

<4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxx\instance> has been made using Trusted security with DSN <NBMSSQL_temp_23736_9600_1> using standard userid <sa>.

<4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c260>---sa0x0065c260 退出

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065c690>. 又一个sa登录

<4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c690> 紧接着退出

<4> SQLEnumerator: INF - Enumerated SQL hosts: SERVER:Server={BJDSQLCLUSTER\instance};UID:Login ID=?;PWD:Password=?;Trusted_Connection:Use Integrated Security=?;*APP:AppName=?;*WSID:WorkStation ID=?

01:17:34.156 [23736.9600] <4> SQLEnumerator: INF - Could not enumerate Local SQL host/instance using SQLBrowseConnectW ---无法使用SQLBrowseConnect枚举出sql本地主机和实例,这个SQLBrowseConnect用来发现和枚举连接数据库所需要值(主机名实例名等)

<4> CGlobalInformation::SQLEnumerator: INF - Hosts and instances retrieved from host list string

<4> CGlobalInformation::SQLEnumerator: INF - host: mediaserver

<4> CGlobalInformation::SQLEnumerator: INF - instance: xxxx

<4> CGlobalInformation::SQLEnumerator: INF - host: BJDSQLCLUSTER

<4> CGlobalInformation::SQLEnumerator: INF - instance: xxxxx

<4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxx\instance> has been made using Trusted security with DSN <NBMSSQL_23736_9600_2> using standard userid <sa>.----从host list中发现了主机名和实例,并成功连接,至此说明nbu client 连接到了数据库实例,接下来看看为什么没有备份成功

-------------------------------------------------------分割线--------------------------------------------

<4> StartupProcess: INF - Starting: <C:\Program Files\Veritas\NetBackup\bin\admincmd\bppllist.exe -byclient mediaserver>

中间又是一堆登录信息,并成功连接到数据库,这里省略

<4> getServerName: Read server name from nb_master_config: masterserver

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.62996 TO master-ip.1556 fd = 960 --media的bprd连接master

<16> writeToServer: ERR - send() to server on socket failed: 发送socket失败

<16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket


<16> CDBbackrec::InitDeviceSet_Part2(): ERR - Error in GetConfiguration: 0x80770003.这里报错和activity monitor里一样了

01:22:09.551 <1> CDBbackrec::InitDeviceSet_Part2(): CONTINUATION: - The api was waiting and the timeout interval had elapsed.

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.63001 TO master-ip.1556 fd = 1400

01:22:09.703 <4> KillAllThreads: INF - Killing group #0

01:22:09.704 [34284.33648] <4> KillAllThreads: INF - Killing group #0

01:22:09.704 <4> KillAllThreads: INF - Issuing SignalAbort to MS SQL Server VDI --windows中看到的消息

01:22:09.704 [34284.33416] <4> KillAllThreads: INF - Killing group #0

01:22:09.704 [34284.32560] <4> KillAllThreads: INF - Killing group #0


01:22:12.709 <2> vnet_pbxConnect: pbxConnectEx Succeeded

01:22:12.710 <2> logconnections: BPRD CONNECT FROM media-ip.63002 TO master-ip.1556 fd = 1276

01:22:14.546 <16> writeToServer: ERR - send() to server on socket failed:

<16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket

<16> CDBbackrec::FreeDeviceSet(): ERR - Error in VDS->Close: 0x80770004.

看来故障原因是bprd 无法将进程状态写入name socket,导致 mediaserver和masterserver通信失败,从而导致vdi超时。

http://www.symantec.com/business/support/index?page=content&id=TECH182435

这里说 7.1版本中如果dbc_RemoteWriteFile- RemoteWriteFile status = 0状态为0可以忽略,下个版本中会解决,但是我是7.5,似乎不是这个问题。

http://www.symantec.com/docs/TECH146444 这篇文章提到sqlserver 某个补丁更新了SQLVDI.DLL,导致备份失败。也不是我的问题

http://www.symantec.com/connect/forums/having-problem-mssql-agent-backup这篇里提到2个方法

1删除进程dbbackex.exe,2增加Client Connect 时间即 Client Read Timeout,可以在bch脚本增加VDITIMEOUTSECONDS XXXX(关于这个参数查阅NetBackup for Microsoft SQL Server Administrator’s Guide)来设置nbu与VDI连接超时的时间。

注意:

Before running another backup, ensure the following log folders exist on media server:

bptm and bpbrm.

If backup still fails after increasing media server timeouts, please check a new set of logs:

dbclient on SQL client, bptm and bpbrm on media server.



解决方案

在脚本中加入了VDITIMEOUTSECONDS 1800后,手动备份成功


备注:

关于错误代码0x80770003和0x80770004在http://www.sqlbackuprestore.com/vdierrors.htm里有关于vdi的错误信息的详细解释

0x80770003 (-2139684861)

The api was waiting and the timeout interval had elapsed.

Similar to the above example, this can happen when the backup application has waited a set amount of time waiting for SQL Server to respond to its backup request, but did not receive any response.

0x80770004 (-2139684860)

An abort request is preventing anything except termination actions.

An example of this error is when the backup software has encountered a critical error, and has issued an abort request to the VDI.

一篇不错的文档:关于如何在SQLserver上对NBU排错

http://www.symantec.com/business/support/index?page=content&id=TECH38369


后记

备份流程 nbu策略--nbu备份脚本--mediaserverVDI---mediaserverDBProcess

mediaserver调用本地脚本,通过vdi和sqlserver里的一组备份进程通信,每个备份的数据库对应3个进程,备份完成后进程应该销毁,并通过vdi通知mediaserver,然后mediserver完成备份。

当sqlserver备份进程在N秒(N是脚本里的超时时间)内不能完成备份,不能通过vdi通知mediaserver,nbu认为备份失败。那么第二次备份时,进程依然存在的话,备份仍会失败。
造成备份很慢的情况可能是sqlserver服务器性能过低,导致进程运行缓慢。

思考

应该增加sqlserver的性能

你可能感兴趣的:(sqlserver,NBU)