microsoft的SCOM的确是好东西,开源免费的监控软件的也有不少,nagios就是其中之一。出于成本和方面考虑,我们也曾用nagios来监控内部的windows和linux服务器,并以此为基础建设了内部的监控平台。另外,相对于VBS,powershell脚本算是新一代脚本了,下面介绍几个放在nsclient下运行的几个简单实用的powershell脚本。希望借此抛砖引玉,丰富下IT运维管理相关的脚本资源,我为人人,人人为我嘛。

脚本1:

“通用”的服务器关键服务状态检查。思路是,从经验来看,无论是AD、exchange还是其他什么应用的服务器,关键的服务都是开机就自动齐东路,所以根据这个“普遍”经验,也就不必要对特定的应用服务进行监控,到可以从服务启动的角度来进一步这样来判断服务器是否监控:凡事服务的启动类型为自动启动的都应该处于正常启动状态,否则服务器是不健康的。该脚本的好处就是通用,简洁!

   1:  #查看所有服务状态,当应该自动启动的服务没有启动起来即报警,.net服务例外
   2:  # To execute from within NSClient++
   3:  #
   4:  # [NRPE Handlers]
   5:  # check_exchange_mailqueue=cmd /c echo C:\Scripts\Nagios\AutoServicesHealth.ps1 | PowerShell.exe -Command -
   6:  #
   7:  # On the check_nrpe command include the -t 30, since it takes some time to load the Exchange cmdlet's.
   8:  #2014-01-22 chenyitai renew this script.
   9:  
  10:  $NagiosStatus = 0
  11:  $NagiosDescription = " "
  12:  
  13:  $service = Get-WMIObject Win32_Service | Where-Object {($_.name -notmatch "clr_optimization") -and ($_.StartMode -match "Auto")}
  14:  
  15:  foreach($AutoService in $service) 
  16:  {
  17:  if($AutoService.state -eq "Running")
  18:       {
  19:  
  20:        }
  21:  else
  22:        {
  23:        $NagiosStatus = 2
  24:        $content = $AutoService.name + " has " + $AutoService.state 
  25:        #Write-Host $content
  26:        #$NagiosDescription += $content + ", "
  27:         $NagiosDescription = $NagiosDescription + $content
  28:        #Write-Host $NagiosDescription
  29:         }
  30:   }
  31:  
  32:  if ($NagiosStatus -eq "2") 
  33:  {
  34:      Write-Host "CRITICAL: " $NagiosDescription " "
  35:  } 
  36:  
  37:  else
  38:  {
  39:      Write-Host "OK: All Auto services are Running. "
  40:  }
  41:  
  42:  exit $NagiosStatus    


脚本2:

检查关键证书是否快要到期。出于安全考虑,现在很多服务或者协议都是用SSL或者TLS来加密,这些加密是依赖服务器“本地计算机-个人”中的某个证书来加密的,比如常见的各种web服务、https,exchange  outlookanywhere,lync,NPS等等,而从我们运维的经验来看,也经常出现因为证书没有及时更新导致关键服务不能正常工作的事故,⊙﹏⊙b汗。下面这个脚本就是帮助我们监测证书的状态,会在过期前的若干天就提醒我们!

   1:  #Get-ChildItem -Path Cert:\LocalMachine\My -ExpiringInDays 10
   2:  #通过-expiringInDays获取在10天过期的计算机-个人内的证书,需要powershell 3.0
   3:  #List certificates by days until expiration
   4:  #get-childitem cert: -recurse | where-object {$_.NotAfter -gt (get-date)} | select Subject,Thumbprint,@{Name="Expires in (Days)";Expression={($_.NotAfter).subtract([DateTime]::Now).days}} | Sort "Expires in (Days)"
   5:  # To execute from within NSClient++
   6:  #
   7:  # [NRPE Handlers]
   8:  # check_exchange_mailqueue=cmd /c echo C:\Scripts\Nagios\LocalMachineMyCert.ps1 | PowerShell.exe -Command -
   9:  #
  10:  # On the check_nrpe command include the -t 30, since it takes some time to load the Exchange cmdlet's.
  11:  #2014-01-23 chenyitai renew this script.
  12:  
  13:  $NagiosStatus = 0
  14:  $NagiosDescription = " "
  15:  $today = Get-Date
  16:  
  17:  
  18:  $certs = Get-ChildItem -Path Cert:\LocalMachine\My #| Where-Object {($_.NotAfter).Subtract([datetime]::now) -lt 30}
  19:  #Write-Host $certs.Subject + $certs.NotAfter
  20:  
  21:  
  22:  
  23:  foreach ($cert in $certs)
  24:  {
  25:  if(($cert.NotAfter).Subtract([datetime]::now) -lt 10) #证书有效期小于10天则报警
  26:    {
  27:      $content = $cert.Subject + " " + $cert.SerialNumber + " " + $cert.NotAfter
  28:      $NagiosDescription += $content + ", "
  29:      $NagiosStatus = 2 # Set the status to failed.
  30:    }
  31:  }
  32:  
  33:  if ($NagiosStatus -eq "2") 
  34:  {
  35:      Write-Host "CRITICAL: " $NagiosDescription " "
  36:  } 
  37:  
  38:  else
  39:  {
  40:      Write-Host "OK: All Certificates are in validity period. "
  41:  }
  42:  
  43:  exit $NagiosStatus    

脚本3:

检查Exchange 2010 mailbox server的DAG、数据库状态是否健康。

   1:  # Test Mailbox Database and Content Index Health
   2:  # Place in C:\scripts\ folder and edit nsc.ini to call "check_mb_servername=cmd /c echo C:\Scripts\MailboxDatabaseHealth.ps1 ; exit($lastexitcode) | PowerShell.exe -Command -"
   3:  #2014-01-21 chenyitai renew this script.
   4:  
   5:  $flag1 = 0
   6:  $flag2 = 0
   7:  $NagiosDescription1 = “”
   8:  $NagiosDescription2 = “”
   9:  
  10:  if ( (Get-PSSnapin -Name Microsoft.Exchange.Management.PowerShell.E2010 -ErrorAction:SilentlyContinue) -eq $null)
  11:  {
  12:      Add-PSSnapin Microsoft.Exchange.Management.PowerShell.E2010
  13:  }
  14:  
  15:  $Status = Get-MailboxDatabaseCopyStatus -server $env:computername #获取数据库状态
  16:  
  17:  
  18:  
  19:  foreach($State in $Status){
  20:  
  21:  if(($state.status -match '^Mounted') -or ($state.status -match '^Healthy')){
  22:  
  23:      }else{
  24:          $content = $($state.name)+": "+$($state.status)
  25:          $NagiosDescription1 += $content+" , " #+=为追加写入的意思
  26:          $flag1 =1
  27:           } 
  28:  }
  29:  foreach($ContentIndexState in $Status){
  30:  
  31:  if($ContentIndexState.contentindexstate -match '^Healthy'){
  32:  
  33:      }else{
  34:          $content2 = $($ContentIndexState.name)+" Index: "+$($ContentIndexState.contentindexstate)
  35:          $NagiosDescription2 += $content2+" , "
  36:          $flag2 = 2
  37:      }
  38:      }
  39:  
  40:  $flag = $flag1 + $flag2
  41:  
  42:  if($flag -eq 0){
  43:      write-host "OK :All Databases and Indexes Are Healthy"
  44:      exit 0
  45:  } elseif ($flag -eq 1){
  46:      write-host $NagiosDescription1 
  47:      exit 2
  48:  } elseif ($flag -eq 2){
  49:      write-host "WARNING: " $NagiosDescription2
  50:      exit 1
  51:  } elseif ($flag -eq 3){
  52:      write-host "CRITICAL: " $NagiosDescription1 $NagiosDescription2
  53:      exit 2
  54:  }

脚本4:

检查Exchange 2010 hub server的队列是否过大


   1:  # Test Queue Health
   2:  # To execute from within NSClient++
   3:  #
   4:  # [NRPE Handlers]
   5:  # check_exchange_mailqueue=cmd /c echo C:\Scripts\Nagios\ExchangeQueueHealth.ps1 | PowerShell.exe -Command -
   6:  #
   7:  # On the check_nrpe command include the -t 30, since it takes some time to load the Exchange cmdlet's.
   8:  #2014-01-21 chenyitai renew this script.
   9:  
  10:  $NagiosStatus = “”
  11:  $NagiosDescription = “”
  12:  
  13:  if ( (Get-PSSnapin -Name Microsoft.Exchange.Management.PowerShell.E2010 -ErrorAction:SilentlyContinue) -eq $null)
  14:  {
  15:      Add-PSSnapin Microsoft.Exchange.Management.PowerShell.E2010
  16:  }
  17:  
  18:  $Status = Get-Queue -Server $env:computername #获取队列状态
  19:  
  20:  ForEach ($Queue in $Status)
  21:  {
  22:  
  23:  if ($Queue.MessageCount -gt "50" ) #队列中邮件计数大于50就报警
  24:    {
  25:      $content = @($Queue.Identity) + " queue has " + $Queue.MessageCount + " messages to " + $Queue.NextHopDomain #如果不加@(),执行会报错:方法调用失败,因为 [Microsoft.Exchange.Data.QueueViewer.QueueIdentity] 不包含名为“op_Addition”的方法。
  26:      $NagiosDescription += $content + ", "
  27:      $NagiosStatus = 2 # Set the status to failed.
  28:    } 
  29:  
  30:  }
  31:  
  32:  if ($NagiosStatus -eq "2") 
  33:  {
  34:      Write-Host "CRITICAL: " $NagiosDescription " "
  35:  } 
  36:  
  37:  else
  38:  {
  39:      Write-Host "OK: All mail queues within limits. "
  40:  }
  41:  
  42:  exit $NagiosStatus        

由于powershell和nagios结合比较新鲜,关于nagios下运行powershell的机制请参考:

http://nsclient.org/nscp/wiki/guides/nagios/external_scripts