OS Watcher for Windows (OSWFW) is several batch files that run the Windows utility logman and schtasks. The logman utility collects various Operating System counters. It archives these metrics to aid diagnosing performance and Operating System issues. OSWFW has segmented these counter collections into various categories. The schtasks utility is used to run a batch file that cleans up the archive files to keep only 24 hours of data. It is also used if Oracle's Real Application Cluster (RAC) is involved, to run a batch file that checks the RAC Interconnect. OSWFW can be downloaded from this note. Installation instructions for OSWFW are provided in this User Guide.
Author : Kevin Reardon
Create Date 05-23-2007
Update Date 05-13-2013
Expire Date
Version: OSWFW 2.5.1
The OS Watcher For Windows (OSWFW) User Guide
Kevin Reardon, Center of Expertise
OS Watcher for Windows (OSWFW) is several batch files that run the Windows utility logman and schtasks. The logman utility collects various Operating System counters. It archives these metrics to aid diagnosing performance and Operating System issues. OSWFW has segmented these counter collections into various categories. The schtasks utility is used to run a batch file that cleans up the archive files to keep only 24 hours of data. It is also used if Oracle's Real Application Cluster (RAC) is involved, to run a batch file that checks the RAC Interconnect. OSWFW can be downloaded from this note. Installation instructions for OSWFW are provided in this User Guide.
OSWFW consists of a batch file and a series logman configuration files that contain the counter paths to be captured. The main controlling batch file is the file "OSWATCHER.BAT," which creates and schedules individual counter collections to collect specific kinds of data, using the Windows logman utility. Each counter collection will have its own output file. This version of OSWFW has been made aware of Oracle's Real Application Clusters. When it runs, it will detect if Oracle Clusterware is installed, install itself on all nodes in the Cluster, and schedule a batch file that checks the RAC Interconnect.
Data collection intervals are configurable by the user, and all counter collections run on this interval. For example, if OSWFW is configured to collect data once per minute, each counter collection will collect its data, append it to its output file, sleep for one minute and repeat the data collection. Each output file will contain, at most, one hour of data. At the end of each hour, logman creates a new file. This file creation interval is not command line modifiable.
The Operation System utility schtasks is used to remove older data collection files. This is done to prevent these collection files to fill up the disk system they reside on. OSWFW will keep twenty four hours of data on disk and will delete the older files. If these files need to be saved, view the help files on schtasks to set up a different task to archive these files.
OSWFW will terminate after the Run Time Interval. With the proper command line options, OSWFW can be stopped on all nodes, or on individual nodes.
OSWFW is certified to run on the following platforms:
- Windows XP (x86 & x64)
- Windows 7 (x86 & x64)
- Windows 8 (x86 & x64)
- Windows 2003 R1 & R2 (x86 and x64)
- Windows 2008 R1 & R2 (x86 & x64)
- Windows 2012
OSWFW needs to be run in an Administrator account (Run as Administrator). Exactly which OS permissions are required to run the logman or schtasks is not covered in this document. See the proper Microsoft documentation on this topic. OSWFW was tested in a default installation of the Operating System (kept to the most current patch set available during the testing period) which has all permissions set to their default settings.
OSWFW can not run in OS installations that use a language other than English. Various commands used in the batch file do not return results in English reliably enough.
OSWFW should be installed manually by using the following procedure. OSWFW is available through My Oracle Support and is downloaded as a zip file. The user then copies the file oswfw.zip to the directory where OSWFW is to be installed and issues the following command:
C:\> unzip oswfw.zip |
This installs all the files associated with OSWFW into this directory. OSWFW is now installed.
OSWatcher runs in a Real Application Cluster environment and will deploy itself on all nodes that are a cluster member and are up. Prior to running OSW the first time, one should rename the file OSWPrivNet.config.template to OSWPrivNet.config and modify it to contain the IP addresses of all the Interconnect IP addresses. These addresses are the initial IP addresses of the interfaces and not the HAIP addresses. The HAIP addresses can change between system reboots. An example of the OSWPrivNet.config file is as follows:
# Start of OSWPrivNet.config file
# Put the IP addresses for all Interconnect interfaces of all nodes on a single line
# Remove the "#" character from the address line. The following are examples only:
192.168.2.1
192.168.2.2
192.168.2.3
192.168.2.4
# End of OSWPrivNet.config file
In this case each node in the cluster has two interfaces for a total of four IP addresses.
The OSWPrivNet.bat file runs as a scheduled task. Its purpose is to check the viability of the interconnect network. It does this by both pinging and running tracert (traceroute).
Removing OSWFW is quite simple.
C:\> oswatcher remove |
This will first stop and then remove all the OSWFW counters and tasks from a single node, or all RAC nodes.
To complete the removal task, on the host OSWFW was installed on, and on each node in the RAC cluster it was installed on, issue the following command
C:\> del /s osw |
This last step must be manual in order to prevent accidental deletion of the captured data.
OSWFW has had a few more command line options added in order to work in a RAC environment. These are detailed in the following section.
Initially configure OSWFW
To initially configure OSWFW, you specify the interval where logman will collect the counter data, the number of hours OSWFW will run, and if it is to be run on RAC or not. The following is the syntax to configure OSWFW:
OSWatcher {ARG1} {ARG2} {ARG3}
|
When OSWFW is started for the first time it creates the Archive sub-directory and several sub-directories (one for each data collection). OSWFW will automatically start after this command is given.
OSWFW can be reconfigured at any time, running or not, using the same syntax above.
OSWFW will start after the first time the command is issued. It can also be stopped from the command line. To start the OSWFW utility execute the OSWATCHER.BAT batch script from the directory where OSWFW was installed. If not run from this directory, OSWFW will not find its configuration files. If it is installed on RAC, this command starts OSWFW on all nodes or an individual node.
The start command line syntax is:
OSWatcher start {node name} |
If the node name is left off, and OSWFW was installed on a RAC system, it will start all the counters on all the nodes. It does not matter if they have already been started as no change occurs to an already started counter.
OSWFW is configured to create a new log file every hour and this interval is not configurable (there should be no need to configure it). If no arguments are entered, the script runs with default values of collecting data every 30 seconds and will run for 48 hours.
OSWFW is configured to create a new log file every hour and this interval is not configurable (there should be no need to configure it). If no arguments are entered, the script runs with default values of collecting data every 30 seconds and will run for 48 hours.
C:\> oswatcher 60 10 RAC |
This would start the tool, collect data at 60-second intervals, and run for 10 hours. With the last argument, OSWFW will detect it is on RAC, configure all the nodes, and start on all nodes.
OSWatcher stop {node name}
To stop the OSWFW utility execute the OSWatcher stop command from the directory where OSWFW was installed. This will stop all the counters. If OSWFW is installed on a RAC system, an optional node name can be given to stop OSWFW on that node. To stop OSWFW on all nodes, no node name is given.
C:\> oswatcher stop |
This will stop OSWFW on the system it is installed on, or all nodes in a RAC system.
C:\> oswatcher stop curiousgeorge1 |
This will stop OSWFW on the RAC node named curiousgeorge1.
To find out the status of all of the counters, use the command line option of "status". If installed on a RAC system, the status of a specific node can be found. The status command line option is used to provide a quick check of the status. If more detail is needed, use the query command line option.
OSWatcher status {node name} |
It will list all the counters and show if they are running or not:
|
In this example, all OSWatcher counters are running on the node curiousgeorge1. For this example, OSWFW was installed on a RAC system, and the status for one node was requested. This is why the task OSWPrivNet was included. The task OSWCleanup is also included, and would be even for a stand-alone system.
This command line option is to display more detailed information about the counters. The syntax is:
OSWatcher query {node name} {counter / OSWCleanup / OSWPrivNet} |
To query more extensive details of a specific Counter or task on a node, use the query command line option. Counter names are case sensitive. A special counter name "all" is used to specify all nodes or all counters (which includes the tasks OSWCleanup and, if on RAC, OSWPrivNet).
As an example, to query the counter OSWThread on the node curiousgeorge1:
C:\> oswatcher query curiousgeorge1 OSWThread |
To display details for all the counters, and if on RAC, all the nodes, use the option "all". This option will display all the details for each counter, on all nodes, one at a time..
C:\> oswatcher query all |
The files that OSWFW creates can contain more counter outputs than can be easily managed. To break these files down into more manageable sizes, the Windows utility "relog" is used.
Each entry in the OSWFW represents a unique Operating System entity and as such its name can vary from system to system. Other OSWFW capture files are capturing different counters, so follow this procedure to find the names of those objects. The utility "relog" allows you to see all the names of the captured objects. The following is a list of the possible formats of these captured objects:
|
Even though the use of the wild card "*" is possible, it is not a very robust option in this version of the Operating System, and many times does not produce reliable results. As such, a different method is outlined in this document. This method is to put the unique names of the objects of interest into a configuration file and have relog use that. The relog command line syntax can be retrieved from the command line: "relog" This explains, quite well, the syntax of the command and can be referred to if need be.
To extract the names of all the captured objects in a trace file, and save it off so it can be used to create the configuration file, use this command:
relog {trace_file_name} -q > {trace_file_name}.counter.txt |
This will extract the counters as they are in the log file. Typically, these counters are listed in the order they were created, by Performance Object Counter. If you are only after a specific Counter type for all Threads or Objects, then you can use this file to parse out the specific data.
If you wish to group the counters of a specific type, another technique is to sort the file:
relog {trace_file_name} -q | sort /+1 > {trace_file_name}.sorted.counter.txt |
This output file, {trace_file_name}.sorted.counter.txt, now contains just the names of the captured objects and has them sorted. The sorting will group the various counters for a specific OS object. For example, from the entire capture file, once these names are extracted and sorted, the following can be extracted:
\\GEORGE\Thread(svchost/0)\% Privileged Time
\\GEORGE\Thread(svchost/0)\% Processor Time
\\GEORGE\Thread(svchost/0)\% User Time
\\GEORGE\Thread(svchost/0)\Elapsed Time
\\GEORGE\Thread(svchost/0)\ID Process
\\GEORGE\Thread(svchost/0)\ID Thread
\\GEORGE\Thread(svchost/0)\Priority Base
\\GEORGE\Thread(svchost/0)\Priority Current
\\GEORGE\Thread(svchost/0)\Thread State
\\GEORGE\Thread(svchost/0)\Thread Wait Reason
\\GEORGE\Thread(svchost/0#1)\% Privileged Time
\\GEORGE\Thread(svchost/0#1)\% Processor Time
\\GEORGE\Thread(svchost/0#1)\% User Time
\\GEORGE\Thread(svchost/0#1)\Elapsed Time
\\GEORGE\Thread(svchost/0#1)\ID Process
\\GEORGE\Thread(svchost/0#1)\ID Thread
\\GEORGE\Thread(svchost/0#1)\Priority Base
\\GEORGE\Thread(svchost/0#1)\Priority Current
\\GEORGE\Thread(svchost/0#1)\Thread State
\\GEORGE\Thread(svchost/0#1)\Thread Wait Reason
\\GEORGE\Thread(svchost/0#2)\% Privileged Time
\\GEORGE\Thread(svchost/0#2)\% Processor Time
\\GEORGE\Thread(svchost/0#2)\% User Time
\\GEORGE\Thread(svchost/0#2)\Elapsed Time
\\GEORGE\Thread(svchost/0#2)\ID Process
\\GEORGE\Thread(svchost/0#2)\ID Thread
\\GEORGE\Thread(svchost/0#2)\Priority Base
\\GEORGE\Thread(svchost/0#2)\Priority Current
\\GEORGE\Thread(svchost/0#2)\Thread State
\\GEORGE\Thread(svchost/0#2)\Thread Wait Reason
From above, we see that the Machine name is "GEORGE," while the object is "Thread" and the parent executable is "svchost." In this case, the parent executable, svchost/0 (the base instance) is listed along with three of its indexes. Each index is a separate thread. Even though each thread has an Index ID, this number is not the ID Thread. Finding the ID Thread for a particular thread is a little more complex and is outlined later in this document. The last part of the captured object name is the actual counter, for instance "Thread Wait Reason" or "Thread State."
Other techniques can also be used. If there are Unix utilities installed on your Windows system, you can use the utility "grep" to extract just the "Thread State" counters, or any other combination of strings.
Since this part of this guide concerns reducing the amount of information in one of the capture files, we are going to extract all of the counters for the base executable and just one of its child Threads. To do this we copy the file we created above to a file we are to modify. We do this just in case there will be a different combination of objects we wish to extract later.
copy {trace_file_name}.counter.txt thread_svchost_0.txt |
Edit the thread_svchost_0.txt file to contain only the counters that refer to svchost/0 and svchost/0#1.
\\GEORGE\Thread(svchost/0)\% Privileged Time
\\GEORGE\Thread(svchost/0)\% Processor Time
\\GEORGE\Thread(svchost/0)\% User Time
\\GEORGE\Thread(svchost/0)\Elapsed Time
\\GEORGE\Thread(svchost/0)\ID Proces
\\GEORGE\Thread(svchost/0)\ID Thread
\\GEORGE\Thread(svchost/0)\Priority Base
\\GEORGE\Thread(svchost/0)\Priority Current
\\GEORGE\Thread(svchost/0)\Thread State
\\GEORGE\Thread(svchost/0)\Thread Wait Reason
\\GEORGE\Thread(svchost/0#1)\% Privileged Time
\\GEORGE\Thread(svchost/0#1)\% Processor Time
\\GEORGE\Thread(svchost/0#1)\% User Time
\\GEORGE\Thread(svchost/0#1)\Elapsed Time
\\GEORGE\Thread(svchost/0#1)\ID Process
\\GEORGE\Thread(svchost/0#1)\ID Thread
\\GEORGE\Thread(svchost/0#1)\Priority Base
\\GEORGE\Thread(svchost/0#1)\Priority Current
\\GEORGE\Thread(svchost/0#1)\Thread State
\\GEORGE\Thread(svchost/0#1)\Thread Wait Reason
Save this file. We now run relog to extract the values of these counters from the original log file:
relog {trace_file_name} -cf thread_svchost_0.txt -f csv -o thread_svchost_0.csv |
This command will create a comma-delimited file that can be brought up in Excel or other spread-sheet-like application. This "csv" can now be imported into Excel to use its graphing capabilities, or to further examine the file.
Keep in mind that if Excel is to be used, some versions have a limit as to the number of columns one spreadsheet can have (256 columns in Excel 2000 so check your version's limits). Each counter will be a column in Excel. Each row will be the counter's value. The number of rows this will resolve to will depend on the command line options issued when OSWFW was started that created these log files.
Depending on the size of the file and number of counters listed, this extraction could take some time. It was found that the smaller number of counters in the configuration file, the quicker this extract takes. It might be faster to perform various small extracts and concatenate the output files together in the end. This determination is left to the reader.
OSWFW, by default, is configured to capture the ID Tread counter. All Performance Counters, on the other hand, use the "Thread Instance Number" to delineate a thread spawned by a particular process. This Thread Instance Number is a monotonically increasing number, starting from zero, which identifies a thread in a particular process. In conjunction with the Process Name and thread Instance Number, there is also the ID Thread, which is a globally unique number assigned to each Thread. Unfortunately, logman does not put the ID Thread as part of counter name, but only the Process Name and the Thread Instance Number so one has to capture the ID Thread as a separate counter. This counter does not change during the lifetime of the Thread. Depending on how often the Parent process creates and destroys threads, this number can be reused. The global ID Thread, on the other hand, might repeat, but that case is exceptional and today computers are not manufactured with that much memory to accommodate that many threads.
When the Oracle Database views V$PROCESS.SPID or V$SESSION.PROCESS are queried for the Process ID of a particular process, both the Process ID and ID Thread are returned.
SQL> SELECT PROGRAM, SPID, ADDR FROM V$PROCESS; |
Since the Windows Operating System is thread based, the Process ID alone will not give enough information to trace down the information that OSWatcher delivers, so the ID Thread is needed. Unfortunately, the Operating System logs that can be used (the Counters) do not use the Process ID or the ID Thread but use the Process Name and the Thread Instance Number. This section describes how to find the ID Thread in the logs and relate them to the Process Name and Thread Instance Number so the information in the logs for the ID Thread of interest can be extracted from the connection log files.
OSWFW, by default, is configured to capture the ID Tread counter by using the "\Thread(*)\ID Thread" counter. This counter will log the ID Thread for all threads in the system (because of the use of the wildcard "*"). This static counter does not change for the life of the thread. All Performance Counters, use the Thread Instance Number to delineate a thread spawned by a particular process. This Thread Instance Number is a monotonically increasing number, starting from zero, which identifies a thread in a particular process, while the ID Thread is a globally unique number assigned to the thread when it is created.
If you wish to find the performance counter that corresponds to the ID Thread of interest, you will have to find the Process Name and Thread Instance Number for that ID Thread. This counter does not change during the lifetime of the Thread.
To extract the ID Thread for a particular thread, first all the ID Threads must be extracted from the log file. This can be done using the wildcard "*". The syntax of relog is a little touchy, so if the following format does not work, use the method outlined above to create a configuration file from the exact counter names. To extract the ID Thread counters and their values, issue the following command:
relog {trace_file_name} -q > {trace_file_name}.counter.txt |
Sorting at this point will not assist as the log file puts all the ID Thread counters together. This extract does not include the values of the counters, just the counter's names. Once this file is created, copy it to another file that will be edited to leave only the ID Thread counter names.
copy {trace_file_name}.counter.txt IDthread.txt |
Edit this file to leave only the entries that are of this format:
\\Machine\Thread({Parent /Instance#Index})\ID Thread
Since it is expected that the reader will be only interested in only one process parent, those that are associated with Oracle, leave only those with the process parent "oracle," "TNSLSNR," and "oradim." As example the list will take on this appearance:
\\GEORGE\Thread(TNSLSNR/0)\ID Thread
\\GEORGE\Thread(TNSLSNR/1)\ID Thread
\\GEORGE\Thread(TNSLSNR/2)\ID Thread
\\GEORGE\Thread(oracle/0)\ID Thread
\\GEORGE\Thread(oracle/1)\ID Thread
\\GEORGE\Thread(oracle/2)\ID Thread
\\GEORGE\Thread(oracle/3)\ID Thread
\\GEORGE\Thread(oracle/4)\ID Thread
\\GEORGE\Thread(oracle/5)\ID Thread
\\GEORGE\Thread(oracle/6)\ID Thread
\\GEORGE\Thread(oracle/7)\ID Thread
\\GEORGE\Thread(oracle/8)\ID Thread
\\GEORGE\Thread(oracle/9)\ID Thread
\\GEORGE\Thread(oracle/10)\ID Thread
\\GEORGE\Thread(oracle/11)\ID Thread
\\GEORGE\Thread(oracle/12)\ID Thread
\\GEORGE\Thread(oracle/13)\ID Thread
\\GEORGE\Thread(oracle/14)\ID Thread
\\GEORGE\Thread(oracle/15)\ID Thread
\\GEORGE\Thread(oracle/16)\ID Thread
\\GEORGE\Thread(oracle/17)\ID Thread
\\GEORGE\Thread(oradim/0)\ID Thread
This list contains the process parents of the Oracle Listener (TNSLSNR), the Oracle executable (oracle) and the Database Configuration Assistant (oradim). This file will be used to extract just the ID Threads.
relog {trace_file_name} -cf IDThread.txt -f csv -o IDThread.csv |
The output file, IDThread.csv, now contains all the ID Threads for the Oracle Threads. The simplest method to use at this point is to bring up the file in Excel to find the ID Thread. It will be the number that was found from V$PROCESS or V$SESSION. When selecting the process ID from V$SESSION, remote sessions will have the Process ID of the Client process also. The format will be:
Client ID Thread:Server ID Thread |
The select statement to use to find the ID Thread is:
|
In this example, the SQLPLUS.EXE ID Thread is 480 while the SQL*Plus Process ID is 2572. If the Oracle background threads are under scrutiny, use the V$PROCESS view to find the ID Thread:
|
In the case where the intent is to isolate which thread the SQL*Plus session is part of, take the PADDR from V$SESSION (3425290C) and find it in V$PROCESS. This will result in the IDThread of 3344 (ORACLE.EXE (SHAD) 3344 3425290C).
Once the ID Thread of in interest is found in the IDThread.csv file, the name of the counter will be the header for that column. In the case where the PMON thread is to be examined, search for the ID Thread 3220. In this case it will have the counter name of:
\\GEORGE\Thread\oracle(3)\ID Thread = 3220
After all of this work, the ID Thread can now be related to the parent Process Name and the Thread Instance Number. From this information, all the counters for this particular thread can be extracted from the log file. In the case mentioned above, where the interest lies in PMON, the counter "\\GEORGE\Thread\oracle(3)\ID Thread" is extracted.
But wait, there's more. The use of wild cards would come in quite handy at this point in the process, but lacking that, the counters for the particular thread have to be pulled from the list of all counters created earlier.
type {trace_file_name}.counter.txt | sort /+1 > Thread_oracle_3.txt |
This sort will combine all the counters based on their name, rather than the order they were gathered in. From this new file it should be easy to get the counters for \\GEORGE\Thread\oracle(3). Once all the extraneous counters are removed, the file should contain something like:
\\GEORGE\Thread(oracle/3)\% Privileged Time
\\GEORGE\Thread(oracle/3)\% Processor Time
\\GEORGE\Thread(oracle/3)\% User Time
\\GEORGE\Thread(oracle/3)\Elapsed Time
\\GEORGE\Thread(oracle/3)\ID Process
\\GEORGE\Thread(oracle/3)\ID Thread
\\GEORGE\Thread(oracle/3)\Priority Base
\\GEORGE\Thread(oracle/3)\Priority Current
\\GEORGE\Thread(oracle/3)\Thread State
\\GEORGE\Thread(oracle/3)\Thread Wait Reason
Now you can extract the counters for the thread of interest:
relog {trace_file_name} -cf Thread_oracle_3.txt -f csv -o Thread_oracle_3.csv |
The file Thread_oracle_3.csv can now be viewed in Excel, or some other editor.
Windows NT 4.0 Resource Kit
Chapter 10 - About Performance Monitor
http://www.microsoft.com/technet/archive/ntwrkstn/reskit/02perfmn.mspx?mfr=true
How To Troubleshoot High CPU Utilization of an MTS or COM+ Process
http://support.microsoft.com/kb/258833
As stated above, when OSWFW is started for the first time it creates the archive subdirectory under the OSWFW installation directory. The archive directory contains several subdirectories, one for each data collection. These directories are named OSWMemory, OSWNetstat, OSWPhysicalDisk, OSWProcess, OSWProcessor, OSWServer_Work_Queue, OSWSystem, and OSWThread. One file per hour will be generated in each of the subdirectories. A new file is created after each hour that OSWFW is running. The file will be in the following format:
%COMPUTERNAME%_OSW<Performance Object>_MMDDHHMM_nnn.csv |
The format of MMDDHHMM is Month, Day, Hour, and Minute. The nnn is a numerical value, which starts at 001 and increases by one, but typically will not is this configuration.
The descriptions of these Counters can be found by bringing up the Windows Performance monitor. First open the Taskbar, Start, Run. In the Run prompt screen, type in "perfmon.msc", without the quotes. In the Performance Microsoft Management Console, the lower right section will list various Counters. Right click this part of the window and select Add Counters. In the Add Counters window the Counter of interest can be brought up and the Explain button can be pressed to bring up the description.
At the end of this document are links to attachments which are text files listing all the Counters and their descriptions for the verions of Windows. They were acquired using Microsoft's PowerShell v2.0 which is installed either by default or through patching the Windows Operating System.
The format of a Counter's name is:
For example:
This is the percentage of the elapsed time that the logical C: disk drive was busy servicing read or write requests.
OSWFW does not run in a directory with spaces in it. This is planned to be fixed in the next release.
If OSWFW is not run as Administrator, it may faslely report it can't run on a remote drive when it is a local drive. This is due to the OS utilities being called can't be run except by the Administrator.
Current OSWatcher for Windows is Version 2.5.1 May 13, 3013
Click here to download the zip file containing OSWFW.
The list of counters can be downloaded via the following links:
Windows2003R2x64Counters
Windows2003R2x86Counters
Windows2003x64Counters
Windows2008R2x64Counters
Windows2008x86Counters
Windows7x64Counters
Windows7x86Counters
Windows8x64Counters
Windows8x86Counters
WindowsXPx64Counters
WindowsXPx86Counters
Wubdiws2012x64Counters