Here is a list of frequently asked questions that may help you answer any questions you may have before you even have to ask them.
TELNET, RLOGIN and FTP have been disabled on ABACUS for security reasons. User can use SSH or SLOGIN to log into your accounts on ABACUS and use SCP to transfer files between different machines. Windows users can use "Secure Shell Client" for login and "Secure File Transfer Client" for file transfer. For those users who have no access to "Secure Shell Client" on Windows machines, can download a free SSH Client called putty.exe from the PuTTY web page. SFTP can be used instead of FTP for those who prefer using FTP for file transfer.
Head node is the Login node. Its IP address is abacus.uwaterloo.ca. The following example shows how users can log into the head node. Login to compute nodes is not recommended, but they can do so when necessary. Users will get a uniform interface for the home directories no matter which node they log into.
Example of login to ABACUS from another UNIX/Linux machine: Suppose that you are a user on a UNIX machine named monolith, you want to log into abacus, you have a user name of "foobar" on ABACUS and a password of "tricky". You do following (the texts in bold face are the commands you need to type in),
monolith:~% ssh -l foobar abacus.uwaterloo.ca foobar@abacus's password: tricky [foobar@head ~]$Example of transfer files between ABACUS and another UNIX/Linux machine: Suppose that you are a user on UNIX machine monolith, you want to transfer a file named file.txt which is located in the home directory of monolith, to ABACUS, your user name is "foobar" on ABACUS and password is "tricky". You do following,
monolith:~% scp file.txt [email protected]: foobar@abacus's password: trickyExample of using SFTP: Suppose that you are a user on UNIX machine monolith, you want to transfer files between monolith and ABACUS, your user name is "foobar" on ABACUS and password is "tricky". You do following,
monolith:~% sftp [email protected] foobar@abacus's password: tricky sftp>
First, login to ABACUS, then issue the command 'passwd'. The system will prompt you for the old (existing) password and ask you to choose a new password. Please follow this guideline in choosing a password,
[foobar@head ~]$ passwd
Supposed you have logged into head, you now want to log into node035 (i.e., quad32g001), you do,
[foobar@head ~]$ ssh node035
monolith:~% ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/foobar/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/foobar/.ssh/id_rsa. Your public key has been saved in /home/foobar/.ssh/id_rsa.pub. The key fingerprint is: 0c:44:8c:3e:b9:b4:20:e3:83:4b:19:d9:54:cf:65:35 foobar@monolithPlease note, when the system prompts for passphrase, just enter, don't type any passphrase.
monolith:~% cd .ssh monolith:~/.ssh% scp id_rsa.pub abacus:On ABACUS,
[foobar@head ~]$ cd .sshIf the file authorized_keys does not already exist,
[foobar@head .ssh]$ touch authorized_keys [foobar@head .ssh]$ cat ~/id_rsa.pub >> authorized_keysNow, user foobar can login to ABACUS from monolith without typing the password,
monolith:~% ssh foobar@abacus
The Portable Batch System, PBS, is a workload management system for Linux clusters. It supplies command to submit, monitor, and delete jobs. It has the following components.
Job Server - also called pbs_server provides the basic batch services such as receiving/creating a batch job, modifying the job, protecting the job against system crashes, and running the job.
Job Executor - a daemon (pbs_mom) that actually places the job into execution when it receives a copy of the job from the Job Server, and returns the job's output to the user.
Job Scheduler - a daemon that contains the site's policy controlling which job is run and where and when it is run. PBS allows each site to create its own Scheduler. Maui Scheduler is used on ABACUS.
Below are the steps needed to run user job:
PBS Options
Below are some of the commonly used PBS options in a job script file. The options start with "#PBS."
Option Description ====== =========== #PBS -N MyJob Assigns a job name. The default is the name of PBS job script. #PBS -l nodes=4:ppn=2 The number of nodes and processors per node. #PBS -q queuename Assigns the queue your job will use. #PBS -l walltime=01:00:00 The maximum wall-clock time during which this job can run. #PBS -o mypath/my.out The path and file name for standard output. #PBS -e mypath/my.err The path and file name for standard error. #PBS -j oe Join option that merges the standard error stream with the standard output stream of the job. #PBS -W stagein=file_list Copies the file onto the execution host before the job starts. #PBS -W stageout=file_list Copies the file from the execution host after the job completes. #PBS -m b Sends mail to the user when the job begins. #PBS -m e Sends mail to the user when the job ends. #PBS -m a Sends mail to the user when job aborts (with an error). #PBS -m ba Allows a user to have more than 1 command with the same flag by grouping the messages together on 1 line, else only the last command gets executed. #PBS -r n Indicates that a job should not rerun if it fails. #PBS -V Exports all environment variables to the job.Job Script Example
A job script may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options.
For example, a simple job script, named geo1.bash, contains the following lines:
#!/bin/bash #PBS -l nodes=1:ppn=1 #PBS -V PBS_O_WORKDIR=/home/huang/temp myPROG='/home/huang/software/nwchem-4.7/bin/LINUX64_x86_64/nwchem' myARGS='/home/huang/software/tce-test/geo-0.98.nw' cd $PBS_O_WORKDIR $myPROG $myARGS >& out1An example to run a job in a specific node, contains the following lines:
#!/bin/bash #PBS -l nodes=node035:ppn=1 #PBS -V PBS_O_WORKDIR=/home/huang/temp myPROG='/home/huang/software/nwchem-4.7/bin/LINUX64_x86_64/nwchem' myARGS='/home/huang/software/tce-test/geo-0.98.nw' cd $PBS_O_WORKDIR $myPROG $myARGS >& out1Another example, a MPI job scipt, named geo2.bash, contains the following lines:
#!/bin/bash #PBS -l nodes=4:ppn=4 #PBS -V NCPUS=16 PBS_O_WORKDIR=/home/huang/temp cd $PBS_O_WORKDIR cat $PBS_NODEFILE > .machinefile myPROG='/home/huang/software/nwchem-4.7/bin/LINUX64_x86_64/nwchem_mpi' myARGS='/home/huang/software/tce-test/geo-0.98.nw' MPIRUN='/opt/mpich.pgi/bin/mpirun' $MPIRUN -np $NCPUS -machinefile .machinefile $myPROG $myARGS >& out2
The above job script templates should be modified for the need of the job. You need to change the contents of the variables PBS_O_WORKDIR, myPROG and myARGS only.
Submitting a Job
Use the qsub command to submit the job,
qsub geo2.bashPBS assigns a job a unique job identifier once it is submitted (e.g. 70.head). This job identifier will be used to monitor status of the job later. After a job has been queued, it is selected for execution based on the time it has been in the queue, wall-clock time limit, and number of processors.
Monitoring a Job
Below are the PBS commands for monitoring a job:
Command Function ======= ======== qstat -a check status of jobs, queues, and the PBS server qstat -f get all the information about a job, i.e. resources requested, resource limits, owner, source, destination, queue, etc. qdel JobID delete a job from the queue qhold JobID hold a job if it is in the queue qrls JobID release a job from hold
There are some quite useful Maui commands for monitoring a job, too:
Command Description ======= =========== showq Show a detailed list of submitted jobs showbf Show the free resources (time and processors available) at the moment checkjob JobID Show a detailed description of the job JobID showstart JobID Gives an estimate of the expected started time of the job JobID
For example, to check the status of a job,
qstat -f 70.head or checkjob 70.head
File systems on the head node are backed up to tape drives once a week. Incremental backup for the /home file system to another Linux machine is done daily. Users are also encouraged to back up their files to another system or any removable media by themselves for safety. For example, to copy file over to another UNIX/Linux machine, users can use rsync or scp commands. To copy files over to their PCs, users can use 'SSH Secure File Transfer Client'.