续上周:
- In Bash scripts, subshells (written with parentheses) are convenient ways to group commands. A common example is to temporarily move to a different working directory, e.g.
# do something in current dir
(cd /some/other/dir && other-command)
# continue in orignial dir
In Bash, note there are lots of kinds of variable expansion. Checking a variable exitsts:
${name:?error message}
. For example, if a Bash script requires a single argument, just writeinput_file=${1:?usage: $0 input_file}
. Using a default value if a variable is empty:${name:-default}
. If you want to have an additional (optional) parameter added to the previous example, you can use something likeoutput_file=${2:-logfile}
. If$2
is omitted and thus empty,output_file
will be set tologfile
. Arithmetic expansion:i=$(( (i+1) % 5 ))
. Sequences:{1..10}
. Trimming of strings:${var%suffix}
and${var#prefix}
. For example ifvar=foo.pdf
, thenecho ${var%.pdf}.txt
printsfoo.txt
.Brace expansion using
{
...}
can reduce having to re-type similar text and automate combinations of items. This is helpful in examples likemv foo.{txt,pdf} some-dir
(which moves both files),cp somefile{,.bak}
(which expands tocp somefile somefile.bak
) ormkdir -p test-{a,b,c}/subtest-{1,2,3}
(which expands all possible combinations and creates a directory tree). Brace expansion is performed before any other expansion.The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion. (For example, a range like
{1..20}
cannot be expressed with variables using{$a..$b}
. Useseq
or afor
loop instead, e.g.,seq $a $b
orfor((i=a; i<=b; i++)); do ...; done
.)The output of a command can be treated like a file via
<(some command)
(known as process substitution). For example, compare local/etc/hosts
with a remote one:
diff /etc/hosts <(ssh somehost cat /etc/hosts)
- When writing scripts you may want to put all of your code in curly braces. If the closing brace is missing, your script will be prevented from executing due to a syntax error. This makes sense when your script is going to be downloaded from the web, since it prevents partially downloaded scripts from executing:
{
# Your code here
}
- A "here document" allows redirection of multiple lines of input as if from a file:
cat <
In Bash, redirect both standard output and standard error via:
some-command >logfile 2>&1
orsome-command &>logfile
. Often, to ensure a command does not leave an open file handle to standard input, tying it to the terminal you are in, it is also good practice to add.
Use
man ascii
for a good ASCII table, with hex and decimal values. For general encoding info,man unicode
,man utf-8
, andman latin1
are helpful.
-Use screen
or [tmux](https://tmux.github.io)
to multiplex the screen, especially useful on remote ssh sessions and to detach and re-attach to a session. byobu
can enhance screen or tmux by providing more information and easier management. A more minimal alternative for session persistence only is [dtach](https://github.com/bogner/dtach)
.
In ssh, knowing how to port tunnel with
-L
or-D
(and occasionally-R
) is useful, e.g. to access web sites from a remote server.It can be useful to make a few optimizations to your ssh configuration; for example, this
~/.ssh/config
contains settings to avoid dropped connections in certain network environments, uses compression (which is helpful with scp over low-bandwidth connections), and multiplex channaels to the same server with a local control file:
TCPKeepAlive=yes
ServerAliveInterval=15
ServerAliveCountMax=6
Compression=yes
ControlMaster auto
ControlPath /tmp/%r@%h:%p
ControlPersist yes
A few other options relevant to ssh are security sensitive and should be enables with care, e.g. per subnet or host or in trusted networks:
StrictHostKeyChecking=no
,ForwardAgent=yes
Consider
[mosh](https://mosh.mit.edu)
an alternative to ssh that uses UDP, avoiding dropped connecitons and adding convenience on the road (requires server-side setup).To get the permissions on a file in octal form, which is useful for system configuration but not available in
ls
and easy to bungle, use something like
stat -c '%A %a %n' /etc/timezone
For interactive selection of values from the output of another command, use
[percol](https://github.com/mooz/percol)
or[fzf](https://github.com/junegunn/fzf)
.For interaction with files based on the output of another command (like
git
), usefpp
(PathPicker).For a simple web server for all files in the current directory (and subdirs), available to anyone on your network, use:
python -m SimpleHTTPServer 7777
(for port 7777 and Python 2) andpython -m http.server 7777
(for port 7777 and Python 3).For running a command as another user, use
sudo
. Defaults to running as root; use-u
to specify another user. Use-i
to login as that user (you will be asked for your password).For switching the shell to another user, use
su username
orsu - username
. The latter with "-" gets an environment as if another user just logged in. Omitting the username defaults to root. You will be asked for the password of the user you are switching to.Know about the 128K limit on command lines. This "Argument list too long" error is common when wildcard matching large numbers of files. (When this happens alternatives liek
find
andxargs
may help.)For a basic calculator (and of course access to Python in general), use the
python
interperter. For example,
>>> 2+3
5
Processing files and data
To locate a file by name in the current directory,
find . -iname '*something*'
(or similar). To find a file anywhere by name, uselocate something
(but bear in mindupdatedb
may not have indexed recently created files).For general searching through source or data files, there are several options more advanced or faster than
grep -r
, including (in rough order from older to newer)[ack](https://github.com/beyondgrep/ack2)
,[ag](ttps://github.com/ggreer/the_silver_searcher)
("the silver searcher"), and[rg](https://github.com/BurntSushi/ripgrep)
(ripgrep).To convert HTML to text:
lynx -dump -stdin
For Markdown, HTML, and all kinds of document conversion, try
[pandoc](http://pandoc.org/)
. For example, to convert a Markdown document to Word format:pandoc README.md --from markdown --to docx -o temp.docx
If you must handle XML,
xmlstarlet
is old but good.For JSON, use
[jq](http://stedolan.github.io/jq/)
. For interactive use, also see[jid](https://github.com/simeji/jid)
and[jiq](https://github.com/fiatjaf/jiq)
.For YAML, use
[shyaml](https://github.com/0k/shyaml)
.For Excel or CSV files,
[csvkit](https://github.com/onyxfish/csvkit)
providesin2csv
,csvcut
,csvjoin
,csvgrep
, etc.For Amazon S3,
[s3cmd](https://github.com/s3tools/s3cmd)
is convenient and[s4cmd](https://github.com/bloomreach/s4cmd)
is faster. Amazon's[aws](https://github.com/aws/aws-cli)
and the improved[saws](https://github.com/donnemartin/saws)
are essential for other AWS-related tasks.Know about
sort
anduniq
, including uniq's-u
and-d
options -- see one-liners below. See alsocomm
.Know about
cut
,paste
, andjoin
to manipulate text files. Many people usecut
but forget aboutjoin
.Know about
wc
to count newlines (-l
), characters (-m
), words (-w
) and bytes(-c
).Know about
tee
to copy from stdin to a file and also to stdout, as inls -al | tee file.txt
.For more complex calculations, including grouping, reversing fields, and statistical calculations. consider
[datamash](https://www.gnu.org/software/datamash/)
.Know that locale affects a lot of command line tools in subtle ways, including sorting order (collation) and performance. Most Linux installations will set
LANG
or other locale variables to a local setting like US English. But be aware sorting will change if you change locale. And know i18n routines can make sort or other commands run many times slower. In some situations (such as the set operations or uniqueness operations below) you can safely ignore slow i18n routines entirely and use traditional byte-based sort order, usingexport LC_ALL=C
.You can set a specific command's environment by prefixing its invocation with the environment variable settings, as in
TZ=Pacific/Fiji date
.Know basic
awk
andsed
for simple data munging. See[One-liners](#one-liners)
for examples.To replace all occurrences of a string in place, in one or more files:
perl -pi.bak -e 's/old-string/new-string/g' my-files-*.txt
- To rename multiple files and/or search and replace within files, try
[repren](https://github.com/jlevy/repren)
. (In some cases therename
command also allows multiple renames, but be careful as its functionality is not the same on all Linux distributions.)
# Full rename of filenames, directories, and contents foo -> bar:
repren --full --preserve-case --from foo --to bar .
# Recover backup files whatever.bak -> whatever:
repren --renames --from '(.*)\.bak' --to '\1' *.bak
# Same as above, using rename, if available:
rename 's/\.bak$//' *.bak
- As the man page says,
rsync
really is a fast and extraordinarily versatile file copying tool. It's known for synchronizing between machines but is equally useful locally. When security restrictions allow, usingrsync
instead ofscp
allows recovery of a transfer without restarting from scratch. It also is among the fastest ways to delete large numbers of files:
mkdir empty && rsync -r --delete empty/ some-dir && rmdir some-dir
For monitoring progress when processing files, use
[pv](http://www.ivarch.com/programs/pv.shtml)
,[pycp](https://github.com/dmerejkowsky/pycp)
,[pmonitor](https://github.com/dspinellis/pmonitor)
,[progress](https://github.com/Xfennec/progress)
,rsync --progress
, or, for block-level copying,dd status=progress
.Use
shuf
to shuffle or select random lines from a file.Know
sort
's options. For numbers, use-n
, or-h
for handling human-readable numbers (e.g. fromdu -h
). Know how keys work (-t
and-k
). In particular, watch out that you need to write-k1,1
to sort by only the first field;-k1
means sort according to the whole line. Stable sort (sort -s
) can be useful. For example, to sort first by field 2, then secondarily by field 1, you can usesort -k1,1 | sort -s -k2,2
.If you ever need to write a tab literal in a command line in Bash (e.g. for the -t argument to sort), press ctrl-v [Tab] or write
$'\t'
(the latter is better as you can copy/paste it).The standard tools for patching source code are
diff
andpatch
. See alsodiffstat
for summary statistics of a diff andsdiff
for a side-by-side diff. Notediff -r
works for entire directories. Usediff -r tree1 tree2 | diffstat
for a summary of changes. Usevimdiff
to compare and edit files.For binary files, use
hd
,hexdump
orxxd
for simple hex dumps andbvi
,hexedit
orbiew
for binary editing.Also for binary files,
strings
(plusgrep
, etc.) lets you find bits of text.For binary diffs (delta compression), use
xdelta3
.To convert text encodings, try
iconv
. Oruconv
for more advanced use; it supports some advanced Unicode things. For example:
# Displays hex codes or actual names of characters (useful for debugging):
uconv -f utf-8 -t utf-8 -x '::Any-Hex;' < input.txt
uconv -f utf-8 -t utf-8 -x '::Any-Name;' < input.txt
# Lowercase and removes all accents (by expanding and dropping them):
uconv -f utf-8 -t utf-8 -x '::Any-Lower; :: Any-NFD; [:Nonspacing Mark:]>; ::Any-NFC;' < input.txt > output.txt
To split files into pieces, see
split
(to split by size) andcsplit
(to split by a pattern).Date and time: To get the current date and time in the helpful ISO 8601 format, use
date -u +"%Y-%m-%dT%H:%M:%SZ"
(other options are problematic). To manipulate date and time expressions, usedateadd
,datediff
,strptime
etc. fromdeteutils
.Use
zless
,zmore
,zcat
, andzgrep
to operate on compressed files.File attributes are settable via
chattr
and offer a lower-level alternative to file permissions. For example, to protect against accidental file deletion the immutable flag:sudo chattr +i /critical/directory/or/file
Use
getfacl
andsetfacl
to save and restore file permissions. For example:
getfacl -R /some/path > permission.txt
setfacl --restore=permissions.txt
- To create empty files quickly, use
truncate
(creates sparse file),fallocate
(ext4, xfs, btrfs and ocfs2 filesystems),xfs_mkfile
(almost any filesystems, comes in xfsprogs package),mkfile
(for Unix-like systems like Solaris, Mac OS).
System debugging
For web debugging,
curl
andcurl -I
are handy, or theirwget
equivalents, or the more modernhttpie
.To know current cpu/disk status, the classic tools are
top
(or the betterhtop
),iostat
, andiotop
. Useiostat -mxz 15
for basic CPU and detailed per-partition disk stats and performance insight.For network connection details, use
netstat
andss
.For a quick overview of what's happening on a system,
dstat
is especially useful. For broadest overview with details, useglances
.To know memory status, run and understand the output of
free
andvmstat
. In particular, be aware the "cached" value is memory held by the Linux kernel as file cache, so effectively counts toward the "free" value.Java system debugging is a different kettle of fish, but a simple trick on Oracle's and some other JVMs is that you can run
kill -3
and a full stack trace and heap summary(including generational garbage collection details, which can be highly informative) will be dumped to stderr/logs. The JDK'sjps
,jstat
,jstack
,jmap
are useful. SJK tools are more advanced.Use
mtr
as a better traceroute, to identify network issues.For looking at why a disk is full,
ncdu
saves time over the usual commands likedu -sh *
.To find which socket or process is using bandwidth, try
iftop
ornethogs
.The
ab
tool (comes with Apache) is helpful for quick-and-dirty checking of web server performance. For more complex load testing, trysiege
.For more serious network debugging,
wireshark
,tshark
, orngrep
.Know about
strace
andltrace
. These can be helpful if a program is failing, hanging, or crashing, and you don't know why, or if you want to get a general idea of perfomance. Note the profiling option (-c
), and the ability to attach to a running process (-p
). Use trace child option (-f
) to avoid missing important calls.Know about
ldd
to check shared libraries etc - but never run it on untrusted files.Know how to connect to a running process with
gdb
and get its stack traces.Use
/proc
. It's amazingly helpful sometimes when debugging live problems. Examples:/proc/cpuinfo
,/proc/meminfo
,/proc/cmdline
,/proc/xxx/cwd
,/proc/xxx/exe
,/proc/xxx/fd/
,/proc/xxx/smaps
(wherexxx
is the process id or pid).