2019-05-28每周学习笔记

续上周：

In Bash scripts, subshells (written with parentheses) are convenient ways to group commands. A common example is to temporarily move to a different working directory, e.g.

    # do something in current dir
    (cd /some/other/dir && other-command)
    # continue in orignial dir

In Bash, note there are lots of kinds of variable expansion. Checking a variable exitsts: ${name:?error message}. For example, if a Bash script requires a single argument, just write input_file=${1:?usage: $0 input_file}. Using a default value if a variable is empty: ${name:-default}. If you want to have an additional (optional) parameter added to the previous example, you can use something like output_file=${2:-logfile}. If $2 is omitted and thus empty, output_file will be set to logfile. Arithmetic expansion: i=$(( (i+1) % 5 )). Sequences: {1..10}. Trimming of strings: ${var%suffix} and ${var#prefix}. For example if var=foo.pdf, then echo ${var%.pdf}.txt prints foo.txt.
Brace expansion using {...} can reduce having to re-type similar text and automate combinations of items. This is helpful in examples like mv foo.{txt,pdf} some-dir (which moves both files), cp somefile{,.bak} (which expands to cp somefile somefile.bak) or mkdir -p test-{a,b,c}/subtest-{1,2,3} (which expands all possible combinations and creates a directory tree). Brace expansion is performed before any other expansion.
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and command substitution (done in a left-to-right fashion); word splitting; and filename expansion. (For example, a range like {1..20} cannot be expressed with variables using {$a..$b}. Use seq or a for loop instead, e.g., seq $a $b or for((i=a; i<=b; i++)); do ...; done.)
The output of a command can be treated like a file via <(some command) (known as process substitution). For example, compare local /etc/hosts with a remote one:

    diff /etc/hosts <(ssh somehost cat /etc/hosts)

When writing scripts you may want to put all of your code in curly braces. If the closing brace is missing, your script will be prevented from executing due to a syntax error. This makes sense when your script is going to be downloaded from the web, since it prevents partially downloaded scripts from executing:

    {
        # Your code here
    }

A "here document" allows redirection of multiple lines of input as if from a file:

cat <

 
  
  In Bash, redirect both standard output and standard error via: some-command >logfile 2>&1 or some-command &>logfile. Often, to ensure a command does not leave an open file handle to standard input, tying it to the terminal you are in, it is also good practice to add .
 
  Use man ascii for a good ASCII table, with hex and decimal values. For general encoding info, man unicode, man utf-8, and man latin1 are helpful. 
 
 
 -Use screen or [tmux](https://tmux.github.io) to multiplex the screen, especially useful on remote ssh sessions and to detach and re-attach to a session. byobu can enhance screen or tmux by providing more information and easier management. A more minimal alternative for session persistence only is [dtach](https://github.com/bogner/dtach). 
  
  In ssh, knowing how to port tunnel with -L or -D (and occasionally -R) is useful, e.g. to access web sites from a remote server. 
  It can be useful to make a few optimizations to your ssh configuration; for example, this ~/.ssh/config contains settings to avoid dropped connections in certain network environments, uses compression (which is helpful with scp over low-bandwidth connections), and multiplex channaels to the same server with a local control file: 
  
     TCPKeepAlive=yes
    ServerAliveInterval=15
    ServerAliveCountMax=6
    Compression=yes
    ControlMaster auto
    ControlPath /tmp/%r@%h:%p
    ControlPersist yes
 
  
  A few other options relevant to ssh are security sensitive and should be enables with care, e.g. per subnet or host or in trusted networks: StrictHostKeyChecking=no, ForwardAgent=yes 
  Consider [mosh](https://mosh.mit.edu) an alternative to ssh that uses UDP, avoiding dropped connecitons and adding convenience on the road (requires server-side setup). 
  To get the permissions on a file in octal form, which is useful for system configuration but not available in ls and easy to bungle, use something like 
  
     stat -c '%A %a %n' /etc/timezone
 
  
  For interactive selection of values from the output of another command, use [percol](https://github.com/mooz/percol) or [fzf](https://github.com/junegunn/fzf). 
  For interaction with files based on the output of another command (like git), use fpp (PathPicker). 
  For a simple web server for all files in the current directory (and subdirs), available to anyone on your network, use: python -m SimpleHTTPServer 7777 (for port 7777 and Python 2) and python -m http.server 7777 (for port 7777 and Python 3). 
  For running a command as another user, use sudo. Defaults to running as root; use -u to specify another user. Use -i to login as that user (you will be asked for your password). 
  For switching the shell to another user, use su username or su - username. The latter with "-" gets an environment as if another user just logged in. Omitting the username defaults to root. You will be asked for the password of the user you are switching to. 
  Know about the 128K limit on command lines. This "Argument list too long" error is common when wildcard matching large numbers of files. (When this happens alternatives liek find and xargs may help.) 
  For a basic calculator (and of course access to Python in general), use the python interperter. For example, 
  
 >>> 2+3
5
 
 Processing files and data 
  
  To locate a file by name in the current directory, find . -iname '*something*' (or similar). To find a file anywhere by name, use locate something (but bear in mind updatedb may not have indexed recently created files). 
  For general searching through source or data files, there are several options more advanced or faster than grep -r, including （in rough order from older to newer) [ack](https://github.com/beyondgrep/ack2), [ag](ttps://github.com/ggreer/the_silver_searcher) ("the silver searcher"), and [rg](https://github.com/BurntSushi/ripgrep) (ripgrep). 
  To convert HTML to text: lynx -dump -stdin 
  For Markdown, HTML, and all kinds of document conversion, try [pandoc](http://pandoc.org/). For example, to convert a Markdown document to Word format: pandoc README.md --from markdown --to docx -o temp.docx 
  If you must handle XML, xmlstarlet is old but good. 
  For JSON, use [jq](http://stedolan.github.io/jq/). For interactive use, also see [jid](https://github.com/simeji/jid) and [jiq](https://github.com/fiatjaf/jiq). 
  For YAML, use [shyaml](https://github.com/0k/shyaml). 
  For Excel or CSV files, [csvkit](https://github.com/onyxfish/csvkit) provides in2csv, csvcut, csvjoin, csvgrep, etc. 
  For Amazon S3, [s3cmd](https://github.com/s3tools/s3cmd) is convenient and [s4cmd](https://github.com/bloomreach/s4cmd) is faster. Amazon's [aws](https://github.com/aws/aws-cli) and the improved [saws](https://github.com/donnemartin/saws) are essential for other AWS-related tasks. 
  Know about sort and uniq, including uniq's -u and -d options -- see one-liners below. See also comm. 
  Know about cut, paste, and join to manipulate text files. Many people use cut but forget about join. 
  Know about wc to count newlines (-l), characters (-m), words (-w) and bytes(-c). 
  Know about tee to copy from stdin to a file and also to stdout, as in ls -al | tee file.txt. 
  For more complex calculations, including grouping, reversing fields, and statistical calculations. consider [datamash](https://www.gnu.org/software/datamash/). 
  Know that locale affects a lot of command line tools in subtle ways, including sorting order (collation) and performance. Most Linux installations will set LANG or other locale variables to a local setting like US English. But be aware sorting will change if you change locale. And know i18n routines can make sort or other commands run many times slower. In some situations (such as the set operations or uniqueness operations below) you can safely ignore slow i18n routines entirely and use traditional byte-based sort order, using export LC_ALL=C. 
  You can set a specific command's environment by prefixing its invocation with the environment variable settings, as in TZ=Pacific/Fiji date. 
  Know basic awk and sed for simple data munging. See [One-liners](#one-liners) for examples. 
  To replace all occurrences of a string in place, in one or more files: 
  
     perl -pi.bak -e 's/old-string/new-string/g' my-files-*.txt
 
  
  To rename multiple files and/or search and replace within files, try [repren](https://github.com/jlevy/repren). (In some cases the rename command also allows multiple renames, but be careful as its functionality is not the same on all Linux distributions.) 
  
     # Full rename of filenames, directories, and contents foo -> bar:
    repren --full --preserve-case --from foo --to bar .
    # Recover backup files whatever.bak -> whatever:
    repren --renames --from '(.*)\.bak' --to '\1' *.bak
    # Same as above, using rename, if available:
    rename 's/\.bak$//' *.bak
 
  
  As the man page says, rsync really is a fast and extraordinarily versatile file copying tool. It's known for synchronizing between machines but is equally useful locally. When security restrictions allow, using rsync instead of scp allows recovery of a transfer without restarting from scratch. It also is among the fastest ways to delete large numbers of files: 
  
 mkdir empty && rsync -r --delete empty/ some-dir && rmdir some-dir
 
  
  For monitoring progress when processing files, use [pv](http://www.ivarch.com/programs/pv.shtml), [pycp](https://github.com/dmerejkowsky/pycp), [pmonitor](https://github.com/dspinellis/pmonitor), [progress](https://github.com/Xfennec/progress), rsync --progress, or, for block-level copying, dd status=progress. 
  Use shuf to shuffle or select random lines from a file. 
  Know sort's options. For numbers, use -n, or -h for handling human-readable numbers (e.g. from du -h). Know how keys work (-t and -k). In particular, watch out that you need to write -k1,1 to sort by only the first field; -k1 means sort according to the whole line. Stable sort (sort -s) can be useful. For example, to sort first by field 2, then secondarily by field 1, you can use sort -k1,1 | sort -s -k2,2. 
  If you ever need to write a tab literal in a command line in Bash (e.g. for the -t argument to sort), press ctrl-v [Tab] or write $'\t' (the latter is better as you can copy/paste it). 
  The standard tools for patching source code are diff and patch. See also diffstat for summary statistics of a diff and sdiff for a side-by-side diff. Note diff -r works for entire directories. Use diff -r tree1 tree2 | diffstat for a summary of changes. Use vimdiff to compare and edit files. 
  For binary files, use hd, hexdump or xxd for simple hex dumps and bvi, hexedit or biew for binary editing. 
  Also for binary files, strings (plus grep, etc.) lets you find bits of text. 
  For binary diffs (delta compression), use xdelta3. 
  To convert text encodings, try iconv. Or uconv for more advanced use; it supports some advanced Unicode things. For example: 
  
     # Displays hex codes or actual names of characters (useful for debugging):
    uconv -f utf-8 -t utf-8 -x '::Any-Hex;' < input.txt
    uconv -f utf-8 -t utf-8 -x '::Any-Name;' < input.txt
    # Lowercase and removes all accents (by expanding and dropping them):
    uconv -f utf-8 -t utf-8 -x '::Any-Lower; :: Any-NFD; [:Nonspacing Mark:]>; ::Any-NFC;' < input.txt > output.txt
 
  
  To split files into pieces, see split (to split by size) and csplit (to split by a pattern). 
  Date and time: To get the current date and time in the helpful ISO 8601 format, use date -u +"%Y-%m-%dT%H:%M:%SZ" (other options are problematic). To manipulate date and time expressions, use dateadd, datediff, strptime etc. from deteutils. 
  Use zless, zmore, zcat, and zgrep to operate on compressed files. 
  File attributes are settable via chattr and offer a lower-level alternative to file permissions. For example, to protect against accidental file deletion the immutable flag: sudo chattr +i /critical/directory/or/file 
  Use getfacl and setfacl to save and restore file permissions. For example: 
  
     getfacl -R /some/path > permission.txt
    setfacl --restore=permissions.txt
 
  
  To create empty files quickly, use truncate (creates sparse file), fallocate (ext4, xfs, btrfs and ocfs2 filesystems), xfs_mkfile (almost any filesystems, comes in xfsprogs package), mkfile (for Unix-like systems like Solaris, Mac OS). 
  
 System debugging 
  
  For web debugging, curl and curl -I are handy, or their wget equivalents, or the more modern httpie. 
  To know current cpu/disk status, the classic tools are top (or the better htop), iostat, and iotop. Use iostat -mxz 15 for basic CPU and detailed per-partition disk stats and performance insight. 
  For network connection details, use netstat and ss. 
  For a quick overview of what's happening on a system, dstat is especially useful. For broadest overview with details, use glances. 
  To know memory status, run and understand the output of free and vmstat. In particular, be aware the "cached" value is memory held by the Linux kernel as file cache, so effectively counts toward the "free" value. 
  Java system debugging is a different kettle of fish, but a simple trick on Oracle's and some other JVMs is that you can run kill -3  and a full stack trace and heap summary(including generational garbage collection details, which can be highly informative) will be dumped to stderr/logs. The JDK's jps, jstat, jstack, jmap are useful. SJK tools are more advanced. 
  Use mtr as a better traceroute, to identify network issues. 
  For looking at why a disk is full, ncdu saves time over the usual commands like du -sh *. 
  To find which socket or process is using bandwidth, try iftop or nethogs. 
  The ab tool (comes with Apache) is helpful for quick-and-dirty checking of web server performance. For more complex load testing, try siege. 
  For more serious network debugging, wireshark, tshark, or ngrep. 
  Know about strace and ltrace. These can be helpful if a program is failing, hanging, or crashing, and you don't know why, or if you want to get a general idea of perfomance. Note the profiling option (-c), and the ability to attach to a running process (-p). Use trace child option (-f) to avoid missing important calls. 
  Know about ldd to check shared libraries etc - but never run it on untrusted files. 
  Know how to connect to a running process with gdb and get its stack traces. 
  Use /proc. It's amazingly helpful sometimes when debugging live problems. Examples: /proc/cpuinfo, /proc/meminfo, /proc/cmdline, /proc/xxx/cwd, /proc/xxx/exe, /proc/xxx/fd/, /proc/xxx/smaps (where xxx is the process id or pid).

2019-05-28每周学习笔记

Processing files and data

System debugging

你可能感兴趣的:(2019-05-28每周学习笔记)