Biostar学习笔记(2)

  1. If you want to practice at your own pace, you can download related materials from the following website: Biostar data site.
  2. I just skimmed through the whole book. It's a good book for newbies. You will get to know about what is bioinformatics and learn some basic codes. If you want to learn more, the authors have also given some useful resources and links that you can refer to. This book will guide you learn bioinformatics in a systematic way. The best way to learn is to learn by practicing.
  3. What is Bioinformatics?
    Make sense of biological data by using computational methods. Most bioinformatics mainly includds the following four categories:
    • Assembly: 基因组装,建立新的基因组
    • Resequencing:重测序,与已知基因组进行序列比对,鉴别突变和变异情况
    • Classification:确定一个生物群的种群构成
    • Quantification:用DNA测序的方法来测量细胞内的功能学特征。
  4. "pwd": show current filepath. If you want to use the returned value, You can use the following. DATA_PATH=${PWD}
  5. "|": Pipe sign. very useful when you are trying to acheive simple goals in several steps that can be connected by a pipe.
  6. Keep file folders well-organized, easy to memorize and use.
  7. parallel: use multiple process to finish similartasks. eg:
mkdir -p ~/tmp/fastq && cd ~/tmp/fastq
touch GSE89245.txt
for i in $(seq -w 86 95); do echo "SRP0921""$i" >>GSE89245.txt;done 
# seq -w, return the value in the format of the latter number (compare "seq -w 1 10" and "seq 1 10")
cat sraid.txt | parallel fastq-dump -o sra --split-files {}
  1. view and combine files
# for regular files
cat file1 file2 file... >> bigfile
# for gziped files
zcat file1.gz file2.gz filen.gz >> bigfile.gz
  1. The $PATH environment variable
echo $PATH
export $PATH=/file/path/of/real/programes:$PATH >> ~/.bashrc
source ~/.bashrc
  1. "grep" command, usually used with "cat" or "zcat" and "|" and "cut -f" command to extract certain column and pass the values to downstream analysis
man grep
cat SGD_features.tab | cut -f 2,3,4 | grep ORF | grep -v Dubious | wc -l # sample lines
  1. "sed": replace strings with new values. Very useful when renaming multiple files with similar patterns.
man sed
  1. "awk" command. This command is a little complicated, try to use online resources to learn more.
man awk

Since I used most of my time skiming through this book, I will write more about grep/sed/awk command in the future. Hope you find this useful to you.

你可能感兴趣的:(Biostar学习笔记(2))