cd ~/local/app/
curl -O https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.8.2/sratoolkit.2.8.2-centos_linux64.tar.gz
tar xzvf sratoolkit.2.8.2-centos_linux64.tar.gz
cd sratoolkit.2.8.2-centos_linux64
ll
cd bin
ll
echo "export PATH=\$PATH:\$HOME/local/app/sratoolkit.2.8.2-centos_linux64/bin" >> ~/.bashrc
source ~/.bashrc
mkdir -p ~/edu/lec4
cd ~/edu/lec4
prefetch -h
Usage:
prefetch [options] [...]
Download SRA or dbGaP files and their dependencies
prefetch [options] [...]
Check SRA file for missed dependencies and download them
prefetch --list [...]
List the content of a kart file
...............................................................
prefetch SRR1553610
ls ~/ncbi
find ~/ncbi
/home/sunchengquan/ncbi
/home/sunchengquan/ncbi/public
/home/sunchengquan/ncbi/public/sra
/home/sunchengquan/ncbi/public/sra/SRR1553610.sra
fastq-dump -h
从NCBI下下来的数据,双端测序数据是放在一个文件里的,所以需要把它们重新拆解为两个文件
cd ncbi/public/sra/
fastq-dump --split-files SRR1553610
FASTQ格式的原始数据文件已经在当前文件夹了
wc -l *.fastq
879348 SRR1553610_1.fastq
879348 SRR1553610_2.fastq
1758696 总用量
head SRR1553610_1.fastq
cat *.fastq | grep @SRR | wc -l
echo SRR1553607 > sra.ids
echo SRR1553605 >> sra.ids
prefetch --option-file sra.ids
fastq-dump --split-files ~/ncbi/public/sra/SRR155360*
or
cat sra.ids | sed 's/SRR/fastq-dump --split-files SRR/'
esearch -db sra -query PRJNA257197 | efetch -format runinfo
esearch -db sra -query PRJNA257197 | efetch -format runinfo > runinfo.txt
cat runinfo.txt|wc -l
927
cat runinfo.txt|grep SRR|wc -l
891
cat runinfo.txt | cut -f 1 -d ","
du -hs ~/ncbi
例如查询当前用户目录下的所占用的空间
du -hs *
4.0K blast.sh
2.8G Cosmic_grep
2.5G data
16M edu
5.6M features.gff
11M GAS_match
67M illumina-data.fq
23M illumina-data.fq.gz
95M input
4.8M lec7.tar.gz
1.4G local
4.0K match.gff
42M mydata.tar.gz
790M ncbi
................................................
对于du命令,-h表示 查看以M 为单位显示文件大小结果,-s 统计此目录中所有文件大小总和
首先进入https://www.ncbi.nlm.nih.gov/sra/
输入你要找的这个编号:PRJNA257197
点击search
会看到很多检索结果
点击右上角的send to
选定File,并把Format改为RunInfo
点击Create File就生成了一个SraRunInfo.csv文件了
有没有发现,你其实只是把这种网页版的操作变成了几乎一一对应的命令行操作而已