jishigou / thinksns
每天自动化批量获取weibo食堂----美食菜谱图片
下载固定微博页面文件index.php?mod=yugao008准备调试脚本,并展示文件名[root@localhost ~]# wget http://weibo.cns*****.com/index.php?mod=yugao008 | ls -l index.php\?mod\=yugao008
-rw-r--r-- 1 root root 135540 04-23 11:37 index.php?mod=yugao008
搜索图片文件含o.jpg关键词的行,并展示
[root@localhost ~]# grep 'o.jpg" class="artZoomAll"' index.php\?mod\=yugao008 | more
<li><a href="http://weibo.cns*****.com/images/topic/9/39/75512_o.jpg" class="artZoomAll" rel="http://weibo.cns*****.com/images/topic/9/39/75512_s.jpg" rev="1444907_lLhRjs"><img src="http://wei *.com/images/topic/9/39/75512_s.jpg" /></a></li> <li><a href="http://weibo.cns*****.com/images/topic/3/27/75505_o.jpg" class="artZoomAll" rel="http://weibo.cns*****.com/images/topic/3/27/75505_s.jpg" rev="1444805_LiCXAH"><img src="http://wei *.com/images/topic/3/27/75505_s.jpg" /></a></li> <li><a href="http://weibo.cns*****.com/images/topic/e/19/75506_o.jpg" class="artZoomAll" rel="http://weibo.cns*****.com/images/topic/e/19/75506_s.jpg" rev="1444805_LiCXAH"><img src="http://wei *.com/images/topic/e/19/75506_s.jpg" /></a></li>搜索图片文件含o.jpg关键词的行,以awk截取引号分割第二段url另存为$(hostname)_$(date +%Y%m%d%H%M%S).txt文件,并展示各文件名
[root@locahost ~]# grep 'o.jpg" class="artZoomAll"' index.php\?mod\=yugao008 | awk -F "\"" '{print $2}' > $(hostname)_$(date +%Y%m%d%H%M%S).txt | ls -l $(hostname)*.txt
-rw-r--r-- 1 root root 1286 04-23 12:59 lindows_20130423125938.txt
-rw-r--r-- 1 root root 1286 04-23 14:49 lindows_20130423144952.txt
-rw-r--r-- 1 root root 1286 04-23 14:49 lindows_20130423144957.txt
-rw-r--r-- 1 root root 1286 04-23 14:51 lindows_20130423145111.txt
-rw-r--r-- 1 root root 1286 04-23 14:51 lindows_20130423145149.txt
-rw-r--r-- 1 root root 1286 04-23 14:53 lindows_20130423145307.txt
、、、
搜索图片文件含o.jpg关键词的行,以awk截取引号分割第二段url另存为$(hostname)_$(date +%Y%m%d%H%M%S).txt文件,并展示各文件名及其内容
[root@localhost ~]# ls -l $(hostname)*.txt
[root@localhost ~]# more lindows_20130423125938.txt
http://weibo.cns*****.com/images/topic/9/39/75512_o.jpg
http://weibo.cns*****.com/images/topic/3/27/75505_o.jpg
http://weibo.cns*****.com/images/topic/e/19/75506_o.jpg
http://weibo.cns*****.com/images/topic/d/82/75500_o.jpg
http://weibo.cns*****.com/images/topic/9/45/75501_o.jpg
、、、
搜索所有$(hostname)*.txt文件里含http关键词的行并展示各文件名和内容
[root@localhost ~]# grep 'o.jpg" class="artZoomAll"' index.php\?mod\=yugao008 | awk -F "\"" '{print $2}' > $(hostname)_$(date +%Y%m%d%H%M%S).txt | grep http $(hostname)*.txt | more
lindows_20130423144952.txt:http://weibo.cns*****.com/images/topic/9/45/75501_o.jpg
lindows_20130423144952.txt:http://weibo.cns*****.com/images/topic/0/17/75458_o.jpg
lindows_20130423144952.txt:http://weibo.cns*****.com/images/topic/c/99/75459_o.jpg
lindows_20130423144952.txt:http://weibo.cns*****.com/images/topic/b/28/75454_o.jpg
、、、
搜索所有$(hostname)*.txt文件里含http关键词的行,删除lindows*.txt:的文字后,排序且删除重复项后并展示
[root@localhost ~]# grep 'o.jpg" class="artZoomAll"' index.php\?mod\=yugao008 | awk -F "\"" '{print $2}' > $(hostname)_$(date +%Y%m%d%H%M%S).txt | grep http $(hostname)*.txt | awk -F "txt:" '{print $2}' | sort | uniq | more
http://weibo.cns*****.com/images/topic/0/17/75458_o.jpg
http://weibo.cns*****.com/images/topic/0/20/75450_o.jpg
http://weibo.cns*****.com/images/topic/3/25/75423_o.jpg
http://weibo.cns*****.com/images/topic/3/27/75505_o.jpg
http://weibo.cns*****.com/images/topic/3/82/75455_o.jpg
http://weibo.cns*****.com/images/topic/4/27/75302_o.jpg
http://weibo.cns*****.com/images/topic/5/27/75276_o.jpg
http://weibo.cns*****.com/images/topic/6/0/75351_o.jpg
http://weibo.cns*****.com/images/topic/6/3/75390_o.jpg
、、、
搜索所有$(hostname)*.txt文件里含http关键词的行,删除lindows*.txt:的文字后,排序且删除重复项后并展示,并批量下载所有jpg 到指定目录/home/lindows/
[root@locahost ~]# grep 'o.jpg" class="artZoomAll"' index.php\?mod\=yugao008 | awk -F "\"" '{print $2}' > $(hostname)_$(date +%Y%m%d%H%M%S).txt | grep http $(hostname)*.txt | awk -F "txt:" '{print $2}' | sort | uniq | more | xargs -I {} wget -P /home/lindows/ {}