网址:http://www.medsci.cn/sci/
(1)按ISSN号搜索
(2)得到结果,以及需要提取的部分
<?xml version="1.0" encoding="UTF-8"?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40"> <Worksheet ss:Name="sci"> <Table> <Row> <Cell><Data ss:Type="String">ISSN</Data></Cell> <Cell><Data ss:Type="String">年文章</Data></Cell> <Cell><Data ss:Type="String">投稿难易</Data></Cell> <Cell><Data ss:Type="String">一审周期</Data></Cell> </Row> </Table> </Worksheet> </Workbook>
utf-8编码:
preg_match("/^[\x{4e00}-\x{9fa5}A-Za-z0-9_]+$/u",$str)gbk编码:
匹配特定汉字(比如候鸟):preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str)
utf-8编码:
<?php header("Content-Type:text/html; charset=utf-8"); $gb = "可你跟随那南归的候鸟飞的那么远"; $utf8 = iconv('GB2312', 'UTF-8', $gb); preg_match("/\x{5019}\x{9E1F}/u",$utf8, $match1); echo "<pre>"; print_r($match1); echo "</pre>"; ?>
gbk编码:
<?php //header("Content-Type:text/html; charset=utf-8"); $gb = "可你跟随那南归的候鸟飞的那么远"; preg_match("/候鸟/",$gb, $match2); echo "<pre>"; print_r($match2); echo "</pre>"; ?>
(2)运行myspider.php脚本
(3)得到结果表格