正则心得

}
</script>
<script language=javascript>
ati('#', 'http://www.cnblogs.com/../UpLoadFile/Product/20101010162153846.jpg', '加厚青色围脖');
要匹配换行符
使用如下
<script language=javascript>\s\sati
这里使用\s两个应该是匹配\r\n的源因
也可以使用\s*?来获得更通用的效果
<tr><td><a href='(?P<link>/Product/Detail_\d*.html)'[\s\S]*?><img src='(?P<img>[^']*)' width='130' height='130'
过份依赖[\s\S]*会造成回溯引用,使程序死住,上面是我改进过的程序,之前程序就一直挂着,原先那个都用[\s\S]*?的我没有保存,建议使用[^']*这样的进行替代
使用
<div class="goodsItem">[\s\s]*?<a href="(?P<link>[^"]*?)" target="_blank"><img src="(?P<img>[^"]*?)"
而不是
<div class="goodsItem">[\s\s]*?<a href="(?P<link>[\s\S]*?)" target="_blank"><img src="(?P<img>[\s\S]*?)"
re. finditer ( pattern, string [, flags ] )

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

 

 

 

 

7.2.6.9. Raw String Notation

Raw string notation (r"text") keeps regular expressions sane. Without it, every backslash ('\') in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical:

>>> re.match(r"\W(.)\1\W", " ff ")
<_sre.SRE_Match object at ...>
>>> re.match("\\W(.)\\1\\W", " ff ")
<_sre.SRE_Match object at ...>

When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means r"\\". Without raw string notation, one must use "\\\\", making the following lines of code functionally identical:

>>> re.match(r"\\", r"\\")
<_sre.SRE_Match object at ...>
>>> re.match("\\\\", r"\\")
<_sre.SRE_Match object at ...>


20101015更新
对于诸如

       
       <div class="listPic"><a href="/?mod=goods&amp;do=display&amp;id=2032&amp;sid=f11ee838a106889a37abf4e9227a03fe" target="_blank"><img src='/upload/photobase/2010-09/100924112121_s.jpg' border="0" title="新款 银色小雏菊三叶草满钻白色珍珠开口戒指" /></a>
的匹配,我们可以使用如下的回溯引用来达到前后一致匹配的效果,这里还要注意,以括号命名的就是名组,只不过类似link,img是named group,另一种(‘|”)未显式的标识出来,但都占用数字位从1开始,因此,    1          2       这个不占                      3           4        
<div class="listPic"><a[\s\S]*?href=("|')(?P<link>[^"]*?)\1[\s\S]*?<img[\s\S]*?src=("|')(?P<img>[^"]*?)\3

 

你可能感兴趣的:(正则)