Everyday Scripting with Ruby 读书笔记(5)

---
用正则表达式刮取网页
Scraping Web Pages with Regular Expressions
---


irb(main):001:0> require 'open-uri'
=> true
① open方法可以打开文件和URL
irb(main):002:0> page = open('http://espn.go.com')
=> #<File:C:/DOCUME~1/GRAY~1.LAN/LOCALS~1/Temp/open-uri.2544.0>
② 同行两条语句只打印最后一个输出
irb(main):003:0> text = page.read; nil
=> nil
③ scan方法返回匹配给定正则表达式的所有子串,有分组的话每个元素本身都是数组
④ 问号可以避免“贪婪”

irb(main):014:0> text.scan(/<li\s+class=\"(lo.*?)\".*><a.+>(.+)<\/a>/)
=> [["lo", "ESPN"], ["lo", "Fantasy"], ["lo", "NFL"], ["lo solid", "MLB"], ["lo solid", "NBA"], ["lo solid", "NHL"], ["lo solid", "ESPNU"], ["lo solid", "College FB"], ["lo solid", "Men's BB"], ["lo solid", "Women's BB"], ["lo solid", "NASCAR"], ["lo solid", "Racing"], ["lo solid", "Golf"], ["lo solid", "Soccer"], ["lo solid", "High School"], ["lo solid", "Tennis"], ["lo solid", "Boxing"], ["lo solid", "More +"]]

散列表
irb(main):002:0> hash = {}
=> {}
irb(main):003:0> hash[:firstname] = "George"
=> "George"
irb(main):004:0> hash[:lastname] = "Hamilton"
=> "Hamilton"
irb(main):005:0> hash[:firstname]
=> "George"
irb(main):006:0> hash2 = {:firstname => "Tom", :lastname => "Johnson"}
=> {:firstname=>"Tom", :lastname=>"Johnson"}
irb(main):007:0> hash
=> {:firstname=>"George", :lastname=>"Hamilton"}
irb(main):008:0>

字符串截取
irb(main):008:0> "abcdefg"[0..2]
=> "abc"
irb(main):009:0> "abcdefg"[0...2]
=> "ab"

你可能感兴趣的:(Web,正则表达式,读书,Ruby,Go)