1.xml文件如下:
<?xml version="1.0" encoding="UTF-8"?>
<dataroot xmlns:od="urn:schemas-microsoft-com:officedata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="book1.xsd" generated="2008-12-28T19:12:24">
<book1>
<bianhao>GAR001</bianhao>
<name>计算机组装与维护教程</name>
<借阅时间>2008-03-14T17:23:28</借阅时间>
<author>刘瑞新</author>
<publish>机械工业出版社</publish>
<count>1</count>
<language>中文</language>
<manager>ctec</manager>
</book1>
<book1>
<bianhao>GAR002</bianhao>
<name>计算机接口技术</name>
<借阅时间>2008-03-14T17:27:16</借阅时间>
<author>刘星等</author>
<publish>机械工业出版社</publish>
<count>1</count>
<language>中文</language>
<manager>ctec</manager>
</book1>
<book1>
<bianhao>GAR003</bianhao>
<name>数值分析与算法</name>
<借阅时间>2008-03-14T17:28:50</借阅时间>
<author>徐士良</author>
<publish>机械工业出版社</publish>
<count>1</count>
<language>中文</language>
<manager>ctec</manager>
</book1>
</dataroot>
ruby.rb文件内容:
require 'rexml/document'
xml =REXML::Document.new(File.open"1.xml"
xml.each_element('//book1') do |newbook|
puts newbook.elements['bianhao'].text
puts newbook.elements['name'].text
puts newbook.elements['author'].text
puts newbook.elements['publish'].text
puts newbook.elements['count'].text
puts newbook.elements['language'].text
puts newbook.elements['manager'].text
end
转自:
http://zmfbird.iteye.com/blog/306174
Ruby解析HTML插件Nokogiri使用实践
2010-05-20 10:27
关于ruby解析XML,HTML的插件介绍得很多,其中最出名的的有Hpricot和Nokogiri, 关于这两个,推荐选用Nokogiri.(http://www.espace.com.eg/blog/2009/03/24/nokogiri-vs-hpricot/)
参考安装和文档:http://nokogiri.org
Nokogiri (Github repository), a new HTML and XML parser for Ruby. It "parses and searches XML/HTML faster than Hpricot" (Hpricot being the current de facto Ruby HTML parser) and boasts XPath support, CSS3 selector support (a big deal, because CSS3 selectors are mega powerful) and the ability to be used as a "drop in" replacement for Hpricot.
总体来说,Nokogiri是较好的一个Ruby版HTML/XML解析插件。
安装:gem install nokogiri
注:有时候会遇到libxml2相关的错误:无法找到libxml2.dll 或:无法定位程序输入点xmlNewDocPI于动态链接库 libxml2.dll上。 这时,在nokogiri安装目录下C:\Ruby18\lib\ruby\gems\1.8\gems\nokogiri-1.4.1-x86-mswin32\,找到ext\nokogiri\目录下的几个dll文件,拷贝到C:\Ruby18\bin目录下。
测试:建立一个ruby文件
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://zhangxh.net/hr/quiz/1/bar_list.html'))
doc.xpath('*').each do |link|
puts link.content
end
运行:得到相应结果!
page.css('td') # should return an array of 4 table cell nodes
page.css('td')[3] # return the 4th 'td' node, counting starts at 0
http://nokogiri.rubyforge.org/nokogiri/Nokogiri/XML/Document.html
转自:
http://hi.baidu.com/kenrome/blog/item/269730120149795df919b824.html
本人自己写的代码:
require 'nokogiri'
require 'open-uri'
def getXML()
#从网站"intsig.net"获取body为xml格式的文件
#<card_list>
#<card t="1" cid = "21212">http://fifjeiaofjoe//</card>
#</card_list>
doc = Nokogiri::HTML(open('http://intsig.net'))
#puts(doc)
doc.search('//html/body/card_list/card').each do |card|
puts(card.text)
t = card.attribute("t")
cid = card.attribute("cid")
puts(t)
puts(cid)
end
end