最近闲来无事,突然对文件上传感兴趣起来,不由的想着自己不借助第三方包写个java文件上传的工具类,经过对http协议的一番研究,总算有点小成果。
这是页面表单代码片段:
<form action="gb/test.do" method="post" enctype="multipart/form-data"> <h1>Hello World!</h1> name:<input type="text" name="name" value="" /><br> address:<input type="text" name="address" value="" /><br> age:<input type="text" name="age" value="" /><br> photo1:<input type="file" name="f1"><br> photo2:<input type="file" name="f2"><br> 验证码:<input type="text" name="image"><img src="RandomImageServlet"> <input type="submit"> </form>
在页面填写完数据后,提交表单,通过httpwatch我们可以看到如下数据:
POST /testWeb/gb/test.do HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */* Referer: http://localhost:8084/testWeb/ Accept-Language: zh-cn Content-Type: multipart/form-data; boundary=---------------------------7db1e4183026c Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.3; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729) Host: localhost:8084 Content-Length: 7647320 Connection: Keep-Alive Cache-Control: no-cache Cookie: JSESSIONID=FEF31749D9D8CBE22C2A49C7A6799A15 -----------------------------7db1e4183026c Content-Disposition: form-data; name="name" 灏忓己 -----------------------------7db1e4183026c Content-Disposition: form-data; name="address" 杈藉畞澶ц繛 -----------------------------7db1e4183026c Content-Disposition: form-data; name="age" 23 -----------------------------7db1e4183026c Content-Disposition: form-data; name="f1"; filename="C:\Documents and Settings\mark\妗岄潰\JavaEE_CN.chm" Content-Type: application/octet-stream ITSF ` 鳋 3 ?|獅?? 犐"骒?|獅?? 犐"骒` x T? 汤 ? 籵 ITSP T ; : < j?].!?濝 犐"骒T PMGL> / /#IDXHDR樳? /#ITBITS /#STRINGS橍臥嚽/#SYSTEM ??/#TOPICS樳?冔/#URLSTR樻猶帤_/#URLTBL樸?備X /#WINDOWS栰?丩/$FIftiMain桏?佽孶 /$OBJINST桏??/$WWAssociativeLinks/ /$WWAssociativeLinks/Property桏?/$WWKeywordLinks/ /$WWKeywordLinks/BTree栰?壚L/$WWKeywordLinks/Data桍扥亼l/$WWKeywordL掜o 簔螓6礩?晢彌?瘓戠7?@T怶御<删2w搋?50*xN樾sa捏?洙{)藽缃镎菻Z粺拺??I荷o柦雛?檀}mxJ暾???瑩巧坮汎煘 絴? `... -----------------------------7db1e4183026c Content-Disposition: form-data; name="f2"; filename="C:\Documents and Settings\mark\妗岄潰\pietty0327.exe" Content-Type: application/octet-stream MZ @ ? ???L?This program cannot be run in DOS mode. $ ?垵頽嫖頽嫖頽嫖齠徫靚嫖隻單靚嫖隻槲鬾嫖隻刮nn嫖mf刮飊嫖齠晃靚嫖mf晃黱嫖M坞n嫖頽缥猳嫖隻呂猲嫖e肝飊嫖頽嫖靚嫖隻嘉飊嫖Rich頽嫖 PE L S醐B ? ? P? ? ? @ ? I/ 芎 ? ? ? ?庿xn?{+0c7$(~m兩姅躆?鵈Tn琏籡E (e... -----------------------------7db1e4183026c Content-Disposition: form-data; name="image" 7hti -----------------------------7db1e4183026c--
其中我们仔细观看上面的数据
Content-Type: multipart/form-data; boundary=---------------------------7db1e4183026c
这里有个值boundary,这个单词意为分界之意,不难理解,它的值“
---------------------------7db1e4183026c
”是个分界符,用于把表单数据分为不同的段,但是如果仔细观察,你会发现正文里的分界符的长度比这个值多了2个字节长度,也就是多 了两个“-”,而且正文中的分界符每个分界符结尾都有一个回车换行符(http协议指明了行应当由回车/换行对结束 ),占两个字节,这里还要注意的是最后一个边界符——终结符,比前面的边界符多了两个“-”,但是,这里要注意的是
终结符!=边界符+“--”;
而是
终结符=边界符-回车换行符+“--”+回车换行符!
这里估计很多人刚开始都是会处理错误的。
此外我们还发现上传文件的数据段是这样的:
Content-Disposition: form-data; name="f1"; filename="C:\Documents and Settings\mark\妗岄潰\JavaEE_CN.chm" Content-Type: application/octet-stream
而只是普通的文本框输入的数据的话,它并没有filename=“...”,而且也没有下面的Content-Type这行数据:
Content-Disposition: form-data; name="address"
而我们的任务就是要处理这个数据流,把参数名,跟参数值从数据流里提取出来,下面是处理数据流的java代码:
import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import java.io.UnsupportedEncodingException; import java.util.HashMap; import javax.servlet.ServletInputStream; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; /** * * @author mark */ public class UploadFile { private static Log log = LogFactory.getLog(UploadFile.class); /** * 上传文件组件,调用该方法的servlet在使用该方法前必须先调用request.setCharacterEncoding()方法,设置编码格式。该编码格式须与页面编码格式一致。 * @param sis 数据流 * @param encoding 编码方式。必须与jsp页面编码方式一样,否则会有乱码。 * @param length 数据流长度 * @param upLoadPath 文件保存路径 * @throws FileNotFoundException * @throws IOException */ public static HashMap uploadFile(ServletInputStream sis, String encoding, int length, String upLoadPath) throws IOException { HashMap paramMap = new HashMap(); boolean isFirst = true; String boundary = null;//分界符 byte[] tmpBytes = new byte[4096];//tmpBytes用于存储每行读取到的字节。 int[] readBytesLength = new int[1];//数组readBytesLength中的元素i[0],用于保存readLine()方法中读取的实际字节数。 int readStreamlength = 0;//readStreamlength用于记录已经读取的流的长度。 String tmpString = null; tmpString = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; while (readStreamlength < length) { if (isFirst) { boundary = tmpString; isFirst = false; } if (tmpString.equals(boundary)) { String contentDisposition = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; String contentType = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; //当时上传文件时content-Type不会是null if (contentType != null && contentType.trim().length() != 0) { String paramName = getPramName(contentDisposition); String fileName = getFileName(getFilePath(contentDisposition)); paramMap.put(paramName, fileName); //跳过空格行 readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; /* * 文件名不为空,则上传了文件。 */ if (fileName != null && fileName.trim().length() != 0) { fileName = upLoadPath + fileName; //开始读取数据 byte[] cash = new byte[4096]; int flag = 0; FileOutputStream fos = new FileOutputStream(fileName); tmpString = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; /* *分界符跟结束符虽然看上去只是结束符比分界符多了“--”,其实不是, *分界符是“-----------------------------45931489520280”后面有2个看不见的回车换行符,即0D 0A *而结束符是“-----------------------------45931489520280--”后面再跟2个看不见的回车换行符,即0D 0A * */ while (tmpString.indexOf(boundary.substring(0, boundary.length() - 2)) == -1) { for (int j = 0; j < readBytesLength[0]; j++) { cash[j] = tmpBytes[j]; } flag = readBytesLength[0]; tmpString = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; if (tmpString.indexOf(boundary.substring(0, boundary.length() - 2)) == -1) { fos.write(cash, 0, flag); fos.flush(); } else { fos.write(cash, 0, flag - 2); fos.flush(); } } fos.close(); } else { //跳过空格行 readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; //读取分界符或者结束符 tmpString = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; } } //当不是长传文件时 else { String paramName = getPramName(contentDisposition); String value = readLine(tmpBytes, readBytesLength, sis, encoding); //去掉回车换行符(最后两个字节) byte[] valueByte=value.getBytes(encoding); value =new String(valueByte, 0, valueByte.length-2, encoding); readStreamlength = readStreamlength + readBytesLength[0]; paramMap.put(paramName, value); tmpString = readLine(tmpBytes, readBytesLength, sis, encoding); readStreamlength = readStreamlength + readBytesLength[0]; } } } sis.close(); return paramMap; } /** * 从流中读取一行数据。 * @param bytes 字节数组,用于保存从流中读取到的字节。 * @param index 一个整型数组,只有一个元素,即index[0],用于保存从流中实际读取的字节数。 * @param sis 数据流 * @param encoding 组建字符串时所用的编码 * @return 将读取到的字节经特定编码方式组成的字符串。 */ private static String readLine(byte[] bytes, int[] index, ServletInputStream sis, String encoding) { try { index[0] = sis.readLine(bytes, 0, bytes.length);//readLine()方法把读取的内容保存到bytes数组的第0到第bytes.length处,返回值是实际读取的 字节数。 if (index[0] < 0) { return null; } } catch (IOException e) { log.error("read line ioexception"); return null; } if (encoding == null) { return new String(bytes, 0, index[0]); } else { try { return new String(bytes, 0, index[0], encoding); } catch (UnsupportedEncodingException ex) { log.error("Unsupported Encoding"); return null; } } } private static String getPramName(String contentDisposition) { String s = contentDisposition.substring(contentDisposition.indexOf("name=\"") + 6); s = s.substring(0, s.indexOf('\"')); return s; } private static String getFilePath(String contentDisposition) { String s = contentDisposition.substring(contentDisposition.indexOf("filename=\"") + 10); s = s.substring(0, s.indexOf('\"')); return s; } private static String getFileName(String filePath) { String rtn = null; if (filePath != null) { int index = filePath.lastIndexOf("/");//根据name中包不包含/来判断浏览器的类型。 if (index != -1)//包含/,则此时可以判断文件由火狐浏览器上传 { rtn = filePath.substring(index + 1);//获得文件名 } else//不包含/,可以判断文件由ie浏览器上传。 { index = filePath.lastIndexOf("\\"); if (index != -1) { rtn = filePath.substring(index + 1);//获得文件名 } else { rtn = filePath; } } } return rtn; } }