php 一行行读取utf16le文件转化为utf-8

utf16是两个字节存储1个字

utf8有可能是一个字节或者两个或者三个字节

utf16 开头有Byte Order Mark

private static function decodeHttpResponseUtf16LeXML( $fileName ) {
$fp = fopen( $fileName, 'r' );
if($fp === false){
throw new Exception('无法打开文件!');
}
/*** loop over the file pointer ***/
$num = 0;$newArr= array();
while ( !feof ( $fp) )
{
$buffer = stream_get_line( $fp, 10000, chr(0xa).chr(0) );//16进制空格后面还要跟一个0
if( $num>0 ){
$buffer = chr(0xFF).chr(0xFE).$buffer;
}
$string = self::utf16_to_utf8($buffer);
if(!empty($string)){
$arr = explode(',', $string);
$pingyu = trim($arr[10]);
$length = count( $arr );
if(  !empty($pingyu)  && $length > 15 ){
$pingyuRealy = array_slice( $arr,10, $length-14 );
$pingyuRealy = join(',',$pingyuRealy);
array_splice( $arr, 10,$length-14,array('10'=>$pingyuRealy));
}
if($num>0)
self::insertSql( $arr );
unset($arr);
$num++;
}
}
echo $num.'row',self::convert(memory_get_usage(true)),'
';
return $num;
}


/**
* 将16位utf16转化为utf8
* @param unknown $str
* @return unknown|string
*/
public static function utf16_to_utf8($str) {
$c0 = ord($str[0]);
$c1 = ord($str[1]);
if ($c0 == 0xFE && $c1 == 0xFF) {
$be = true;
} else if ($c0 == 0xFF && $c1 == 0xFE) {
$be = false;
} else {
return $str;
}
$str = substr($str, 2);
$len = strlen($str);
$dec = '';
for ($i = 0; $i < $len; $i += 2) {
$c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) :ord($str[$i + 1]) << 8 | ord($str[$i]);
if ($c >= 0x0001 && $c <= 0x007F) {
$dec .= chr($c);
} else if ($c > 0x07FF) {
$dec .= chr(0xE0 | (($c >> 12) & 0x0F));
$dec .= chr(0x80 | (($c >>  6) & 0x3F));
$dec .= chr(0x80 | (($c >>  0) & 0x3F));
} else {
$dec .= chr(0xC0 | (($c >>  6) & 0x1F));
$dec .= chr(0x80 | (($c >>  0) & 0x3F));
}
}
return $dec;
}




你可能感兴趣的:(php 一行行读取utf16le文件转化为utf-8)