PHP判断字符串编码了类型

搜集的两个function

第一个在我的测试中发现不够好,大部分还是准的

  
    
// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8_bak( $word ) {
if ( preg_match ( " /^([ " . chr ( 228 ) . " - " . chr ( 233 ) . " ]{1}[ " . chr ( 128 ) . " - " . chr ( 191 ) . " ]{1}[ " . chr ( 128 ) . " - " . chr ( 191 ) . " ]{1}){1}/ " , $word ) == true || preg_match ( " /([ " . chr ( 228 ) . " - " . chr ( 233 ) . " ]{1}[ " . chr ( 128 ) . " - " . chr ( 191 ) . " ]{1}[ " . chr ( 128 ) . " - " . chr ( 191 ) . " ]{1}){1}$/ " , $word ) == true || preg_match ( " /([ " . chr ( 228 ) . " - " . chr ( 233 ) . " ]{1}[ " . chr ( 128 ) . " - " . chr ( 191 ) . " ]{1}[ " . chr ( 128 ) . " - " . chr ( 191 ) . " ]{1}){2,}/ " , $word ) == true )
{
return true ;
}
else {
return false ;
}
}

第二个测试中比第一个好一些

  
    
// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8( $string ) {

// From http://w3.org/International/questions/qa-forms-utf-8.html
return preg_match ( ' %^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$%xs
' , $string );

}
// function is_utf8

你可能感兴趣的:(PHP)