引用
invalid byte sequence in utf-8 mysql
In Ruby 1.9.3 is possible to use String.encode to "ignore" the invalid UTF-8 sequences. Here is a snippet that will work both in 1.8 (iconv) and 1.9 (String#encode) :
require 'iconv' unless String.method_defined?(:encode) if String.method_defined?(:encode) file_contents.encode!('UTF-8', 'UTF-8', :invalid => :replace) else ic = Iconv.new('UTF-8', 'UTF-8//IGNORE') file_contents = ic.iconv(file_contents) end
or if you have really troublesome input you can do a double conversion from UTF-8 to UTF-16 and back to UTF-8:
require 'iconv' unless String.method_defined?(:encode) if String.method_defined?(:encode) file_contents.encode!('UTF-16', 'UTF-8', :invalid => :replace, :replace => '') file_contents.encode!('UTF-8', 'UTF-16') else ic = Iconv.new('UTF-8', 'UTF-8//IGNORE') file_contents = ic.iconv(file_contents) end