nutch2.1+mysql报错及解决

阅读更多
错误信息:
java.io.IOException: java.sql.BatchUpdateException: Incorrect string value: '\xD6\xD0\xB9\xFA\xB9\xA4...' for column 'content' at row 1
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.sql.BatchUpdateException: Incorrect string value: '\xD6\xD0\xB9\xFA\xB9\xA4...' for column 'content' at row 1
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1666)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1082)
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)


解决方法:
在nutch2.1
中配置

  encodingdetector.charset.min.confidence
  1
  A integer between 0-100 indicating minimum confidence value
  for charset auto-detection. Any negative value disables auto-detection.
 


并确保mysql数据库编码为UTF-8
 
 
 
         

你可能感兴趣的:(nutch,搜索,网络爬虫,mysql,gora)