Weka的全名是怀卡托智能分析环境(Waikato Environment for Knowledge Analysis),是一款免费的,非商业化(与之对应的是SPSS公司商业数据挖掘产品--Clementine )的,基于JAVA环境下开源的机器学习(machine learning)以及数据挖掘(data minining)软件。
MySQL是一个关系型数据库管理系统,由瑞典MySQL AB公司开发,目前属于Oracle公司。MySQL是一种关联数据库管理系统,关联数据库将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性。MySQL所使用的SQL语言是用于访问数据库的最常用标准化语言。MySQL软件采用了双授权政策(本词条“授权政策”),它分为社区版和商业版,由于其体积小、速度快、总体拥有成本低,尤其是开放源码这一特点,一般中小型网站的开发都选择MySQL作为网站数据库。由于其社区版的性能卓越,搭配PHP和Apache可组成良好的开发环境。
由于Weka数据格式的特殊性,如果想在Weka中处理数据,必须首先将数据的格式转化成ARFF格式,所以需要经历SQL->ARFF的转化,比较麻烦,但是Weka已经为此做了充分的准备,只需简单配置就可在Weka GUI上直接连接操作Mysql数据库。
Java运行环境
Weka安装
mysql-connector-java-5.1.26-bin.jar
在weka的安装目录下新建lib文件夹,将mysql-connector-java-5.1.26-bin.jar包复制到此lib文件夹下,并且在%JAVA_HOME%\jre\lib\ext"下也复制一份mysql-connector-java-5.1.6-bin.jar。
在weka的安装目录下找到weka.jar,将其解压到当前目录,你会看到多出来一个名为weka的文件夹,进到此文件夹目录下,找到experiment文件夹下的DatabaseUtils.props.mysql,将其改名为DatabaseUtils.props,替换原有的DatabaseUtils.props文件,并将其修改文件里的以下内容:
1 # Database settings for MySQL 3.23.x, 4.x 2 # 3 # General information on database access can be found here: 4 # http://weka.wikispaces.com/Databases 5 # 6 # url: http://www.mysql.com/ 7 # jdbc: http://www.mysql.com/products/connector/j/ 8 # author: Fracpete (fracpete at waikato dot ac dot nz) 9 # version: $Revision: 5836 $ 10 11 # JDBC driver (comma-separated list) 12 #jdbcDriver=org.gjt.mm.mysql.Driver 13 jdbcDriver=com.mysql.jdbc.Driver 14 15 # database URL 16 #jdbcURL=jdbc:mysql://server_name:3306/database_name 17 jdbcURL=jdbc:mysql://localhost:3306/rtest 18 # specific data types 19 # string, getString() = 0; --> nominal 20 # boolean, getBoolean() = 1; --> nominal 21 # double, getDouble() = 2; --> numeric 22 # byte, getByte() = 3; --> numeric 23 # short, getByte()= 4; --> numeric 24 # int, getInteger() = 5; --> numeric 25 # long, getLong() = 6; --> numeric 26 # float, getFloat() = 7; --> numeric 27 # date, getDate() = 8; --> date 28 # text, getString() = 9; --> string 29 # time, getTime() = 10; --> date 30 31 # specific data types 32 string, getString() = 0; --> nominal 33 boolean, getBoolean() = 1; --> nominal 34 double, getDouble() = 2; --> numeric 35 byte, getByte() = 3; --> numeric 36 short, getByte()= 4; --> numeric 37 int, getInteger() = 5; --> numeric 38 long, getLong() = 6; --> numeric 39 float, getFloat() = 7; --> numeric 40 date, getDate() = 8; --> date 41 text, getString() = 9; --> string 42 time, getTime() = 10; --> date 43 TINYINT=3 44 SMALLINT=4 45 #SHORT=4 46 SHORT=5 47 INTEGER=5 48 INT=5 49 INT_UNSIGNED=6 50 BIGINT=6 51 LONG=6 52 REAL=7 53 NUMERIC=2 54 DECIMAL=2 55 FLOAT=2 56 DOUBLE=2 57 CHAR=0 58 TEXT=0 59 VARCHAR=0 60 LONGVARCHAR=9 61 BINARY=0 62 VARBINARY=0 63 LONGVARBINARY=9 64 BIT=1 65 BLOB=9 66 DATE=8 67 TIME=8 68 DATETIME=8 69 TIMESTAMP=8 70 71 # other options 72 CREATE_DOUBLE=DOUBLE 73 CREATE_STRING=TEXT 74 CREATE_INT=INT 75 CREATE_DATE=DATETIME 76 DateFormat=yyyy-MM-dd HH:mm:ss 77 checkUpperCaseNames=false 78 checkLowerCaseNames=false 79 checkForTable=true 80 81 # All the reserved keywords for this database 82 # Based on the keywords listed at the following URL (2009-04-13): 83 # http://dev.mysql.com/doc/mysqld-version-reference/en/mysqld-version-reference-reservedwords-5-0.html 84 Keywords=\ 85 ADD,\ 86 ALL,\ 87 ALTER,\ 88 ANALYZE,\ 89 AND,\ 90 AS,\ 91 ASC,\ 92 ASENSITIVE,\ 93 BEFORE,\ 94 BETWEEN,\ 95 BIGINT,\ 96 BINARY,\ 97 BLOB,\ 98 BOTH,\ 99 BY,\ 100 CALL,\ 101 CASCADE,\ 102 CASE,\ 103 CHANGE,\ 104 CHAR,\ 105 CHARACTER,\ 106 CHECK,\ 107 COLLATE,\ 108 COLUMN,\ 109 COLUMNS,\ 110 CONDITION,\ 111 CONNECTION,\ 112 CONSTRAINT,\ 113 CONTINUE,\ 114 CONVERT,\ 115 CREATE,\ 116 CROSS,\ 117 CURRENT_DATE,\ 118 CURRENT_TIME,\ 119 CURRENT_TIMESTAMP,\ 120 CURRENT_USER,\ 121 CURSOR,\ 122 DATABASE,\ 123 DATABASES,\ 124 DAY_HOUR,\ 125 DAY_MICROSECOND,\ 126 DAY_MINUTE,\ 127 DAY_SECOND,\ 128 DEC,\ 129 DECIMAL,\ 130 DECLARE,\ 131 DEFAULT,\ 132 DELAYED,\ 133 DELETE,\ 134 DESC,\ 135 DESCRIBE,\ 136 DETERMINISTIC,\ 137 DISTINCT,\ 138 DISTINCTROW,\ 139 DIV,\ 140 DOUBLE,\ 141 DROP,\ 142 DUAL,\ 143 EACH,\ 144 ELSE,\ 145 ELSEIF,\ 146 ENCLOSED,\ 147 ESCAPED,\ 148 EXISTS,\ 149 EXIT,\ 150 EXPLAIN,\ 151 FALSE,\ 152 FETCH,\ 153 FIELDS,\ 154 FLOAT,\ 155 FLOAT4,\ 156 FLOAT8,\ 157 FOR,\ 158 FORCE,\ 159 FOREIGN,\ 160 FROM,\ 161 FULLTEXT,\ 162 GOTO,\ 163 GRANT,\ 164 GROUP,\ 165 HAVING,\ 166 HIGH_PRIORITY,\ 167 HOUR_MICROSECOND,\ 168 HOUR_MINUTE,\ 169 HOUR_SECOND,\ 170 IF,\ 171 IGNORE,\ 172 IN,\ 173 INDEX,\ 174 INFILE,\ 175 INNER,\ 176 INOUT,\ 177 INSENSITIVE,\ 178 INSERT,\ 179 INT,\ 180 INT1,\ 181 INT2,\ 182 INT3,\ 183 INT4,\ 184 INT8,\ 185 INTEGER,\ 186 INTERVAL,\ 187 INTO,\ 188 IS,\ 189 ITERATE,\ 190 JOIN,\ 191 KEY,\ 192 KEYS,\ 193 KILL,\ 194 LABEL,\ 195 LEADING,\ 196 LEAVE,\ 197 LEFT,\ 198 LIKE,\ 199 LIMIT,\ 200 LINES,\ 201 LOAD,\ 202 LOCALTIME,\ 203 LOCALTIMESTAMP,\ 204 LOCK,\ 205 LONG,\ 206 LONGBLOB,\ 207 LONGTEXT,\ 208 LOOP,\ 209 LOW_PRIORITY,\ 210 MATCH,\ 211 MEDIUMBLOB,\ 212 MEDIUMINT,\ 213 MEDIUMTEXT,\ 214 MIDDLEINT,\ 215 MINUTE_MICROSECOND,\ 216 MINUTE_SECOND,\ 217 MOD,\ 218 MODIFIES,\ 219 NATURAL,\ 220 NOT,\ 221 NO_WRITE_TO_BINLOG,\ 222 NULL,\ 223 NUMERIC,\ 224 ON,\ 225 OPTIMIZE,\ 226 OPTION,\ 227 OPTIONALLY,\ 228 OR,\ 229 ORDER,\ 230 OUT,\ 231 OUTER,\ 232 OUTFILE,\ 233 PRECISION,\ 234 PRIMARY,\ 235 PRIVILEGES,\ 236 PROCEDURE,\ 237 PURGE,\ 238 READ,\ 239 READS,\ 240 REAL,\ 241 REFERENCES,\ 242 REGEXP,\ 243 RELEASE,\ 244 RENAME,\ 245 REPEAT,\ 246 REPLACE,\ 247 REQUIRE,\ 248 RESTRICT,\ 249 RETURN,\ 250 REVOKE,\ 251 RIGHT,\ 252 RLIKE,\ 253 SCHEMA,\ 254 SCHEMAS,\ 255 SECOND_MICROSECOND,\ 256 SELECT,\ 257 SENSITIVE,\ 258 SEPARATOR,\ 259 SET,\ 260 SHOW,\ 261 SMALLINT,\ 262 SONAME,\ 263 SPATIAL,\ 264 SPECIFIC,\ 265 SQL,\ 266 SQLEXCEPTION,\ 267 SQLSTATE,\ 268 SQLWARNING,\ 269 SQL_BIG_RESULT,\ 270 SQL_CALC_FOUND_ROWS,\ 271 SQL_SMALL_RESULT,\ 272 SSL,\ 273 STARTING,\ 274 STRAIGHT_JOIN,\ 275 TABLE,\ 276 TABLES,\ 277 TERMINATED,\ 278 THEN,\ 279 TINYBLOB,\ 280 TINYINT,\ 281 TINYTEXT,\ 282 TO,\ 283 TRAILING,\ 284 TRIGGER,\ 285 TRUE,\ 286 UNDO,\ 287 UNION,\ 288 UNIQUE,\ 289 UNLOCK,\ 290 UNSIGNED,\ 291 UPDATE,\ 292 UPGRADE,\ 293 USAGE,\ 294 USE,\ 295 USING,\ 296 UTC_DATE,\ 297 UTC_TIME,\ 298 UTC_TIMESTAMP,\ 299 VALUES,\ 300 VARBINARY,\ 301 VARCHAR,\ 302 VARCHARACTER,\ 303 VARYING,\ 304 WHEN,\ 305 WHERE,\ 306 WHILE,\ 307 WITH,\ 308 WRITE,\ 309 XOR,\ 310 YEAR_MONTH,\ 311 ZEROFILL 312 313 # The character to append to attribute names to avoid exceptions due to 314 # clashes between keywords and attribute names 315 KeywordsMaskChar=_ 316 317 #flags for loading and saving instances using DatabaseLoader/Saver 318 nominalToStringLimit=50 319 idColumn=auto_generated_id
然后将weka文件夹打包成weka.jar,替换原来的weka.jar。运行weka,选择open DB,选择user,输入用户名和密码,点击connect,info显示connecting to:jdbc:mysql://localhost:3306/myweka = true,代表连接成功。Explorer就从数据库中载入数据集了。