Weka链接Mysql数据库

Weka简介

  Weka的全名是怀卡托智能分析环境(Waikato Environment for Knowledge Analysis),是一款免费的,非商业化(与之对应的是SPSS公司商业数据挖掘产品--Clementine )的,基于JAVA环境下开源的机器学习(machine learning)以及数据挖掘(data minining)软件。

Weka数据格式

WEKA存储数据的格式是ARFF(Attribute-Relation File Format)文件,这是一种ASCII文本文件。二维表格存储在如下的ARFF文件中。这也就是WEKA自带的“weather.arff” 文件,在WEKA安装目录的“data”子目录下可以找到。
代码:
% ARFF file for the weather data with some numric features
%
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
%
% 14 instances
%
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

Mysql简介

  MySQL是一个关系型数据库管理系统,由瑞典MySQL AB公司开发,目前属于Oracle公司。MySQL是一种关联数据库管理系统,关联数据库将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性。MySQL所使用的SQL语言是用于访问数据库的最常用标准化语言。MySQL软件采用了双授权政策(本词条“授权政策”),它分为社区版和商业版,由于其体积小、速度快、总体拥有成本低,尤其是开放源码这一特点,一般中小型网站的开发都选择MySQL作为网站数据库。由于其社区版的性能卓越,搭配PHPApache可组成良好的开发环境。

Weka直接连接Mysql

由于Weka数据格式的特殊性,如果想在Weka中处理数据,必须首先将数据的格式转化成ARFF格式,所以需要经历SQL->ARFF的转化,比较麻烦,但是Weka已经为此做了充分的准备,只需简单配置就可在Weka GUI上直接连接操作Mysql数据库。

准备工作:

Java运行环境

Weka安装

mysql-connector-java-5.1.26-bin.jar

详细配置步骤:

  在weka的安装目录下新建lib文件夹,将mysql-connector-java-5.1.26-bin.jar包复制到此lib文件夹下,并且在%JAVA_HOME%\jre\lib\ext"下也复制一份mysql-connector-java-5.1.6-bin.jar。

  在weka的安装目录下找到weka.jar,将其解压到当前目录,你会看到多出来一个名为weka的文件夹,进到此文件夹目录下,找到experiment文件夹下的DatabaseUtils.props.mysql,将其改名为DatabaseUtils.props,替换原有的DatabaseUtils.props文件,并将其修改文件里的以下内容:

  1 # Database settings for MySQL 3.23.x, 4.x

  2 #

  3 # General information on database access can be found here:

  4 # http://weka.wikispaces.com/Databases

  5 #

  6 # url:     http://www.mysql.com/

  7 # jdbc:    http://www.mysql.com/products/connector/j/

  8 # author:  Fracpete (fracpete at waikato dot ac dot nz)

  9 # version: $Revision: 5836 $

 10 

 11 # JDBC driver (comma-separated list)

 12 #jdbcDriver=org.gjt.mm.mysql.Driver

 13 jdbcDriver=com.mysql.jdbc.Driver

 14 

 15 # database URL

 16 #jdbcURL=jdbc:mysql://server_name:3306/database_name

 17 jdbcURL=jdbc:mysql://localhost:3306/rtest

 18 # specific data types

 19 # string, getString() = 0;    --> nominal

 20 # boolean, getBoolean() = 1;  --> nominal

 21 # double, getDouble() = 2;    --> numeric

 22 # byte, getByte() = 3;        --> numeric

 23 # short, getByte()= 4;        --> numeric

 24 # int, getInteger() = 5;      --> numeric

 25 # long, getLong() = 6;        --> numeric

 26 # float, getFloat() = 7;      --> numeric

 27 # date, getDate() = 8;        --> date

 28 # text, getString() = 9;      --> string

 29 # time, getTime() = 10;       --> date

 30 

 31 # specific data types

 32  string, getString() = 0;    --> nominal

 33  boolean, getBoolean() = 1;  --> nominal

 34  double, getDouble() = 2;    --> numeric

 35  byte, getByte() = 3;        --> numeric

 36  short, getByte()= 4;        --> numeric

 37  int, getInteger() = 5;      --> numeric

 38  long, getLong() = 6;        --> numeric

 39  float, getFloat() = 7;      --> numeric

 40  date, getDate() = 8;        --> date

 41  text, getString() = 9;      --> string

 42  time, getTime() = 10;       --> date

 43 TINYINT=3

 44 SMALLINT=4

 45 #SHORT=4

 46 SHORT=5

 47 INTEGER=5

 48 INT=5

 49 INT_UNSIGNED=6

 50 BIGINT=6

 51 LONG=6

 52 REAL=7

 53 NUMERIC=2

 54 DECIMAL=2

 55 FLOAT=2

 56 DOUBLE=2

 57 CHAR=0

 58 TEXT=0

 59 VARCHAR=0

 60 LONGVARCHAR=9

 61 BINARY=0

 62 VARBINARY=0

 63 LONGVARBINARY=9

 64 BIT=1

 65 BLOB=9

 66 DATE=8

 67 TIME=8

 68 DATETIME=8

 69 TIMESTAMP=8

 70 

 71 # other options

 72 CREATE_DOUBLE=DOUBLE

 73 CREATE_STRING=TEXT

 74 CREATE_INT=INT

 75 CREATE_DATE=DATETIME

 76 DateFormat=yyyy-MM-dd HH:mm:ss

 77 checkUpperCaseNames=false

 78 checkLowerCaseNames=false

 79 checkForTable=true

 80 

 81 # All the reserved keywords for this database

 82 # Based on the keywords listed at the following URL (2009-04-13):

 83 # http://dev.mysql.com/doc/mysqld-version-reference/en/mysqld-version-reference-reservedwords-5-0.html

 84 Keywords=\

 85   ADD,\

 86   ALL,\

 87   ALTER,\

 88   ANALYZE,\

 89   AND,\

 90   AS,\

 91   ASC,\

 92   ASENSITIVE,\

 93   BEFORE,\

 94   BETWEEN,\

 95   BIGINT,\

 96   BINARY,\

 97   BLOB,\

 98   BOTH,\

 99   BY,\

100   CALL,\

101   CASCADE,\

102   CASE,\

103   CHANGE,\

104   CHAR,\

105   CHARACTER,\

106   CHECK,\

107   COLLATE,\

108   COLUMN,\

109   COLUMNS,\

110   CONDITION,\

111   CONNECTION,\

112   CONSTRAINT,\

113   CONTINUE,\

114   CONVERT,\

115   CREATE,\

116   CROSS,\

117   CURRENT_DATE,\

118   CURRENT_TIME,\

119   CURRENT_TIMESTAMP,\

120   CURRENT_USER,\

121   CURSOR,\

122   DATABASE,\

123   DATABASES,\

124   DAY_HOUR,\

125   DAY_MICROSECOND,\

126   DAY_MINUTE,\

127   DAY_SECOND,\

128   DEC,\

129   DECIMAL,\

130   DECLARE,\

131   DEFAULT,\

132   DELAYED,\

133   DELETE,\

134   DESC,\

135   DESCRIBE,\

136   DETERMINISTIC,\

137   DISTINCT,\

138   DISTINCTROW,\

139   DIV,\

140   DOUBLE,\

141   DROP,\

142   DUAL,\

143   EACH,\

144   ELSE,\

145   ELSEIF,\

146   ENCLOSED,\

147   ESCAPED,\

148   EXISTS,\

149   EXIT,\

150   EXPLAIN,\

151   FALSE,\

152   FETCH,\

153   FIELDS,\

154   FLOAT,\

155   FLOAT4,\

156   FLOAT8,\

157   FOR,\

158   FORCE,\

159   FOREIGN,\

160   FROM,\

161   FULLTEXT,\

162   GOTO,\

163   GRANT,\

164   GROUP,\

165   HAVING,\

166   HIGH_PRIORITY,\

167   HOUR_MICROSECOND,\

168   HOUR_MINUTE,\

169   HOUR_SECOND,\

170   IF,\

171   IGNORE,\

172   IN,\

173   INDEX,\

174   INFILE,\

175   INNER,\

176   INOUT,\

177   INSENSITIVE,\

178   INSERT,\

179   INT,\

180   INT1,\

181   INT2,\

182   INT3,\

183   INT4,\

184   INT8,\

185   INTEGER,\

186   INTERVAL,\

187   INTO,\

188   IS,\

189   ITERATE,\

190   JOIN,\

191   KEY,\

192   KEYS,\

193   KILL,\

194   LABEL,\

195   LEADING,\

196   LEAVE,\

197   LEFT,\

198   LIKE,\

199   LIMIT,\

200   LINES,\

201   LOAD,\

202   LOCALTIME,\

203   LOCALTIMESTAMP,\

204   LOCK,\

205   LONG,\

206   LONGBLOB,\

207   LONGTEXT,\

208   LOOP,\

209   LOW_PRIORITY,\

210   MATCH,\

211   MEDIUMBLOB,\

212   MEDIUMINT,\

213   MEDIUMTEXT,\

214   MIDDLEINT,\

215   MINUTE_MICROSECOND,\

216   MINUTE_SECOND,\

217   MOD,\

218   MODIFIES,\

219   NATURAL,\

220   NOT,\

221   NO_WRITE_TO_BINLOG,\

222   NULL,\

223   NUMERIC,\

224   ON,\

225   OPTIMIZE,\

226   OPTION,\

227   OPTIONALLY,\

228   OR,\

229   ORDER,\

230   OUT,\

231   OUTER,\

232   OUTFILE,\

233   PRECISION,\

234   PRIMARY,\

235   PRIVILEGES,\

236   PROCEDURE,\

237   PURGE,\

238   READ,\

239   READS,\

240   REAL,\

241   REFERENCES,\

242   REGEXP,\

243   RELEASE,\

244   RENAME,\

245   REPEAT,\

246   REPLACE,\

247   REQUIRE,\

248   RESTRICT,\

249   RETURN,\

250   REVOKE,\

251   RIGHT,\

252   RLIKE,\

253   SCHEMA,\

254   SCHEMAS,\

255   SECOND_MICROSECOND,\

256   SELECT,\

257   SENSITIVE,\

258   SEPARATOR,\

259   SET,\

260   SHOW,\

261   SMALLINT,\

262   SONAME,\

263   SPATIAL,\

264   SPECIFIC,\

265   SQL,\

266   SQLEXCEPTION,\

267   SQLSTATE,\

268   SQLWARNING,\

269   SQL_BIG_RESULT,\

270   SQL_CALC_FOUND_ROWS,\

271   SQL_SMALL_RESULT,\

272   SSL,\

273   STARTING,\

274   STRAIGHT_JOIN,\

275   TABLE,\

276   TABLES,\

277   TERMINATED,\

278   THEN,\

279   TINYBLOB,\

280   TINYINT,\

281   TINYTEXT,\

282   TO,\

283   TRAILING,\

284   TRIGGER,\

285   TRUE,\

286   UNDO,\

287   UNION,\

288   UNIQUE,\

289   UNLOCK,\

290   UNSIGNED,\

291   UPDATE,\

292   UPGRADE,\

293   USAGE,\

294   USE,\

295   USING,\

296   UTC_DATE,\

297   UTC_TIME,\

298   UTC_TIMESTAMP,\

299   VALUES,\

300   VARBINARY,\

301   VARCHAR,\

302   VARCHARACTER,\

303   VARYING,\

304   WHEN,\

305   WHERE,\

306   WHILE,\

307   WITH,\

308   WRITE,\

309   XOR,\

310   YEAR_MONTH,\

311   ZEROFILL

312 

313 # The character to append to attribute names to avoid exceptions due to

314 # clashes between keywords and attribute names

315 KeywordsMaskChar=_

316 

317 #flags for loading and saving instances using DatabaseLoader/Saver

318 nominalToStringLimit=50

319 idColumn=auto_generated_id
View Code

  然后将weka文件夹打包成weka.jar,替换原来的weka.jar。运行weka,选择open DB,选择user,输入用户名和密码,点击connect,info显示connecting to:jdbc:mysql://localhost:3306/myweka = true,代表连接成功。Explorer就从数据库中载入数据集了。

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(mysql)