摘要: 通过例子说明 Instances, Instance, Attribute 三个类. 本贴可与 日撸 Java 三百行(51-60天,kNN 与 NB) 配合使用.
先测试了再说.
先到这里下载测试数据集 weather-original.arff.
https://gitee.com/fansmale/javasampledata
package thinking;
import java.io.FileReader;
import weka.core.Instances;
import weka.core.Instance;
import weka.core.Attribute;
/**
* Test the data management of Weka.
*
* @author Fan Min [email protected]
*
*/
public class WekaDataTest {
/**
*****************
* The only testing method.
*
* @param args
*****************
*/
public static void main(String args[]) {
Instances tempData = null;
try {
FileReader fileReader = new FileReader("D:/workplace/javasampledata/weather-original.arff");
tempData = new Instances(fileReader);
fileReader.close();
} catch (Exception ee) {
System.out.println("Cannot read the file: \r\n" + ee);
System.exit(0);
} // Of try
// Step 1. Show the data.
System.out.println("\r\n********* Part 1 *********");
System.out.println("The data table is:\r\n" + tempData);
// Step 2. Show one instance.
System.out.println("\r\n********* Part 2 *********");
System.out.println("The 3rd instance is: \r\n" + tempData.instance(2));
// Step 3. Show one attribute.
System.out.println("\r\n********* Part 3 *********");
System.out.println("The 2nd attribute is: \r\n" + tempData.attribute(1));
System.out.println("Its number of values is: \r\n" + tempData.attribute(1).numValues());
System.out.println("The 3nd attribute is: \r\n" + tempData.attribute(2));
System.out.println("Its number of values is: \r\n" + tempData.attribute(2).numValues());
// Step 4. Take out one value from the data table.
System.out.println("\r\n********* Part 4 *********");
System.out.println("The 1st attribute value of the 1st instance is: " + tempData.instance(0).value(0));
System.out.println("The 3rd attribute value of the 1st instance is: " + tempData.instance(0).value(2));
System.out.println("The 5th attribute value of the 1st instance is: " + tempData.instance(0).value(4));
System.out.println("The 5th attribute value of the 1st instance is: " + tempData.instance(0).value(4));
// Step 5. Set the class attribute and show.
System.out.println("\r\n********* Part 5 *********");
tempData.setClassIndex(0);
System.out.println("If we use the 1st attribute as the class, it is: \r\n" + tempData.classAttribute());
tempData.setClassIndex(4);
System.out.println("If we use the 5th attribute as the class, it is: \r\n" + tempData.classAttribute());
System.out.println("The class value of the 1st instance is: " + tempData.instance(0).classValue());
}// Of main
}// Of class WekaDataTest
********* Part 1 *********
The data table is:
@relation weather.original
@attribute outlook {sunny,overcast,rainy}
@attribute temperature {hot,mild,cool}
@attribute humidity numeric
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}
@data
sunny,hot,95,FALSE,no
sunny,hot,92,TRUE,no
overcast,hot,91,FALSE,yes
rainy,mild,88,FALSE,yes
rainy,cool,78,FALSE,yes
rainy,cool,75,TRUE,no
overcast,cool,72,TRUE,yes
sunny,mild,89,FALSE,no
sunny,cool,77,FALSE,yes
rainy,mild,71,FALSE,yes
sunny,mild,78,TRUE,yes
overcast,mild,88,TRUE,yes
overcast,hot,70,FALSE,yes
rainy,mild,72,TRUE,no
********* Part 2 *********
The 3rd instance is:
overcast,hot,91,FALSE,yes
********* Part 3 *********
The 2nd attribute is:
@attribute temperature {hot,mild,cool}
Its number of values is:
3
The 3nd attribute is:
@attribute humidity numeric
Its number of values is:
0
********* Part 4 *********
The 1st attribute value of the 1st instance is: 0.0
The 3rd attribute value of the 1st instance is: 95.0
The 5th attribute value of the 1st instance is: 1.0
The 5th attribute value of the 1st instance is: 1.0
********* Part 5 *********
If we use the 1st attribute as the class, it is:
@attribute outlook {sunny,overcast,rainy}
If we use the 5th attribute as the class, it is:
@attribute play {yes,no}
The class value of the 1st instance is: 1.0
该类管理某一个属性, 如
@attribute temperature {hot,mild,cool}
这是一个名词型属性, 名字是 temperature, 取值为 3 种可能.
@attribute humidity numeric
这是一个实数型属性, 名字是 humidity , 取值为 0 种可能 (因为 Java 无法获得无穷种可能, 所以毛了, 干脆跟你说是 0种, 哈哈).
见 Part 3 输出.
该类管理一行数据, 见 Part 2 输出.
该类管理一个表格, 见 Part 1 输出.
如果要获得第 i 行第 j 列的数据, 必须写成
tempData.instance(i).value(j);
的形式. 它获得的是数据的内部表示, 如 sunny, overcast, rainy 的内部表示依次是 0.0, 1.0, 2.0. 如果要把它们换成整数, 需要进行强制类型转换.
(int)tempData.instance(i).value(j);
注意: 从表格中获得数据, 不能先取列再取行. 因为后者首先取出一个属性, 而不是一个完整的列, 见 Part 3 输出.
如果做预测任务, 需要设置决策属性, 见 Part 5.