题目:随机生成 Salary {name, baseSalary, bonus }的记录,如“wxxx,10,1”,每行一条记录,总共1000万记录,写入文本文件(UFT-8编码),
然后读取文件,name的前两个字符相同的,其年薪累加,比如wx,100万,3个人,最后做排序和分组,输出年薪总额最高的10组:
wx, 200万,10人
lt, 180万,8人
....
name 4位a-z随机, baseSalary [0,100]随机 bonus[0-5]随机 年薪总额 = baseSalary*13 + bonus
请努力将程序优化到5秒内执行完
实体对象:
@Getter@Setter@ToString
public class Salary {
private String name; //员工姓名
private Integer baseSalary; //基础工资
private Integer bonus; //奖金
}
要求生成随机数据格式:{"baseSalary":26,"bonus":4,"name":"lvox"}
/**
* 生成1000w条随机数据
*/
public class SalaryTest {
public static void main(String[] args) throws IOException {
File file = new File("D:/upload/test.txt");
FileWriter out = new FileWriter(file, true);
Integer count = 10000000;
while (true){
StringBuilder name = new StringBuilder(4);
String chars = "abcdefghijklmnopqrstuvwxyz";
for (int i = 0 ; i < 4; i++){
name.append(chars.charAt((int) (Math.random() * 26))) ;
}
Salary salary = new Salary();
salary.setName(name.toString());
salary.setBaseSalary((int) (Math.random() * 100 + 1));
salary.setBonus((int) (Math.random() * 5 + 1));
count --;
out.write(JSON.toJSONString(salary));
out.write("\n");
if (count ==0){
out.close();
return;
}
}
}
}
生成结果下:
统计TopN:
public class Test {
public static void main(String[] args) throws IOException {
long time = new Date().getTime();
/**
* 其实也可以用随机流;将文件分块;多起几个线程执行;
这样的做的话得将文件分块;意思就是通过scan的api;先全文循环后标记;分成几份.
1000w的数据不大.这样反而效率更低
*/
BufferedReader reader = new BufferedReader(new FileReader("D:/upload/test.txt"));
String line = null;
HashMap map = new HashMap();
while((line = reader.readLine()) != null){
Salary salary = JSON.parseObject(String.valueOf(line), Salary.class);
String key = salary.getName().substring(0, 2);
Salary result = map.get(key);
if (result != null) {
result.setBaseSalary(result.getBaseSalary() + salary.getBaseSalary() * 13 + salary.getBonus());
result.setBonus(result.getBonus() +1);
}else {
result = new Salary();
result.setName(key);
result.setBonus(1);
result.setBaseSalary(salary.getBaseSalary() * 13 + salary.getBonus());
map.put(key,result);
}
}
ArrayList values = new ArrayList();
Collection co = map.values();
values.addAll(co);
/**
java8之后提供流排序;效率更高
**/
List list = map.values().stream().sorted(new Comparator() {
@Override
public int compare(Salary o1, Salary o2) {
return o2.getBaseSalary() - o1.getBaseSalary();
}
}).collect(Collectors.toList());
/* Collections.sort( values, new Comparator() {
public int compare(Salary o1, Salary o2) {
return o2.getBaseSalary() - o1.getBaseSalary();
}
});
*/
System.out.println((new Date().getTime() - time));
for (int i =0 ; i < 10 ; i++){
System.out.println(list.get(i));
}
}
}
输出结果:
运行时间:2627毫秒
Salary(name=nb, baseSalary=10050715, bonus=15156)
Salary(name=nd, baseSalary=10012526, bonus=15093)
Salary(name=kl, baseSalary=10009653, bonus=15064)
Salary(name=qj, baseSalary=10005223, bonus=15077)
Salary(name=qo, baseSalary=9998526, bonus=15052)
Salary(name=ug, baseSalary=9992740, bonus=14993)
Salary(name=ky, baseSalary=9991997, bonus=15013)
Salary(name=dv, baseSalary=9976088, bonus=15101)
Salary(name=zk, baseSalary=9974697, bonus=15071)
Salary(name=qb, baseSalary=9961493, bonus=14974)
写在最后:
关于文件分块的;其实就是预先对一个数据做预处理;下会就可以起多个线程;如果文件只执行一次;就看业务的复杂的程度和数据量来定;不一定线程越多效率越高.. 不要一味的追求多线程