Java爬取51job保存到MySQL并进行分析

大二下实训课结业作业,想着就爬个工作信息,原本是要用python的,后面想想就用java试试看,

java就自学了一个月左右,想要锻炼一下自己面向对象的思想等等的,

然后网上转了一圈,拉钩什么的是动态生成的网页,51job是静态网页,比较方便,就决定爬51job了。

 参考https://blog.csdn.net/qq_42982169/article/details/83155040,改了许多地方,方便模块化,加了保存的功能

前提:

创建Maven Project方便包管理

使用httpclient 3.1以及jsoup1.8.3作为爬取网页和筛选信息的包,这两个版本用的人多。

mysql-connect-java 8.0.13用来将数据导入数据库,支持mysql8.0+

分析使用,tablesaw(可选,会用的就行)

 

“大数据+上海”以此URL为例子,只要是类似的URL都可行

https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=

 

先设计了个大概的功能,修改了好几版,最后觉得这样思路比较清晰,以JobBean容器作为所有功能的媒介

Java爬取51job保存到MySQL并进行分析_第1张图片

 

先完成爬取网页,以及保存到本地

创建JobBean对象

public class JobBean {
    private String jobName;
    private String company;
    private String address;
    private String salary;
    private String date;
    private String jobURL;
    
    public JobBean(String jobName, String company, String address, String salary, String date, String jobURL) {
        this.jobName = jobName;
        this.company = company;
        this.address = address;
        this.salary = salary;
        this.date = date;
        this.jobURL = jobURL;
    }
    
    
    
    @Override
    public String toString() {
        return "jobName=" + jobName + ", company=" + company + ", address=" + address + ", salary=" + salary
                + ", date=" + date + ", jobURL=" + jobURL;
    }



    public String getJobName() {
        return jobName;
    }
    public void setJobName(String jobName) {
        this.jobName = jobName;
    }
    public String getCompany() {
        return company;
    }
    public void setCompany(String company) {
        this.company = company;
    }
    public String getAddress() {
        return address;
    }
    public void setAddress(String address) {
        this.address = address;
    }
    public String getSalary() {
        return salary;
    }
    public void setSalary(String salary) {
        this.salary = salary;
    }
    public String getDate() {
        return date;
    }
    public void setDate(String date) {
        this.date = date;
    }
    public String getJobURL() {
        return jobURL;
    }
    public void setJobURL(String jobURL) {
        this.jobURL = jobURL;
    }
}

然后写一个用于保存容器的工具类,这样在任何阶段都可以保存容器

import java.io.*;
import java.util.*;

/**实现
 * 1。将JobBean容器存入本地
 * 2.从本地文件读入文件为JobBean容器(有筛选)
 * @author PowerZZJ
 *
 */
public class JobBeanUtils {
    
    /**保存JobBean到本地功能实现
     * @param job
     */
    public static void saveJobBean(JobBean job) {
        try(BufferedWriter bw =
                new BufferedWriter(
                        new FileWriter("JobInfo.txt",true))){
            String jobInfo = job.toString();
            bw.write(jobInfo);
            bw.newLine();
            bw.flush();
        }catch(Exception e) {
            System.out.println("保存JobBean失败");
            e.printStackTrace();
        }
    }
    
    /**保存JobBean容器到本地功能实现
     * @param jobBeanList JobBean容器
     */
    public static void saveJobBeanList(List jobBeanList) {
        System.out.println("正在备份容器到本地");
        for(JobBean jobBean : jobBeanList) {
            saveJobBean(jobBean);
        }
        System.out.println("备份完成,一共"+jobBeanList.size()+"条信息");
    }
    
    /**从本地文件读入文件为JobBean容器(有筛选)
     * @return jobBean容器
     */
    public static List loadJobBeanList(){
        List jobBeanList = new ArrayList<>();
        try(BufferedReader br = 
                new BufferedReader(
                        new FileReader("JobInfo.txt"))){
            String str = null;
            while((str=br.readLine())!=null) {
                //筛选,有些公司名字带有","不规范,直接跳过
                try {
                    String[] datas = str.split(","); 
                    String jobName = datas[0].substring(8);
                    String company = datas[1].substring(9);
                    String address = datas[2].substring(9);
                    String salary = datas[3].substring(8);
                    String date = datas[4].substring(6);
                    String jobURL = datas[5].substring(8);
                    //筛选,全部都不为空,工资是个区间,URL以https开头,才建立JobBean
                    if (jobName.equals("") || company.equals("") || address.equals("") || salary.equals("")
                            || !(salary.contains("-"))|| date.equals("") || !(jobURL.startsWith("http")))
                        continue;
                    JobBean jobBean = new JobBean(jobName, company, address, salary, date, jobURL);
                    //放入容器
                    jobBeanList.add(jobBean);
                }catch(Exception e) {
                    System.out.println("本地读取筛选:有问题需要跳过的数据行:"+str);
                    continue;
                }
            }
            System.out.println("读取完成,一共读取"+jobBeanList.size()+"条信息");
            return jobBeanList;
        }catch(Exception e) {
            System.out.println("读取JobBean失败");
            e.printStackTrace();
        }
        return jobBeanList;
    }
}

接着就是关键的爬取了

Java爬取51job保存到MySQL并进行分析_第2张图片

标签是el 里面是需要的信息,以及第一个el出来的是总体信息,一会需要去除。

各自里面都有t1,t2,t3,t4,t5标签,按照顺序一个个取出来就好。

再查看"下一页"元素,在bk标签下,这里要注意,有两个bk,第一个bk是上一页,第二个bk才是下一页,

之前我爬取进入死循环了。。。。

最后一个spider功能把爬取信息以及迭代下一页全部都放在一起

import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**爬取网页信息
 * @author PowerZZJ
 *
 */
public class Spider {
    //记录爬到第几页
    private static int pageCount = 1;
    
    private String strURL;
    private String nextPageURL;
    private Document document;//网页全部信息
    private List jobBeanList;
    
    public Spider(String strURL) {
        this.strURL = strURL;
        nextPageURL = strURL;//下一页URL初始化为当前,方便遍历
        jobBeanList = new ArrayList();
        
    }
    
    /**获取网页全部信息
     * @param 网址
     * @return 网页全部信息
     */
    public Document getDom(String strURL) {
        try {
            URL url = new URL(strURL);
            //解析,并设置超时
            document = Jsoup.parse(url, 4000);
            return document;
        }catch(Exception e) {
            System.out.println("getDom失败");
            e.printStackTrace();
        }
        return null;
    }
    

    /**筛选当前网页信息,转成JobBean对象,存入容器
     * @param document 网页全部信息
     */
    public void getPageInfo(Document document) {
        //通过CSS选择器用#resultList .el获取el标签信息
        Elements elements = document.select("#resultList .el");
        //总体信息删去
        elements.remove(0);
        //筛选信息
        for(Element element: elements) {
            Elements elementsSpan = element.select("span");
            String jobURL = elementsSpan.select("a").attr("href");
            String jobName = elementsSpan.get(0).select("a").attr("title");
            String company = elementsSpan.get(1).select("a").attr("title");
            String address = elementsSpan.get(2).text();
            String salary = elementsSpan.get(3).text();
            String date = elementsSpan.get(4).text();
            //建立JobBean对象
            JobBean jobBean = new JobBean(jobName, company, address, salary, date, jobURL);
            //放入容器
            jobBeanList.add(jobBean);
        }
    }
    
    /**获取下一页的URL
     * @param document 网页全部信息
     * @return 有,则返回URL
     */
    public String getNextPageURL(Document document) {
        try {
            Elements elements = document.select(".bk");
            //第二个bk才是下一页
            Element element = elements.get(1);
            nextPageURL = element.select("a").attr("href");
            if(nextPageURL != null) {
                System.out.println("---------"+(pageCount++)+"--------");
                return nextPageURL;
            }
        }catch(Exception e) {
            System.out.println("获取下一页URL失败");
            e.printStackTrace();
        }
        return null;
    }
    
    
    /**开始爬取
     * 
     */
    public void spider() {
        while(!nextPageURL.equals("")) {
            //获取全部信息
            document = getDom(nextPageURL);
            //把相关信息加入容器
            getPageInfo(document);
            //查找下一页的URL
            nextPageURL = getNextPageURL(document);
        }
    }
    
    //获取JobBean容器
    public List getJobBeanList() {
        return jobBeanList;
    }
}

 然后测试一下爬取与保存功能

import java.util.ArrayList;
import java.util.List;

public class Test1 {
    public static void main(String[] args) {
        List jobBeanList = new ArrayList<>();
        //大数据+上海
        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";        

        //测试Spider以及保存
        Spider spider = new Spider(strURL);
        spider.spider();
        //获取爬取后的JobBean容器
        jobBeanList = spider.getJobBeanList();
        
        //调用JobBean工具类保存JobBeanList到本地
        JobBeanUtils.saveJobBeanList(jobBeanList);
    
        //调用JobBean工具类从本地筛选并读取,得到JobBeanList
        jobBeanList = JobBeanUtils.loadJobBeanList();
        
    }
}

然后本地就有了JobInfo.txt

Java爬取51job保存到MySQL并进行分析_第3张图片

然后就是把JobBean容器放到MySQL中了,我的数据库名字是51job,表名字是jobInfo,所有属性都是字符串,emmm就字符串吧

Java爬取51job保存到MySQL并进行分析_第4张图片

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

public class ConnectMySQL {
    //数据库信息
    private static final String DBaddress = "jdbc:mysql://localhost/51job?serverTimezone=UTC";
    private static final String userName = "root";
    private static final String password = "Woshishabi2813";
    
    private Connection conn;
    
    //加载驱动,连接数据库
    public ConnectMySQL() {
        LoadDriver();
        //连接数据库
        try {
            conn = DriverManager.getConnection(DBaddress, userName, password);
        } catch (SQLException e) {
            System.out.println("数据库连接失败");
        }
    }
    
    //加载驱动
    private void LoadDriver() {
        try {
            Class.forName("com.mysql.cj.jdbc.Driver");
            System.out.println("加载驱动成功");
        } catch (Exception e) {
            System.out.println("驱动加载失败");
        }
    }
    
    //获取连接
    public Connection getConn() {
        return conn;
    }
}

接着就是数据相关操作的工具类的编写了。

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.util.ArrayList;
import java.util.List;


public class DBUtils {
    
    /**将JobBean容器存入数据库(有筛选)
     * @param conn 数据库的连接
     * @param jobBeanList jobBean容器
     */
    public static void insert(Connection conn, List jobBeanList) {
        System.out.println("正在插入数据");
        PreparedStatement ps;
        for(JobBean j: jobBeanList) {
            //命令生成
            String command = String.format("insert into jobInfo values('%s','%s','%s','%s','%s','%s')",
                    j.getJobName(),
                    j.getCompany(),
                    j.getAddress(),
                    j.getSalary(),
                    j.getDate(),
                    j.getJobURL());
            
            try {
                ps = conn.prepareStatement(command);
                ps.executeUpdate();
            } catch (Exception e) {
                System.out.println("存入数据库筛选有误信息:"+j.getJobName());
            }
        }
        System.out.println("插入数据完成");

    }
    
    /**将JobBean容器,取出
     * @param conn 数据库的连接
     * @return jobBean容器
     */
    public static List select(Connection conn){
        PreparedStatement ps;
        ResultSet rs;
        List jobBeanList  = new ArrayList();

        String command = "select * from jobInfo";
        try {
            ps = conn.prepareStatement(command);
            rs = ps.executeQuery();
            int col = rs.getMetaData().getColumnCount();
            while(rs.next()) {
                JobBean jobBean = new JobBean(rs.getString(1), 
                            rs.getString(2), 
                            rs.getString(3), 
                            rs.getString(4),
                            rs.getString(5),
                            rs.getString(6));

                jobBeanList.add(jobBean);
            }
            return jobBeanList;
        } catch (Exception e) {
            System.out.println("数据库查询失败");
        }
        return null;
    }
}

 

然后测试一下

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;

public class Test2 {
    public static void main(String[] args) {
        List jobBeanList = new ArrayList<>();
        jobBeanList = JobBeanUtils.loadJobBeanList();

        //数据库测试
        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        
        //插入测试
        DBUtils.insert(conn, jobBeanList);
        //select测试
        jobBeanList = DBUtils.select(conn);
        for(JobBean j: jobBeanList) {
            System.out.println(j);
        }
    }
}

 

Java爬取51job保存到MySQL并进行分析_第5张图片

上面的图可以看到虽然是“大数据+上海”,但是依旧有运维工程师上面不相关的,后面会进行过滤处理。这里就先存入数据库中

先来个功能的整体测试,删除JobInfo.txt,重建数据库

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;


public class TestMain {
    public static void main(String[] args) {
        List jobBeanList = new ArrayList<>();
        //大数据+上海
        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,1.html?lang=c&stype=1&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";        
//        //Java+上海
//        String strURL = "https://search.51job.com/list/020000,000000,0000,00,9,99,java,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=";
        
        //所有功能测试
        //爬取的对象
        Spider jobSpider = new Spider(strURL);
        jobSpider.spider();
        //爬取完的JobBeanList
        jobBeanList = jobSpider.getJobBeanList();
        
        //调用JobBean工具类保存JobBeanList到本地
        JobBeanUtils.saveJobBeanList(jobBeanList);
    
        //调用JobBean工具类从本地筛选并读取,得到JobBeanList
        jobBeanList = JobBeanUtils.loadJobBeanList();
    
        //连接数据库,并获取连接
        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        
        //调用数据库工具类将JobBean容器存入数据库
        DBUtils.insert(conn, jobBeanList);
        
//        //调用数据库工具类查询数据库信息,并返回一个JobBeanList
//        jobBeanList = DBUtils.select(conn);
//        
//        for(JobBean j: jobBeanList) {
//            System.out.println(j);
//        }
    }
}

这些功能都是能独立使用的,不是一定要这样一路写下来。

接下来就是进行数据库的读取,进行简单的过滤,然后进行分析了

先上思维导图

Java爬取51job保存到MySQL并进行分析_第6张图片

首先是过滤关键字和日期

 

import java.util.ArrayList;
import java.util.Calendar;
import java.util.List;public class BaseFilter {
    private List jobBeanList;
    //foreach遍历不可以remove,Iterator有锁
    //用新的保存要删除的,然后removeAll
    private List removeList;
    
    public BaseFilter(List jobBeanList) {
        this.jobBeanList = new ArrayList();
        removeList =  new ArrayList();
        //引用同一个对象,getJobBeanList有没有都一样
        this.jobBeanList = jobBeanList;
        printNum();
    }
    
    //打印JobBean容器中的数量
    public void printNum() {
        System.out.println("现在一共"+jobBeanList.size()+"条数据");
    }
    

    /**筛选职位名字
     * @param containJobName 关键字保留
     */
    public void filterJobName(String containJobName) {
        for(JobBean j: jobBeanList) {
            if(!j.getJobName().contains(containJobName)) {
                removeList.add(j);
            }
        }
        jobBeanList.removeAll(removeList);
        removeList.clear();
        printNum();
    }
    
    /**筛选日期,要当天发布的
     * @param
     */
    public void filterDate() {
        Calendar now=Calendar.getInstance();
        int nowMonth = now.get(Calendar.MONTH)+1;
        int nowDay = now.get(Calendar.DATE);
        
        for(JobBean j: jobBeanList) {
            String[] date = j.getDate().split("-");
            int jobMonth = Integer.valueOf(date[0]);
            int jobDay = Integer.valueOf(date[1]);
            if(!(jobMonth==nowMonth && jobDay==nowDay)) {
                removeList.add(j);
            }
        }
        jobBeanList.removeAll(removeList);
        removeList.clear();
        printNum();
    }
    
    public List getJobBeanList(){
        return jobBeanList;
    }
    
}

测试一下过滤的效果

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;


public class Test3 {
    public static void main(String[] args) {
        List jobBeanList = new ArrayList<>();
        //数据库读取jobBean容器
        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        jobBeanList = DBUtils.select(conn);
        
        BaseFilter bf = new BaseFilter(jobBeanList);
        //过滤时间
        bf.filterDate();
        //过滤关键字
        bf.filterJobName("数据");
        bf.filterJobName("分析");
        
        for(JobBean j: jobBeanList) {
            System.out.println(j);
        }
    }
}

Java爬取51job保存到MySQL并进行分析_第7张图片

到这里基本是统一的功能,后面的分析就要按照不同职业,或者不同需求而定了,不过基本差不多,

这里分析的就是“大数据+上海”下的相关信息了,为了数据量大一点,关键字带有"数据"就行,有247条信息

用到了tablesaw的包,这个我看有人推荐,结果中间遇到问题都基本百度不到,只有官方文档,反复看了,而且这个还不能单独画出图,

还要别的依赖包,所以我就做个表格吧。。。可视化什么的已经不想研究了(我为什么不用python啊。。。)

分析也就没有什么面向对象需要写的了,基本就是一个main里面一路写下去了。具体用法可以看官方文档,就当看个结果了解一下

工资统一为万/月

import static tech.tablesaw.aggregate.AggregateFunctions.*;

import java.sql.Connection;
import java.util.ArrayList;
import java.util.List;

import tech.tablesaw.api.*;

public class Analayze {
    public static void main(String[] args) {
        List jobBeanList = new ArrayList<>();

        ConnectMySQL cm = new ConnectMySQL();
        Connection conn = cm.getConn();
        jobBeanList = DBUtils.select(conn);
        
        BaseFilter bf = new BaseFilter(jobBeanList);
        bf.filterDate();
        bf.filterJobName("数据");
        int nums = jobBeanList.size();
        
        //分析
        //按照工资排序
        String[] jobNames = new String[nums];
        String[] companys = new String[nums];
        String[] addresss = new String[nums];
        double[] salarys = new double[nums];
        String[] jobURLs = new String[nums];
        for(int i=0; i) {
            JobBean j = jobBeanList.get(i);
            String jobName = j.getJobName();
            String company = j.getCompany();
            //地址提出区名字
            String address;
            if(j.getAddress().contains("-")) {
                address = j.getAddress().split("-")[1];
            }else{
                address = j.getAddress();
            }
            
            //工资统一单位
            String sSalary = j.getSalary();
            double dSalary;
            if(sSalary.contains("万/月")) {
                dSalary = Double.valueOf(sSalary.split("-")[0]);
            }else if(sSalary.contains("千/月")) {
                dSalary = Double.valueOf(sSalary.split("-")[0])/10;
                dSalary = (double) Math.round(dSalary * 100) / 100;
            }else if(sSalary.contains("万/年")) {
                dSalary = Double.valueOf(sSalary.split("-")[0])/12;
                dSalary = (double) Math.round(dSalary * 100) / 100;
            }else {
                dSalary = 0;
                System.out.println("工资转换失败");
                continue;
            }
            String jobURL = j.getJobURL();
            
            jobNames[i] = jobName;
            companys[i] = company;
            addresss[i] = address;
            salarys[i] = dSalary;
            jobURLs[i] = jobURL;
        }
        
        Table jobInfo = Table.create("Job Info")
                .addColumns(
                    StringColumn.create("jobName", jobNames),
                    StringColumn.create("company", companys),
                    StringColumn.create("address", addresss),
                    DoubleColumn.create("salary", salarys),
                    StringColumn.create("jobURL", jobURLs)
                        );
        
//        System.out.println("全上海信息");
//        System.out.println(salaryInfo(jobInfo));
        
        
        List addressJobInfo = new ArrayList<>();
        //按照地区划分
        Table ShanghaiJobInfo = chooseByAddress(jobInfo, "上海");
        Table jingAnJobInfo = chooseByAddress(jobInfo, "静安区");
        Table puDongJobInfo = chooseByAddress(jobInfo, "浦东新区");
        Table changNingJobInfo = chooseByAddress(jobInfo, "长宁区");
        Table minHangJobInfo = chooseByAddress(jobInfo, "闵行区");
        Table xuHuiJobInfo = chooseByAddress(jobInfo, "徐汇区");
        //人数太少
//        Table songJiangJobInfo = chooseByAddress(jobInfo, "松江区");
//        Table yangPuJobInfo = chooseByAddress(jobInfo, "杨浦区");
//        Table hongKouJobInfo = chooseByAddress(jobInfo, "虹口区");
//        Table OtherInfo = chooseByAddress(jobInfo, "异地招聘");
//        Table puTuoJobInfo = chooseByAddress(jobInfo, "普陀区");        
        addressJobInfo.add(jobInfo);
        //上海地区招聘        addressJobInfo.add(ShanghaiJobInfo);
        addressJobInfo.add(jingAnJobInfo);
        addressJobInfo.add(puDongJobInfo);
        addressJobInfo.add(changNingJobInfo);
        addressJobInfo.add(minHangJobInfo);
        addressJobInfo.add(xuHuiJobInfo);
//        addressJobInfo.add(songJiangJobInfo);
//        addressJobInfo.add(yangPuJobInfo);
//        addressJobInfo.add(hongKouJobInfo);
//        addressJobInfo.add(puTuoJobInfo);
//        addressJobInfo.add(OtherInfo);for(Table t: addressJobInfo) {
            System.out.println(salaryInfo(t));
        }
        
        for(Table t: addressJobInfo) {
            System.out.println(sortBySalary(t).first(10));
        }
        
    }
    
    //工资平均值,最小,最大publicstatic Table salaryInfo(Table t) {        
        return t.summarize("salary",mean,stdDev,median,max,min).apply();
    }
    
    //salary进行降序publicstatic Table sortBySalary(Table t) {
        return t.sortDescendingOn("salary");
    }
    
    //选择地区publicstatic Table chooseByAddress(Table t, String address) {
        Table t2 = Table.create(address)
                .addColumns(
                    StringColumn.create("jobName"),
                    StringColumn.create("company"),
                    StringColumn.create("address"),
                    DoubleColumn.create("salary"),
                    StringColumn.create("jobURL"));
        for(Row r: t) {
            if(r.getString(2).equals(address)) {
                t2.addRow(r);
            }
        }
        return t2;
    }
}

前半段是各个地区的信息

Java爬取51job保存到MySQL并进行分析_第8张图片

 

后半段是各个区工资最高的前10名的信息,可以看到这个tablesaw的表要多难看有多难看。。。

jobURL可以直接在浏览器里面看,

Java爬取51job保存到MySQL并进行分析_第9张图片

 

换个URL进行测试

我要找Java开发工作

将之前TestMain中的strURL换成Java+上海

https://search.51job.com/list/020000,000000,0000,00,9,99,java,2,1.html?lang=c&stype=&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&providesalary=99&lonlat=0%2C0&radius=-1&ord_field=0&confirmdate=9&fromType=&dibiaoid=0&address=&line=&specialarea=00&from=&welfare=

删除JobInfo.txt,重建数据库

运行,爬了270多页,本地JobInfo.txt

Java爬取51job保存到MySQL并进行分析_第10张图片

数据库

Java爬取51job保存到MySQL并进行分析_第11张图片

 

 然后到Analyze中把bf.filterJobName("数据");

改为“Java”,再加一个“开发”,然后运行

Java爬取51job保存到MySQL并进行分析_第12张图片

Java爬取51job保存到MySQL并进行分析_第13张图片

Java爬取51job保存到MySQL并进行分析_第14张图片

信息全部都出来了,分析什么的,先照着表格说一点把。。。

后面想要拓展的内容就是继续爬取jobURL然后把职位要求做统计。这还没做,暑假有兴趣应该会搞一下,

然后可以把数据库设计一下,把工资分为最低和最高两项,存进去就变成double类型,这样以后分析也会轻松一点

 

你可能感兴趣的:(Java爬取51job保存到MySQL并进行分析)