Introduction
With the rapid development of AI and the advent of the era of big data, python language not only plays a brilliant role in the field of AI but also has a unique advantage in the data processing. It plays an increasingly important role in web development, network programming, automatic operation and maintenance, game development, finance, and other fields.
Since listed companies and companies that do not need financing to have strict control over the allocation of funds, the requirements for talent ratio are more stringent, redundant personnel are reduced, and funds are allocated to other core parts. That’s the reason which has formed severe competition among applicants for the Internet industry.
More graduates choose to enter this industry nowadays. We find the same job objective needs among students around us — students who major in communication and new media. To solve the practical problem for students, our original preparation focuses on which position can be more promising in the internet industry. However, after the data mining from Lagou, we try to explore the relationship between basic elements like urban demand, salary level, professional skills, etc. and talent engagement situation in the Internet industry. Also, the future trend of this industry can be shown from company conditions. Through data statistics around data analysts, we can get a clear idea about the most required talent characteristic model of the Internet industry.
Methods
Preliminary preparation
Lagou.com is a recruitment website that provides job opportunities for Internet practitioners. We searched the website for ‘data analysis’ keywords and found that there were 15 job information data on each page, a total of 30 pages. The information we need includes position, company, salary, city, work experience, education, domain, company financing, company scale, and description of the job.
Retrieve data from dynamic web pages
Visiting Lagou.com through the Chrome browser and opening the Console console, we can find that the website is a dynamic web page and the data is loaded using ‘ajax’. Analysis of the ‘DevTools’ shows the data we need: Request Headers and From Data. The post parameter ‘kd’ represents the search position, and ‘pn’ indicates the page number. In order to obtain its data, we analyze the interface and use ‘request’ to directly request this interface to get JSON data.
Deal with anti-crawling
In the data crawling process, we found that the anti-crawling mechanism of Lagou.com did a good job. Despite adding ‘Referer’ and ‘User-Agent’ information, the system will still recognize crawler, and you will not be allowed to continue crawling after a few times, returning the prompt: “You have operated too frequently, please visit again later”. So we obtained ‘cookies’ information using ‘requests.Session’. At the same time, in order to avoid being crawled too fast, the delay time is set 10 seconds. We parse the data through the ‘xpath’ syntax to get job description information. Finally, 380 pieces of data were obtained, which were saved by ‘CSV’ storage.
Data cleaning and processing
Because of the small amount of data, we use Excel for data cleaning and processing. First of all, in order not to affect the analysis of salary, we have eliminated the data on the position of recruiting interns. By keyword searching, 12 pieces of data are eliminated. Secondly, for the processing of industry field information, because some companies have multiple domains, we only select the most important one. Third, for the processing of salary data, because the salary range is a custom range, there is no uniform standard, which is not conducive to subsequent visual processing. Therefore, we choose the median as the measurement standard. Through ‘data-column-fixed symbol-‘-’ ’, the maximum and minimum values of the salary range are filtered, and the median value is taken by the function. The resulting data is shown in the figure:
Visualization and Results
Analysis of relevant company conditions
In terms of industry conditions and company size, mobile Internet accounts for 30% of demand and is a hot industry for data analysis. E-commerce, data services, entertainment, and finance also account for about 10% each. Demands in areas such as medical care, software development, and information security are relatively small. This reflects the huge amount of data in the mobile Internet industry and the demand for talent. Regardless of the company’s financing situation and size, talents for data analysis related positions are generally needed.
Analysis of urban demand
From the analysis above, we can find that demand is mainly concentrated in China’s three major economic circles: Beijing-Tianjin-Hebei, Yangtze River Delta, Pearl River Delta, and central cities. In terms of specific cities, Beijing, Shanghai, Shenzhen, and Guangzhou rank among the top four, followed by Hangzhou and Chengdu. The number of jobs is significantly higher than in other cities, which is in line with the current level of economic development.
Analysis of academic requirements and work experience
From the perspective of education requirements, 87.1% of the companies require a bachelor’s degree or higher, 3.16% of them even require a master’s degree or higher. It can be seen that most companies have high qualifications for candidates. This may have the following reasons: on the one hand, the data analysis industry needs a high level of knowledge. In addition to certain mathematical knowledge, it needs to involve a lot of related industry knowledge; on the other hand, with the continuous development of the mobile Internet, the era of big data is coming, the update and iteration of various things are very fast, so higher requirements are put forward for candidates.
From work experience, the general requirement is 3–5 years, accounting for about 45%, and companies that require lower than one year of experience account for only 1.32%. So for non-graduates, to get a good position in the data analysis industry, internship experience is very important.
Analysis of salary level
From the three bar charts about the salary level, it is obvious that the top three cities of the highest salary in the data analysis industry are Beijing, Shanghai, and Shenzhen. Compared to the overall salary level ranking of the city in China, we can see Shenzhen preeminent above Guangzhou, maybe is because that many internet companies are set in Shenzhen. The average salary of the six cities is about 10k-23k per month, while in the top three is about 20k-23k, with little difference.
In terms of industry, we can see the salary level of the cultural entertainment industry and consumption living industry, which is rather high than others, about 24k-26k per month. While the salary in other industries is 16k-20k per month, it doesn’t show a big gap. The phenomenon is in line with the discipline of talent distribution, it shows the uneven distribution in the market. However, it also provides that there many new opportunities for data analysts in the cultural entertainment industry and consumption living industry.
The salary level and working experience have shown an obvious positive correlation trend with a similar slope. But the growth trend is gradually slow. We can see the Golden growth period is between 3–5 years of working experience in accordance with the normal distribution discipline. The highest salary data analysts can get is around 28k per month. And the demand for fresh graduates is large, they can get around 8k per month.
Analysis of professional skills
As we can see, most of the company’s requirements for data analyst candidates are their application ability of professional software, no matter what industry and which position. The most required computer skills are SQL, Python, SPSS, PPT and Excel, a combination of programming software and office software. Since data analyst is such a practical job that its core demand bases on data processing and analysis capabilities. So for graduates, they should master data processing tools very well rather than have superficial knowledge in order to have a good understanding of business and operation.
Conclusions
The employment situation of the data analysis is optimistic. In China, the demand for related jobs is still concentrated in the three major economic circles, especially in Beijing, Shanghai, and Shenzhen. From the perspective of industry demand, it is mainly concentrated in mobile Internet, data services, e-commerce. Most of the relevant positions in data analysis are required to be undergraduate, and the work experience is required to be mostly 1–5 years. Because of the development of big data and artificial intelligence, the salaries of data analysis positions have risen. In terms of salary, the monthly salary ranges from 10K-30K. Beijing and Shanghai are higher than in other cities, and entertainment and consumer life industries are higher than in other industries. The most required computer skills are SQL, Python, SPSS, PPT, and Excel. In terms of skills, companies pay more attention to data analysis and understanding of business and products.
Thanks for your reading, you can see our codes on Github:
拉勾网数据分析职位信息爬虫
Reference
1. The development of the Chinese Internet industry in 2019, the six development trends in the future and the forecast of the future development situation.China Industrial Information.Retrieved from.https://www.chyxx.com/industry/201904/728237.html
2. China Internet employment insight white book in 2019.Sohu News.Retrieved from.http://www.sohu.com/a/299845538_313170