大数据数据科学家常用面试题
Choose a job you love, and you will never have to work a day in your life. — Confucius
选择一份自己喜欢的工作,您将永远不必工作一天。 —Kong子
介绍(Introduction)
An interview is a formal meeting which occurs between employer and an job applicant. A job interview is the first step to get closer towards getting the job an individual aspires. It also gives an equal opportunity to the employer to appraise the applicant’s qualification , appearance and overall fitness towards the job opening. For the technical roles like Data Science can contain an enormous set of questions to prove your skills as well as command over the subject which best suits the organization requirements.
面试是在雇主和求职者之间举行的正式会议。 求职面试是迈向个人渴望的工作的第一步。 这也为雇主提供了平等的机会来评估求职者的资格,外貌和总体适应度。 对于像Data Science这样的技术角色,可能包含大量问题,以证明您的技能以及对最适合组织要求的主题的掌握。
In this modern world , there has been a sharp rise in the data science jobs (650% job growth since 2012 (source: LinkedIn))and 11.5 million new jobs by 2026 (source: U.S. Bureau of Labor Statistics). Data science is getting bigger and bigger each passing day, as a result it is churning out plenty of job opportunities for those interested in pursuing a career as a Data Scientist. So if your preparing for any such interview , then you have visited the right blog.
在这个现代世界中,数据科学工作急剧增加(自2012年以来,工作增长了650% (来源:LinkedIn)),到2026年将新增1150万个工作岗位(来源:美国劳工统计局)。 数据科学日新月异,因此,它为有兴趣从事数据科学家职业的人们提供了大量工作机会。 因此,如果您准备进行任何此类采访,那么您已经访问了正确的博客。
动机 (Motivation)
Remember Why you started
记住你为什么开始
It will make you feel good and calm your nerves that everyone who is preparing or giving data science interviews has their own struggle story. It’s all about having patience to learn continuously. The data science interview questions covers a vast range of different topics as it is one of the interdisciplinary field and those cheeky interviewers love to throw the odd ball. To handle such situations you need to answers these questions confidently to assure the interviewer that you have the relevant knowledge related to the subject.
准备或进行数据科学访谈的每个人都有自己的奋斗故事,这将使您感觉良好并放松神经。 这是要有耐心不断学习。 数据科学面试问题涵盖了众多不同的主题,因为它是跨学科领域之一,那些厚脸皮的面试官都喜欢扔奇怪的球。 为了处理这种情况,您需要自信地回答这些问题,以确保面试官具有与该主题相关的知识。
If we browse about the data science , there are plenty of articles out there explaining with different examples ,it’s going to confuse you more , plus if the question comes up from you didn’t study ?
如果我们浏览数据科学,那么会有很多文章用不同的示例进行解释,这会让您更加困惑,加上问题是否出自您是否没有学习?
The main motive behind this blog to bring everything together. So this blog is divided into following sections :-
该博客将所有内容整合在一起的主要动机。 因此,此博客分为以下部分:-
Technical questions :- This section will contain following sub topics
技术问题:-本节将包含以下子主题
a) Mathematics
a)数学
b) Statistics
b)统计
c) Coding
c)编码
d) Machine Learning
d)机器学习
2. Practical Experience questions
2.实践经验问题
3. Soft Skills
3.软技能
4. Scenarios /Case Studies
4.场景/案例研究
技术问题(Technical Questions)
The interviewer expect that the candidate should have strong knowledge of mathematics,statistics,coding and machine learning. They are likely to be asked to demonstrate their hands on these skills but prepare to show off their theoretical techniques too.
面试官希望候选人应具有数学,统计学,编码和机器学习方面的丰富知识。 他们可能会被要求证明自己掌握这些技能,但也准备炫耀其理论技巧。
数学 (Mathematics)
https://unsplash.com/photos/GzDrm7SYQ0g https://unsplash.com/photos/GzDrm7SYQ0gMachine Learning builds upon the language of mathematics to express concepts that seems intuitively obvious but are difficult to formalize. Mathematics underpins the study of the machine learning , statistics , algorithms and computer architecture, among others.Applied mathematics is at the heart of the matter.
机器学习建立在数学语言的基础上,以表达直观上显而易见但难以形式化的概念。 数学是机器学习,统计,算法和计算机体系结构等研究的基础。应用数学是问题的核心。
Showing good knowledge about math signals to the interviewer that you have a strong understanding of how the algorithms works. Here are some questions related to mathematics
向面试官展示有关数学信号的丰富知识,表明您对算法的工作原理有深入的了解。 这是一些与数学有关的问题
- What is the sum of numbers from 1 to 100?1到100的数字总和是多少?
- What do you mean by Fibonacci series ? 斐波那契数列是什么意思?
- What is the equation of line ? What do you mean by intercept and slope of line ? 线的方程是什么? 线的截距和斜率是什么意思?
- What is the difference between exponential and logarithmic functions? 指数函数和对数函数有什么区别?
- What are the different types of probability ? 有哪些不同类型的概率?
- What is the difference between normal , binomial and bernouli distribution? 正态分布,二项分布和伯努利分布有什么区别?
- What are the rules for the differentiation and integration? 差异化和融合的规则是什么?
- What is chain rule of differentiation? 什么是差异化的连锁法则?
- What are matrix and explain the properties of it ? 什么是矩阵并说明其属性?
- What are tensors ? Why it is important? 什么是张量? 为什么重要?
- A snail from down from a well 50 ft deep ? Each day it climbs up 3 ft, and each night it slides down 1 ft. How many days does it take him to get out ? 从50英尺深的井下钻来的蜗牛? 每天,它爬上3英尺,每天晚上,它滑下1英尺。他要走多少天?
- You have a cube of 10 * 10 * 10 cube , made up of thousand 1*1*1 cubes. If you remove the outer layer of this structure, how many cubes are left with you ? 您有一个10 * 10 * 10多维数据集,由1000个1 * 1 * 1多维数据集组成。 如果删除此结构的外层,则剩下多少个多维数据集?
- A race track has 5 lanes. There are 25 cars and we would like to find out the 3 fastest cars of those 25. What is the minimum number of races one would need to conduct to determine the 3 fastest cars? 一条赛道有5条车道。 一共有25辆赛车,我们想找出这25辆中最快的3辆。确定3辆最快的赛车需要参加的最低比赛次数是多少?
- Four people want to cross a rickety bridge at night, they have a single torch and the bridge is too dangerous to cross without one. The bridge is only strong enough to support two people at a time. Not all people take the same time to cross the bridge. Times for each person : 1 min, 2 mins, 7 mins and 10 mins. What is the shortest time needed for all four of them to cross the bridge? 有四个人想在晚上越过摇摇欲坠的桥,他们只有一个火炬,桥太危险了,无法一人过桥。 这座桥的强度仅足以一次支撑两个人。 并非所有人都花相同的时间过桥。 每个人的时间:1分钟,2分钟,7分钟和10分钟。 他们四个人过桥最短的时间是什么?
- An extension of the rock , paper and scissors where there are N options instead of 3 options. For what values of N is it possible to construct the fair game , where by ‘fair’ means that for any move that a player plays there are an equal number of moves that beat it or loose it ? 岩石,纸张和剪刀的扩展,其中有N个选项而不是3个选项。 对于N的多少值,可以构建公平游戏,而“公平”意味着对于玩家所打的任何举动,都有相等数量的打败它或丢掉它的举动?
- In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl , they have another child. If they have a boy,they stop. What is the proportion of boy to girls in the country? 在人们只想要男孩的国家,每个家庭继续生孩子直到生男孩。 如果他们有一个女孩,他们就会有另一个孩子。 如果他们有一个男孩,他们会停下来。 该国男孩与女孩的比例是多少?
统计 (Statistics)
https://unsplash.com/photos/NDfqqq_7QWM https://unsplash.com/photos/NDfqqq_7QWMDo you know , data scientists were once called statisticians ? The two profession aren’t one and the same, but many data scientists have finished the statistics degree. Statistics is one of the founding fathers of the data science. In an interview , you will tested logically on your ability to reason statistics. Here important it is to use precise technical language. Please consider the following list of questions :-
您知道吗,数据科学家曾经被称为统计学家? 这两个行业并不相同,但是许多数据科学家已经完成了统计学学位。 统计是数据科学的奠基人之一。 在采访中,您将逻辑推理测试统计数据的能力。 在这里重要的是使用精确的技术语言。 请考虑以下问题列表:
- What is central limit theorem and why it is important ? 中心极限定理是什么,为什么如此重要?
- What is sampling and how many sampling methods you know ? 什么是采样,您知道几种采样方法?
- What is null hypothesis and how do we state it ? 什么是零假设,我们如何陈述它?
- How would you explain a linear regression to a business executive? 您如何向业务主管解释线性回归?
- What is heteroskedasticity is and how to solve it ? 什么是异方差,如何解决?
- How do you find the correlation between the categorical values and continuous values ? 您如何找到分类值和连续值之间的相关性?
- What are the assumptions of the linear regression? 线性回归的假设是什么?
- What do the terms like p-value , coefficient , and r-squared value mean? What is the significance of each of these components? p值,系数和r平方值等术语是什么意思? 这些组件中的每一个的意义是什么?
- What is a statistical interaction? 什么是统计互动?
- What do you mean by statistical power and how do you calculate it ? 您所说的统计能力是什么意思,怎么计算呢?
- What is selection bias? 什么是选择偏见?
- What is an example of a data set with a non-Gaussian distribution? 具有非高斯分布的数据集的示例是什么?
- Please explain the difference between overfitting and underfitting ? 请解释过度拟合和欠拟合之间的区别吗?
- Explain what the cross validation is. How and why it is used ? 说明什么是交叉验证。 如何以及为什么使用它?
- Explain boostrapping as if you are talking to a non-technical person? 像是在与非技术人员交谈一样,解释一下助推器?
- State some biases that you are likely to encounter when cleaning a database? 陈述清理数据库时可能会遇到的一些偏见吗?
编码 (Coding)
https://unsplash.com/photos/OqtafYT5kTw https://unsplash.com/photos/OqtafYT5kTwEvery data scientists is supposed to know a certain amount of programming knowledge. You are not supposed to be pro , but you must have that much knowledge in order to use the libraries efficiently. You need to have the grip and the potential for the continuous improvement.
每个数据科学家都应该了解一定数量的编程知识。 您不应该是pro,但是您必须具有足够的知识才能有效地使用库。 您需要掌握并不断改进的潜力。
Python,R and SQL are the bread and butter of the programming language in data science. Questions will be asked around these three staples. Some of the questions are listed below :-
Python,R和SQL是数据科学中编程语言的基础。 围绕这三个主食会问一些问题。 下面列出了一些问题:
R Language
R语言
- Can you write and explain some of the most common syntax of R?您可以编写和解释R的一些最常见语法吗?
- How do you list the pre loaded dataset in R? 如何在R中列出预加载的数据集?
- What are the different data structure in R? Please explain about them R中有哪些不同的数据结构? 请解释一下
- How do you load files like (json, .csv and .xlsx) using R?如何使用R加载文件(如json,.csv和.xlsx)?
- What is Rmarkdown ? What it is used for? 什么是Rmarkdown? 它的作用是什么?
- Explain the steps to build and evaluate the linear regression? 解释构建和评估线性回归的步骤?
- What are the packages are used for data imputation? 数据插补使用了哪些软件包?
- How to write a custom function in R? Explain with the example. 如何在R中编写自定义函数? 举例说明。
- Name some functions available in “dplyr” package. 在“ dplyr”包中命名一些可用的功能。
- How would you create a new R6 Class? 您将如何创建新的R6类?
- Tell me something about shinyR. 告诉我一些有关ShinerR的信息。
- What is advantage of using apply family of functions in R? 在R中使用Apply系列函数有什么优势?
- What packages are used for data mining in R? R中使用哪些程序包进行数据挖掘?
- Give examples of “rbind()” and “cbind()” functions in R? 举例说明R中的“ rbind()”和“ cbind()”函数吗?
- Give examples of while and for loop in R. 给出R中while和for循环的示例。
- Give examples of “select” and “filter” functions from “dplyr” package. 举例说明“ dplyr”软件包中的“选择”和“过滤”功能。
- What is the use of stringR package. Give some examples of the functions in Stringr. stringR包有什么用。 给出一些Stringr函数的示例。
- How would you make multiple plots onto a single page in R? 您将如何在R中的单个页面上绘制多个图?
- How would you create a scatterplot using ggplot2 package? 您将如何使用ggplot2包创建散点图?
- How would you facet the data using ggplot2 package? 您将如何使用ggplot2数据包处理数据?
- How to create series with the help vector values? 如何使用帮助向量值创建序列?
- Explain about “initialize()” function in R? 解释一下R中的“ initialize()”函数吗?
- How would you do a left and right join in R? 您将如何在R中进行左右连接?
- What is a factor? How would you create a factor in R? 是什么因素? 您将如何在R中创建一个因子?
- What are the different import functions in R? R中有哪些不同的导入功能?
- Name some functions which can be used for debugging in R? 列出一些可用于R中调试的函数?
- How would you check the distribution of a categorical variable in R? 您将如何检查R中类别变量的分布?
- How would you rename the columns of a dataframe? 您将如何重命名数据框的列?
- How are missing values and impossible values represented in R? R中的缺失值和不可能值如何表示?
- How would you extract one particular word from a string? 您如何从字符串中提取一个特定的单词?
- When is it appropriate to use the “next” statement in R? 什么时候在R中使用“ next”语句合适?
- What is the use of with() and by() in R ? R中with()和by()的用途是什么?
- What is a factor variable, and why would you use one? 什么是因子变量,为什么要使用一个?
- When is it appropriate to use the which() function? 什么时候适合使用which()函数?
- What is the difference between lapply and sapply? lapply和sapply有什么区别?
- How do you merge two data frames in R? 如何在R中合并两个数据框?
- What is the command used to store the R Objects in a file ? 用于将R对象存储在文件中的命令是什么?
- How can you split a continuous variable into different groups/rank in R? 如何在R中将连续变量分成不同的组/等级?
- Please explain the key difference between Python and R? 请说明Python和R之间的主要区别?
Python
Python
- What are the different data types used in python ?python中使用了哪些不同的数据类型?
- Explain the difference between list and python? 解释一下list和python之间的区别?
- What is Python dictionary ? 什么是Python字典?
- Explain lambda functions ? 解释lambda函数?
- Explain list comprehensions and how they are used in python? 解释列表理解以及如何在python中使用它们?
- What is negative index and how it is used in python? 什么是负索引,如何在python中使用它?
- What are the commonly used python libraries ? 常用的python库是什么?
- What is Pandas and how it is useful ? 什么是熊猫?熊猫有什么用?
- How do you read files using pandas ? 您如何使用熊猫读取文件?
- How can you create series in python? 如何在python中创建系列?
- What is the default missing value marker in pandas, and how can you detect all missing values in a Data-frame? 大熊猫中默认的缺失值标记是什么,如何检测数据框中的所有缺失值?
- How will you reverse a string in python ? 您将如何在python中反转字符串?
- Which python library would you prefer for data wrangling ? 您希望使用哪个python库进行数据处理?
- How can you build a simple logistic regression ? 如何建立简单的逻辑回归?
- What’s the shortest way to open a text file in python? 在python中打开文本文件的最短方法是什么?
- Have you done web scraping in Python? How can you do that? 您是否已在Python中完成了网页抓取? 你该怎么做?
- Please explain what is “pass” in python? 请解释什么是python中的“ pass”?
- Please explain the concept of pattern matching using python? 请解释使用python模式匹配的概念吗?
- What tool you would use to find bugs ? 您将使用什么工具来发现错误?
- What’s your preferred library for plotting in python? 您首选在python中进行绘图的库是什么?
- What is the main difference between a pandas series and a single column data frame in Python? Python中的pandas系列和单列数据框之间的主要区别是什么?
- Write a code to sort the dataframe both in ascending and descending ? 编写代码以对数据框进行升序和降序排序?
- Why should we use numpy arrays instead nested python lists? 为什么我们应该使用numpy数组而不是嵌套的python列表?
- How can you train and interpret a linear regression model in SciKit learn? 如何在SciKit学习中训练和解释线性回归模型?
- How can you handle duplicate values in a dataset for a variable in Python? 如何在Python中处理变量的数据集中的重复值?
- Which Random Forest parameters can be tuned to enhance the predictive power of the model? 可以调整哪些随机森林参数来增强模型的预测能力?
- How can you check if a data set or time series is Random? 如何检查数据集或时间序列是否是随机的?
- Can we create a DataFrame with multiple data types in Python? If yes, how can you do it? 我们可以在Python中创建具有多种数据类型的DataFrame吗? 如果是,您该怎么办?
- Which is the standard data missing marker used in Pandas? 熊猫使用的标准数据缺失标记是什么?
- How are NumPy and SciPy related? NumPy和SciPy有什么关系?
- Which plot will you use to access the uncertainty of a statistic? 您将使用哪个图来获取统计的不确定性?
- What are some features of Pandas that you like or dislike? 您喜欢或不喜欢的熊猫有哪些功能?
- Which scientific libraries in SciPy have you worked with in your project? 您在项目中与SciPy的哪些科学图书馆合作?
- What is pylab? 什么是pylab?
SQL
SQL
- You have table called with CUST_ID,Order_Date,Order_ID,Tran_Amt. How would you select the top 100 customers with the highest spend over a long period of time? 您有用CUST_ID,Order_Date,Order_ID,Tran_Amt调用的表。 您将如何选择长期以来支出最高的前100名客户?
- Describe the different part of SQL query? 描述SQL查询的不同部分?
- What is primary key and foreign key ? 什么是主键和外键?
- What is the difference between UNION and UNION ALL ? UNION和UNION ALL有什么区别?
- Write down the SQL script to return data from two tables ? 写下SQL脚本以从两个表返回数据?
- What is the difference between the primary key and a unique key? 主键和唯一键有什么区别?
- What is the difference between the SQL, MYSQL and SQL server? SQL,MYSQL和SQL Server有什么区别?
- What are the different types of JOINS and explain the how will perform JOINS ? 有哪些不同类型的JOINS,并说明如何执行JOINS?
机器学习 (Machine Learning)
The understanding of the machine learning methodologies is essential for every aspiring data scientist. You should be able to explain the key concepts in detail. It is possible that the interviewer may ask you a business problem and to come up the possible solutions based on the machine learning algorithms.With the algorithms, expect to touch upon the commonly observed problems and their fixes. Some of the questions are listed below :-
对机器学习方法的理解对于每个有抱负的数据科学家都是必不可少的。 您应该能够详细解释关键概念。 面试官可能会问您一个业务问题并根据机器学习算法提出可能的解决方案,并希望借助这些算法来解决常见的问题及其解决方案。 下面列出了一些问题:
- What is the difference between AI,Data Science and DL? AI,数据科学和DL有什么区别?
- What is the difference between supervised ,unsupervised learning and reinforcement learning? 监督学习,非监督学习和强化学习有什么区别?
- Describe the architecture of Machine Learning ? 描述机器学习的架构?
- What’s the trade off between bias and variance? 偏差和方差之间的权衡是什么?
- How will you handle the imbalanced data? 您将如何处理不平衡的数据?
- How do you ensure that you don’t over-fit the model ? 您如何确保不过度拟合模型?
- How is KNN different from K-means clustering? KNN与K-means聚类有何不同?
- Explain how the ROC curve works? 解释ROC曲线如何工作?
- What is linear regression? Explain the OLS stats model? 什么是线性回归? 解释OLS统计模型?
- What is L1 and L2 regularization ? 什么是L1和L2正则化?
- What is R square and how it is different from the adjusted R square? 什么是R平方,它与调整后的R平方有什么不同?
- What are the different metrics used for the regression models? 回归模型使用哪些不同的指标?
- How do you evaluate the prediction accuracy of the classification model? 您如何评估分类模型的预测准确性?
- Explain the SVM algorithm and also the hyper parameters associated with it? 解释SVM算法以及与此相关的超级参数?
- What is Bayes theorem? How it is useful in machine learning ? 贝叶斯定理是什么? 它在机器学习中有什么用?
- What is “Naive” in Naive Bayes theorem? 朴素贝叶斯定理中的“朴素”是什么?
- What is your favorite algorithm and explain about it ? 您最喜欢的算法是什么,并对其进行解释?
- What do you mean by confusion matrix? 混淆矩阵是什么意思?
- What is the difference between Type 1 and Type 2 algorithm ? Type 1和Type 2算法有什么区别?
- What is the difference between probability and likelihood ? 概率和可能性之间有什么区别?
- What is cross validation technique and how you are going to apply on time series data? 什么是交叉验证技术,以及如何将其应用于时间序列数据?
- Explain the decision tree algorithm ? What is decision tree prune ? 解释决策树算法? 什么是决策树修剪?
- What is more important to you model accuracy and model performance? 什么对您的模型准确性和模型性能更重要?
- What is F1 score ? How would you use it ? 什么是F1分数? 您将如何使用它?
- What do you mean by ensemble technique? What are it’s types ? 合奏技术是什么意思? 它是什么类型?
- What is bagging ? Explain the algorithm which uses it ? 什么是装袋? 解释使用它的算法?
- What is boosting ? Explain the algorithm which uses the bagging concept ? 什么在助推? 解释使用装袋概念的算法?
- What’s the kernel trick and how it is useful ? 什么是内核技巧,它有什么用?
实践经验问题 (Practical Experience questions)
Practical questions are important as it reflects how you have used your technical knowledge in your project to solve any business problems. There is am exhausting list of data science questions and the interviewer is not going to waste time in asking dozens of questions . Instead they ask these practical questions to gauge whether you are the candidate for them.
实际问题很重要,因为它反映了您如何在项目中使用技术知识来解决任何业务问题。 数据科学问题的清单十分详尽,访调员不会浪费时间问几十个问题。 相反,他们会问这些实际问题,以评估您是否适合他们。
These practical questions are designed to shed light on your pace of work, experiences and habits. Here are some of the interview questions
这些实际问题旨在阐明您的工作节奏,经验和习惯。 以下是一些面试问题
- Summarize your experience in the field of data science.总结您在数据科学领域的经验。
- Tell me about your first data science projects. 告诉我您的第一个数据科学项目。
- How do keep yourself updated in the field of data science ? 如何在数据科学领域保持自我更新?
- Have you ever implemented any research paper? 您是否实施过任何研究论文?
- So python is your preferred programming language ?What experience do you have with R? 那么python是您首选的编程语言?您对R有什么经验?
- Do you any experience in Tableau or PowerBI? 您有Tableau或PowerBI的经验吗?
- What kind of RDBMS software do you have experience with? 您有什么样的RDBMS软件经验?
- Do you any experience in cloud services ? How you have used them in data science? 您有云服务方面的经验吗? 您如何在数据科学中使用它们?
- What was the strength of your data science team? How everyone collaborated among each other? 您的数据科学团队的实力是什么? 每个人如何相互协作?
- How often you monitor and update your machine learning models? 您多久监控和更新一次机器学习模型?
行为问题 (Behavioral Questions)
Interviewers are interested in how you handle workplace situations, how you work within a team and whether you are fit for the above position. These questions are asked indirectly for example the interviewer may pose broad questions about your motivations or the tasks you enjoy. There is not an exact answers to this , but it helps the interviewer to understand your past behavior. It also helps them to understand what kind of person you are.
面试官对您如何处理工作场所的情况,团队中的工作方式以及您是否适合担任上述职位感兴趣。 这些问题是间接询问的,例如,面试官可能会提出有关您的动机或您喜欢的任务的广泛问题。 尚无确切答案,但这可以帮助面试官了解您过去的行为。 它还可以帮助他们了解您是什么样的人。
Consider an example where you have faced a conflict while working on a team project. Instead of asking hypothetical questions (How will you deal with) . The interviewer is hoping to elicit a more meaningful response by pushing you to chat about a real life incident. They will always look for four things for your story :-
考虑一个在团队项目中遇到冲突的示例。 而不是提出假设的问题(您将如何处理)。 面试官希望通过敦促您聊天现实生活中的事件来引起更有意义的答复。 他们将始终为您的故事寻找四件事:-
Situation: What was the context? (spend 10 % of time answering this)
情况:背景是什么? (花10%的时间回答这个问题)
Task: What needed to be done? (spend 10 % of time answering this)
任务:需要做什么? (花10%的时间回答这个问题)
Action: What did you do ?(spend 10 % of time answering this)
行动:您做了什么?(花10%的时间回答这个问题)
Results: What are the accomplishments?
结果:有哪些成就?
Some of the examples of behavioral questions :-
行为问题的一些例子:
- Please describe your data science project 请描述您的数据科学项目
- Tell us about the situation when you had to balance competing priorities ?告诉我们有关您何时需要权衡竞争优先事项的情况?
- Describe a time when you manage to persuade someone to see things your way. 描述您设法说服某人以自己的方式看待事物的时间。
- Is there was any situation where you got bored by your job ? 在任何情况下,您对工作感到无聊吗?
- How do you motivate yourself ? 您如何激励自己?
- When you failed to meet the deadline? 您何时未能按时完成?
- Our team is brand new and is under financed. We have no standard procedures or training , and everything is ad hoc . How will you handle this ? 我们的团队是全新的,资金不足。 我们没有标准的程序或培训,并且一切都是临时的。 您将如何处理?
案例研究问题 (Case Study questions)
The purpose of scenarios (case study questions) is to test your experience in data science fields. It will likely look for skills outside of the technical toolkit. Interviewer might be looking for skills outside of the technical toolkit. For instance, they may be looking for logical reasoning or business understanding. It’s important for you to demonstrate structural thinking , reasoning and problem solving skills. You can’t be a good data scientist if you don’t solve or identify business problems. For example :-
场景(案例研究问题)的目的是测试您在数据科学领域的经验。 它可能会在技术工具包之外寻找技能。 采访者可能正在寻找技术工具包之外的技能。 例如,他们可能正在寻找逻辑推理或业务理解。 展示结构性思维,推理和解决问题的能力对您很重要。 如果您不解决或识别业务问题,就不能成为一名优秀的数据科学家。 例如 :-
A real estate company sales department has increased the selling price of all the items by 5%. There are 10 regions containing houses with different price. Before the price increase the gross revenue was $50000000 with an average selling price of $100000. After the price increase, the gross revenue was $50500000 , with an average selling price $95000. Why hasn’t the price increase had the desired impact of increasing revenue and average selling price?
一家房地产公司的销售部门已将所有商品的售价提高了5%。 有10个地区的房价不同。 在提价之前,总收入为5000万美元,平均售价为100000美元。 提价后,总收入为5.050亿美元,平均售价为95,000美元。 价格上涨为什么没有产生增加收入和平均售价的预期效果?
You can also be given case studies to predict or guess something based on the market sizing questions. These questions also depend on the domain in which you are working.
您还可以进行案例研究,以根据市场规模问题来预测或猜测某些内容。 这些问题还取决于您工作的领域。
结论 (Conclusion)
An interview is a dialogue , now a written test
面试是对话,现在是笔试
We have tried only to scratch the surface when it comes to examples of data science interview questions which we may encounter. The industry is booming and as such the companies are constantly adapting their interview sessions.
在涉及到我们可能遇到的数据科学面试问题的示例时,我们只是尝试打起了表面。 该行业正在蓬勃发展,因此,公司正在不断调整他们的面试环节。
Data science interview questions may vary in their peculiarities , but the type of questions remain the same, so having a base knowledge of these types of with a good amount of preparation will allow you to logically tackle any question the interviewer has up on her sleeve.
数据科学面试问题的特点可能有所不同,但问题的类型保持不变,因此,对这些类型的问题有充分的基础知识,并作了大量准备,就可以使您逻辑上解决面试官提出的所有问题。
Happy Reading ,
祝您阅读愉快
Saurav Anand
索拉夫·阿南德(Saurav Anand)
获得访问专家视图的权限-订阅DDI Intel (Gain Access to Expert View — Subscribe to DDI Intel)
翻译自: https://medium.com/datadriveninvestor/want-to-be-a-data-scientist-a-simple-guide-to-tackle-data-science-interview-d4c957636510
大数据数据科学家常用面试题