facebook工程师
介绍(Introduction)
Facebook is arguably the world’s most popular social media network with over 2 billion worldwide active users. So it is no longer news that Facebook accumulates and stores massive amounts of user data, making it a treasure trove for anyone looking to build a career in data science. Whether its data scientist, data analyst or data engineering, no matter your data science career path, Facebook will offer you a scale only a few companies can match.
Facebook可以说是世界上最受欢迎的社交媒体网络,全球活跃用户超过20亿。 因此,Facebook积累并存储大量用户数据已不再是新闻,对于任何希望在数据科学领域谋求职业的人来说,这都是一个宝藏。 无论是数据科学家,数据分析师还是数据工程专家,无论您从事数据科学的职业生涯如何,Facebook都将为您提供规模只有少数公司可与之匹敌的规模。
As a data engineer at Facebook, you will not only get to work with the most advanced tools and platform any data engineers can ever dream of, but you will also see a direct link between your work, company growth, and user satisfaction.
作为Facebook的数据工程师,您不仅可以使用任何数据工程师梦dream以求的最先进的工具和平台,而且还可以看到您的工作,公司发展和用户满意度之间的直接联系。
Facebook的数据工程师角色 (The Data Engineer Role at Facebook)
Pixabay PixabayData engineer roles in any enterprise data analytic team range from managing, optimizing, and overseeing data retrieval system to building complex and robust data pipelines and algorithms. In more technical terms, their job involves finding trends in data sets, developing algorithms for enhanced data collection, compiling database systems, and writing complex queries for refining dataset.
在任何企业数据分析团队中,数据工程师的作用范围从管理,优化和监督数据检索系统到构建复杂而健壮的数据管道和算法。 用更多的技术术语来说,他们的工作包括发现数据集的趋势,开发用于增强数据收集的算法,编译数据库系统以及编写复杂的查询以完善数据集。
“Data engineers at Facebook are part of a tightly integrated team of core technical functions that support every product team at Facebook. They help product decisions alongside software engineering, design, product management, data science, research, and others”.
“ Facebook的数据工程师是紧密集成的核心技术功能团队的一部分,这些功能为Facebook的每个产品团队提供支持。 他们帮助产品决策以及软件工程,设计,产品管理,数据科学,研究等”。
At Facebook, data engineers lay the groundwork for data analysis by building and managing scalable data pipelines and frameworks, designing data warehouses for internal business use, and leveraging big data technologies to transform raw and complex data into actionable insights for better business decision making.
在Facebook,数据工程师通过构建和管理可伸缩的数据管道和框架,设计供内部业务使用的数据仓库以及利用大数据技术将原始和复杂的数据转换为可行的见解以更好地进行业务决策,为数据分析奠定基础。
Love Facebook and working with data? Check out “The Facebook Data Analyst Interview” article on Interview Query!
喜欢Facebook并使用数据吗? 在Interview Query上查看“ Facebook Data Analyst Interview”文章!
必备技能 (Required Skills)
The data engineer role at Facebook requires time-tested skills and extensive industry experience. As a result, Facebook chooses to hire only highly qualified applicants with at least 4 years of industry experience in data warehouse space.
在Facebook担任数据工程师的角色需要经过时间考验的技能和丰富的行业经验。 结果,Facebook选择只雇用在数据仓库领域具有至少4年行业经验的高素质申请人。
Other Minimum Qualifications Include:
其他最低资格包括:
- BS/BA in Computer Science, Mathematics, Physics, or other technical fields. 在计算机科学,数学,物理或其他技术领域获得BS / BA。
- Over 4 years of experience in writing complex SQL, Dataframe APIs, developing custom ETL, implementation, and maintenance. 在编写复杂SQL,Dataframe API,开发自定义ETL,实现和维护方面拥有4年以上的经验。
- Extensive industry experience with either a MapReduce or an MPP system. 具有MapReduce或MPP系统的丰富行业经验。
- Deep understanding of data architecture, Machine learning methods, Schema design, and dimensional data modelling. 对数据体系结构,机器学习方法,模式设计和维度数据建模有深入的了解。
- Hands-on experience with object-oriented programming languages (Java, Python, C++, Scala, Perl, etc.) 具有面向对象编程语言(Java,Python,C ++,Scala,Perl等)的动手经验
- Experience in analyzing large dataset to identify deliverables, gaps, and inconsistencies.具有分析大型数据集以识别可交付成果,差距和不一致之处的经验。
Facebook数据工程师团队的类型 (Types of Data Engineer Teams at Facebook)
Facebook is a very large product-based company with many departments, teams, and sub-division levels. As a data-driven company, Facebook relies heavily on data to make sound business decisions.
Facebook是一家非常大型的基于产品的公司,具有许多部门,团队和细分级别。 作为一家数据驱动的公司,Facebook高度依赖数据来制定合理的业务决策。
Data engineers are responsible for data collection and data integrity and they work cross-functionally with internal teams to help facilitate the leap from data to sound decision-making processes. As a result, data engineers at Facebook work within teams and their specific roles may differ a little based on the team roles.
数据工程师负责数据收集和数据完整性,并且与内部团队进行跨职能合作,以帮助推动从数据到健全的决策流程的飞跃。 结果,Facebook的数据工程师在团队中工作,其具体角色可能会因团队角色而有所不同。
Depending on the team, data engineer roles at Facebook may include:
根据团队的不同,Facebook的数据工程师角色可能包括:
Facebook App Monetization (FAM) Team: Roles include designing and building a strong data foundation, infrastructure, and architecture that will aid analytics, product, engineering and FAM leadership drive better decisions. They also work closely with data infrastructure teams to suggest improvements and modifications to existing data and ETL pipelines and communicate strategies and processes to multi-functional groups and leadership.
Facebook应用程序获利(FAM)团队:角色包括设计和构建强大的数据基础,基础架构和体系结构,以帮助分析,产品,工程和FAM领导制定更好的决策。 他们还与数据基础架构团队紧密合作,以建议对现有数据和ETL管道进行改进和修改,并将策略和流程传达给多功能组和领导层。
Data Warehouse Team: Roles within the ream include designing/building/launching new ETL processes and data models in production, managing data warehouse plans, partnering with engineers, product managers and product analysts to understand data needs, and collaborating with the data infrastructure team to triage infra issues and drive to resolution.
数据仓库团队:范围内的角色包括设计/构建/启动生产中的新ETL流程和数据模型,管理数据仓库计划,与工程师,产品经理和产品分析师合作以了解数据需求,并与数据基础架构团队合作以对基础设施问题进行分类,并寻求解决方案。
Novi Blockchain Data Engineering Team: Data engineers in this team design and implement scalable data repositories to integrate qualitative and quantitative research data, build and launch new ETL processes in production, and identify, collect and transform user interaction data and server events data into scalable schema models. They also work closely with Product Managers, Data Scientists, Software Engineers, Economic Researchers, Compliance, and Risk Management to build unique and intuitive products to tackle challenging problems.
Novi区块链数据工程团队:该团队中的数据工程师设计和实施可伸缩的数据存储库,以集成定性和定量研究数据,在生产中构建和启动新的ETL流程,以及识别,收集用户交互数据和服务器事件数据并将其转换为可伸缩模式楷模。 他们还与产品经理,数据科学家,软件工程师,经济研究人员,法规遵从性和风险管理紧密合作,以开发独特而直观的产品来解决具有挑战性的问题。
Facebook Video Distribution: Roles include developing optimal data processing architecture and systems for new data and ETL pipelines, and recommending improvements and modifications to existing data and ETL pipelines. Collaborating with Facebook internal teams to understand their needs and links these needs within the framework of data engineering solutions.
Facebook视频分发:角色包括为新数据和ETL管道开发最佳的数据处理架构和系统,并建议对现有数据和ETL管道进行改进和修改。 与Facebook内部团队合作以了解他们的需求,并将这些需求链接到数据工程解决方案的框架内。
Partnerships Central Systems, Data and Tools Team: Responsibilities include building and maintaining efficient and reliable data pipelines to move and transform data, build models that provide intuitive analytics, and collaborate cross-functionally to frame problems, gather data, and provide business-impact recommendations.
合作伙伴关系中央系统,数据和工具团队:职责包括建立和维护高效可靠的数据管道以移动和转换数据,构建提供直观分析的模型以及跨功能协作以解决问题,收集数据并提供对业务有影响的建议。
Family Ecosystems: General roles include, working with data infrastructure, product software Engineering, and product management teams to develop and validate, architecture-driven, end to end analytics development products, tools, and infrastructure stacks. Other roles include building an optimal data processing framework for new data and ETL pipelines/applications, build visualization for data and metrics insights, and effectively communicate strategy within teams and across various leadership level.
家庭生态系统:一般角色包括与数据基础架构,产品软件工程和产品管理团队合作,以开发和验证,架构驱动,端到端分析开发产品,工具和基础架构堆栈。 其他角色包括为新数据和ETL管道/应用程序建立最佳的数据处理框架,为数据和指标见解建立可视化以及在团队内部以及跨不同领导层有效地交流战略。
面试过程 (The Interview Process)
The Facebook data engineer interview follows a standard interview process like other Facebook technical roles. The interview process starts with an initial recruiter phone call interview where the roles and interview process will be explained. After this, is a one-hour long technical phone screen involving SQL and Python/Java coding. After passing the technical screen, an onsite interview consisting of 3 to 4 back-to-back interview rounds will be scheduled.
与其他Facebook技术角色一样,Facebook数据工程师的采访遵循标准的采访过程。 面试过程从最初的招聘人员电话面试开始,其中将解释角色和面试过程。 之后,是一个长达一小时的技术电话屏幕,其中涉及SQL和Python / Java编码。 通过技术筛选后,将安排由3到4个背对背访谈回合组成的现场访谈。
Learn more about Facebook’s interview process by reading this article about “Facebook Data Science Interview Questions and Solutions”.
阅读有关“ Facebook数据科学面试问题与解决方案”的文章,以了解有关Facebook面试过程的更多信息。
初始画面 (Initial Screen)
This is a 30 minutes long phone call interview with a recruiter or HR. Within this phone call conversation, the recruiter gets to explain more about the job role and what to expect subsequently with the interview process.
这是对招聘人员或HR进行的30分钟的电话面试。 在电话交谈中,招聘人员可以详细解释工作角色以及面试过程中的期望。
技术画面 (Technical Screen)
The Facebook DE technical interview is a 1-hour long phone interview involving SQL and Python/Java (depending on your programming language preference) coding using “Coderpad”. Questions are usually around 8 to 10 in number and are divided equally between SQL and Python (5 SQL/5Python) and there’s an algorithm question for both SQL and Python.
Facebook DE技术面试是一个长达1小时的电话面试,涉及使用“ Coderpad”进行SQL和Python / Java(取决于您的编程语言偏好)编码。 问题的数量通常在8到10之间,并且在SQL和Python之间平均分配(5个SQL / 5Python),并且对于SQL和Python都有一个算法问题。
Note: You will be always limited by time (1-hour max.). It helps to clearly communicate your thought process with the interviewer while solving problems especially around the coding section.
注意:您将始终受时间限制(最多1小时)。 它有助于在解决问题(尤其是在编码部分周围)的同时,与面试官清晰地交流您的思维过程。
Looking to practice your Python before your interview? We recommend reviewing this article about “Python Data Science Interview Questions” on Interview Query!
想要在面试之前练习Python? 我们建议您阅读有关“面试查询”中“ Python数据科学面试问题”的文章!
现场采访 (Onsite Interview)
The last stage of the Facebook data engineer interview process is an onsite interview comprised of 3 full-stacked interviews (2 ETL rounds, 1 data modelling round), 1 behavioral round, and a lunch break in-between.
Facebook数据工程师面试过程的最后阶段是一次现场面试,包括3个完整的面试(2个ETL回合,1个数据建模回合),1个行为回合和一个午餐时间。
Except for the behavioral interview, every other interview round will have a product sense element that tests the candidate’s product-sense knowledge on key operational metrics. You can expect questions like “What metrics would be good to capture for x scenario?”, “Describe a situation where you did not agree with the stakeholders and how did you handle it?”. Questions around ETL and modelling are case-based and may require some amount of coding.
除行为面试外,每轮其他面试都将有一个产品感测要素,用于测试候选人在关键操作指标上的产品感官知识。 您可能会想到诸如“对于x场景而言,哪种指标最适合捕获?”,“描述您与利益相关者不同意以及如何处理的情况?”之类的问题。 有关ETL和建模的问题是基于案例的,可能需要一些编码。
A breakdown of the onsite interview process is as follows:
现场采访过程的细分如下:
ETL Round: This round involves writing SQL and python/java code that resembles standard Facebook ETL codes.
ETL回合:此回合涉及编写类似于标准Facebook ETL代码SQL和python / java代码。
Modelling Round: This round has a mixture of SQL and Python and questions involves data model questions based on business scenario
建模回合:此回合混合了SQL和Python,问题涉及基于业务场景的数据模型问题
Behavioral: This interview assesses a candidate’s communication skills and how well they can convey their thoughts and ideas. Work on preparing your own stories, for example, a story on how you achieved success on a project, or about a time you dealt with a major failure, or on how you overcame a particular challenge on a project.
行为:这次面试评估候选人的沟通技巧,以及他们如何传达自己的思想和观念。 编写自己的故事,例如有关您如何在项目上取得成功,关于您处理重大失败的时间或如何克服项目上特定挑战的故事。
Note: Pre-pandemic era, this interview was done onsite at the Facebook building, but because of the pandemic, every interview is done virtually (online).
注意:大流行前时代,本次采访是在Facebook大楼现场进行的,但是由于大流行,每次采访都是虚拟进行的(在线)。
注意事项 (Notes and Tips)
Unsplash UnsplashThe Facebook data engineer interview process aims to assess candidates’ abilities to utilize big data to provide actionable business insights for growth. Facebook uses standardized questions to test the candidate’s in-depth knowledge of data architecture and frameworks as well as key operational metrics for all Facebook products.
Facebook数据工程师面试过程旨在评估应聘者利用大数据为增长提供可行的业务见解的能力。 Facebook使用标准化的问题来测试候选人对数据架构和框架以及所有Facebook产品的关键运营指标的深入了解。
Also, remember that Facebook uses standardized questions for all their interview process especially coding interviews. Try to explain your thought process while answering questions; communicate clearly to the interviewer how and why you used the methods you used.
另外,请记住,Facebook在所有面试过程中都使用标准化问题,尤其是编码面试。 回答问题时尝试解释您的思考过程; 与面试官清楚地交流如何以及为何使用所使用的方法。
The Facebook data engineer interview covers the length and breadth of data science domains including modelling, visualization, system designs, and end-to-end solutions from a data engineering perspective. Questions can span across:
Facebook数据工程师访谈涵盖了数据科学领域的长度和广度,包括从数据工程的角度进行建模,可视化,系统设计以及端到端解决方案。 问题可能涉及:
- Data structures and algorithms 数据结构和算法
- Writing SQL queries to solve a real-world problem编写SQL查询以解决实际问题
- DB performance tuning数据库性能调整
- Data pipeline design数据管道设计
- Metric and visualization solution design for a business case业务案例的度量和可视化解决方案设计
- Statistics and modelling统计与建模
- Previous project experience以前的项目经验
- Big data solutions like Spark, EMRSpark,EMR等大数据解决方案
- Reporting tools like Tableau, Excel报表工具,如Tableau,Excel
- Building data platforms or architecture for a hypothetical or existing Facebook product. 为假设的或现有的Facebook产品构建数据平台或体系结构。
Practice lot of SQL, Python/Java, modeling, and algorithm questions including lists, arrays (strings and substrings), dot product, JOINS, SUBQUERY, AGGREGATE functions, and GROUP BY. Try coding on a whiteboard to get familiar with the on-site interview experience. Finally, prepare for your interview by practicing Facebook data engineering questions on Interview Query!
练习很多SQL,Python / Java,建模和算法问题,包括列表,数组(字符串和子字符串),点积,JOINS,SUBQUERY,AGGREGATE函数和GROUP BY。 尝试在白板上编码以熟悉现场采访体验。 最后,通过在“面试查询”中练习Facebook数据工程问题来准备面试!
Facebook数据工程师面试问题 (Facebook Data Engineer Interview Questions)
- Given an array of integers, we would like to determine whether the array is monotonic (non-decreasing/non-increasing) or not.给定一个整数数组,我们想确定该数组是否是单调的(不递减/不递增)。
Examples:
例子:
1 2 5 5 8->true
1 2 5 5 8->真
9 4 4 2 2->true
9 4 4 2 2->是
1 4 6 3->false
1 4 6 3->假
1 1 1 1 1 1->true
1 1 1 1 1 1->是
- Design a dashboard to highlight a certain aspect of the user behaviour设计仪表板以突出显示用户行为的某个方面
- Does database view occupy the disk space.数据库视图是否占用磁盘空间。
- What is a loop that goes on forever? 永远持续下去的循环是什么?
- What is the term used to select non duplicates in SQL? 在SQL中用于选择非重复项的术语是什么?
- Find the max no from the given set of elements in an array (without using max function) 从数组中给定的元素集中找到最大值(不使用max函数)
- Find the minimum absolute difference between the set of elements of an array. 查找数组元素集之间的最小绝对差。
- Create DDL (table and foreign keys) for several tables in a provided ERD. ERD contains at least one many to many relationship. 在提供的ERD中为多个表创建DDL(表和外键)。 ERD包含至少一种多对多关系。
- Recursively parse a string for a pattern that can be either 1 or 2 characters long. 递归地分析一个字符串,该字符串可以是1个字符或2个字符。
- Perform a merge-sort with SQL only. 仅使用SQL执行合并排序。
- Given full authority to “make it work”, import a large data set with duplicates into a warehouse while meeting the requirements of a business intelligence designer for query speed. 拥有“使其工作”的完全权限,可以将具有重复项的大型数据集导入仓库,同时满足商业智能设计人员对查询速度的要求。
- Query a many to many relationship while not violating the grain of a fact table. 在不违反事实表的前提下查询多对多关系。
- Given a number and an array find the sum of any 2 numbers in a list is equal to a given number. 给定一个数字和一个数组,找到列表中任意两个数字的总和等于给定数字。
- Design an experiment to test whether a feature spurs conversation. 设计实验以测试某个功能是否刺激了对话。
- Describe your projects. 描述您的项目。
- Given a raw data table, how would you write the SQL to perform the ETL to get data into the desired format? 给定原始数据表,您将如何编写SQL来执行ETL以将数据转换为所需格式?
- How do rate the popularity of a posted video online? 如何评价在线发布的视频的受欢迎程度?
- Given an IP address as an input string, validate it and return True/False 给定IP地址作为输入字符串,对其进行验证并返回True / False
- Count the neighbors of each node in a graph. input graph is a multidimensional list 计算图中每个节点的邻居。 输入图是多维列表
- Given a list of tuples of movie watched times, find how many unique minutes of the movie did the viewer watch e.g. [(0,15),(10,25)]. The viewer watched 25 minutes of the movie. 给定电影观看时间的元组列表,找到观看者观看了电影的多少分钟,例如[(0,15),(10,25)]。 观众观看了25分钟的电影。
- How do you delete duplicate in a list? 如何删除列表中的重复项?
- Given a multi-step product feature, write SQL to see how well this feature is doing (loading times, step completion %). Then use Python to constantly update average step time as new values stream in, given that there are too many to store in memory 给定多步骤产品功能,请编写SQL以查看此功能的运行情况(加载时间,步骤完成百分比)。 然后,考虑到要存储的内存太多,请使用Python不断更新平均步进时间,以流入新值
- How do you join two tables with all the information on the left one unchanged?您如何连接两个表而左侧的所有信息保持不变?
- What operator will you use if you want to join a table 2 tables with one left and matched the right one? 如果要联接一个左有2个表且右一个相匹配的表,将使用什么运算符?
- The ORDER BY command in SQL is automatically set in what format if you didn’t set it? Ascending or Descending? 如果未设置,SQL中的ORDER BY命令会自动设置为哪种格式? 上升还是下降?
- When you want to delete or add a column of a table in a database, what command you will use? 当您要删除或添加数据库中表的列时,将使用什么命令?
- You have a 2-D array of friends like [[A,B],[A,C],[B,D],[B,C],[R,M], [S],[P], [A]] 您有一个二维数组的朋友,例如[[A,B],[A,C],[B,D],[B,C],[R,M],[S],[P],[一种]]
- Write a function that creates a dictionary of how many friends each person has. People can have 0 to many friends. However, there won’t be repeat relationships like [A,B] and [B,A] and neither will there be more than 2 people in a relationship 编写一个函数,以创建每个人有多少个朋友的字典。 人们可以有0到许多朋友。 但是,不会有像[A,B]和[B,A]这样的重复关系,并且在一个关系中也不会有超过2个人
谢谢阅读(Thanks for Reading)
If you’re interested in sharpening your data science skills, check out Interview Query!
如果您有兴趣提高自己的数据科学技能,请查看Interview Query !
Check out my Youtube channel for more interviewing guides, and tips & tricks for solving problems.
查看我的YouTube频道,以获取更多面试指南以及解决问题的提示和技巧。
Find more Facebook interview guides like the Facebook Data Analyst Interview and Facebook Data Science Interview Questions and Solutions on the Interview Query blog.
查找更多Facebook采访指南,例如Facebook Data Analyst采访 面试查询博客上的“ Facebook数据科学面试问题和解决方案”。
Originally published at https://www.interviewquery.com on August 31, 2020.
最初于2020年8月31日发布在https://www.interviewquery.com 。
翻译自: https://towardsdatascience.com/the-facebook-data-engineer-interview-345235afaac0
facebook工程师