数据科学相关职业
In case you missed it, there’s a pandemic out there, and it forces all of us to shut down all public events. As time goes by, we all begin to understand the impact of the lockdowns, social distancing and absence of gatherings. One of the things we realized, and by “we” I refer to the Algo group at Taboola, where I work, is the impact this has on those who are just beginning their career path or are about to shift it.
万一您错过了它,那将会是一场大流行,它迫使我们所有人关闭所有公共活动。 随着时间的流逝,我们所有人都开始了解封锁,社会疏远和没有聚会的影响。 其中一个我们意识到,并通过东西“我们”我指的是ALGO组在Taboola,我工作的地方,这是对那些谁是刚刚开始他们的职业生涯或即将接班的影响。
We used to host and attend many data science meetups and conferences, and noticed that many junior data scientists and data scientists-to-be used these gatherings to ask for and receive guidance and unofficial consulting regarding their career paths. And now, when all these are canceled, they have no one to reach out. And so, we came up with a new initiative, which we named Algo Boost (algoboost.me), to allow everyone to schedule a 30-minutes, one-on-one Zoom session with us, and get the guidance they seek.
我们曾经主持和参加许多数据科学聚会和会议,并注意到许多初级数据科学家和数据科学家将利用这些聚会来寻求和接受有关其职业道路的指导和非正式咨询。 而现在,当所有这些取消时,他们没有人可以伸出援手。 因此,我们提出了一项新计划,我们将其命名为Algo Boost( algoboost.me ),以使每个人都可以与我们安排30分钟的一对一Zoom会话,并获得他们所寻求的指导。
We were stunned by how much the data science community here in Israel, where we’re located, was in need for this. All of our volunteers where fully booked for an entire month within six hours of launch.
我们感到震惊的是,我们所在地的以色列的数据科学界对此有多大的需求。 我们所有的志愿者在启动后的六个小时内就已预订满一个月的时间。
I personally have already spent a few hours answering questions and providing guidance where I could, and found that there are certain questions and anxieties that are common to most — to be honest, I had them too when I started my career. Therefore I thought it will be a good idea to write them all down, including my personal thoughts, as I believe more people will find them useful at these times, in others places across the globe too. I would still like to emphasize — these are my personal view of things, and are nothing but my own advices.
我个人已经花了几个小时来回答问题并在可能的地方提供指导,并且发现有些问题和焦虑是大多数人最常见的-老实说,我在职业生涯中也遇到过这些问题。 因此,我认为将它们全部写下来,包括我的个人想法,将是一个好主意,因为我相信,在这些时候,甚至在全球其他地方,也会有更多的人发现它们有用。 我仍然要强调-这些是我对事物的个人观点,仅是我自己的建议。
The actual meaning of data scientist varies a lot. You might have noticed that when going over data science job descriptions — each company interprets what data science is in a different way. In some places it means you’re going to work on deep-learning models, in others the role involves mostly SQL and Excel. Make sure you understand what the specific role you’re looking at actually is. And if you’re definition of data science is working on machine-learning and deep-learning models, then —
数据科学家的实际含义千差万别。 您可能已经注意到,遍历数据科学职位描述时,每个公司都以不同的方式解释什么是数据科学。 在某些地方,这意味着您将要使用深度学习模型,而在其他地方,则该角色主要涉及SQL和Excel。 确保您了解您实际上正在寻找的特定角色。 而且,如果您对数据科学的定义正在研究机器学习和深度学习模型,那么-
Your first job won’t be as a data scientist, and that’s OK. If there’s anything I wish someone would have told me when I just started my career, it’s this one. A data scientist is a person who knows how to model a problem, how to analyze both data and results, and can implement the code that performs it — and obviously tune it. If it sounds like a lot of skills, that’s because it is. This is also why it’s not a first job. You become a data scientist after you gained experience (from your previous role) as either an analyst or a software developer, and then filling the gaps of the other role as part of your first data scientist job. So if you’re looking for your first job, start by being an analyst or a software developer (preferably big-data related) — whichever suits you best. This is how most of us begun. For example, I started my career as a data engineer at Appsflyer, and this was certainly one of the best things that could have happened to me on my career path, and one of the things I’m proud of till today. And so, just in case it wasn’t clear enough, allow me to emphasize —
您的第一份工作不会是数据科学家,那没关系。 如果我希望当我刚开始职业生涯时有人告诉我什么,那就是这个。 数据科学家是一个知道如何对问题建模,如何对数据和结果进行分析,并可以执行执行该问题的代码并显然对其进行调整的人。 如果听起来像是很多技能,那是因为。 这也是为什么它不是第一份工作的原因。 在从以前的职位获得分析师或软件开发人员的经验之后,您成为了一名数据科学家,然后填补了另一个职位的空白,这是您首次从事数据科学家工作的一部分。 因此,如果您正在寻找自己的第一份工作,请从成为分析师或软件开发人员(最好是与大数据相关的人员)开始-最适合您。 这就是我们大多数人开始的方式。 例如,我以Appsflyer的数据工程师的身份开始了我的职业生涯,这当然是我职业生涯中可能发生的最好的事情之一,也是直到现在我一直为之骄傲的事情之一。 因此,以防万一,请允许我强调一下-
Programming is part of the job, a major part of it. Implementing machine-learning models means you need to code them. And test them. And deploy them. And fix bugs, and upgrade them — and we haven’t even touched the input data processing and feature design. Coding is what data scientists do most of the day — and not necessarily coding a state-of-the-art model. Not everyone is into coding, and that’s perfectly fine, because while I’m probably stating the obvious, I would like to make this clear —
编程是工作的一部分,也是工作的重要部分。 实施机器学习模型意味着您需要对其进行编码。 并测试它们。 并部署它们。 修复错误并进行升级-我们甚至还没有涉及输入数据处理和功能设计。 编码是数据科学家一天中大部分时间要做的事情,而不必编码最新的模型。 并不是每个人都喜欢编码,这完全可以,因为尽管我可能要说清楚,但我想澄清一下—
It’s OK not to be a data scientist. These days, it seems like there’s this big halo surrounding data science. This is the hottest trend, and people sometimes get the feeling that being a data scientist is the best career path you can have. That is absolutely false. The best career path for you is what suits you best, because this where you will thrive. My first question to anyone who asks me how to become a data scientist is: describe to me your working-day in five years from now. If you’re into analyzing, figuring out data and using statistics to uncover interesting insights, but the coding is something you’d prefer to avoid — then go be an analyst. If you want to talk to people, present ideas and make decisions based on data — then you should be a product manager. This is not “letting yourself down” or “settling for second best” — these are meaningful, demanding and extremely challenging roles with a lot of impact, and if this is what you actually want to do — go do it. A title is just a title. But if it really is data science you’re after, here are some of my personal tips:
不必成为数据科学家也可以。 这些天来,似乎数据科学领域充满了光环。 这是最热门的趋势,人们有时会觉得做一名数据科学家是您可以拥有的最佳职业道路。 那是绝对错误的。 最适合您的职业道路是最适合您的职业,因为这将使您蓬勃发展。 对于任何问我如何成为数据科学家的人,我的第一个问题是:向我描述从现在起五年内的工作时间。 如果您要进行分析,找出数据并使用统计数据来发现有趣的见解,但是您最好避免使用编码-然后成为分析师。 如果您想与人交谈,提出想法并根据数据做出决策,那么您应该是产品经理。 这不是在“让自己失望”或“争取第二名” –这些都是有意义的,要求很高的,极具挑战性的角色,并且会产生很大的影响,如果这是您真正想做的,那就去做。 标题仅仅是标题。 但是,如果您真正追求的是数据科学,那么这里有一些我的个人提示:
Focus on models which are relevant to the industry. There are a ton of different model-types and fields under machine-learning and deep-learning, but only some of them are really being used in today’s industry. These are, mostly, image recognition, natural language processing (NLP) and recommendation systems. And so, while reinforcement learning might be the coolest thing you’ve ever seen (and I couldn’t agree with you more), this isn’t what I would recommend you focus on when kickstarting your career. Go to Kaggle, pick up challenges of the types I mentioned, and try to solve them — yourself. And by that I mean that using external libraries to do the hard technical stuff for you is how we actually do things in the industry, but try to implement a simple version of them yourself at least once. For example, using NLTK for NLP stemming is great, but try to see if you can implement a basic version of it yourself. Andrew Ng’s machine-learning course on Coursera even has an exercise where you implement back-propagation form scratch. These stuff will really make you understand how things work, and for sure will reflect on your job interviews. And if you’re not sure you know how you should tackle these challenges —
专注于与行业相关的模型。 在机器学习和深度学习中,有很多不同的模型类型和领域,但是在当今的行业中,只有其中一些确实被使用。 这些主要是图像识别,自然语言处理(NLP)和推荐系统。 因此,尽管强化学习可能是您见过的最酷的事情(而且我不能完全同意您的看法),但我不建议您在开始职业生涯时专注于此。 转到Kaggle ,拿起我所提到的类型的挑战,并尝试解决这些问题-你自己。 而且,我的意思是,使用外部库为您完成艰苦的技术工作是我们实际在行业中进行工作的方式,但是请尝试至少自己一次实现它们的简单版本。 例如,使用NLTK进行NLP阻止很不错,但是请尝试看看是否可以自己实现它的基本版本。 吴安德(Andrew Ng)在Coursera上的机器学习课程甚至进行了练习,您都可以从头开始进行反向传播。 这些东西确实会让您了解事情的运作方式,并且肯定会在您的工作面试中得到反映。 而且,如果您不确定自己知道如何应对这些挑战,
Read, and make sure you understand. One of the most important skills of a data scientists is the ability to go look for solutions by themselves. Many of the challenges we face, we face for the first time. Knowing what are the relevant sources and being able to read academic papers & technical blogposts is a must for this job. Practice this, and is there’s anything in the paper you don’t fully understand, go find the answers. And once you figured out the answers,
阅读并确保您理解。 数据科学家最重要的技能之一就是能够自己寻找解决方案。 我们面临的许多挑战是我们第一次面临。 知道哪些是相关资源,并且能够阅读学术论文和技术博客文章是这项工作的必要条件。 练习一下,发现本文中有您没有完全理解的内容,请找到答案。 一旦找到答案,
Write blogposts, with code. This tip is one of my personal favorites, as this is one of the things I used to do when started looking for my first job as a data scientist, and still do till today. The audience to which I write my blogposts is always the same — me, in six months from now, after I’ve forgotten everything that is written in that blogpost. So whenever I write a blogpost, I make sure to explain everything I do starting from the very basics, provide examples, and make sure not to leave any holes or open questions. I found this to be the best way to make sure I really understand what I think I understand, by following Einstein’s quote: if you can’t explain it simply, you don’t understand it well enough. Adding code as examples only makes it better, as it forces you to turn theory to practice.
用代码编写博客文章。 这篇技巧是我个人的最爱之一,因为这是我开始寻找我的第一份数据科学家工作时一直要做的事情,直到今天。 我写博客文章的对象始终是相同的-从现在起六个月后,我忘记了该博客文章中写的所有内容。 因此,每当我写博客时,都要确保从基本开始就解释我所做的一切,提供示例,并确保不要留下任何漏洞或未解决的问题。 通过遵循爱因斯坦的名言,我发现这是确保我真正理解自己的理解的最好方法:如果您不能简单地解释它,那么您就不会足够理解它。 添加代码作为示例只会使其更好,因为它迫使您将理论付诸实践。
Do you really need a higher degree (M.Sc/Ph.D)? The answer to this question varies by the country you live in. I can tell you that here in Israel, the answer will be: probably yes, and there’s a reason why. As the name implies, data scientists are scientists— meaning, we perform research. That means that mastering fields which were unknown to you just a month ago, keeping up-to-date with academic papers and designing & performing experiments is the core of data science. These are the exact same things people do when pursuing to complete their academic thesis, which is why having one is a major advantage. That being said, there are data scientists holding only a bachelor degree. To be honest, my first manager at Taboola had only a B.Sc, and was still considered one of the brightest in the group.
您真的需要更高的学位(理学硕士/博士学位)吗? 这个问题的答案因您所居住的国家而异。我可以告诉您,在以色列这里,答案将是:可能是,这是有原因的。 顾名思义,数据科学家就是科学家,也就是说,我们进行研究。 这意味着掌握一个月前您还不了解的领域,了解最新的学术论文以及设计和执行实验是数据科学的核心。 这些都是人们在追求完成其学术论文时所做的完全相同的事情,这就是拥有一个主要优势的原因。 话虽如此,有些数据科学家只持有学士学位。 老实说,我在Taboola的第一任经理只有理学学士学位,至今仍被认为是集团中最杰出的经理之一。
Do you need an academic degree in machine-learning? The short answer is no. The longer answer will be: no, but you’ll have to work harder to fill in the gaps. Truth be told, you don’t even need a degree in computer science, but it comes with a cost. I never studied computer science — my academic background is physics, yet here I am. But I realized I had holes in my data science knowledge and experience, and worked extra hard to fill those. If you too don’t come from an academic data science background, you’ll have to catch up — learn to code, learn to work with data, learn to model, learn to analyze, and really understand what you’re doing and why. The cost of not having the formal background means the path to your first data science job may be longer than others, and will require more effort, but it’s absolutely possible. Wondering what steps you should be taking to land your first job as a data scientist? Well, read from the top :). Good luck!
您需要机器学习的学位吗? 简短的答案是否定的。 更长的答案将是:不,但是您将不得不更加努力地填补空白。 说实话,您甚至不需要计算机科学的学位,但这是有代价的。 我从未学习过计算机科学-我的学术背景是物理学,但我在这里。 但是我意识到我在数据科学方面的知识和经验上存在漏洞,并为此付出了额外的努力。 如果您也不是来自学术数据科学领域,那么您就必须赶上来—学习编码,学习使用数据,学习建模,学习分析,并真正了解您的工作以及为什么。 没有正式背景的代价意味着您完成第一个数据科学工作的道路可能比其他人更长,并且需要付出更多的努力,但这绝对是可能的。 想知道您应该采取什么步骤来获得数据科学家的第一份工作? 好吧,从顶部开始阅读:)。 祝好运!
Freepik Freepik翻译自: https://towardsdatascience.com/how-to-kickstart-your-career-as-a-data-scientist-3a904ff58e5c
数据科学相关职业