https://brohrer.github.io/imposter_syndrome.html
Imposter syndrome
冒名顶替者综合症
I am not a real data scientist.
I have never used a deep learning framework, like TensorFlow or Keras.
I have never touched a GPU.
I don’t have a degree in computer science or statistics. My degree is in mechanical engineering, of all things.
I don't know R.
But I haven’t given up hope. After reading a bunch of job postings, I figured out that all it will take to become a real data scientist is five PhD's and 87 years of job experience.
我不是个数据科学家。
我从没有用过一个深度学习的框架,比如TensorFlow
或Keras
。
我从没有接触过GPU(
显卡)
。
我没有一个计算机科学或统计学方面的学位。我的学位是关于所有事物的机械学。
我不懂R
语言。
但是我没有放弃希望。在读了一串的工作列表后,我的除了结论,成为数据科学家需要5
个博士学位和87
年的工作经历。
If this sounds familiar, know that you are not alone. You are not the only one who wonders how much longer they can get away with pretending to be a data scientist. You are not the only one who has nightmares about being laughed out of your next interview.
如果这听起来耳熟,你就知道你不孤单。你不是唯一想知道需要多久才能侥幸成为一个不需要假装是一个数据科学家的人。你不是唯一做噩梦害怕在下一次面试中被嘲笑的人。
Imposter syndrome is feeling like everyone else in your field is more qualified than you are, that you will never get hired or, if you already have been, that you are a mistake of the hiring process. Despite its statistical implausibility, most of us feel below average. Based on my conversations with colleagues, I estimate that 9 out of 10 of us suffer from imposter syndrome at one time or another. (If this sounds entirely unfamiliar to you, I recommend an introspective reading of “
Unskilled and unaware of it” by Kruger and Dunning.)
冒名顶替者综合症是一种感觉在你的领域中的所有其他人都比你更有资格的感觉,你觉得你不会被录用,如果你已经被录用了,你觉得你是招聘过程的一个错误。尽管在统计学上难以置信,但是大多数人都感觉自己低于平局水平。基于我和同事的对话,我估计9/10
中的我们都在某些时刻经历冒名顶替者综合症。(
如果这听起来对你来说完全陌生,我推荐你自省地阅读Kruger
和Dunning
的文章“不熟练和不清楚”.)
[a picture]
Even Ewoks feel like imposters sometimes. (Photo courtesy of Diane Rohrer.)
[
一张图片,省略了,一只装成熊的狗的图片]
What a real data scientist looks like
真正的数据科学家看起来啥样
“Data science” is a term that has generated a lot of excitement and, like a magnet, has pulled in lots of nearby subfields. The field we call data science is still relatively young, yet already too broad for an individual to be an expert in every corner of it. In my experience, the master-of-all-trades data science unicorn is a mythical beast. None of us can cover all the bases. So how are we to proceed?
数据科学是一个激起很多兴奋点的术语,像一块磁石一样,它吸取了许多附近的子领域。这个我们称为数据科学的领域还仍然年轻,然而领域太广了一个人很难是熟悉每一个角落的专家。以我的经验来说,掌握所有数据科学领域的独角兽就是一个神话。没有人可以覆盖所有这些基础。那么,我们如何前行?
There are two paths forward: generalist and specialist.
有两条路径:通才或专家。
A good
generalist
- is superficially familiar with every part of data science,
- recognizes all the jargon and technical terms,
- has a good notion of what tools and expertise are needed to solve a given problem, and
- asks insightful questions in technical reviews.
一个好的通才
- 肤浅地熟悉数据科学的每个领域
- 认识所有的行话和技术名词
- 对解决一个给定一个问题需要使用什么工具应用什么技能有好的想法
- 在技术审查的时候能够问得出有洞察力的问题
A good
specialist
- understands one area deeply,
- can explain their area of expertise to non-experts,
- understands the tradeoffs between different approaches,
- is up to date on current research and new tools, and
- can use their tools quickly to produce high-quality results.
一个好的专家
- 深入地懂一个领域
- 能够把他的领域的专门知识解释给非专家
- 理解不同方案的利弊
- 了解最新的研究和工具
- 能够快速地使用工具产生高质量的结果
A generalist does not necessarily know the details of how an algorithm works and the tricks of using a tool. They will tell you that data cleaning is critical, but may not be able to enumerate the trade-offs between methods for replacing missing values. They will tell you that Spark is a good way to speed up your computations, but may not be able to advise you on the best settings to use.
一个通才不需要知道一个算法工作的细节或使用一个工具的窍门。他们会告诉你整理数据是关键的,但是可能不会枚举出不同的替换缺失的数据值的方法的利弊。他们会告诉你Spark
是好的加速运算的工具,但是可能不能给你如何配置他的建议。
A specialist does not necessarily know much about something that is outside their area. They will know the best architecture for running a linear regression on 500 million data points, but may not be able to explain a naive Bayes classifier. They will keenly grasp the trade-offs between square loss, hinge loss and logistic loss, but may be unable to query data from a Hive table.
一个专家不需要知道他的领域之外的事。他们会知道在500
百万数据点上运行线性回归的最好架构,但可能不知道如何解释基础贝叶斯分类器。他们会敏锐地掌握平方损失,hinge loss
和logistic loss
,但是可能不能从Hive
表中查询数据。
Another way to describe generalists and specialists is “broad” versus “deep”. They are both technically savvy, but their expertise is distributed differently. We are all part generalist and part specialist. As you evolve through your career, you get to find the mixture that works best for you.
另一个区分通才和专家的方式是广度和深度。他们都懂技术,但是他们的专长的分布不同。我们都是一部分通才,一部分专家。随着你职业生涯的推进,你会发现最适合你的混合。
This distinction can be helpful when
hiring data scientists too. Asking specifically for research experience in deep neural networks or a background in financial data visualization will draw applicants that fit your needs more effectively than calling for a "full-stack" data scientist.
这种差别在雇佣一个数据科学家的时候也用得上。特别要求深度神经网络方面的研究经验或财务数据可视化的背景将会比“全栈数据科学家”更有效地吸引适合你的需求的应聘者。
How to prove that you are a real data scientist
如何证明你是一位真正的数据科学家
Traditionally we establish our qualification in a field with advanced degrees. Unfortunately for most of us, there are few such degrees available in data science. We have no piece of paper to use as a shield when someone questions our qualifications. So what do we do instead? How can we answer our critics, or interviewers, our colleagues, and harshest of all, the voices in our head?
传统上我们证明我们的在某一领域的资格通过高级的学历证书。不幸的是对我们大多数人来说,在数据科学领域很少有这样的学历整数。我们没有一张纸在别人质疑我们的资格的时候可以作为盾牌。那么我们该怎么做呢?我们怎样用我们头脑中的声音来回答我们的挑剔者、面试者和同事这些最难缠的人的问题呢?
Consider woodworking. Imagine that you want to install a custom cabinet in your kitchen. Three carpenters show up inquiring after the job. The first one presents you with a certificate. She says, “I apprenticed with the premier cabinet maker in the city for seven years.” The second opens her toolbox and says, “My chisels are of the latest design, and no one has a sharper plane.” The third hands you a small box, cherry-colored and perfectly smooth. When you pull the handle with a fingertip, a drawer slides out soundlessly. She says, “I made this.”
考虑木工活。设想你想在厨房里安装一个橱柜。工作过后三个木匠来接受质询。第一个给你看了一个整数。她说,“我师从这个城市做橱柜的第一名7
年”。第二个打开她的工具箱说:
“我的凿子是最新款,也没有人有比我的更锋利的刨子。”
第三个给你一个小盒子,樱桃色,非常光滑。当你用指尖拉把手,一个抽屉无声地滑出来。她说:
“这是我做的。”
Certifications, tools and portfolio are all popular ways for establishing credentials. I won’t argue that one is superior to another, but portfolios are particularly effective for data scientists. Certifications are few and not yet standardized. Listing algorithms and computer languages we have used doesn’t convey our depth of familiarity with them or what we can do with them.
Building things shows to a non-technical audience what we can do for them and demonstrates our expertise for technical interviewers and colleagues. Of course, this doesn’t guarantee that you’ll get a job on your first interview. But even if you don't, that’s normal.
Keep interviewing.
证书,工具和知识结构组合是流行的确立资格的方式。我不会说一种比令一种高级。但是知识结构组合对数据科学家来说尤为有效。证书很少还没有标准化。列出用过算法和计算机语言不能够传达我们对他们的熟悉程度。做东西向非技术观众展示我们能为他们做什么,也向同事和面试者证明我们的专长。当然,这不会担保你将会在第一次面试中获得工作。但如果没成功,也是正常的。继续面试。
How it feels to be a real data scientist
Note that both generalists and specialists have lots of things they don’t know. This means that even real data scientists will spend most of their days feeling lost. Our project lead will ask us questions that we don't know the answer to. Colleagues will talk comfortably about algorithms we've never heard of. Teammates will write code that we can't begin to decipher. Articles will cite "hot" subfields that we didn't know existed. Archiv papers will throw around equations that may as well be hieroglyphic gibberish. Interns will point out fundamental flaws in our reasoning. This is OK. You're not doing it wrong. This is OK.
要知道通才和专家都有很多不懂的。这意味着即使真正的数据科学家也有很多感觉失落的时候。我们的项目主管会问我们我们不能回答的问题。同事们会舒服地谈论我们没听过的算法。队友会写我们不知从何入手理解的代码。文章会引用热门的我们没听过的子领域。论文中会到处扔看起来像象形文字的乱语。实习生都会指出我们的推理的根本缺陷。这很正常。你没做错。这很正常。
Our goal isn’t to accumulate answers, but to ask better questions. If you are asking questions and using data to find answers, YOU ARE A DATA SCIENTIST. Period.
我们的目标不是积攒答案,而是问更好的问题。如果你正在问问题并且用数据在发现答案,你就是一位数据科学家。句号。
Brandon
August 22, 2017