数据科学r语言_您应该为数据科学学习哪些语言?

数据科学r语言

Data science is an exciting field to work in, combining advanced statistical and quantitative skills with real-world programming ability. There are many potential programming languages that the aspiring data scientist might consider specializing in.

数据科学是一个令人兴奋的领域,它将先进的统计和定量技能与实际编程能力相结合。 有抱负的数据科学家可能会考虑采用许多潜在的编程语言。

While there is no correct answer, there are several things to take into consideration. Your success as a data scientist will depend on many points, including:

尽管没有正确的答案,但有几件事要考虑。 您作为数据科学家的成功取决于很多方面,包括:

Specificity

特异性

When it comes to advanced data science, you will only get so far reinventing the wheel each time. Learn to master the various packages and modules offered in your chosen language. The extent to which this is possible depends on what domain-specific packages are available to you in the first place!

当涉及到高级数据科学时,每次您都只能重新发明轮子。 学习掌握以您选择的语言提供的各种软件包和模块。 可能的程度首先取决于您可以使用哪些特定于域的软件包!

Generality

概论

A top data scientist will have good all-round programming skills as well as the ability to crunch numbers. Much of the day-to-day work in data science revolves around sourcing and processing raw data or ‘data cleaning’. For this, no amount of fancy machine learning packages are going to help.

一位顶尖的数据科学家将具有良好的全面编程技能以及处理数字的能力。 数据科学中的许多日常工作都围绕着采购和处理原始数据或“数据清理”。 为此,没有任何花哨的机器学习包会有所帮助。

Productivity

生产率

In the often fast-paced world of commercial data science, there is much to be said for getting the job done quickly. However, this is what enables technical debt to creep in — and only with sensible practices can this be minimized.

在通常快节奏的商业数据科学世界中,要快速完成工作有很多话要说。 但是,这正是技术债务蔓延的原因,只有明智的实践才能使这种债务减至最少。

Performance

性能

In some cases it is vital to optimize the performance of your code, especially when dealing with large volumes of mission-critical data. Compiled languages are typically much faster than interpreted ones; likewise statically typed languages are considerably more fail-proof than dynamically typed. The obvious trade-off is against productivity.

在某些情况下,优化代码的性能至关重要,尤其是在处理大量关键任务数据时。 编译语言通常比解释语言要快得多。 同样,静态类型的语言比动态类型的语言具有更好的防故障能力。 明显的权衡是不利于生产力。

To some extent, these can be seen as a pair of axes (Generality-Specificity, Performance-Productivity). Each of the languages below fall somewhere on these spectra.

在某种程度上,这些可以看作是一对轴(通用性,性能和生产率)。 下面的每种语言都属于这些频谱。

With these core principles in mind, let’s take a look at some of the more popular languages used in data science. What follows is a combination of research and personal experience of myself, friends and colleagues — but it is by no means definitive! In approximately order of popularity, here goes:

牢记这些核心原则,让我们看一下数据科学中使用的一些更流行的语言。 接下来是我自己,朋友和同事的研究和个人经验的结合,但这绝不是确定的! 按流行程度大致如下:

[R (R)

你需要知道的 (What you need to know)

Released in 1995 as a direct descendant of the older S programming language, R has since gone from strength to strength. Written in C, Fortran and itself, the project is currently supported by the R Foundation for Statistical Computing.

R作为较早的S编程语言的直接后代于1995年发布,此后R变得越来越强大。 该项目由C,Fortran及其本身编写,目前得到R统计计算基金会的支持。

执照 (License)

Free!

自由!

优点 (Pros)

  • Excellent range of high-quality, domain specific and open source packages. R has a package for almost every quantitative and statistical application imaginable. This includes neural networks, non-linear regression, phylogenetics, advanced plotting and many, many others.

    高质量,特定领域和开源软件包的优秀产品。 R提供了几乎所有可以想象的定量和统计应用程序的软件包。 这包括神经网络,非线性回归,系统发育,高级绘图以及许多其他功能。

  • The base installation comes with very comprehensive, in-built statistical functions and methods. R also handles matrix algebra particularly well.

    基本安装带有非常全面的内置统计功能和方法。 R还可以很好地处理矩阵代数。
  • Data visualization is a key strength with the use of libraries such as ggplot2.

    借助ggplot2之类的库,数据可视化是关键优势。

缺点 (Cons)

  • Performance. There’s no two ways about it, R is not a quick language.

    性能。 关于它,没有两种方法, R不是一种快速的语言 。

  • Domain specificity. R is fantastic for statistics and data science purposes. But less so for general purpose programming.

    域特异性。 对于统计和数据科学而言,R太棒了。 但是对于通用编程而言则更少。
  • Quirks. R has a few unusual features that might catch out programmers experienced with other languages. For instance: indexing from 1, using multiple assignment operators, unconventional data structures.

    怪癖。 R具有一些不寻常的功能,这些功能可能赶不上使用其他语言的程序员。 例如:使用多个赋值运算符从1开始索引,使用非常规数据结构。

裁决-“为设计目的而精采” (Verdict — “brilliant at what it’s designed for”)

R is a powerful language that excels at a huge variety of statistical and data visualization applications, and being open source allows for a very active community of contributors. Its recent growth in popularity is a testament to how effective it is at what it does.

R是一种功能强大的语言,擅长于各种统计和数据可视化应用程序,并且开源是一个非常活跃的贡献者社区。 它最近受欢迎程度的提高证明了它在工作中的有效性。

Python (Python)

你需要知道的 (What you need to know)

Guido van Rossum introduced Python back in 1991. It has since become an extremely popular general purpose language, and is widely used within the data science community. The major versions are currently 3.6 and 2.7.

Guido van Rossum于1991年引入Python。此后,Python成为一种非常流行的通用语言,并在数据科学界广泛使用。 当前的主要版本是3.6和2.7 。

执照 (License)

Free!

自由!

优点 (Pros)

  • Python is a very popular, mainstream general purpose programming language. It has an extensive range of purpose-built modules and community support. Many online services provide a Python API.

    Python是一种非常流行的主流通用编程语言。 它具有广泛的专用模块和社区支持。 许多在线服务都提供Python API。

  • Python is an easy language to learn. The low barrier to entry makes it an ideal first language for those new to programming.

    Python是一种易于学习的语言。 入门门槛低,使其成为编程新手的理想第一语言。
  • Packages such as pandas, scikit-learn and Tensorflow make Python a solid option for advanced machine learning applications.

    诸如pandas , scikit-learn和Tensorflow之类的软件包使Python成为高级机器学习应用程序的可靠选择。

缺点 (Cons)

  • Type safety: Python is a dynamically typed language, which means you must show due care. Type errors (such as passing a String as an argument to a method which expects an Integer) are to be expected from time-to-time.

    类型安全性:Python是一种动态类型化的语言,这意味着您必须格外小心。 有时会出现类型错误(例如将String作为参数传递给需要Integer的方法)。
  • For specific statistical and data analysis purposes, R’s vast range of packages gives it a slight edge over Python. For general purpose languages, there are faster and safer alternatives to Python.

    为了实现特定的统计和数据分析目的,R广泛的软件包使其与Python相比有一点优势。 对于通用语言,Python提供了更快,更安全的替代方法。

裁决-“优秀的全能选手” (Verdict — “excellent all-rounder”)

Python is a very good choice of language for data science, and not just at entry-level. Much of the data science process revolves around the ETL process (extraction-transformation-loading). This makes Python’s generality ideally suited. Libraries such as Google’s Tensorflow make Python a very exciting language to work in for machine learning.

Python是数据科学语言的很好选择,而不仅仅是入门级的语言。 许多数据科学过程都围绕ETL过程 (提取-转换-加载)进行。 这使得Python的通用性非常适合。 诸如Google的Tensorflow之类的库使Python成为一种非常激动人心的语言,可用于机器学习。

SQL (SQL)

你需要知道的 (What you need to know)

SQL (‘Structured Query Language’) defines, manages and queries relational databases. The language appeared by 1974 and has since undergone many implementations, but the core principles remain the same.

SQL (“结构化查询语言”)定义,管理和查询关系数据库 。 该语言于1974年问世,此后经历了许多实现,但是核心原理保持不变。

执照 (License)

Varies — some implementations are free, others proprietary

不同-有些实现是免费的,而另一些则是专有的

优点 (Pros)

  • Very efficient at querying, updating and manipulating relational databases.

    在查询,更新和操作关系数据库方面非常高效。
  • Declarative syntax makes SQL an often very readable language . There’s no ambiguity about what SELECT name FROM users WHERE age > 18 is supposed to do!

    声明式语法使SQL成为一种非常易读的语言。 对于SELECT name FROM users WHERE age > 18 SELECT name FROM users WHERE age >SELECT name FROM users WHERE age >应该做什么没有任何歧义!

  • SQL is very used across a range of applications, making it a very useful language to be familiar with. Modules such as SQLAlchemy make integrating SQL with other languages straightforward.

    SQL在各种应用程序中都非常常用,这使其成为一种非常有用的语言。 诸如SQLAlchemy之类的模块使SQL与其他语言的集成变得简单。

缺点 (Cons)

  • SQL’s analytical capabilities are rather limited — beyond aggregating and summing, counting and averaging data, your options are limited.

    SQL的分析功能非常有限-除了聚合,求和,计数和平均数据之外,您的选择也受到限制。
  • For programmers coming from an imperative background, SQL’s declarative syntax can present a learning curve.

    对于来自命令性背景的程序员而言,SQL的声明性语法可以显示学习曲线。
  • There are many different implementations of SQL such as PostgreSQL, SQLite, MariaDB . They are all different enough to make inter-operability something of a headache.

    SQL有许多不同的实现,例如PostgreSQL , SQLite , MariaDB 。 它们之间的差异足以使互操作性令人头疼。

裁决-“永恒而高效” (Verdict — “timeless and efficient”)

SQL is more useful as a data processing language than as an advanced analytical tool. Yet so much of the data science process hinges upon ETL, and SQL’s longevity and efficiency are proof that it is a very useful language for the modern data scientist to know.

SQL作为数据处理语言比作为高级分析工具更有用。 然而,这么多的数据科学过程都取决于ETL,而SQL的寿命和效率证明了它是现代数据科学家了解的非常有用的语言。

Java (Java)

你需要知道的 (What you need to know)

Java is an extremely popular, general purpose language which runs on the (JVM) Java Virtual Machine. It’s an abstract computing system that enables seamless portability between platforms. Currently supported by Oracle Corporation.

Java是一种非常流行的通用语言,可在(JVM)Java虚拟机上运行。 这是一个抽象的计算系统,可实现平台之间的无缝移植。 目前由Oracle Corporation支持。

执照 (License)

Version 8 — Free! Legacy versions, proprietary.

版本8 —免费! 旧版,专有。

优点 (Pros)

  • Ubiquity . Many modern systems and applications are built upon a Java back-end. The ability to integrate data science methods directly into the existing codebase is a powerful one to have.

    无处不在。 许多现代系统和应用程序都建立在Java后端上。 将数据科学方法直接集成到现有代码库中的能力非常强大。
  • Strongly typed. Java is no-nonsense when it comes to ensuring type safety. For mission-critical big data applications, this is invaluable.

    强类型。 在确保类型安全性方面,Java毫无疑问。 对于关键任务大数据应用程序来说,这是无价的。
  • Java is a high-performance, general purpose, compiled language . This makes it suitable for writing efficient ETL production code and computationally intensive machine learning algorithms.

    Java是一种高性能的通用编译语言。 这使其适合编写高效的ETL生产代码和计算密集型机器学习算法。

缺点 (Cons)

  • For ad-hoc analyses and more dedicated statistical applications, Java’s verbosity makes it an unlikely first choice. Dynamically typed scripting languages such as R and Python lend themselves to much greater productivity.

    对于临时分析和更专用的统计应用程序,Java的冗长性使其成为不太可能的首选。 动态类型的脚本语言(例如R和Python)可提高生产力。
  • Compared to domain-specific languages like R, there aren’t a great number of libraries available for advanced statistical methods in Java.

    与R之类的领域特定语言相比,Java中没有太多可用于高级统计方法的库。

裁决-“数据科学的有力竞争者” (Verdict — “a serious contender for data science”)

There is a lot to be said for learning Java as a first choice data science language. Many companies will appreciate the ability to seamlessly integrate data science production code directly into their existing codebase, and you will find Java’s performance and and type safety are real advantages.

学习Java作为首选的数据科学语言有很多话要说。 许多公司将欣赏将数据科学生产代码直接无缝集成到其现有代码库中的能力,并且您会发现Java的性能和类型安全是真正的优势。

However, you’ll be without the range of stats-specific packages available to other languages. That said, definitely one to consider — especially if you already know one of R and/or Python.

但是,您将没有其他语言可用的特定于统计信息的软件包范围。 就是说,绝对要考虑的一个-特别是如果您已经了解R和/或Python之一。

Scala (Scala)

你需要知道的 (What you need to know)

Developed by Martin Odersky and released in 2004, Scala is a language which runs on the JVM. It is a multi-paradigm language, enabling both object-oriented and functional approaches. Cluster computing framework Apache Spark is written in Scala.

Scala由Martin Odersky开发并于2004年发布,是一种在JVM上运行的语言。 它是一种多范式语言,支持面向对象的方法和功能方法。 集群计算框架Apache Spark用Scala编写。

执照 (License)

Free!

自由!

优点 (Pros)

  • Scala + Spark = High performance cluster computing. Scala is an ideal choice of language for those working with high-volume data sets.

    Scala + Spark =高性能集群计算。 对于使用大量数据集的人员来说,Scala是理想的语言选择。
  • Multi-paradigmatic: Scala programmers can have the best of both worlds. Both object-oriented and functional programming paradigms available to them.

    多范式:Scala程序员可以兼得两者。 面向对象和功能编程范例都可以使用。
  • Scala is compiled to Java bytecode and runs on a JVM. This allows inter-operability with the Java language itself, making Scala a very powerful general purpose language, while also being well-suited for data science.

    Scala被编译为Java字节码,并在JVM上运行。 这允许与Java语言本身进行互操作,从而使Scala成为功能非常强大的通用语言,同时也非常适合数据科学。

缺点 (Cons)

  • Scala is not a straightforward language to get up and running with if you’re just starting out. Your best bet is to download sbt and set up an IDE such as Eclipse or IntelliJ with a specific Scala plug-in.

    如果您刚入门,Scala不是一种简单易用的语言。 最好的选择是下载sbt并使用特定的Scala插件设置IDE(例如Eclipse或IntelliJ)。

  • The syntax and type system are often described as complex. This makes for a steep learning curve for those coming from dynamic languages such as Python.

    语法和类型系统通常被描述为复杂的。 这为那些来自动态语言(例如Python)的人提供了陡峭的学习曲线。

裁决-“完美,适用于大数据” (Verdict — “perfect, for suitably big data”)

When it comes to using cluster computing to work with Big Data, then Scala + Spark are fantastic solutions. If you have experience with Java and other statically typed languages, you’ll appreciate these features of Scala too.

在使用群集计算与大数据一起使用时,Scala + Spark是绝佳的解决方案。 如果您有Java和其他静态类型语言的使用经验,那么您也会喜欢Scala的这些功能。

Yet if your application doesn’t deal with the volumes of data that justify the added complexity of Scala, you will likely find your productivity being much higher using other languages such as R or Python.

但是,如果您的应用程序不处理足以证明Scala增加了复杂性的数据量,那么使用R或Python等其他语言可能会发现您的生产力要高得多。

朱莉亚 (Julia)

你需要知道的 (What you need to know)

Released just over 5 years ago, Julia has made an impression in the world of numerical computing. Its profile was raised thanks to early adoption by several major organizations including many in the finance industry.

Julia(Julia)发布于5年前,在数值计算领域给人留下了深刻的印象。 由于包括金融业在内的数个主要组织的早期采用,提高了它的形象。

执照 (License)

Free!

自由!

优点 (Pros)

  • Julia is a JIT (‘just-in-time’) compiled language, which lets it offer good performance. It also offers the simplicity, dynamic-typing and scripting capabilities of an interpreted language like Python.

    Julia是一种JIT(“及时”)编译语言,可提供良好的性能。 它还提供了像Python这样的解释语言的简单性,动态键入和脚本编写功能。
  • Julia was purpose-designed for numerical analysis. It is capable of general purpose programming as well.

    Julia是专为数字分析而设计的。 它也能够进行通用编程。
  • Readability. Many users of the language cite this as a key advantage

    可读性。 许多使用该语言的用户都将其作为主要优势

缺点 (Cons)

  • Maturity. As a new language, some Julia users have experienced instability when using packages. But the core language itself is reportedly stable enough for production use.

    到期。 作为一种新语言,一些Julia用户在使用软件包时会遇到不稳定的情况。 但是据报道,核心语言本身已经足够稳定,可供生产使用。
  • Limited packages are another consequence of the language’s youthfulness and small development community. Unlike long-established R and Python, Julia doesn’t have the choice of packages (yet).

    有限的软件包是该语言的年轻化和小型开发社区的另一个结果。 与历史悠久的R和Python不同,Julia尚未选择软件包。

裁决-“一个为未来” (Verdict — “one for the future”)

The main issue with Julia is one that cannot be blamed for. As a recently developed language, it isn’t as mature or production-ready as its main alternatives Python and R.

Julia的主要问题是不能责怪的。 作为一种新近开发的语言,它不像其主要替代品Python和R那样成熟或可以投入生产。

But, if you are willing to be patient, there’s every reason to pay close attention as the language evolves in the coming years.

但是,如果您愿意耐心等待,那么随着语言在未来几年的发展,我们有充分的理由要密切注意。

的MATLAB (MATLAB)

你需要知道的 (What you need to know)

MATLAB is an established numerical computing language used throughout academia and industry. It is developed and licensed by MathWorks, a company established in 1984 to commercialize the software.

MATLAB是一种在学术界和行业中广泛使用的已建立的数值计算语言。 它由MathWorks开发并获得许可,该公司成立于1984年,旨在将该软件商业化。

执照 (License)

Proprietary — pricing varies depending on your use case

专有-定价因您的用例而异

优点 (Pros)

  • Designed for numerical computing. MATLAB is well-suited for quantitative applications with sophisticated mathematical requirements such as signal processing, Fourier transforms, matrix algebra and image processing.

    专为数值计算而设计。 MATLAB非常适合具有复杂数学要求的定量应用,例如信号处理,傅立叶变换,矩阵代数和图像处理。
  • Data Visualization. MATLAB has some great inbuilt plotting capabilities.

    数据可视化。 MATLAB具有一些出色的内置绘图功能。
  • MATLAB is often taught as part of many undergraduate courses in quantitative subjects such as Physics, Engineering and Applied Mathematics. As a consequence, it is widely used within these fields.

    在许多物理,工程和应用数学等定量学科的本科课程中,经常将MATLAB授课。 结果,它在这些领域中被广泛使用。

缺点 (Cons)

  • Proprietary licence. Depending on your use-case (academic, personal or enterprise) you may have to fork out for a pricey licence. There are free alternatives available such as Octave. This is something you should give real consideration to.

    专有许可证。 根据您的用例(学术,个人或企业),您可能需要为获得昂贵的许可证付出代价。 有免费的替代方法,例如Octave 。 这是您应该真正考虑的事情。

  • MATLAB isn’t an obvious choice for general-purpose programming.

    对于通用编程,MATLAB不是一个明显的选择。

裁决-“最适合数学密集型应用程序” (Verdict — “best for mathematically intensive applications”)

MATLAB’s widespread use in a range of quantitative and numerical fields throughout industry and academia makes it a serious option for data science.

MATLAB在整个行业和学术界在定量和数值领域的广泛使用使其成为数据科学的重要选择。

The clear use-case would be when your application or day-to-day role requires intensive, advanced mathematical functionality. Indeed, MATLAB was specifically designed for this.

明确的用例是您的应用程序或日常角色需要密集的高级数学功能时。 实际上,MATLAB是为此专门设计的。

其他语言 (Other Languages)

There are other mainstream languages that may or may not be of interest to data scientists. This section provides a quick overview… with plenty of room for debate of course!

还有其他主流语言可能会或可能不会对数据科学家感兴趣。 本节提供快速概述…当然还有足够的讨论空间!

C ++ (C++)

C++ is not a common choice for data science, although it has lightning fast performance and widespread mainstream popularity. The simple reason may be a question of productivity versus performance.

尽管C ++具有闪电般的快速性能和广泛的主流流行度,但它并不是数据科学的常见选择。 简单的原因可能是生产率与性能的问题。

As one Quora user puts it:

正如Quora的一位用户所说 :

“If you’re writing code to do some ad-hoc analysis that will probably only be run one time, would you rather spend 30 minutes writing a program that will run in 10 seconds, or 10 minutes writing a program that will run in 1 minute?”

“如果您正在编写代码以进行可能仅运行一次的临时分析,您宁愿花30分钟编写将在10秒内运行的程序,还是花10分钟编写将在1秒内运行的程序?分钟?”

The dude’s got a point. Yet for serious production-level performance, C++ would be an excellent choice for implementing machine learning algorithms optimized at a low-level.

花花公子的观点。 但是对于严重的生产级性能,C ++将是实现在低级优化的机器学习算法的绝佳选择。

Verdict — “not for day-to-day work, but if performance is critical…”

裁决-“不是日常工作,但如果绩效至关重要……”

JavaScript (JavaScript)

With the rise of Node.js in recent years, JavaScript has become more and more a serious server-side language. However, its use in data science and machine learning domains has been limited to date (although checkout brain.js and synaptic.js!). It suffers from the following disadvantages:

近年来,随着Node.js的兴起, JavaScript越来越成为一种严肃的服务器端语言。 但是,它在数据科学和机器学习领域中的使用迄今受到限制(尽管结帐brain.js和synaptic.js !)。 它具有以下缺点:

  • Late to the game (Node.js is only 8 years old!), meaning…

    游戏晚了(Node.js才8岁!),这意味着…
  • Few relevant data science libraries and modules are available. This means no real mainstream interest or momentum

    很少有相关的数据科学库和模块可用。 这意味着没有真正的主流兴趣或动力
  • Performance-wise, Node.js is quick. But JavaScript as a language is not without its critics.

    在性能方面,Node.js很快。 但是JavaScript作为一种语言并不是没有批评者的 。

Node’s strengths are in asynchronous I/O, its widespread use and the existence of languages which compile to JavaScript. So it’s conceivable that a useful framework for data science and realtime ETL processing could come together.

Node的优势在于异步I / O,其广泛使用以及可编译为JavaScript的语言的存在。 因此可以想象,一个有用的数据科学框架和实时ETL处理可以融合在一起。

The key question is whether this would offer anything different to what already exists.

关键问题是,这是否会提供与已经存在的不同的东西。

Verdict — “there is much to do before JavaScript can be taken as a serious data science language”

结论—“在将JavaScript视为一种严肃的数据科学语言之前,还有很多事情要做”

Perl (Perl)

Perl is known as a ‘Swiss-army knife of programming languages’, due to its versatility as a general-purpose scripting language. It shares a lot in common with Python, being a dynamically typed scripting language. But, it has not seen anything like the popularity Python has in the field of data science.

Perl因其作为通用脚本语言的多功能性而被称为“编程语言的瑞士军刀”。 它是动态类型化的脚本语言,与Python有很多共同点。 但是,它没有像Python在数据科学领域那样受欢迎。

This is a little surprising, given its use in quantitative fields such as bioinformatics. Perl has several key disadvantages when it comes to data science. It isn’t stand-out fast, and its syntax is famously unfriendly. There hasn’t been the same drive towards developing data science specific libraries. And in any field, momentum is key.

考虑到它在生物信息学等定量领域的使用,这有点令人惊讶。 在数据科学方面,Perl有几个关键的缺点。 它不是很快就脱颖而出,并且其语法众所周知是不友好的 。 开发特定于数据科学的库的驱动力不同。 在任何领域,动力都是关键。

Verdict — “a useful general purpose scripting language, yet it offers no real advantages for your data science CV”

判决-“一种有用的通用脚本语言,但对于您的数据科学CV并没有任何真正的优势”

Ruby (Ruby)

Ruby is another general purpose, dynamically typed interpreted language. Yet it also hasn’t seen the same adoption for data science as has Python.

Ruby是另一种通用的动态类型的解释语言。 但是,它还没有像Python那样被数据科学采用。

This might seem surprising, but is likely a result of Python’s dominance in academia, and a positive feedback effect . The more people use Python, the more modules and frameworks are developed, and the more people will turn to Python.

这似乎令人惊讶,但很可能是Python在学术界的统治地位以及积极的反馈效应的结果。 使用Python的人越多,开发的模块和框架就越多,并且使用Python的人也就越多。

The SciRuby project exists to bring scientific computing functionality, such as matrix algebra, to Ruby. But for the time being, Python still leads the way.

存在SciRuby项目是为了将科学计算功能(例如矩阵代数)引入Ruby。 但是就目前而言,Python仍然处于领先地位。

Verdict — “not an obvious choice yet for data science, but won’t harm the CV”

判决-“对于数据科学而言,尚不是一个显而易见的选择,但不会损害简历”

结论 (Conclusion)

Well, there you have it — a quickfire guide to which languages to consider for data science. The key here is to understand your usage requirements in terms of generality vs specificity, as well as your personal preferred development style of performance vs productivity.

好了,您已掌握了它-速成指南,可为数据科学考虑使用哪些语言。 这里的关键是要从通用性和特异性方面了解您的使用要求,以及您个人偏爱的性能与生产力开发风格。

I use R, Python and SQL on a regular basis, as my current role largely focuses on developing existing data pipeline and ETL processes. These languages give the right balance of generality and productivity to do the job, with the option of using R’s more advanced statistics packages when needed.

我经常使用R,Python和SQL,因为我目前的职责主要集中在开发现有的数据管道和ETL流程上。 这些语言可以在通用性和生产率之间实现适当的平衡,并在需要时可以选择使用R的高级统计软件包。

However — you may already have some experience with Java. Or you may want to use Scala for big data. Or, perhaps you’re keen to get involved with the Julia project.

但是,您可能已经对Java有一定的经验。 或者您可能想将Scala用于大数据。 或者,也许您渴望参与Julia项目。

Maybe you learned MATLAB at university, or want to give SciRuby a chance? Perhaps you have an altogether different suggestion. If so, please leave a reply below — I look forward to hearing from you!

也许您是在大学学习过MATLAB的,还是想给SciRuby一个机会? 也许您有完全不同的建议。 如果是这样,请在下面留下答复-我期待您的来信!

Thanks for reading!

谢谢阅读!

翻译自: https://www.freecodecamp.org/news/which-languages-should-you-learn-for-data-science-e806ba55a81f/

数据科学r语言

你可能感兴趣的:(大数据,编程语言,python,机器学习,人工智能)