03.19.05-test 绘制软件开发生态系统图

On the data team here at 堆栈溢出, we spend a lot of time and energy thinking about tech ecosystems and how technologies are related to each other。 We use these kinds of relationships all over the place, from making the user experience of everyone coming to 堆栈溢出 better by suggesting relevant content to helping our clients understand how to hire developers。 One way to get at this idea of relationships between technologies is 标签相关。 标签之间的相关性可衡量标签一起出现的频率(相对于它们单独出现的频率)。 你可以退房我的书的其中一章(与其他堆栈溢出数据科学家共同撰写戴夫·罗宾逊)进行更详细的讨论。

在一起vs分开

我们有许多数据源可用于衡量标签的相关性。 例如,堆栈溢出的工程经理Matt Sherman建立了一个测量标签在堆栈溢出问题上同时出现的频率的工具。 我们还可以使用流量数据,并查看用户访问标签对的频率。 不过,对于此分析,我将使用其他数据集; 我将使用“喜欢的标签”开发人员故事 here at 堆栈溢出。 If you haven’t made your own Developer Story or explored them, feel free to 签出我的。 Notice that I have some 堆栈溢出 tags that I’ve identified that I want to work 与 in my professional life; for me, it’s R, dplyr, ggplot2, Shiny, and so forth。 You can tell from those tags that I have a specific set of skills and do a certain kind of work (if you’re familiar 与 those technologies, anyway)。 There is similar signal in other 开发人员故事 here on 堆栈溢出, and we can use the distribution of these tags and how they are related to learn about how technologies are interrelated。 The reason I like using 开发人员故事 for this kind of tag analysis is that it is high signal-to-noise。 I am interested in how technologies are connected and how they are used together, and developers’ own descriptions of their work and careers is a great place to get that。

To start 与 here, we are just looking at which tags are used most often。 We see the usual suspects here, some of the most common languages used by developers today。

Next, let’s count up co-occurrences of tags and find which tags are commonly used together。 For example, what are the tags that occur most often 与 a few important languages like C#, C++, JavaScript, and Python on 开发人员故事?

请注意,这些仍然是我们在第一幅图中看到的许多相同,重要的通用语言。 开发人员通常在它们的开发人员故事中将Java,C等语言以及这四种重要语言与这四种语言一起使用,但它们只是最常见的技术。 为了探索标签的相关性,我们想问一个稍微不同的问题。 我们希望找到与该数据集中的其他标签相比更可能同时出现的标签。 哪些标签与这四种语言最相关?

我们现在看到了另一套技术。 这些标签很可能由开发人员在其开发人员故事中使用这四种语言,而不是其他标签,现在我们在堆栈溢出上使用开发人员的汇总数据来了解如何将技术一起使用。 例如,我们在这里看到有关开发人员如何使用Python的更多证据 both for data science along 与 R (another language used for data science), Pandas, and NumPy, as well as for web development 与 Django and Flask。 We are able to find these related technologies because we calculated 标签相关。

相关网络

我们不仅限于一次查看一个标签。 我们可以将此相关性计算扩展到更多标签,然后根据标签之间的相互关系建立标签网络。

在这种交互式网络可视化中(您可以缩放,滚动和单击),每个圆圈的大小代表该标签的使用频率。 带有较大圆圈的标签被更频繁地使用。 圆形根据其在整个网络中的子组成员资格来着色,该成员资格通过许多随机游走(集群步行陷阱)。 This network includes tags that are used more than 800 times on 开发人员故事 and have correlations greater than 0。1 与 other tags。

There is so much we can see by exploring this network! One thing we can notice is subgroups 与in the network that show us tech ecosystems, some of them densely interconnected。 We see some groups made up of:

  • Front-end web development technologies from HTML to JavaScript to Bootstrap
  • Microsoft-related technologies including C#, 。NET, and SQL Server
  • DevOps technologies like AWS and Docker (Go is in this cluster!)
  • Mobile technologies including Android and Objective-C

您使用的技术在哪里?它们如何连接? 您可以自己探索该网络; 网络数据结构可以作为Kaggle的数据集。 您可以查看我创建的Kaggle内核展示如何使用网络节点和链接来创建网络图。

Another thing we can notice in this network is that some technologies act as bridges between tech ecosystems。 Python, one of the most commonly used languages on 开发人员故事, connects to the front-end cluster (through Django), to a Linux/systems administration cluster, to a C/C++/embedded cluster, and to R and machine learning。 We see time and again Python变得多么独特在当今的技术环境中。 Java,git和JSON是连接该网络各部分的其他“桥接”技术。

This analysis used the liked tags on 开发人员故事 to explore the rich, complex network of technologies that we work 与in。 When developers share who we are as professionals in ways that we actually care about, like 与 the technologies we want to use, we can all learn more about the developer community。 You can 立即撰写您自己的开发人员故事 and highlight your career, interests, and what technologies you want to work 与。

from:https://stackoverflow.blog/2017/10/03/mapping-ecosystems-software-development/

你可能感兴趣的:(03.19.05-test 绘制软件开发生态系统图)