源数据和数据源_这些是任何人都可以使用的最佳免费开放数据源

源数据和数据源

by Hiren Patel

希伦·帕特尔(Hiren Patel)

什么是开放数据? (What is Open Data?)

In simple terms, Open Data means the kind of data which is open for anyone and everyone for access, modification, reuse, and sharing.

简而言之,“ 开放数据”是指对任何人和所有人开放以供访问,修改,重用和共享的数据类型。

Open Data derives its base from various “open movements” such as open source, open hardware, open government, open science etc.

开放数据源于各种“开放运动”,例如开放源代码,开放硬件,开放政府,开放科学等。

Governments, independent organizations, and agencies have come forward to open the floodgates of data to create more and more open data for free and easy access.

各国政府,独立组织和机构已经挺身而出,打开数据的闸门,以创建越来越多的开放数据,以供免费和轻松访问。

为什么开放数据很重要? (Why Is Open Data Important?)

Open data is important because the world has grown increasingly data-driven. But if there are restrictions on the access and use of data, the idea of data-driven business and governance will not be materialized.

开放数据非常重要,因为世界越来越以数据为驱动力。 但是,如果对数据的访问和使用有限制,那么数据驱动型业务和治理的想法将无法实现。

Therefore, open data has its own unique place. It can allow a fuller understanding of the global problems and universal issues. It can give a big boost to businesses. It can be a great impetus for machine learning. It can help fight global problems such as disease or crime or famine. Open data can empower citizens and hence can strengthen democracy. It can streamline the processes and systems that the society and governments have built. It can help transform the way we understand and engage with the world.

因此,开放数据有其独特的位置。 它可以使人们对全球问题和普遍问题有更全面的了解。 它可以极大地促进企业发展。 这可能是机器学习的强大动力。 它可以帮助解决疾病,犯罪或饥荒等全球性问题。 开放数据可以增强公民权能,因此可以加强民主。 它可以简化社会和政府建立的流程和系统。 它可以帮助改变我们理解和与世界互动的方式。

So here’s my list of 15 awesome Open Data sources:

因此,这是我列出的15个很棒的开放数据源的清单:

1. 世界银行公开数据 (1. World Bank Open Data)

As a repository of the world’s most comprehensive data regarding what’s happening in different countries across the world, World Bank Open Data is a vital source of Open Data. It also provides access to other datasets as well which are mentioned in the data catalog.

作为有关世界不同国家正在发生的事情的全球最全面数据的存储库,世界银行开放数据是开放数据的重要来源。 它还提供对数据目录中提到的其他数据集的访问。

World Bank Open Data is massive because it has got 3000 datasets and 14000 indicators encompassing microdata, time series statistics, and geospatial data.

世界银行开放数据之所以庞大,是因为它拥有3000个数据集和14000个指标,其中包括微数据,时间序列统计信息和地理空间数据。

Accessing and discovering the data you want is also quite easy. All you need to do is to specify the indicator names, countries or topics and it will open up the treasure-house of Open Data for you. It also allows you to download data in different formats such as CSV, Excel, and XML.

访问和发现所需的数据也非常容易。 您所需要做的就是指定指标名称,国家或主题,这将为您打开开放数据的宝库。 它还允许您下载不同格式的数据,例如CSV,Excel和XML。

If you are a journalist or academic, you will be enthralled by the array of tools available to you. You can get access to analysis and visualization tools that can bolster your research. It can felicitate a deeper and better understanding of global problems.

如果您是新闻工作者或学术界人士,那么您将被一系列可用的工具所吸引。 您可以访问可以增强您的研究的分析和可视化工具。 它可以促进对全球问题的更深入和更好的理解。

You can get access to the API which can help you create the data visualizations you need, live combinations with other data sources and many more such features.

您可以访问API,该API可以帮助您创建所需的数据可视化,与其他数据源的实时组合以及更多此类功能。

Therefore, it’s no surprise that World Bank Open Data tops any list of Open Data sources!

因此,世界银行开放数据在开放数据源的任何列表中居于首位也就不足为奇了!

2. 世卫组织(世界卫生组织)—开放数据仓库 (2. WHO (World Health Organization) — Open data repository)

WHO’s Open Data repository is how WHO keeps track of health-specific statistics of its 194 Member States.

世卫组织的开放数据存储库是世卫组织跟踪其194个会员国特定于健康的统计数据的方式。

The repository keeps the data systematically organized. It can be accessed as per different needs. For instance, whether it is mortality or burden of diseases, one can access data classified under 100 or more categories such as the Millennium Development Goals (child nutrition, child health, maternal and reproductive health, immunization, HIV/AIDS, tuberculosis, malaria, neglected diseases, water and sanitation), non communicable diseases and risk factors, epidemic-prone diseases, health systems, environmental health, violence and injuries, equity etc.

该存储库可以系统地组织数据。 可以根据不同需求进行访问。 例如,无论是死亡还是疾病负担,人们都可以访问100类或更多类别的数据,例如千年发展目标(儿童营养,儿童健康,孕产妇和生殖健康,免疫,艾滋病毒/艾滋病,结核病,疟疾,被忽视的疾病,水和卫生设施),非传染性疾病和危险因素,易流行的疾病,卫生系统,环境健康,暴力和伤害,公平等。

For your specific needs, you can go through the datasets according to themes, category, indicator, and country.

根据您的特定需求,您可以根据主题,类别,指标和国家/地区浏览数据集。

The good thing is that it is possible to download whatever data you need in Excel Format. You can also monitor and analyze data by making use of its data portal.

好处是可以以Excel格式下载所需的任何数据。 您还可以通过其数据门户监视和分析数据。

The API to the World Health Organization’s data and statistics content is also available.

也可以使用世界卫生组织的数据和统计内容的API。

3. Google Public Data Explorer (3. Google Public Data Explorer)

Launched in 2010, Google Public Data Explorer can help you explore vast amounts of public-interest datasets. You can visualize and communicate the data for your respective uses.

Google公共数据资源管理器于2010年启动,可帮助您探索大量的公共利益数据集。 您可以可视化并交流数据以供各自使用。

It makes the data from different agencies and sources available. For instance, you can access data from World Bank, U. S. Bureau of Labor Statistics and U.S. Bureau, OECD, IMF, and others.

它使来自不同机构和来源的数据可用。 例如,您可以访问来自世界银行,美国劳工统计局和美国局,经合组织,国际货币基金组织等的数据。

Different stakeholders access this data for a variety of purposes. Whether you are a student or a journalist, whether you are a policy maker or an academic, you can leverage this tool in order to create visualizations of public data.

不同的利益相关者出于各种目的访问此数据。 无论您是学生还是新闻工作者,无论您是决策者还是学者,都可以利用此工具来创建公共数据的可视化。

You can deploy various ways of representing the data such as line graphs, bar graphs, maps and bubble charts with the help of Data Explorer.

您可以借助数据资源管理器部署各种表示数据的方式,例如折线图,条形图,地图和气泡图。

The best part is that you would find these visualizations quite dynamic. It means that you will see them change over time. You can change topics, focus on different entries and modify the scale.

最好的部分是您会发现这些可视化非常动态。 这意味着您将看到它们随时间变化。 您可以更改主题,关注不同的条目并修改比例。

It is easily shareable too. As soon as you get the chart ready, you can embed it on your website or blog or simply share a link with your friends.

它也很容易共享。 一旦您准备好图表,就可以将其嵌入到您的网站或博客中,或者简单地与您的朋友共享链接。

4. 在AWS(RODA)上注册开放数据 (4. Registry of Open Data on AWS (RODA))

This is a repository containing public datasets. It is data which is available from AWS resources.

这是一个包含公共数据集的存储库。 它是可从AWS资源中获得的数据。

As far as RODA is concerned, you can discover and share the data which is publicly available.

就RODA而言,您可以发现和共享公开可用的数据。

In RODA, you can use keywords and tags for common types of data such as genomic, satellite imagery and transportation in order to search whatever data that you are looking for. All of this is possible on a simple web interface.

在RODA中,可以将关键字和标签用于常见的数据类型,例如基因组,卫星图像和运输,以搜索所需的数据。 所有这些都可以在简单的Web界面上实现。

For every dataset, you will discover detail page, usage examples, license information and tutorials or applications that use this data.

对于每个数据集,您将发现详细信息页面,用法示例,许可信息以及使用此数据的教程或应用程序。

By making use of a broad range of compute and data analytics products, you can analyze the open data and build whatever services you want.

通过使用各种计算和数据分析产品,您可以分析开放数据并构建所需的任何服务。

While the data you access is available through AWS resources, you need to bear in mind that it is not provided by AWS. This data belongs to different agencies, government organizations, researchers, businesses and individuals.

尽管您可以通过AWS资源访问您访问的数据,但请记住,它不是由AWS提供的。 此数据属于不同的机构,政府组织,研究人员,企业和个人。

5. 欧盟开放数据门户 (5. European Union Open Data Portal)

You can access whatever open data EU institutions, agencies and other organizations publish on a single platform namely European Union Open Data Portal.

您可以访问欧盟机构,机构和其他组织在单一平台(即欧盟开放数据门户)上发布的所有开放数据。

The EU Open Data Portal is home to vital open data pertaining to EU policy domains. These policy domains include economy, employment, science, environment, and education.

欧盟开放数据门户网站是与欧盟政策领域相关的重要开放数据的所在地。 这些政策领域包括经济,就业,科学,环境和教育。

Around 70 EU institutions, organizations or departments such as Eurostat, the European Environment Agency, the Joint Research Centre and other European Commission Directorates General and EU Agencies have made their datasets public and allowed access. These datasets have crossed the number of 11700 till date.

大约70个欧盟机构,组织或部门,例如欧盟统计局(Eurostat),欧洲环境署,联合研究中心以及其他欧盟委员会总局和欧盟机构已将其数据集公开并允许访问。 迄今为止,这些数据集的数量已超过11700。

The portal enables easy access. You can easily search, explore, link, download and reuse the data through a catalog of common metadata. You can do so for your specific purposes. It could be commercial or non-commercial purposes.

门户使访问变得容易。 您可以通过常见的元数据目录轻松地搜索,浏览,链接,下载和重用数据。 您可以根据自己的特定目的进行操作。 它可以是商业目的,也可以是非商业目的。

You can search the metadata catalog through an interactive search engine (Data tab) and SPARQL queries (Linked data tab).

您可以通过交互式搜索引擎(“数据”选项卡)和SPARQL查询(“链接的数据”选项卡)搜索元数据目录。

By making use of this catalog, you can gain access to the data stored on the different websites of the EU institutions, agencies and organizations.

通过使用此目录,您可以访问存储在欧盟机构,机构和组织的不同网站上的数据。

6. 五十八 (6. FiveThirtyEight)

It is a great site for data-driven journalism and story-telling.

这是一个以数据为驱动的新闻和故事讲述的好网站。

It provides its various sources of data for a variety of sectors such as politics, sports, science, economics etc. You can download the data as well.

它为政治,体育,科学,经济学等各个领域提供各种数据源。您也可以下载数据。

When you access the data, you will come across a brief explanation regarding each dataset with respect to its source. You will also get to know what it stands for and how to use it.

访问数据时,您会遇到关于每个数据集及其来源的简短说明。 您还将了解它代表什么以及如何使用它。

In order to render this data user-friendly, it provides datasets in as simple, non-proprietary formats such as CSV files as possible. Needless to say, these formats can be easily accessed and processed by humans as well as machines.

为了使此数据易于使用,它以尽可能简单,非专有的格式(例如CSV文件)提供数据集。 不用说,人类和机器都可以轻松访问和处理这些格式。

With the help of these datasets, you can create stories and visualizations as per your own requirements and preference.

借助这些数据集,您可以根据自己的要求和偏好创建故事和可视化文件。

7. 美国人口普查局 (7. U.S. Census Bureau)

U.S. Census Bureau is the biggest statistical agency of the federal government. It stores and provides reliable facts and data regarding people, places, and economy of America.

美国人口普查局是联邦政府最大的统计机构。 它存储并提供有关美国人,地方和经济的可靠事实和数据。

The Census Bureau considers its noble mission to extend its services as the most reliable provider of quality data.

人口普查局认为其扩展服务的崇高使命是最可靠的质量数据提供者。

Whether it is a federal, state, local or tribal government, all of them make use of census data for a variety of purposes. These governments use this data to determine the location of new housing and public facilities. They also make use of it at the time of examining the demographic characteristics of communities, states, and the USA.

无论是联邦政府,州政府,地方政府还是部落政府,他们都出于各种目的使用普查数据。 这些政府使用这些数据来确定新房屋和公共设施的位置。 他们在检查社区,州和美国的人口统计学特征时也会使用它。

This data is also made use of in planning of transportation systems and roadways. When it comes to deciding quotas and creating police and fire precincts, this data comes in handy. When governments create localized areas of elections, schools, utilities etc, they make use of this data. It is a practice to compile population information once a decade and this data are quite useful in accomplishing the same.

此数据也用于运输系统和道路的规划中。 在确定配额以及创建警察和消防区时,此数据非常有用。 当政府创建选举,学校,公用事业等的本地化区域时,它们将使用此数据。 十年一次汇编人口信息是一种惯例,这些数据对于完成人口信息非常有用。

There are various tools such as American Fact Finder, Census Data Explorer and Quick Facts which are useful in case you want to search, customize and visualize data.

有多种工具,例如American Fact Finder,Census Data Explorer和Quick Facts,在您想要搜索,自定义和可视化数据时非常有用。

For instance, Quick Facts alone contains statistics for all the states, counties, cities and even towns with a population of 5000 or more.

例如,仅《事实》便包含所有州,县,城市甚至人口超过5000的城镇的统计信息。

Likewise, American Fact Finder can help you discover popular facts such as population, income etc. It provides information that is frequently requested.

同样,American Fact Finder可以帮助您发现流行的事实,例如人口,收入等。它提供了经常需要的信息。

The good thing is that you can search, interact with the data, get to know about popular statistics and see the related charts through Census Data Explorer. Moreover, you can also use visual tool to customize data on an interactive maps experience.

好处是,您可以通过Census Data Explorer搜索,与数据进行交互,了解流行的统计信息并查看相关的图表。 此外,您还可以使用可视化工具来自定义交互式地图体验中的数据。

8. Data.gov (8. Data.gov)

Data.gov is the treasure-house of US government’s open data. It was only recently that the decision was made to make all government data available for free.

Data.gov是美国政府开放数据的宝库。 直到最近才决定免费提供所有政府数据。

When it was launched, there were only 47. There are now 180,000 datasets.

当它启动时,只有47个。现在有180,000个数据集。

Why Data.gov is a great resource is because you can find data, tools, and resources that you can deploy for a variety of purposes. You can conduct your research, develop your web and mobile applications and even design data visualizations.

之所以将Data.gov用作强大的资源,是因为您可以找到可以部署用于各种目的的数据,工具和资源。 您可以进行研究,开发Web和移动应用程序,甚至设计数据可视化。

All you need to do is enter keywords in the search box and browse through types, tags, formats, groups, organization types, organizations, and categories. This will facilitate easy access to data or datasets that you need.

您需要做的就是在搜索框中输入关键字,然后浏览类型,标签,格式,组,组织类型,组织和类别。 这将有助于轻松访问所需的数据或数据集。

Data.gov follows the Project Open Data Schema — a set of requisite fields (Title, Description, Tags, Last Update, Publisher, Contact Name, etc.) for every data set displayed on Data.gov.

Data.gov遵循项目开放数据架构— Data.gov上显示的每个数据集的一组必填字段(标题,描述,标签,最新更新,发布者,联系人姓名等)。

9. DBpedia (9. DBpedia)

As you know, Wikipedia is a great source of information. DBpedia aims at getting structured content from the valuable information that Wikipedia created.

如您所知,维基百科是一个很好的信息来源。 DBpedia旨在从Wikipedia创建的有价值的信息中获取结构化内容。

With DBpedia, you can semantically search and explore relationships and properties of Wikipedia resource. This includes links to other related datasets as well.

使用DBpedia,您可以在语义上搜索和探索Wikipedia资源的关系和属性。 这也包括到其他相关数据集的链接。

There are around 4.58 million entities in the DBpedia dataset. 4.22 million are classified in ontology, including 1,445,000 persons, 735,000 places, 123,000 music albums, 87,000 films, 19,000 video games, 241,000 organizations, 251,000 species and 6,000 diseases.

DBpedia数据集中大约有458万个实体。 本体中有422万种,包括1,445,000人,735,000个位置,123,000个音乐专辑,87,000个电影,19,000个视频游戏,241,000个组织,251,000种和6,000种疾病。

There are labels and abstracts for these entities in around 125 languages. There are 25.2 million links to images. There are 29.8 million links to external web pages.

这些实体有大约125种语言的标签和摘要。 有2520万个图像链接。 有2980万个指向外部网页的链接。

All you need to do in order to use DBpedia is write SPARQL queries against endpoint or by downloading their dumps.

要使用DBpedia,您需要做的就是针对端点编写SPARQL查询或通过下载其转储。

DBpedia has benefitted several enterprises, such as Apple (via Siri), Google (via Freebase and Google Knowledge Graph), and IBM (via Watson), and particularly their respective prestigious projects associated with artificial intelligence.

DBpedia使数家企业受益,例如Apple(通过Siri),Google(通过Freebase和Google Knowledge Graph)和IBM(通过Watson),特别是与人工智能相关的著名项目。

10. freeCodeCamp打开数据 (10. freeCodeCamp Open Data)

It is an open source community. Why it matters is because it enables you to code, build pro bono projects after nonprofits and grab a job as a developer.

这是一个开源社区。 之所以如此重要,是因为它使您能够编码,在非营利组织之后建立公益项目并获得开发人员的职位。

In order to make this happen, the freeCodeCamp.org community makes available enormous amounts of data every month. They have turned it into open data.

为了实现这一目标,freeCodeCamp.org社区每月都会提供大量数据。 他们已将其转换为开放数据。

You will find a variety of things in this repository. You can find datasets, analysis of the same and even demos of projects based on the freeCodeCamp data. You can also find links to external projects involving the freeCodeCamp data.

您将在此存储库中找到各种东西。 您可以基于freeCodeCamp数据查找数据集,对项目的相同甚至演示进行分析。 您还可以找到涉及freeCodeCamp数据的外部项目的链接。

It can help you with a diversity of projects and tasks that you may have in mind. Whether it is web analytics, social media analytics, social network analysis, education analysis, data visualization, data-driven web development or bots, the data offered by this community can extremely useful and effective.

它可以帮助您解决各种项目和任务。 无论是Web分析,社交媒体分析,社交网络分析,教育分析,数据可视化,数据驱动的Web开发还是漫游器,此社区提供的数据都非常有用和有效。

11. Yelp开放数据集 (11. Yelp Open Datasets)

The Yelp dataset is basically a subset of nothing but our own businesses, reviews and user data for use in personal, educational and academic pursuits.

Yelp数据集基本上只是我们自己的业务,评论和用户数据的一个子集,用于个人,教育和学术追求。

There are 5,996,996 reviews, 188,593 businesses, 280,991 pictures and 10 metropolitan areas included in Yelp Open Datasets.

Yelp开放数据集包含5,996,996条点评,188,593家企业,280,991张图片和10个大城市区域。

You can use them for different purposes. Since they are available as JSON files, you can use them in order to teach students about databases. You can use them to learn NLP or for sample production data while you understand how to design mobile apps.

您可以将它们用于不同的目的。 由于它们以JSON文件形式提供,因此您可以使用它们来向学生传授有关数据库的知识。 在了解如何设计移动应用程序的同时,您可以使用它们来学习NLP或获取示例生产数据。

In this dataset, you will find each file composed of a single object type, one JSON-object per-line.

在此数据集中,您将找到每个由单一对象类型组成的文件,每行一个JSON对象。

12. 联合国儿童基金会数据集 (12. UNICEF Dataset)

Since UNICEF concerns itself with a wide variety of critical issues, it has compiled relevant data on education, child labor, child disability, child mortality, maternal mortality, water and sanitation, low birth-weight, antenatal care, pneumonia, malaria, iodine deficiency disorder, female genital mutilation/cutting, and adolescents.

由于儿童基金会关注各种各样的关键问题,因此它收集了有关教育,童工,儿童残疾,儿童死亡率,孕产妇死亡率,水和卫生,低出生体重,产前保健,肺炎,疟疾,碘缺乏症的相关数据。疾病,女性生殖器残割/切割以及青少年。

UNICEF’s open datasets published on the IATI Registry: http://www.iatiregistry.org/publisher/unicef has been extracted directly from UNICEF’s operating system (VISION) and other data systems, and it reflects inputs made by individual UNICEF offices.

联合国儿童基金会在IATI注册中心( http://www.iatiregistry.org/publisher/unicef)上公开的数据集是直接从联合国儿童基金会的操作系统(VISION)和其他数据系统中提取的,反映了联合国儿童基金会各个办事处的投入。

The good thing is that there is a regular update when it comes to these datasets. Every month, the data is updated in order to make it more comprehensive, reliable and accurate.

好消息是这些数据集会定期更新。 每个月,数据都会更新一次,以使其更加全面,可靠和准确。

You can freely and easily access this data. In order to do so, you can download this data in CSV format. You can also preview sample data prior to downloading it.

您可以自由,轻松地访问此数据。 为此,您可以CSV格式下载此数据。 您还可以在下载样本数据之前对其进行预览。

While anybody can explore and visualize UNICEF’s datasets, there are three principal publishers:

尽管任何人都可以浏览和可视化联合国儿童基金会的数据集,但主要的发布者有以下三个:

UNICEF’s AID TRANSPARENCY PORTAL : You can far more easily access the datasets if you use this portal. It also includes details for each country that UNICEF works in.

联合国儿童基金会的援助透明门户 :如果您使用此门户,则可以更加轻松地访问数据集。 它还包括儿童基金会工作所在的每个国家的详细信息。

Publisher d-portal : It is, at the moment, in BETA. With this, portal, you can explore IATI data.

Publisher d-portal :目前在BETA中。 有了这个门户,您可以浏览IATI数据。

You can search the information related to development activities, budgets etc. You can explore this information country-wise.

您可以搜索与开发活动,预算等有关的信息。可以在全国范围内探索该信息。

Publisher’s data platform : On this platform, you can easily access statistics, charts, and metrics on data accessed via the IATI Registry. If you click on the headers, you can also sort many of the tables that you see on the platform. You will also find many of the datasets in the platforms in machine-readable JSON format.

发布者的数据平台 :在此平台上,您可以轻松访问通过IATI注册中心访问的数据的统计信息,图表和度量。 如果单击标题,还可以对平台上看到的许多表进行排序。 您还将在平台中找到许多机器可读的JSON格式的数据集。

13. Kaggle (13. Kaggle)

Kaggle is great because it promotes the use of different dataset publication formats. However, the better part is that it strongly recommends that the dataset publishers share their data in an accessible, non-proprietary format.

Kaggle很棒,因为它促进了不同数据集发布格式的使用。 但是,更好的是,它强烈建议数据集发布者以一种可访问的非专有格式共享其数据。

The platform supports open and accessible data formats. It is important not just for access but also for whatever you want to do with this data. Therefore, Kaggle Dataset clearly defines the file formats which are recommended while sharing data.

该平台支持开放和可访问的数据格式。 这不仅对访问很重要,而且对于您要使用此数据进行的任何操作都非常重要。 因此,Kaggle数据集明确定义了共享数据时建议使用的文件格式。

The unique thing about Kaggle datasets is that it is not just a data repository. Each dataset stands for a community that enables you to discuss data, find out public codes and techniques, and conceptualize your own projects in Kernels.

关于Kaggle数据集的独特之处在于,它不仅仅是一个数据存储库。 每个数据集代表一个社区,使您可以讨论数据,查找公共代码和技术,以及在内核中概念化自己的项目。

CSV, JSON, SQLite, Archive, Big Query etc. are files types that Kaggle supports. You can find a variety of resources in order to start working on your open data project.

CSV,JSON,SQLite,Archive,Big Query等是Kaggle支持的文件类型。 您可以找到各种资源,以开始进行开放数据项目。

The best part is that Kaggle allows you to publish and share datasets privately or publicly.

最好的部分是Kaggle允许您私下或公开发布和共享数据集。

14. LODUM (14. LODUM)

It is the Open Data initiative of the University of Münster. Under this initiative, it is made possible for anyone to access any public information about the university in machine-readable formats. You can easily access and reuse it as per your needs.

这是明斯特大学的开放数据倡议。 在此倡议下,任何人都可以以机器可读的格式访问有关大学的任何公共信息。 您可以根据需要轻松访问和重用它。

Open data about scientific artifacts and encoded as linked data is made available under this project.

在此项目下,可以获得有关科学人工制品的开放数据并被编码为链接数据。

With the help of Linked Data, it is possible to share and use data, ontologies and various metadata standards. It is, in fact, envisaged that it will be the accepted standard for providing metadata, and the data itself on the Web.

借助链接数据,可以共享和使用数据,本体和各种元数据标准。 实际上,可以预见它将成为提供元数据和Web上数据本身的公认标准。

The LODUM team has co-initiated LinkedUniversities.org and LinkedScience.org.

LODUM团队共同发起了LinkedUniversities.org和LinkedScience.org 。

You can use SPARQL editor or SPARQL package of R to analyze data.

您可以使用SPARQL编辑器或R的SPARQL包来分析数据。

SPARQL Package enables to connect to a SPARQL endpoint over HTTP, pose a SELECT query or an update query (LOAD, INSERT, DELETE).

SPARQL软件包使您可以通过HTTP连接到SPARQL端点,进行SELECT查询或更新查询(LOAD,INSERT,DELETE)。

15. UCI机器学习存储库 (15. UCI Machine Learning Repository)

It serves as a comprehensive repository of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms.

它充当数据库,领域理论和数据生成器的综合存储库,机器学习社区使用它们来对机器学习算法进行实证分析。

In this repository, there are, at present, 463 datasets as a service to the machine learning community.

在该存储库中,目前有463个数据集作为对机器学习社区的服务。

The Center for Machine Learning and Intelligent Systems at the University of California, Irvine hosts and maintains it. David Aha had originally created it as a graduate student at UC Irvine.

加利福尼亚大学欧文分校的机器学习和智能系统中心负责托管和维护该中心。 David Aha最初是在加州大学尔湾分校(UC Irvine)的研究生创建的。

Since then, students, educators, and researchers all over the world make use of it as a reliable source of machine learning datasets.

从那时起,全世界的学生,教育者和研究人员都将其用作可靠的机器学习数据集来源。

How it works is that each dataset has its distinct webpage which enlists all the known details including any relevant publications that investigate it. You can download these datasets as ASCII files, often the useful CSV format.

它的工作方式是每个数据集都有其独特的网页,其中列出了所有已知的详细信息,包括进行调查的所有相关出版物。 您可以将这些数据集下载为ASCII文件,通常是有用的CSV格式。

The details of datasets are summarized by aspects like attribute types, number of instances, number of attributes and year published that can be sorted and searched.

数据集的详细信息按属性类型,实例数量,属性数量和可以分类和搜索的发布年份等方面进行了汇总。

打开数据门户和搜索引擎: (Open Data Portals and Search Engines:)

While there are plenty of datasets published by numerous agencies every year, very few datasets become recognized and established.

尽管每年都有许多机构发布大量的数据集,但很少有数据集得到认可和建立。

The reason why very few such datasets sustain as useful resource is that it is a challenge to develop, manage and provide the data in a way that people and organizations find it useful and easy to use.

这样的数据集只能作为有用资源来维持的原因是,以人们和组织认为有用和易于使用的方式来开发,管理和提供数据是一个挑战。

However, please find below a list of other few important open data portals and platforms that permit users to access open data quite easily, study the impact and glean valuable insights.

但是,请在下面找到其他一些重要的开放数据门户和平台的列表,这些门户和平台使用户可以非常轻松地访问开放数据,研究其影响并收集有价值的见解。

  1. Google dataset search

    Google资料集搜寻

  2. Dataverse

    数据宇宙

  3. Open Data Kit

    开放数据套件

  4. Ckan

    kan

  5. Open Data Monitor

    打开数据监控器

  6. Plenar.io

    Plenar.io

  7. Open Data Impact Map

    开放数据影响图

结论 (Conclusion)

Open data is the order of the day. The world has gradually started moving towards open systems and open data is rightly in sync with that.

开放数据是每天的工作。 世界逐渐开始向开放系统迈进,开放数据正与此同步。

The business and organizations which leverage open data will gain a competitive edge and will be able to dominate the future.

利用开放数据的企业和组织将获得竞争优势,并能够支配未来。

翻译自: https://www.freecodecamp.org/news/https-medium-freecodecamp-org-best-free-open-data-sources-anyone-can-use-a65b514b0f2d/

源数据和数据源

你可能感兴趣的:(可视化,大数据,编程语言,python,机器学习)