如何在Ubuntu 18.04上安装和使用TimescaleDB

The author selected the Computer History Museum to receive a donation as part of the Write for DOnations program.

作者选择“ 计算机历史博物馆”作为“ Write for DOnations”计划的一部分接受捐赠。

介绍 (Introduction)

Many applications, such as monitoring systems and data collection systems, accumulate data for further analysis. These analyses often look at the way a piece of data or a system changes over time. In these instances, data is represented as a time series, with every data point accompanied by a timestamp. An example would look like this:

许多应用程序(例如监视系统和数据收集系统)会累积数据以进行进一步分析。 这些分析通常着眼于数据或系统随时间变化的方式。 在这些情况下,数据表示为时间序列,每个数据点都带有时间戳。 一个示例如下所示:

2019-11-01 09:00:00    server.cpu.1    0.9
2019-11-01 09:00:00    server.cpu.15   0.8
2019-11-01 09:01:00    server.cpu.1    0.9
2019-11-01 09:01:00    server.cpu.15   0.8
...

Managing time series data has become an essential skill with the rise of the Internet of Things (IoT) and Industrial Internet of Things. There are more and more devices that collect various time series information: fitness trackers, smart watches, home weather stations, and various sensors, to name a few. These devices collect a lot of information, and all this data must be stored somewhere.

随着物联网 (IoT)和工业物联网的兴起,管理时间序列数据已成为一项基本技能。 越来越多的设备收集各种时间序列信息:健身追踪器,智能手表,家庭气象站和各种传感器,仅举几例。 这些设备收集了大量信息,所有这些数据必须存储在某个地方。

Classic relational databases are most often used to store data, but they don’t always fit when it comes to the huge data volumes of time series. When you need to process a large amount of time series data, relational databases can be too slow. Because of this, specially optimized databases, called NoSQL databases, have been created to avoid the problems of relational databases.

经典的关系数据库最常用于存储数据,但涉及时间序列的庞大数据量时,它们并不总是适合。 当您需要处理大量时间序列数据时,关系数据库可能太慢。 因此,已经创建了专门优化的数据库,称为NoSQL数据库 ,以避免关系数据库的问题。

TimescaleDB is an open-source database optimized for storing time series data. It is implemented as an extension of PostgreSQL and combines the ease-of-use of relational databases and the speed of NoSQL databases. As a result, it allows you to use PostgreSQL for both storing business data and time series data in one place.

TimescaleDB是一个开放源数据库,已针对存储时间序列数据进行了优化。 它实现为PostgreSQL的扩展,并结合了关系数据库的易用性和NoSQL数据库的速度。 结果,它允许您使用PostgreSQL在一个地方存储业务数据和时间序列数据。

By following this tutorial, you’ll set up TimescaleDB on Ubuntu 18.04, configure it, and learn how to work with it. You’ll create time series databases and make simple queries. Finally, you’ll see how to get rid of unnecessary data.

通过遵循本教程,您将在Ubuntu 18.04上设置TimescaleDB,对其进行配置,并学习如何使用它。 您将创建时间序列数据库并进行简单查询。 最后,您将看到如何清除不必要的数据。

先决条件 (Prerequisites)

To follow this tutorial, you will need:

要遵循本教程,您将需要:

  • One Ubuntu 18.04 server set up by following our Initial Server Setup Guide for Ubuntu 18.04, including a non-root user with sudo privileges and a firewall.

    按照我们针对Ubuntu 18.04的初始服务器安装指南设置一台Ubuntu 18.04服务器,其中包括具有sudo特权的非root用户和防火墙。

  • PostgreSQL installed on your server. Follow Step 1 of How To Install and Use PostgreSQL on Ubuntu 18.04 to install it.

    在您的服务器上安装了PostgreSQL。 按照如何在Ubuntu 18.04上安装和使用PostgreSQL步骤1进行安装。

第1步-安装TimescaleDB (Step 1 — Installing TimescaleDB)

TimescaleDB is not available in Ubuntu’s default package repositories, so in this step you will install it from the TimescaleDB Personal Packages Archive (PPA).

Ubuntu的默认软件包存储库中没有TimescaleDB,因此在此步骤中,您将从TimescaleDB Personal Packages Archive(PPA)安装它。

First, add Timescale’s APT repository:

首先,添加Timescale的APT存储库:

  • sudo add-apt-repository ppa:timescale/timescaledb-ppa

    sudo add-apt-repository ppa:timescale / timescaledb-ppa

Confirm this action by hitting the ENTER key.

通过按ENTER键确认此操作。

Next, refresh your APT cache to update your package lists:

接下来,刷新您的APT缓存以更新您的软件包列表:

  • sudo apt update

    sudo apt更新

You can now proceed with the installation. This tutorial uses PostgreSQL version 10; if you are using a different version of PostgreSQL (11 or 9.6, for example), replace the value in the following command and run it:

现在,您可以继续安装。 本教程使用PostgreSQL版本10。 如果您使用的是其他版本的PostgreSQL(例如11或9.6),请替换以下命令中的值并运行它:

  • sudo apt install timescaledb-postgresql-10

    sudo apt安装timescaledb-postgresql- 10

TimescaleDB is now installed and ready to be used. Next, you will turn it on and adjust some of the settings associated with it in the PostgreSQL configuration file to optimize the database.

现在已安装TimescaleDB并准备使用它。 接下来,您将打开它并在PostgreSQL配置文件中调整与它相关联的一些设置以优化数据库。

第2步-配置TimescaleDB (Step 2 — Configuring TimescaleDB)

The TimescaleDB module works fine with the default PostgreSQL configuration settings, but to improve performance and make better use of processor, memory, and disk resources, the developers of TimescaleDB suggest configuring some individual parameters. This can be done automatically with the timescaledb-tune tool or by manually editing your server’s postgresql.conf file.

TimescaleDB模块可以与默认的PostgreSQL配置设置一起正常工作,但是为了提高性能并更好地利用处理器,内存和磁盘资源,TimescaleDB的开发人员建议配置一些单独的参数。 这可以使用timescaledb-tune工具自动完成,也可以通过手动编辑服务器的postgresql.conf文件来完成。

In this tutorial, you will use the timescaledb-tune tool, which will read the postgresql.conf file and interactively suggest making changes.

在本教程中,您将使用timescaledb-tune工具 ,该工具将读取postgresql.conf文件并以交互方式建议您进行更改。

Run the following command to start the configuration wizard:

运行以下命令以启动配置向导:

  • sudo timescaledb-tune

    须藤timescaledb-tune

First, you will be asked to confirm the path to the PostgreSQL configuration file:

首先,将要求您确认PostgreSQL配置文件的路径:


   
   
     
     
     
     
Output
Using postgresql.conf at this path: /etc/postgresql/10/main/postgresql.conf Is this correct? [(y)es/(n)o]:

The utility automatically detects the path to the configuration file, so confirm this by entering y:

该实用程序会自动检测到配置文件的路径,因此请输入y确认:


   
   
     
     
     
     
Output
... Is this correct? [(y)es/(n)o]: y Writing backup to: /tmp/timescaledb_tune.backup201911181111

Next, you will be prompted to change the shared_preload_libraries variable to preload the TimescaleDB module upon starting the PostgreSQL server:

接下来,系统将提示您更改shared_preload_libraries变量,以在启动PostgreSQL服务器时预加载TimescaleDB模块:


   
   
     
     
     
     
Output
shared_preload_libraries needs to be updated Current: #shared_preload_libraries = '' Recommended: shared_preload_libraries = 'timescaledb' Is this okay? [(y)es/(n)o]:

shared_preload_libraries accepts a comma separated list of modules as a value, designating which modules PostgreSQL should load before starting the database server. Making this change will add the timescaledb module to that list.

shared_preload_libraries接受以逗号分隔的模块列表作为值,指定在启动数据库服务器之前PostgreSQL应该加载哪些模块。 进行此更改会将timescaledb模块添加到该列表中。

Note: If a library specified by shared_preload_libraries is not found, the database server will fail to start. Keep this in mind when debugging applications that make use of shared_preload_libraries. For more information on this, see this PostgresqlCO.NF article on shared_preload_libraries.

注意:如果未找到由shared_preload_libraries指定的库,则数据库服务器将无法启动。 在调试使用shared_preload_libraries应用程序时,请记住这一点。 有关此的更多信息,请参见shared_preload_libraries上的这篇PostgresqlCO.NF文章 。

Enable the TimescaleDB module by typing y at this prompt and pressing ENTER:

在此提示符下键入y并按ENTER以启用TimescaleDB模块:


   
   
     
     
     
     
Output
... Is this okay? [(y)es/(n)o]: y success: shared_preload_libraries will be updated

Based on the characteristics of your server and the PostgreSQL version, the script will then offer to tune your settings. Press y to start the tuning process:

然后,根据服务器的特性和PostgreSQL版本,脚本将提供用于调整设置的信息。 按y开始调整过程:


   
   
     
     
     
     
Output
Tune memory/parallelism/WAL and other settings? [(y)es/(n)o]: y Recommendations based on 7.79 GB of available memory and 4 CPUs for PostgreSQL 10 Memory settings recommendations Current: shared_buffers = 128MB #effective_cache_size = 4GB #maintenance_work_mem = 64MB #work_mem = 4MB Recommended: shared_buffers = 1994MB effective_cache_size = 5982MB maintenance_work_mem = 1021001kB work_mem = 5105kB Is this okay? [(y)es/(s)kip/(q)uit]:

timescaledb-tune will automatically detect the servers’s available memory and calculate recommended values for a number of settings. shared_buffers, for example, determines the amount of memory allocated for caching data. By default this setting is relatively low to account for a wider range of platforms, so timescaledb-tune has suggested increasing the value from 128MB to 1994MB, taking better advantage of resources by making more room to store cached information like repeated queries. The work_mem variable has been increased as well to allow for more complicated sorts.

timescaledb-tune将自动检测服务器的可用内存并计算许多设置的建议值。 例如, shared_buffers确定分配用于缓存数据的内存量。 默认情况下,此设置相对较低,以适应更广泛的平台范围,因此timescaledb-tune建议将该值从128MB增加到1994MB ,通过腾出更多空间来存储缓存信息(如重复查询)来更好地利用资源。 还增加了work_mem变量,以允许进行更复杂的排序。

If you would like to learn more about the process of tuning memory settings for PostgreSQL, see the Tuning Your PostgreSQL Server article on the PostgreSQL wiki.

如果您想了解有关为PostgreSQL调整内存设置的更多信息,请参见PostgreSQL Wiki上的 “ 调整PostgreSQL服务器”一文 。

Enter y to accept the values:

输入y以接受值:


   
   
     
     
     
     
Output
... Is this okay? [(y)es/(s)kip/(q)uit]: y success: memory settings will be updated

At this point, if your server has multiple CPUs, you will find the recommendations for parallelism settings. These settings determine how multiple CPUs can make simultaneous queries in parallel to scan databases and return the requested data quicker.

此时,如果您的服务器有多个CPU,则会发现有关并行性设置的建议。 这些设置确定多个CPU如何并行进行并行查询以扫描数据库并更快地返回所请求的数据。

Those with multiple CPUs will encounter recommendations like this:

具有多个CPU的用户将遇到以下建议:


   
   
     
     
     
     
Output
Parallelism settings recommendations Current: missing: timescaledb.max_background_workers #max_worker_processes = 8 #max_parallel_workers_per_gather = 2 #max_parallel_workers = 8 Recommended: timescaledb.max_background_workers = 8 max_worker_processes = 13 max_parallel_workers_per_gather = 1 max_parallel_workers = 2 Is this okay? [(y)es/(s)kip/(q)uit]:

These settings regulate the number of workers, which process requests and background tasks. You can learn more about these settings from the TimescaleDB and PostgreSQL documentation.

这些设置可调节用于处理请求和后台任务的worker的数量。 您可以从TimescaleDB和PostgreSQL文档中了解有关这些设置的更多信息。

Type y then ENTER to accept these settings:

键入y然后按ENTER以接受以下设置:


   
   
     
     
     
     
Output
... Is this okay? [(y)es/(s)kip/(q)uit]: y success: parallelism settings will be updated

Next, you will find recommendations for Write Ahead Log (WAL):

接下来,您将找到有关预写日志 (WAL)的建议 :


   
   
     
     
     
     
Output
WAL settings recommendations Current: #wal_buffers = -1 #min_wal_size = 80MB #max_wal_size = 1GB Recommended: wal_buffers = 16MB min_wal_size = 4GB max_wal_size = 8GB Is this okay? [(y)es/(s)kip/(q)uit]:

WAL is a logging method in which PostgreSQL logs changes to data files before the changes are made to the database. By prioritizing an up-to-date record of data changes, WAL ensures that you can reconstruct your database in the event of a crash. In this way, it preserves data integrity. However, the default settings can cause inefficient input/output (I/O) operations that slow down write performance. To fix this, type and enter y:

WAL是一种日志记录方法,其中PostgreSQL在对数据库进行更改之前将更改记录到数据文件中。 通过优先处理数据更改的最新记录,WAL确保在发生崩溃时可以重建数据库。 这样,它可以保持数据完整性。 但是,默认设置可能会导致输入/输出(I / O)操作效率低下,从而降低写入性能。 要解决此问题,请键入并输入y


   
   
     
     
     
     
Output
... Is this okay? [(y)es/(s)kip/(q)uit]: y success: WAL settings will be updated

You’ll now find some miscellaneous recommendations:

现在,您会发现一些其他建议:


   
   
     
     
     
     
Output
Miscellaneous settings recommendations Current: #default_statistics_target = 100 #random_page_cost = 4.0 #checkpoint_completion_target = 0.5 #max_locks_per_transaction = 64 #autovacuum_max_workers = 3 #autovacuum_naptime = 1min #effective_io_concurrency = 1 Recommended: default_statistics_target = 500 random_page_cost = 1.1 checkpoint_completion_target = 0.9 max_locks_per_transaction = 64 autovacuum_max_workers = 10 autovacuum_naptime = 10 effective_io_concurrency = 200 Is this okay? [(y)es/(s)kip/(q)uit]:

All of these different parameters are aimed at increasing performance. For example, SSDs can process many concurrent requests, so the best value for the effective_io_concurrency might be in the hundreds. You can find more info about these options in the PostgreSQL documentation.

所有这些不同的参数都旨在提高性能。 例如,SSD可以处理许多并发请求,因此, effective_io_concurrency的最佳值可能是数百。 您可以在PostgreSQL文档中找到有关这些选项的更多信息。

Press y then ENTER to continue.

y然后按ENTER以继续。


   
   
     
     
     
     
Output
... Is this okay? [(y)es/(s)kip/(q)uit]: y success: miscellaneous settings will be updated Saving changes to: /etc/postgresql/10/main/postgresql.conf

As a result, you will get a ready-made configuration file at /etc/postgresql/10/main/postgresql.conf.

结果,您将在/etc/postgresql/ 10 /main/postgresql.conf获得现成的配置文件。

Note: If you are automating the installation, you could also run the initial command with the --quiet and --yes flags, which will automatically apply all the recommendations and will make changes to the postgresql.conf configuration file:

注意:如果要自动执行安装,还可以使用--quiet--yes标志运行初始命令,这将自动应用所有建议,并将更改postgresql.conf配置文件:

  • sudo timescaledb-tune --quiet --yes

    须藤timescaledb-tune --quiet --yes

In order for the configuration changes to take effect, you must restart the PostgreSQL service:

为了使配置更改生效,您必须重新启动PostgreSQL服务:

  • sudo systemctl restart postgresql.service

    sudo systemctl重新启动postgresql.service

Now the database is running with optimal parameters and is ready to work with the time series data. In the next steps, you’ll try out working with this data: creating new databases and hypertables and performing operations.

现在,数据库正在使用最佳参数运行,并准备使用时间序列数据。 在接下来的步骤中,您将尝试使用这些数据:创建新的数据库和超表并执行操作。

第3步-创建新的数据库和超表 (Step 3 — Creating a New Database and Hypertable)

With your TimescaleDB setup optimized, you are ready to work with time series data. TimescaleDB is implemented as an extension of PostgreSQL, so operations with time series data are not much different from relational data operations. At the same time, the database allows you to freely combine data from time series and relational tables in the future.

优化您的TimescaleDB设置后,您就可以使用时间序列数据了。 TimescaleDB被实现为PostgreSQL扩展,因此使用时间序列数据进行的操作与关系数据操作并没有太大区别。 同时,该数据库允许您将来自由组合时间序列和关系表中的数据。

To demonstrate this, you will use PostgreSQL commands to create a database, then enable the TimescaleDB extension to create a hypertable, which is a higher-level abstraction of many individual tables. Hypertables are the main structures you will work with in TimescaleDB.

为了说明这一点,您将使用PostgreSQL命令创建一个数据库,然后启用TimescaleDB扩展来创建一个hypertablehypertable是许多单个表的高层抽象。 超表是您将在TimescaleDB中使用的主要结构。

Log into your PostgreSQL database:

登录到PostgreSQL数据库:

  • sudo -u postgres psql

    须藤-u postgres psql

Now create a new database and connect to it. This tutorial will name the database timeseries:

现在创建一个新数据库并连接到它。 本教程将命名数据库timeseries

  • CREATE DATABASE timeseries;

    创建数据库时间序列 ;

  • \c timeseries

    \ c 时间序列

You can find additional information about working with the PostgreSQL database in our How To Create, Remove & Manage Tables in PostgreSQL on a Cloud Server tutorial.

您可以在《 Cloud Server上的如何在PostgreSQL中创建,删除和管理表》教程中找到有关使用PostgreSQL数据库的其他信息。

Finally, enable the TimescaleDB extension:

最后,启用TimescaleDB扩展:

  • CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

    如果timescaledb CASCADE不存在,则创建扩展;

You will see the following output:

您将看到以下输出:


   
   
     
     
     
     
Output
WARNING: WELCOME TO _____ _ _ ____________ |_ _(_) | | | _ \ ___ \ | | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ / | | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \ | | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ / |_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/ Running version 1.5.1 For more information on TimescaleDB, please visit the following links: 1. Getting started: https://docs.timescale.com/getting-started 2. API reference documentation: https://docs.timescale.com/api 3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture Note: TimescaleDB collects anonymous reports to better understand and assist our users. For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry. CREATE EXTENSION

As mentioned earlier, the primary points of interaction with your time series data are hypertables, which consist of many individual tables holding data, called chunks.

如前所述,与时间序列数据进行交互的主要点是超表,它由许多保存数据的单独表组成,称为表

To create a hypertable, start with a regular SQL table and then convert it into a hypertable via the function create_hypertable.

要创建超表,请从常规SQL表开始,然后通过函数create_hypertable将其转换为超create_hypertable

Make a table that will store data for tracking temperature and humidity across a collection of devices over time:

制作一个表,该表将存储数据以跟踪一段时间内各个设备的温度和湿度:

  • CREATE TABLE conditions (

    创建表条件(
  • time TIMESTAMP WITH TIME ZONE NOT NULL,

    TIMESTAMP WITH TIME ZONE NOT NULL,
  • device_id TEXT,

    device_id TEXT,
  • temperature NUMERIC,

    温度NUMERIC,
  • humidity NUMERIC

    湿度NUMERIC
  • );

    );

This command creates a table called conditions with four columns. The first column will store the timestamp, which includes the time zone and cannot be empty. Next, you will use the time column to transform your table into a hypertable that is partitioned by time:

此命令创建一个具有四个列的称为conditions表。 第一列将存储时间戳,其中包括时区,不能为空。 接下来,您将使用时间列将表转换为按时间分区的超表:

  • SELECT create_hypertable('conditions', 'time');

    SELECT create_hypertable('conditions','time');

This command calls the create_hypertable() function, which creates a TimescaleDB hypertable from a PostgreSQL table, replacing the latter.

该命令调用create_hypertable()函数,该函数从PostgreSQL表创建一个TimescaleDB超表,以替换后者。

You will receive the following output:

您将收到以下输出:


   
   
     
     
     
     
Output
create_hypertable ------------------------- (1,public,conditions,t) (1 row)

In this step, you created a new hypertable to store time series data. Now you can populate it with data by writing to the hypertable, then run through the process of deleting it.

在此步骤中,您创建了一个新的超表来存储时间序列数据。 现在,您可以通过写入超表来使用数据填充它,然后执行删除它的过程。

步骤4 —写入和删除数据 (Step 4 — Writing and Deleting Data)

In this step, you will insert data using standard SQL commands and import large sets of data from external sources. This will show you the relational database aspects of TimescaleDB.

在此步骤中,您将使用标准SQL命令插入数据,并从外部源导入大量数据。 这将向您展示TimescaleDB的关系数据库方面。

First, try out the basic commands. You can insert data into the hypertable using the standard INSERT SQL command. Insert some sample temperature and humidity data for the theoretical device weather-pro-000000 using the following command:

首先,尝试基本命令。 您可以使用标准的INSERT SQL命令将数据插入到超表中。 使用以下命令为理论设备weather-pro-000000插入一些样本temperaturehumidity数据:

  • INSERT INTO conditions(time, device_id, temperature, humidity)

    插入条件(时间,device_id,温度,湿度)
  • VALUES (NOW(), 'weather-pro-000000', 84.1, 84.1);

    值(NOW(),'weather-pro-000000',84.1,84.1);

You’ll see the following output:

您将看到以下输出:


   
   
     
     
     
     
Output
INSERT 0 1

You can also insert multiple rows of data at once. Try the following:

您也可以一次插入多行数据。 请尝试以下操作:

  • INSERT INTO conditions

    插入条件
  • VALUES

    价值
  • (NOW(), 'weather-pro-000002', 71.0, 51.0),

    (NOW(),'weather-pro-000002',71.0,51.0),
  • (NOW(), 'weather-pro-000003', 70.5, 50.5),

    (NOW(),'weather-pro-000003',70.5,50.5),
  • (NOW(), 'weather-pro-000004', 70.0, 50.2);

    (NOW(),'weather-pro-000004',70.0,50.2);

You will receive the following:

您将收到以下信息:


   
   
     
     
     
     
Output
INSERT 0 3

You can also specify that the INSERT command will return some or all of the inserted data using the RETURNING statement:

您也可以使用RETURNING语句指定INSERT命令将返回部分或全部插入的数据:

  • INSERT INTO conditions

    插入条件
  • VALUES (NOW(), 'weather-pro-000002', 70.1, 50.1) RETURNING *;

    值(NOW(),'weather-pro-000002',70.1,50.1)返回*;

You will see the following output:

您将看到以下输出:


   
   
     
     
     
     
Output
time | device_id | temperature | humidity -------------------------------+--------------------+-------------+---------- 2019-09-15 14:14:01.576651+00 | weather-pro-000002 | 70.1 | 50.1 (1 row)

If you want to delete data from the hypertable, use the standard DELETE SQL command. Run the following to delete whatever data has a temperature higher than 80 or a humidity higher than 50:

如果要从超级表中删除数据,请使用标准的DELETE SQL命令。 运行以下命令以删除temperature高于80humidity高于50

  • DELETE FROM conditions WHERE temperature > 80;

    从温度> 80的条件中删除;
  • DELETE FROM conditions WHERE humidity > 50;

    在湿度> 50的条件下删除;

After the delete operation, use the VACUUM command to reclaim space still used by data that has been deleted.

删除操作之后,使用VACUUM命令回收已被删除的数据仍在使用的空间。

  • VACUUM conditions;

    真空条件;

You can find more info about VACUUM command in the PostgreSQL documentation.

您可以在PostgreSQL文档中找到有关VACUUM命令的更多信息。

These commands are fine for small-scale data entry, but since time series data often generates huge datasets from multiple devices simultaneously, it’s essential also to know how to insert hundreds or thousands of rows at a time. If you have prepared data from external sources in a structured form, for example in csv format, this task can be accomplished quickly.

这些命令适用于小规模数据输入,但是由于时间序列数据通常会同时从多个设备生成庞大的数据集,因此了解如何一次插入数百或数千行也很重要。 如果您以结构化形式(例如csv格式)从外部源准备了数据,则可以快速完成此任务。

To test this out, you will use a sample dataset that represents temperature and humidity data from a variety of locations. This is an official TimescaleDB dataset, made to test out their database. You can check out more info about sample datasets in the TimescaleDB documentation.

为了测试这一点,您将使用一个样本数据集,该数据集表示来自各个位置的温度和湿度数据。 这是官方的TimescaleDB数据集,用于测试其数据库。 您可以在TimescaleDB文档中查看有关样本数据集的更多信息。

Let’s see how you can import data from the weather_small sample dataset into your database. First, quit Postgresql:

让我们看看如何将数据从weather_small样本数据集中导入数据库。 首先,退出Postgresql:

  • \q

    \ q

Then download the dataset and extract it:

然后下载数据集并提取它:

  • wget https://timescaledata.blob.core.windows.net/datasets/weather_small.tar.gz

    wget https://timescaledata.blob.core.windows.net/datasets/ weather_small .tar.gz

  • tar -xvzf weather_small.tar.gz

    焦油-xvzf weather_small.tar.gz

Next, import the temperature and humidity data into your database:

接下来,将温度和湿度数据导入数据库:

  • sudo -u postgres psql -d timeseries -c "\COPY conditions FROM weather_small_conditions.csv CSV"

    sudo -u postgres psql -d 时间序列 -c“ \ COPY条件来自weather_small_conditions.csv CSV”

This connects to the timeseries database and executes the \COPY command that copies the data from the chosen file into the conditions hypertable. It will run for a few seconds.

这将连接到timeseries数据库并执行\COPY命令,该命令会将数据从所选文件复制到conditions超表中。 它将运行几秒钟。

When the data has been entered into your table, you will receive the following output:

将数据输入到表中后,您将收到以下输出:


   
   
     
     
     
     
Output
COPY 1000000

In this step, you added data to the hypertable manually and in batches. Next, continue on to performing queries.

在此步骤中,您将数据手动且成批添加到了超表中。 接下来,继续执行查询。

第5步-查询数据 (Step 5 — Querying Data)

Now that your table contains data, you can perform various queries to analyze it.

现在您的表已包含数据,您可以执行各种查询来对其进行分析。

To get started, log in to the database:

首先,登录数据库:

  • sudo -u postgres psql -d timeseries

    须藤-u postgres psql -d 时间序列

As mentioned before, to work with hypertables you can use standard SQL commands. For example, to show the last 10 entries from the conditions hypertable, run the following command:

如前所述,要使用超表,您可以使用标准SQL命令。 例如,要显示conditions超表中的最后10个条目,请运行以下命令:

  • SELECT * FROM conditions LIMIT 10;

    选择*从条件限制10;

You will see the following output:

您将看到以下输出:


   
   
     
     
     
     
Output
time | device_id | temperature | humidity ------------------------+--------------------+--------------------+---------- 2016-11-15 12:00:00+00 | weather-pro-000000 | 39.9 | 49.9 2016-11-15 12:00:00+00 | weather-pro-000001 | 32.4 | 49.8 2016-11-15 12:00:00+00 | weather-pro-000002 | 39.800000000000004 | 50.2 2016-11-15 12:00:00+00 | weather-pro-000003 | 36.800000000000004 | 49.8 2016-11-15 12:00:00+00 | weather-pro-000004 | 71.8 | 50.1 2016-11-15 12:00:00+00 | weather-pro-000005 | 71.8 | 49.9 2016-11-15 12:00:00+00 | weather-pro-000006 | 37 | 49.8 2016-11-15 12:00:00+00 | weather-pro-000007 | 72 | 50 2016-11-15 12:00:00+00 | weather-pro-000008 | 31.3 | 50 2016-11-15 12:00:00+00 | weather-pro-000009 | 84.4 | 87.8 (10 rows)

This command lets you see what data is in the database. Since the database contains a million records, you used LIMIT 10 to limit the output to 10 entries.

该命令使您可以查看数据库中有哪些数据。 由于数据库包含一百万条记录,因此您使用LIMIT 10将输出限制为10个条目。

To see the most recent entries, sort the data array by time in descending order:

要查看最新条目,请按时间降序对数据数组进行排序:

  • SELECT * FROM conditions ORDER BY time DESC LIMIT 20;

    选择*从条件按时间排序DESC LIMIT 20;

This will output the top 20 most recent entries.

这将输出前20个最新条目。

You can also add a filter. For example, to see entries from the weather-pro-000000 device, run the following:

您也可以添加过滤器。 例如,要查看来自weather-pro-000000设备的条目,请运行以下命令:

  • SELECT * FROM conditions WHERE device_id = 'weather-pro-000000' ORDER BY time DESC LIMIT 10;

    SELECT * FROM condition where where device_id ='weather-pro-000000'ORDER BY time DESC LIMIT 10;

In this case, you will see the 10 most recent temperature and humidity datapoints recorded by the weather-pro-000000 device.

在这种情况下,您将看到weather-pro-000000设备记录的10个最新温度和湿度数据点。

In addition to standard SQL commands, TimescaleDB also provides a number of special functions that are useful for time series data analysis. For example, to find the median of the temperature values, you can use the following query with the percentile_cont function:

除标准SQL命令外,TimescaleDB还提供了许多特殊功能,这些功能对于时间序列数据分析很有用。 例如,要查找温度值的中值,可以将以下查询与percentile_cont函数一起使用:

  • SELECT percentile_cont(0.5)

    选择percentile_cont(0.5)
  • WITHIN GROUP (ORDER BY temperature)

    组内(按温度排序)
  • FROM conditions

    来自条件
  • WHERE device_id = 'weather-pro-000000';

    WHERE device_id ='weather-pro-000000';

You will see the following output:

您将看到以下输出:


   
   
     
     
     
     
Output
percentile_cont ----------------- 40.5 (1 row)

In this way, you’ll see the median temperature for the entire observation period where the weather-pro-00000 sensor is located.

这样,您将看到weather-pro-00000传感器所在的整个观察期的中间温度。

To show the latest values from each of the sensors, you can use the last function:

要显示每个传感器的最新值,可以使用last函数:

  • select device_id, last(temperature, time)

    选择device_id,last(温度,时间)
  • FROM conditions

    来自条件
  • GROUP BY device_id;

    GROUP BY device_id;

In the output you will see a list of all the sensors and relevant latest values.

在输出中,您将看到所有传感器和相关最新值的列表。

To get the first values use the first function.

要获取第一个值,请使用第first函数。

The following example is more complex. It will show the hourly average, minimum, and maximum temperatures for the chosen sensor within the last 24 hours:

以下示例更为复杂。 它将显示最近24小时内所选传感器的每小时平均,最低和最高温度:

  • SELECT time_bucket('1 hour', time) "hour",

    SELECT time_bucket('1 hour',time)“小时”,
  • trunc(avg(temperature), 2) avg_temp,

    截断(avg(temperature),2)avg_temp,
  • trunc(min(temperature), 2) min_temp,

    trunc(min(temperature),2)min_temp,
  • trunc(max(temperature), 2) max_temp

    截断(max(temperature),2)max_temp
  • FROM conditions

    来自条件
  • WHERE device_id = 'weather-pro-000000'

    WHERE device_id ='weather-pro-000000'
  • GROUP BY "hour" ORDER BY "hour" DESC LIMIT 24;

    GROUP BY“小时” ORDER BY“小时” DESC LIMIT 24;

Here you used the time_bucket function, which acts as a more powerful version of the PostgreSQL date_trunc function. As a result, you will see which periods of the day the temperature rises or decreases:

在这里,您使用了time_bucket函数,该函数充当PostgreSQL date_trunc函数的更强大版本。 结果,您将看到温度在一天中的哪个时段升高或降低:


   
   
     
     
     
     
Output
hour | avg_temp | min_temp | max_temp ------------------------+----------+----------+---------- 2016-11-16 21:00:00+00 | 42.00 | 42.00 | 42.00 2016-11-16 20:00:00+00 | 41.92 | 41.69 | 42.00 2016-11-16 19:00:00+00 | 41.07 | 40.59 | 41.59 2016-11-16 18:00:00+00 | 40.11 | 39.79 | 40.59 2016-11-16 17:00:00+00 | 39.46 | 38.99 | 39.79 2016-11-16 16:00:00+00 | 38.54 | 38.19 | 38.99 2016-11-16 15:00:00+00 | 37.56 | 37.09 | 38.09 2016-11-16 14:00:00+00 | 36.62 | 36.39 | 37.09 2016-11-16 13:00:00+00 | 35.59 | 34.79 | 36.29 2016-11-16 12:00:00+00 | 34.59 | 34.19 | 34.79 2016-11-16 11:00:00+00 | 33.94 | 33.49 | 34.19 2016-11-16 10:00:00+00 | 33.27 | 32.79 | 33.39 2016-11-16 09:00:00+00 | 33.37 | 32.69 | 34.09 2016-11-16 08:00:00+00 | 34.94 | 34.19 | 35.49 2016-11-16 07:00:00+00 | 36.12 | 35.49 | 36.69 2016-11-16 06:00:00+00 | 37.02 | 36.69 | 37.49 2016-11-16 05:00:00+00 | 38.05 | 37.49 | 38.39 2016-11-16 04:00:00+00 | 38.71 | 38.39 | 39.19 2016-11-16 03:00:00+00 | 39.72 | 39.19 | 40.19 2016-11-16 02:00:00+00 | 40.67 | 40.29 | 40.99 2016-11-16 01:00:00+00 | 41.63 | 40.99 | 42.00 2016-11-16 00:00:00+00 | 42.00 | 42.00 | 42.00 2016-11-15 23:00:00+00 | 42.00 | 42.00 | 42.00 2016-11-15 22:00:00+00 | 42.00 | 42.00 | 42.00 (24 rows)

You can find more useful functions in the TimescaleDB documentation.

您可以在TimescaleDB文档中找到更多有用的功能。

Now you know how to handle your data. Next, you will go through how to delete unnecessary data and how to compress data.

现在您知道了如何处理数据。 接下来,您将了解如何删除不必要的数据以及如何压缩数据。

步骤6 —配置数据压缩和删除 (Step 6 — Configuring Data Compression and Deletion)

As data accumulates, it will take up more and more space on your hard drive. To save space, the latest version of TimescaleDB provides a data compression feature. This feature doesn’t require tweaking any file system settings, and can be used to quickly make your database more efficient. For more information on how this compression works, take a look at this Compression article from TimescaleDB.

随着数据的累积,它将占用越来越多的硬盘空间。 为了节省空间,最新版本的TimescaleDB提供了数据压缩功能。 此功能不需要调整任何文件系统设置,可用于快速提高数据库效率。 有关此压缩如何工作的更多信息,请参阅TimescaleDB上的此Compression文章 。

First, enable the compression of your hypertable:

首先,启用您的超表压缩:

  • ALTER TABLE conditions SET (

    ALTER TABLE条件SET(
  • timescaledb.compress,

    timescaledb.compress,
  • timescaledb.compress_segmentby = 'device_id'

    timescaledb.compress_segmentby ='device_id'
  • );

    );

You will receive the following data:

您将收到以下数据:


   
   
     
     
     
     
Output
NOTICE: adding index _compressed_hypertable_2_device_id__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_2 USING BTREE(device_id, _ts_meta_sequence_num) ALTER TABLE

Note: You can also set up TimescaleDB to compress data over the specified time period. For example, you could run:

注意:您也可以将TimescaleDB设置为在指定时间段内压缩数据。 例如,您可以运行:

  • SELECT add_compress_chunks_policy('conditions', INTERVAL '7 days');

    选择add_compress_chunks_policy('conditions',INTERVAL'7 days');

In this example, the data will be automatically compressed after a week.

在此示例中,数据将在一周后自动压缩。

You can see the statistics on the compressed data with the command:

您可以使用以下命令查看压缩数据的统计信息:

  • SELECT *

    选择 *
  • FROM timescaledb_information.compressed_chunk_stats;

    FROM timescaledb_information.compressed_chunk_stats;

You will then see a list of chunks with their statuses: compression status and how much space is taken up by uncompressed and compressed data in bytes.

然后,您将看到一个块的列表及其状态:压缩状态以及未压缩和压缩数据以字节为单位占用了多少空间。

If you don’t have the need to store data for a long period of time, you can delete out-of-date data to free up even more space. There is a special drop_chunks function for this. It allows you to delete chunks with data older than the specified time:

如果您不需要长时间存储数据,则可以删除过时的数据以释放更多的空间。 drop_chunks有一个特殊的drop_chunks函数。 它允许您删除数据早于指定时间的数据块:

  • SELECT drop_chunks(interval '24 hours', 'conditions');

    SELECT drop_chunks(间隔'24小时','条件');

This query will drop all chunks from the hypertable conditions that only include data older than a day ago.

该查询将从超表conditions中删除仅包含早于一天前数据的所有块。

You will receive the following output:

您将收到以下输出:


   
   
     
     
     
     
Output
drop_chunks ---------------------------------------- _timescaledb_internal._hyper_1_2_chunk (1 row)

To automatically delete old data, you can configure a cron task. See our tutorial to learn more about how to use cron to automate various system tasks.

要自动删除旧数据,您可以配置cron任务。 请参阅我们的教程,以了解有关如何使用cron自动化各种系统任务的更多信息 。

Exit from the database:

从数据库退出:

  • \q

    \ q

Next, edit your crontab with the following command, which should be run from the shell:

接下来,使用以下命令编辑crontab ,该命令应从外壳程序运行:

  • crontab -e

    crontab -e

Now add the following line to the end of the file:

现在,将以下行添加到文件末尾:

crontab
crontab
...

0 1 * * * /usr/bin/psql -h localhost -p 5432 -U postgres -d postgres -c "SELECT drop_chunks(interval '24 hours', 'conditions');" >/dev/null 2>&1

This job will delete obsolete data that is older than one day at 1:00 AM every day.

该作业将删除每天凌晨1:00之前一天以上的过时数据。

结论 (Conclusion)

You’ve now set up TimescaleDB on your Ubuntu 18.04 server. You also tried out creating hypertables, inserting data into it, querying the data, compressing, and deleting unnecessary records. With these examples, you’ll be able to take advantage of TimescaleDB’s key benefits over traditional relational database management systems for storing time-series data, including:

现在,您已经在Ubuntu 18.04服务器上设置了TimescaleDB。 您还尝试了创建超表,将数据插入其中,查询数据,压缩和删除不必要的记录。 通过这些示例,您将能够利用TimescaleDB相对于传统的关系数据库管理系统的主要优势来存储时间序列数据,包括:

  • Higher data ingest rates

    更高的数据摄取率
  • Quicker query performance

    更快的查询性能
  • Time-oriented features

    面向时间的功能

Now that you know how to store time series data, you could use the data to create graphs. TimescaleDB is compatible with visualization tools that work with PostgreSQL, like Grafana. You can use our How To Install and Secure Grafana on Ubuntu 18.04 tutorial to learn more about this popular visualization tool.

既然您知道如何存储时间序列数据,就可以使用该数据创建图形。 TimescaleDB与使用PostgreSQL可视化工具(如Grafana)兼容 。 您可以使用我们的如何在Ubuntu 18.04上安装和保护Grafana教程来了解有关此流行的可视化工具的更多信息。

翻译自: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-timescaledb-on-ubuntu-18-04

你可能感兴趣的:(数据库,大数据,python,linux,mysql)