What's OpenTSDB?

OpenTSDB - A Distributed, Scalable Monitoring System
What's OpenTSDB?


sponsored the development and open-source release of OpenTSDB and uses it as their main monitoring system.
OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

Thanks to HBase's scalability, OpenTSDB allows you to collect many thousands of metrics from thousands of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample data and can easily store billions of data points. As a matter of fact, StumbleUpon uses it to keep track of hundred of thousands of time series and collects over 600 million data points per day in their main production datacenter.

Imagine having the ability to quickly plot a graph showing the number of DELETE statements going to your MySQL database along with the number of slow queries and temporary files created, and correlate this with the 99th percentile of your service's latency. OpenTSDB makes generating such graphs on the fly a trivial operation, while manipulating millions of data point for very fine grained, real-time monitoring.

15464 points retrieved, 932 points plotted in 100ms
Generating custom graphs and correlating events is easy.

At StumbleUpon, we have found this system tremendously helpful to:

Get real-time state information about our infrastructure and services.
Understand outages or how complex systems interact together.
Measure SLAs (availability, latency, etc.)
Tune our applications and databases for maximum performance.
Do capacity planning.

OpenTSDB is free software and is available under the LGPLv3+ license.



Time series database

From Wikipedia, the free encyclopedia

A time series database server (TSDS) is a software system that is optimized for handling a time series. In this context, a time series is an associative array of numbers indexed by a datetime or a datetime range. These time series are often called profiles or curves, depending upon the market. A time series of stock prices might be called a price curve, or a time series of energy consumption might be called a load profile. Despite the disparate naming, the operations performed on them are sufficiently common as to demand special database treatment.

TSDSs simplify the development of software with complex business rules in a wide variety of sectors. Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in a relational database. Compositions of those rules are even more difficult. This is a problem compounded by the free nature of relational systems themselves. Many relational systems are often not modelled correctly with respect to time series data. TSDS on the other hand impose a model and this allows them to provide more features for doing so.

Ideally, these repositories are often natively implemented using special database algorithms. However, good performance has also been obtained by storing time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema. These work best when time is treated as a fact, not a dimension.

Contents

 [hide

[edit] Overview

The TSDS allows users to create, enumerate, update and destroy various time series and organize them in some fashion. These series may be organized hierarchically and optionally have companion metadata available with them. The server often supports a number of basic calculations that work on a series as a whole, such as multiplying, adding, or otherwise combining various time series into a new time series. They can also filter on arbitrary patterns defined by the day of the week, low value filters, high value filters, or even have the values of one series filter another. Some TSDSs also build in a wealth of statistical functions.

For example, consider the following hypothetical "time series" or "profile" expression:

<span class="kw1">SELECT</span> nymex<span class="sy0">/</span>gold_price <span class="sy0">*</span> nymex<span class="sy0">/</span>gold_volume 

To analyze this, the TSDS would join the two series nymex/gold_price and nymex/gold_volume based on the overlapping areas of time for each, multiply the values where they intersect, and then output a single composite time series.

Obviously, more complex expressions are allowed. TSDSs often allow users to manage a repository of filters or masks that specify in some way a pattern based on the day of a week and a set of holidays. In this way, one can readily assemble time series data. Assuming such a filter exists, one might hypothetically write

<span class="kw1">SELECT</span> onpeak<span class="br0">(</span> cellphoneusage <span class="br0">)</span> 

which would extract out the time series of cellphoneusage that only intersects that of 'onpeak'. Some systems might generalize the filter to be a time series itself.

This syntactical simplicity drives the appeal of the TSDS. For example, a simple utility bill might be implemented using a query such as:

<span class="kw1">SELECT</span> <span class="kw1">MAX</span><span class="br0">(</span> onpeak<span class="br0">(</span> powerusagekw <span class="br0">)</span> <span class="br0">)</span> <span class="sy0">*</span> demand_charge;  <span class="kw1">SELECT</span> <span class="kw1">SUM</span><span class="br0">(</span> onpeak<span class="br0">(</span> powerusagekwh <span class="br0">)</span> <span class="br0">)</span> <span class="sy0">*</span> energy_charge; 

TSDS also generally have conversions to and from specific time zones implemented at the server level.

[edit] Example

A workable implementation of a time series database can be easily deployed in a conventional SQL-based relational database provided that the database software supports both binary large objects (BLOBs) and user-defined functions. SQL statements that operate on one or more time series quantities on the same row of a table or join can easily be written, as the user-defined time series functions operate comfortably inside of a SELECT statement. However, time series functionality such as a SUM function operating in the context of a GROUP BY clause cannot be easily achieved.




你可能感兴趣的:(open)