Introduction
New users of In uxDB sometimes make the mistake of assuming that time-series data is similar in structure to relational data. The truth is that time-series data and relational data are unique in their own ways, with their own set of bene ts and tradeo s. A key part of understanding how In uxDB performs is understanding the internal mechanics of how In uxDB stores and retrieves data. To help provide a better understanding of how to get the best performance out of In uxDB, this technical paper we will delve into the top ve tips for improving both write and query performance with In uxDB.
This paper will cover:
1. Series Cardinality is King
A discussion of what series cardinality is, and what it means for your database.
2. Always Batch Points
A discussion around what ‘batching’ is, and how it can help you save on performance.
3. Down-Sample Your Data
A discussion on how to down-sample and aggregate your time-series data to conserve IO and storage.
4. Condense Like Data, Separate Unlike Data
A discussion around organizing and structuring data in In uxDB, and how a little bit of upfront planning can save you major headaches in the long-run.
5. Be Precise
A discussion on how to choose which time precision to use when writing your data, and how to optimize for high-volume compression with the new TSM storage engine.
Please note: This article assumes you are somewhat familiar with In uxDB terminology and concepts. It is recommended that you read the In uxDB Getting Started Guide before continuing, if you have not done so already.
So, let’s get started by discussing one of the key components to any schema design in time-series data: series cardinality.
1 / 11
Series Cardinality is King
A series is a collection of data in In uxDB that shares a common tag set, measurement and retention policy. Series cardinality is the count of all of the unique series in a database. This may not be the easiest concept to visualize, so let’s start with an example:
Let’s say that you run a web application, and are using In uxDB as a centralized database for logging response times (along with a few other metrics). In each of these response time metrics, you have the following ve items:
● The IP address of the client (or the end-user)
● The IP address of the application server (we’re assuming you are running multiple front-end servers in this example)
● The version of the running application
● The response time (in milliseconds) of the request
● The HTTP response code of the request (for a full list of response codes, see here)
Now before storing data in In uxDB, it is critical that you determine the potential cardinality (or the number of possible values) for each item that you are planning on storing, and determine whether that item should be stored as either a tag or a eld. In uxDB stores an inverted index for each series in memory, and, while the in-memory representation is relatively compact, the size of this index has direct query performance implications and will dictate the size of the machine required to handle your application.
A quick overview of the di erences between tags and elds:
● Tags are indexed in memory, which can provide very performant lookups when querying and grouping. The downside is that if you use too many unique tag values, your memory and hardware requirements will start to climb. More information on hardware sizing here.
● Fields are not indexed in memory, but can be used to store ‘typed’ values and can be used as operands in math functions. Fields also have no e ect on series cardinality.
2 / 11
Theve things you should always consider before making a value a tag or a eld:
● Does the value have high cardinality? If so, start with the assumption it will be a eld.
● Do you frequently reference the metric in the WHERE clause? That implies you want the value to be indexed, and so it should be stored as a tag. (Unindexed values in the WHERE clause will work, but those queries are signi cantly less performant.)
● Do you want to be able to reference the value in a GROUP BY? If yes, then it must be stored as a tag. (Fields cannot be referenced in a GROUP BY clause.)
● Do you want to use the value in a mathematical function (MEAN(), STDDEV(),
PERCENTILE(), etc)? If yes, then it must be stored as a eld. (Tag values are not valid input for functions.)
● Do you need to store a value’s type (e.g, int, oat, bool, string)? If yes, then it must be stored as a eld. (All tag values are always strings.)
In summary, use elds for high-cardinality values, values you need to perform math on (mean, derivative, etc), or values that you need to store as a speci c type (boolean, int, oat, etc). Use tags for values you need to use in GROUP BY, or that you frequently
reference in the WHERE clause.
Now that we have a better understanding of the di erences between tags and elds, let’s try to provide rough estimates for the cardinality of each of the ve metrics we mentioned earlier:
The IP address of the client (or the end-user)
Depending on who the end-users are, this item could potentially be enormous (or near in nite, if you are someone like Facebook). Due to the huge cardinality on this item, we are going to store it in In uxDB as a eld. All other considerations are moot when considering near in nite cardinality.
The IP address of the application server
Again, the number of potential values for this item will depend on the scale of your operation. For the sake of this example, however, we are going to assume that you have
3 / 11
ten application servers. Due to the small cardinality of this item, there’s no need to make it a eld. In addition, it could also be useful to group by the data points coming from each server (using GROUP BY), which can help when attempting to diagnose performance issues. Due to these reasons, we are going to store it in In uxDB as a tag.
The version of the running application
This item will most likely have a small range of possible values. Most applications don’t have tens of thousands of di erent versions. It will also most likely be very helpful to group by the application version when attempting to gather performance characteristics of your application. Due to the low cardinality and need for grouping, we are going to store this value as a tag.
The response time (in milliseconds) of the request
This item will certainly have a large (or in nite) cardinality, and will most likely be used in math operations. Due to the high cardinality, the desire to pass the value to mathematical functions, and the typed nature of this value, we are going to store this in In uxDB as a
eld.
The HTTP response code of the request
This item has a de ned cardinality of, at the very max, several dozen (even though most applications only use a handful). This value also has no need for math and does not need to be typed, so we will store this in In uxDB as a tag.
Now that we have determined the cardinality of the items above, let’s see what the total cardinality of this measurement would be. Assuming a worst-case for each item (doubling or tripling our estimates):
● The IP address of the end-user - Since this value is unbounded, we’ll use a
cardinality of in nity, which is why we are storing it as aeld.
● The IP address of the application server - Assuming we currently have 10 application servers, let’s give us some room to scale out over time, so let’s use a cardinality of 30.
4 / 11
● The version of the running application - Assuming we are will have multiple deployed versions of our application, let’s use a worst-case cardinality of 50.
● The response time (in milliseconds) of the request - Since this value is also unbounded, we’ll use a cardinality of in nity, which is why we are storing it as a
eld.
● The HTTP response code of the request - Since there are only a handful of commonly-used HTTP codes, let’s use a worst-case cardinality of 20.
These numbers give us a total series cardinality of about 30,000 (= 30 x 50 x 20). This series cardinality can easily t under our low load recommendation, with quite a bit of room to scale out over time (as more application versions and servers are added). In addition, most of the tag values used above will not be entirely independent of each other, which is also important to remember when calculating series cardinality. For example, is it likely that every version of your application will run on all of the di erent application servers at once? This can vary depending on your use-case, but the odds are that the tags themselves will be dependent on each other, making the actual series cardinality signi cantly lower than the potential cardinality.
As mentioned earlier, it is critical that you map out the cardinality and shape of your schema before writing any data into In uxDB. A small amount of forethought and e ort will save you major headaches when moving forward. If series cardinality can grow unbounded, that means your RAM needs are also unbounded, which is rarely a good situation.
Note: In addition to avoiding storage of high-cardinality values as tags, it is also recommended to limit the length/size of tag keys and values when possible. The larger the tag keys and values, the larger the memory footprint will need to be. This is not an immediate concern, but, for historic or long-lived data, larger strings can lead to increased storage and index costs, which can lead to performance problems over time. Just something to keep in mind when crafting your schema.
Next let’s discuss the importance of batching writes to In uxDB, and how small changes in batch size can have large performance implications.
5 / 11
Always Batch Points
Batching is when you send multiple points to In uxDB over the same HTTP request or connection. The HTTP overhead is very high for a single point written in line protocol, and you can easily ood the server by opening thousands of unique HTTP write requests for every point. Add in the overhead HTTPS/TLS encryption (which is highly recommended) and you have a recipe for disaster.
To counter this e ect, you always want to send multiple points (or a batch) to In uxDB whenever possible. Our recommended starting point is 5,000 points per batch, and then move to larger or smaller batches depending on your hardware and application.
Note: If your application cannot batch points due to latency or client restrictions, it is recommended that you use the UDP Service Listener instead of the HTTP API endpoint.
Down-sample Your Data
Storing raw data over a long period of time can be very hard on query performance and disk space. It is important to consider that you down-sample and aggregate data whenever possible. In uxDB comes prepackaged with two tools that help in this area:
● Retention Policies - What retention policy a point is written to determines how long In uxDB will store that data for. The default retention policy for a new database is in nity (ie, data is never dropped), but it is important to identify a set of RP’s early on that work for your use-case and requirements around data retention.
● Continuous Queries - A continuous query is a query that runs automatically over a given period of time, typically with the purpose of writing aggregate/down-sampled data from one retention policy to another.
As a brief example of how to e ciently use these two tools, let’s re-use the example from the Series Cardinality is King section above. In that example, we are recording response times for a web application. Typically with application performance monitoring, long-term historical data is of little use, so let’s create a set of retention policies that ts that assumption:
6 / 11
● default / four days - The default retention policy (where raw points are written to) will be kept for four days. This allows for our engineering team to have live un-aggregated performance data to use for identifying performance regressions or other issues.
● two_week / 14 days - The two_week retention policy will be the rst tier of our
roll-up data policies, where data will be kept for fourteen days before being deleted. We will down-sample data from the default retention policy above by averaging the response time eld into 10 minute buckets, and storing the resulting buckets for two weeks. This gives our engineers a resolution of 10 minutes for reviewing the data from the past two weeks that are older than four days.
● two_months / 2 months - The two_months retention policy will be the second tier of our roll-up data policies, where data will be kept for two months before being deleted. We will down-sample data from the two_week retention policy above by
averaging the response time eld into 30 minute buckets, and storing the resulting buckets for two months. Again, this will give our engineers a resolution of 30 minutes for reviewing the data from the past two months that are older than two weeks.
● historical / inf - The historical retention policy will be the third and nal tier of our roll-up policies, where data is kept inde nitely. We will down-sample data from the two_months retention policy by averaging the response time eld into 60 minute buckets, storing the results inde nitely. This will give our engineers a 60 minute resolution for performance data inde nitely, which can be useful for looking at historical performance.
Now that we have de ned our retention policies, crafting continuous queries to perform the down-sampling is much simpler. Writing the CQ’s themselves is a bit outside the scope of this document, but more information on the art of mastering Retention Policies and Continuous Queries (including an in-depth case study) is available here.
Condense Like Data, Separate Unlike Data
7 / 11
Due to the way that the TSM storage engine stores and retrieves data, there are few performance improvements to be gained by simply restructuring the schema of your data. With that, there are two rules to consider when crafting schema:
● Put similar data into the same measurement. This may seem like common sense, but it is important to remember when writing data in In uxDB. TSM stores data in a
columnar format, so the ‘heavier’ your points are (the more elds and tags), the more data can be compressed and queried at once. The Telegraf project makes good use of this idea, so please review the Telegraf schema for a good place to start.
● Separate un-similar data into di erent databases. Again, this may seem like common sense, but can lead to large performance gains in the long run. There is only one process writing to a shard at any given time, so the more you can separate your shards out (by splitting them into di erent databases) the more writes that can occur simultaneously.
For more information and design documentation about the TSM storage engine, please see here.
Be Precise
As discussed in the point above, it is important to keep in mind how the TSM storage engine stores and retrieves data on writes and reads. There are many bene ts to gleamed through TSM’s state-of-the-art compression mechanisms, the most basic of which is deciding what precision in time to write your points in. In uxDB can handle time precision ranging from nanoseconds to hours, and it is important to understand the bene ts and trade-o s when choosing a smaller or larger time precision. The precisions available for each individual point are:
● Nanosecond Precision (ns) - nanosecond precision is the smallest and most precise timestamp precision available in In uxDB, though it is almost the most resource-intensive to store.
● Microsecond Precision (u)
8 / 11
● Millisecond Precision (ms)
● Second Precision (s)
● Minute and Hour Precision (m/h) - In addition to the precisions listed above, you can also write data into In uxDB at the minute and hour-level precision. This option is not very widely used due to the lack of precision, but is still available for any use-cases that need it. Due to the lack of granularity with the minute and hour precisions, they require the least amount of resources to store, but have the highest risk of collision.
With the above precisions in mind, there are two important points to remember:
1. You should not be choosing a precision that is smaller than the collection interval. For example, if you are collecting and storing metrics every 10 seconds, then use seconds precision. Choosing a precision smaller than seconds will lead to wasted space and a larger overhead in performance.
2. Regularly spaced points (ie, each point is exactly 10 seconds apart) lead to larger compression bene ts. If possible, try to write points at rounded intervals for better performance and more storage capacity. For example, write a point every 10 seconds, as opposed to to one point at 9 seconds, then another at 11 seconds, and so on.
It is also better to query for little amounts of recent data, than for large amounts of data over a longer period of time. Shard duration in In uxDB defaults to seven days, so the more shards that have to be opened (a side-e ect of querying over large lengths of time), the longer your query will take. The shard duration itself can be tuned using the SHARD DURATION option when creating retention policies, so if your use-case is for writing sparse data (a few points per day, for example) then changing the shard duration to a longer period of time (26 weeks, 52 weeks, etc) can help increase the density of data per shard, leading to more performant query lookups.
The size and complexity of your queries will obviously be dictated by your speci c use-case, but you can drastically improve performance by querying often and for as little time as needed.
9 / 11
What’s Next?
We hope that this has been helpful! As for what to do next? Start building! Read the documentation for the latest technical information regarding In uxDB. Get involved with the In uxDB community by opening issues (or, more importantly, pull-requests) on Github, or ask questions on the In uxDB Google Group. And as always, please feel free to contact us at contact@in uxdb.com with any questions or concerns. We also o er varying levels of Professional Services engagements, including performance tuning and schema consultations, if you ever need help from the experts.
Documentation
● 0.13 Download
● 0.13 Installation
● 0.13 Getting Started
● 0.13 Schema Design
● 0.13 Line Protocol
● 0.13 Key Concepts
● Upgrading from previous versions
Q&A? Get involved!
● In uxDB Google Group
● contact@in uxdata.com
Need Help from the Experts?
● Technical Support
● Consulting
● Public Training, Private Training, Virtual Training
10 / 11