Key concepts in InfluxDB
The following concepts are essential for working with InfluxDB. They are at the core of the way it writes, stores, and accesses time series data.
Read the docs 😎📘
Throughout this guide we will link to the InfluxDB documentation, which contains a wealth of useful information.
If you are new to InfluxDB, you might want to familiarize yourself with installing and writing data with InfluxDB first.
Line protocol
Line protocol is InfluxDB’s data input format.
Use line protocol format to write data into InfluxDB. Each line represents a data point. Data points contain tags and fields.
Example
Here is an example of line protocol:
environment,devId=b47f6944 Temp=21.00,Lat=50.087325,Lon=14.407154 1603091412
White space, commas, and position all have important meanings in line protocol. Let’s examine the parts of this line.
environment,devId=b47f6944 Temp=21.00,Lat=50.087325,Lon=14.407154 1603091412
| | | | | | | |
+---------+ +------------+ +------------------------------------+ +--------+
measurement tags fields timestamp
The measurement comes first and must be followed by a comma. After the comma is a tag set, then a space, followed by a field set. The timestamp is optional and comes after another space. If a timestamp is not provided, the current time is used.
InfluxDB resources
The code in the example application frequently uses the following four concepts.
User
InfluxDB users are granted permission to access the database. Users are added as a member of an organization and use authentication tokens to access resources.
Token
Tokens (or authentication tokens) verify user and organization permissions in InfluxDB 2. There are different types of athentication tokens:
- admin tokens grant full read and write access to all resources in all organizations1
- all-access tokens grant full read and write access to all resources in an organization
- read and write tokens grant read or write access to specific resources in an organization
Bucket
A bucket is a named location where time series data is stored. All buckets have a retention policy, a duration of time that each data point persists. All buckets belong to an organization.
Organization
An organization is a logical workspace for a group of users. Members, buckets, tasks, and dashboards (along with a number of other resources), belong to an organization.
Cardinality
Performance in accessing time series data can be affected by the cardinality of that data. In InfluxDB, the primary concern is series cardinality.
Flux
Flux is InfluxDB’s scripting language. Flux code can be sent by other code as strings to be executed by the InfluxDB API.
A simple query might look like this:
from(bucket:"example-bucket")
|> range(start:-1h)
|> filter(fn:(r) =>
r._measurement == "my-measurement" and
r.my-tag-key == "my-tag-value"
)
Query data from a bucket with the from() function. The range() function filters records based on time bounds. The filter() function filters data based on conditions defined in a predicate function (fn).
You can use Flux for a variety of tasks including but not limited to:
- data transformations
- data analysis
- writing custom tasks, checks, and notifications
InfluxDB Cloud does not support admin tokens.
Annotated CSV
An Annotated CSV is the output format of a Flux query with InfluxDB 2.0.
For example, imagine that we’re writing data about the number of “calico” and “tabby” cats, both “young” and “adult” cats, in two shelters, “A” and “B”. The data layout looks like this:
Bucket | Measurement |
---|---|
“cats-and-dogs” | “cats” |
Tag Keys | Tag Values |
---|---|
“shelter” | “A”, “B” |
“type” | “calico” , “tabby” |
Field Keys |
---|
“young” , “adult” |
If we query our data with:
from(bucket: "cats-and-dogs")
|> range(start: 2020-05-15T00:00:00Z, stop: 2020-05-16T00:00:00Z)
|> filter(fn: (r) => r["_measurement"] == "cats")
|> filter(fn: (r) => r["_field"] == "adult")
|> filter(fn: (r) => r["shelter"] == "A")
|> filter(fn: (r) => r["type"] == "calico")
|> limit(n:2)Copy
This is what our Annotated CSV result looks like:
#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,_result,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,shelter,type
,,0,2020-05-15T00:00:00Z,2020-05-16T00:00:00Z,2020-05-15T18:50:33.262484Z,8,adult,cats,A,calico
,,0,2020-05-15T00:00:00Z,2020-05-16T00:00:00Z,2020-05-15T18:51:48.852085Z,7,adult,cats,A,calico
The Annotated CSV has three Annotations. The Annotation rows describe column properties and start with a #
.
#group
: A boolean that indicates the column is part of the group key. A group key is a list of columns for which every row in the table has the same value. Let’s look at the difference between the true and false columns, or the columns that are and aren’t part of the group key, respectively.- true columns: In our example query above, we’ve filtered by a single field type, adult, a single “shelter” tag, “A”, and a single “type” tag, “calico”. These values are constant across rows, so those columns are set to true. Also, filtering for a single value across tags and fields means that all of our tables will belong to the same table. Therefore the table column is also true for this query. The _start and _stop columns, defined by our range, are constants across the rows so these values are also true.
- false columns: The _time and _value columns have different values values across rows which is why they receive a false for the value of the #group Annotation.
#datatype
: Describes the type of data or which line protocol element the column represents.#default
: The value to use for rows with an empty value. So for example, if we had assigned our query to the variable ourQuery this annotation would look like:#default,ourQuery,,,,,,,,,