TSDB tutorial - 2023/11/16

Key concepts
IMPORTANT NOTE for Windows users
Setup
Exploring existing data
Using the Giraffe GUI
Injecting data
To go further

We will study here the basic usage of a Time Series Database. We will base our work on the InfluxDB database, and study how to use it from the command line and through its web interface. We will query existing data, inject custom data and build some queries against it, illustrating some features of InfluxDB.

We will need 3 terminal windows: one for running the InfluxDB database (and see its log messages), another one for running the Telegraf service, and the last one for running the command-line client influx.

Key concepts

InfluxDB stores measurements as series. A measurement is composed of a name, a timestamp, and its values. Since there may be multiple properties for the same measurement, values are stored in the form field=value where field is an alphanumeric identifier. Field values can be strings (with single quotes), floats, integers, or booleans.

Another similar notion associated to measurement is tags: tags are optionally defined for each measurement as tagname=value, where value is always considered as a string.

The main and important difference between fields and tags is that tags are indexed: InfluxDB will be faster at querying data based on tags. They should contain appropriate metadata (source of measurement, sensor id…).

A series is a collection of points that share a measurement, a tag set and a field key.

Data is stored in buckets. A bucket combines the concept of a database and of a retention period (the duration of time that each data point persists). Buckets may (optionally) define explicit schemas that measurements must respect.

The database can be queried using a query language named Flux. In previous version there was a SQL-inspired language, with time-specific extensions, named InfluxQL, but some queries could not be expressed with it. In the context of this tutorial, we will use Flux.

IMPORTANT NOTE for Windows users

While not required by Influx, which can run in any environment, in the context of this tutorial you must use a WSL shell to execute all the commands given in the text

Setup

We will use the 2.7.3 InfluxDB version.

For settings things up, we will download static binaries from https://portal.influxdata.com/downloads/

Note: you do not need to register, you can just close the Register dialog popup.

Static binaries install

This unconventional approach can be used even when you do not have admin rights on the computer: all software runs in user mode and uses non-privileged ports. Obviously you should

IMPORTANT NOTE for Windows it seems you need to locate the database in a directory located on the Windows filesystem (i.e. not in the pseudo fs emulated for $HOME). So: make sure you are in your User directory on Windows (the prompt should display something like /mnt/c/Users/username/. Create a tsdb directory somewhere in this tree (for instance /mnt/c/Users/username/Desktop/tsdb to find it easily) and make sure to always run the commands from there.

Download the Influxdb (v 2.7.3), Telegraf software and the InfluxDB v2 Cloud CLI (as static binaries*), *for Linux 64-bit (even on Windows since we will use WSL).

Uncompress the archives, and run the server in a terminal (again: make sure you are in the right directory):

./influxdb2-2.7.3/usr/bin/influxd --bolt-path=data/bolt --engine-path=data/engine

Then access the administrative URL at the address http://localhost:8086/

Create a new user, with a password (do not use your existing account password). Use tp for initial organization name (it corresponds to a group), and local for initial bucket name.

Note in a file the API key it generates to authentify against the admin user. Then click on Quick start to setup local metric collection. It will automatically create sensors for multiple measurements linked to the InfluxDB database itself (number of users, of queries, of databases…) so that you can monitor its behaviour. The associated dashboard InfluxDB 2.0 OSS Metrics will also be created so that you can visualize the data.

[Question] Open the InfluxDB 2.0 OSS Metrics dashboard. How many buckets are defined in the system.

CLI usage

Let's configure the authentication of the command-line tool, so that we can use it more conveniently.

You should have kept the API token generated for the admin user. If not, you will to generate a new token to grant access to the database. In the web GUI, got to the Load Data / API Tokens. Click on Generate API token and select All Access API token (for simplifying things in the context of this tutorial, do not do this at home). Click on the created token and copy its textual value into a file to preserve it.

Open another terminal (the first one is used to have the daemon run) and set the environment variables so that you can use them more easily:

export INFLUX_TOKEN=<token value>
export INFLUX_ORG=tp
export INFLUX_BUCKET=local

Then configure the authentication for the CLI with the config command:

./influx config create --config-name tpconfig \
  --host-url http://localhost:8086 \
  --org $INFLUX_ORG \
  --token $INFLUX_TOKEN \
  --active

You can now use influx on the command line to interact with your database. Do not use the web interface for the moment.

[Question] What is the influx command that allows you to list the available buckets?

[Question] What are the names of the available buckets?

Monitor system resources

For the moment, our InfluxDB is configured to monitor itself. We will configure a system resource monitor, in order to monitor the computer system. In the main page, click on Load your data button.

The telegraf component is used to collect data and send it to the InfluxDB database. In the Load data interface, select the Telegraf tab and create a new configuration. Select the System configuration, select all measurements (cpu, disk, etc) and create the configuration.

A specific token is created so that the Telegraf component can only write its data to a specific bucket. Copy it to the same file as the first token to keep a copy. A URL endpoint for writing data is also given. They are indicated in the dialog displayed after the creation.

Open another terminal (you should already have one for the database and one for running the influx client), set the INFLUX_TOKEN environment variable generated for the Telegraf component and start the telegraf server (note that it is in usr/bin in the extracted telegraf directory) with the command-line given by Influx. We will let it run for a while so that it collects some data, before exploring it.

Exploring existing data

Database creation

We will use a sample database. Download sample data. The grep part is to keep only lines that define measurements.

curl https://s3.amazonaws.com/noaa.water-database/NOAA_data.txt | grep h2o > NOAA_data.txt

Have a look at the loaded data to make sure everything is ok. Check the timestamp values and determine their unit. Hint: in python, time.localtime returns a time structure from a number of seconds.

Using the command-line interface, create a new bucket named noaa and import the NOAA data into it (influx binary, write command). Important Do not forget to specify the precision (time unit) used in your file.

[Question] Give the command-line used to import the data file.

Global exploration

We will first use the command-line interface influx to start exploring the data, and make queries using the Flux language.

The Flux language often requires to filter against a time range. We first have to determine the appropriate time range for our dataset. For this, we will do some queries with a very large time range (starting in 1900) so that we know that we should get the appropriate info.

The following command runs the query that outputs the first measurement for all series:

./influx query -r 'from (bucket: "noaa") |> range(start: 1900-01-01, stop: now()) |> first() |> keep(columns: [ "_time", "_measurement" ])'

[Question] What is the beginning datetime of measurements? Note: you will need this information in the next questions

Modify the above query to find the time of the last measurement of noaa.

[Question] What is the new query, and what is the end datetime of measurements?

Now that we know the appropriate time range, we can investigate our data structure. Influx provides a package (functions defined using Flux) to help with this, in the schema package. See https://docs.influxdata.com/influxdb/v2/query-data/flux/explore-schema/ for its documentation.

Note that many of the exploring functions need a start parameter expressing a duration (the default -30 days if fine for real-time data, but is too short in our case).

[Question] What measurements are present in the noaa bucket? What is the query to list them?
[Question] What are the fields defined in the h2o_feet measurements, and the query to find them?
[Question] What are the tags keys defined in the h2o_feet measurements, and the query to find them?
[Question] What are the different values for the location tag in h2o_feet measurements, and the query to find them?

Data exploration

Using the Flux syntax, answer the following questions:

[Question] How many measurement points of water level are there in the h2o_feet measurement for each location? (hint: count). Give the query used.
[Question] Convert the h2o_feet measurement into a new h2o_meter measurement by converting the water_level values to meters (hint: map for mapping values, to for storing the new measurement into noaa). Give the query used.

Using the Giraffe GUI

The CLI is useful for administration and some common tasks like data manipulations, but dealing with time series data is more convenient when you can display the stored data. The InfluxDB binary includes a web-based GUI named Giraffe.

The Explore component allows you to build queries and visualize results. The Dashboard components can combine multiple visualizations into a single-page dashboard.

Build a query through the Data Explorer component (expand left column to see the names) to display the water level for both locations as a timeline (Graph). Pay attention to the time range definition.
Click on "Script editor" to see the Flux query built by the GUI. To better understand the output structure, use the Table visualisation component (upper left corner) that displays raw data in table.
In the Query editor, remove the aggregateWindow call and see the difference in the rendering of the graph (when zoomed).

[Question] Display the h2o_temperature graph. Its values are in Farenheit degrees, do a conversion (hint: map) to get a display in celsius degrees. Give the used query.

Dashboard building: create a new Dashboard where you will define 2 timeline views: one for the water level, and the other for the average temperature.

[Question] Screenshot of visualisation

Injecting data

Data injection is most commonly done using the Telegraf component, as we saw at the beginning of the session. But there are multiple bindings that offer this possibility from different languages.

InfluxDB exposes a REST API on port 8086. You can write a measurement with a curl command:

curl --request POST \
"http://localhost:8086/api/v2/write?org=$INFLUX_ORG&bucket=$INFLUX_BUCKET&precision=ns" \
  --header "Authorization: Token $INFLUX_TOKEN" \
  --header "Content-Type: text/plain; charset=utf-8" \
  --header "Accept: application/json" \
  --data-binary '
    monitoring,computer=u1 temperature=72.5,load=.7
    monitoring,computer=u2 temperature=67,load=.5'

Another way is to use the influx write command from the influx CLI:

influx write \
-b bucketName \
-o orgName \
-p s \
'monitoring,computer=u1 temperature=72.5,load=.7 1556896326'

But we will here use the javascript client library to instrument a web page in order to send measures to the database.

Create a mouse bucket in Influxdb.

Download the provided HTML template in a directory and complete it (look for the TODO string) to connect to the Influx database and send tracking events to the mouse bucket. The HTML file displays 4 squares. The goal is to trace the activity of the mouse in the different squares, and determine attributes of the mouse trajectory in each of the squares.

[Question] Provide the relevant source code extract that you had to implement.

To access the page, launch a temporary web server (python3 -m http.server in the same directory as the HTML file will do the job) and access it through the web browser (to avoid security limitations for JS on local files).

Make sure, through the Influx GUI, that you can see the different captured values for x and y, grouped by source.

[Question] Use the Scatter plot to display mouse trajectories in a meaningful way, illustrating the captured moves. You will have to pivot the data. Give the Flux query and a screenshot.

Determine the global distance covered by the mouse in each of the squares (hint: it involves pivot, difference, and math.sqrt).

[Question] Give the query used to obtain the result

Extending the previous query, determine the mean speed (in pixels/seconds) of the mouse for each of the squares (hint: it involves derivative)

[Question] Give the query used to obtain the result

There are no more questions to answer. The rest of the text presents some additional features that you are free to explore at your leisure.

Bonus activity 1: define a dashboard that displays the scatter plot and speed/distance by widget, and updates in real time.
Bonus activity 2: implement additional sensors (mouse clicks, number of typed characters, acceleration…)

To go further

We only saw here some basic features of the InfluxDB stack. We did not address complex queries, continuous queries, using alerts… Feel free to explore.

Bonus activity: replicate the tutorial using Warp10.io instead of InfluxDB, using either a locally installed version or the https://sandbox.senx.io/ sandbox server.

For reference, some interesting approaches to test for time series analytics/prediction: