skylight-api

PVX API — PVQL

Table of contents

  1. PVQL syntax
  2. PVQL functions

The PVQL Syntax

The complete PVQL syntax is as follows:

<values> [BY <group>] FROM <source>
    [WHERE <group-filter>] [HAVING <values-filter>]
    [TOP <window-and-ordering>] [SINCE <timestamp>] [UNTIL <timestamp>]
    [RAW]
    [SETTINGS <key-values>]

A valid PVQL query includes at least one <value> and one <source> in the FROM clause; the BY/WHERE/HAVING clauses are optional, and all other clauses default to specific values (defined below).

Group and Values

<values> and <group> are expressions built from fields; the exhaustive list of PVQL fields can be found in the Definitions page. A field is either a value field (a metric), or a key field (a group), and can be used in the corresponding clause.

The fields are the simplest form of expressions, and you can request them as follows:

client.traffic FROM transport

To request multiple values at once, use a comma as separator:

client.traffic, server.traffic FROM transport

Some fields are built from simpler ones; for example, traffic is defined as client.traffic + server.traffic. You can use either one when querying:

traffic FROM transport

client.traffic + server.traffic FROM transport

When requesting multiple complex expressions, it is useful to name them to distinguish which is which. PVQL supports aliases with the AS keyword:

client.traffic + server.traffic AS "Total traffic",
    client.traffic / (client.traffic + server.traffic) AS "Client Traffic Ratio"
    FROM transport

The group clause (with the BY keyword) specifies how to partition the results: by time, by zones, applications, IP subnets, etc. You can group by multiple expressions.

client.traffic BY application.name FROM transport

traffic BY client.zone.name, server.zone FROM transport

traffic BY time(600) FROM transport

traffic BY time_exclusion.business_hours, time_exclusion.maintenance_windows
    FROM tcp

In the last example, we group the data for each period of 10 minutes (600 s). A given point in PVQL corresponds to the data that preceded it. For example, when using BY time(600), the point at 12:00:00 is actually the data accumulated between 11:50:00 and 12:00:00.

Filtering

The filter specified with the WHERE keyword lets you limit the query to certain keys.

client.traffic BY layer, application.name FROM transport WHERE application.name != "http"

client.rtt BY client.ip FROM transport WHERE (application.name = "http"
    OR application.name = "https") AND server.zone IN "/Local"

traffic FROM tcp WHERE NOT time_exclusion.any

Notice the IN operator in this last example. It allows for filtering on the "/Local" zone or any of its children.

The IN operator may also be used to check if an element belongs to a set:

client.traffic FROM transport WHERE server.port IN {80, 443, 8000..8800}

and for wildcard or pattern matching (using re2 syntax):

client.traffic FROM tls WHERE server_name IN glob("*.google.*")

client.traffic FROM http WHERE host IN re("^www\d")

These functions exist in case-insensitive variants (iglob, ire). There are also functions text / itext for simple string comparisons (default is case-insensitive).

You can further filter the final result with an HAVING clause, to keep only the rows matching the given expression.

client.traffic BY application.name FROM transport HAVING client.traffic >= 10000

Additionnaly, if you need to eliminate values at an early stage of the query, you can filter fields directly in the WHERE clause using the raw. prefix. Note that filtering this way requires a certain knowledge of the way we store data.

client.traffic, client.dtt BY application.name FROM transport WHERE raw.client.dtt != 0

Window and Ordering

The window & ordering clause (which uses the TOP keyword) allows to order and extract a portion of the result. To extract the first 5 results, use the following:

client.traffic BY application.name FROM transport TOP 5

To request 10 lines of data while skipping the first 5 (that is, at offset 5, since offsets start at 0), use the following:

client.traffic, server.traffic BY client.zone.name, server.zone FROM transport TOP 10@5

The ALL keyword lets you request all lines of data starting at a given offset:

client.traffic, server.traffic BY client.zone.name, server.zone FROM transport TOP ALL@5

As for sorting, PVQL lets you sort on any or all of the values and keys in the query. The i-th value is referred to by the name vi, while the j-th key is referred to by the name kj. These names are used in the TOP clause:

client.traffic, server.traffic BY client.zone.name, server.zone FROM transport
    TOP 10@5 {v2 ASC, k1 ASC}

The sort direction accepts ASC or DESC. By default, the keys are sorted in ascending order, and the values in descending order.

Some additional names are available in the ordering clause:

By default the result is sorted by values in descending order unless the query is grouped by time, in which case the default behavior is to return the result in chronological order.

Please note that you can only sort on expressions present in the query.

Source Layer

The FROM clause lets you specify which layer to request.

global.dtt BY application.name FROM http

Date Range

The SINCE / UNTIL clauses allow for limitting the query to a given time interval. They accept either the ISO8601 standard, or time expressions:

global.dtt BY application.name FROM http SINCE @(2018-02-20T10:00:00) UNTIL @now - 60*60

In this example, @now refers to the current time in seconds, thus @now - 60*60 refers to 3600 seconds ago (1 hour ago).

Count Operator

PVQL provides a special # operator if we want to retrieve for a key, the number of distinct elements.

#client.ip FROM tcp

In this example, we request to PVQL the number of distinct client IPs. While without this operator, PVQL will return a set of distinct client IPs.

Settings

A PVQL query can provide a SETTINGS clause to change some internal behaviors of the query, and can be requested as follows:

client.traffic FROM transport SETTINGS name = value [, ...]

We provide several settings for PVQL:

limit_size_set, which limits the size of the returned sets, for example when we request a key as a value, the argument can take a positive number, or a special value ALL.

client.ip FROM transport SETTINGS limit_size_set = 10

This query above will return the first 10 client IPs found during the execution of the query.

client.ip FROM transport SETTINGS limit_size_set = ALL

While this query will return all client IPs found during the execution of the query.

By default, PVQL returns the first 100 elements on a set.

query_id is a parameter if we want to specify a specific PVQL query_id in case we want to cancel the current query, the query could be cancel using the endpoint query-cancel.

degradation_mode, this setting is used to know how much PVQL should degrade the query. The more a query has been degraded, the faster but the less accurate it will be, and the less a query has been degraded, the slower but more precise it will be.

We have several parameters for this setting:

fast mode is designed for queries where response time is really important (ex: a dashboard with many widgets) and where we just need to have an approximation regarding the result of the query. Generally speaking, only complex queries are degraded: for example queries involving keys or values that are expensive to process, or queries executed on long time ranges.

client.ip FROM transport SETTINGS degradation_mode = "fast"

precise mode will give more accurate results, while keeping the execution time under control. It’s a middle-ground between accuracy and execution time, it can be used for a single widget with many values or keys that are complex to process.

client.ip FROM transport SINCE @(2019-01-20T10:00:00) UNTIL @(2019-02-20T10:00:00) SETTINGS degradation_mode = "precise"

unlimited mode is the most accurate mode and will give the result with the best accuracy as possible, but should be used only if we don’t have any other choice, or for debugging purpose as the execution time will be unchecked: it might prove too long for real-world queries and may slow down the whole system.

client.ip FROM transport SETTINGS degradation_mode = "unlimited"

Raw

Finally, the RAW clause lets you disable result aggregation and retrieve raw values stored in the database:

traffic BY application.name FROM transport RAW

This query does not return the total traffic for each application, but the traffic and application of each database record.

String Literals

A string literal in PVQL is an arbitrary sequence of non-null characters (characters with nonzero values) bounded by double quotes ("), for example "This is a string". To include a double-quote character within a string literal, write two adjacent double quotes, e.g. "Alice says ""Hello"" to Bob".

PVQL also accepts “escape” string literals, bounded by backquotes ( `). Within an escape string, a backslash character (\) begins a backslash escape sequence, in which the combination of backslash and the following character(s) represents a special character. The recognized escape sequences are:

Escape Sequence Meaning
\\ Backslash
\` Backquote
\a ASCII Bell (BEL)
\b ASCII Backspace (BS)
\f ASCII Formfeed (FF)
\n ASCII Linefeed (LF)
\r ASCII Carriage Return (CR)
\t ASCII Horizontal Tab (TAB)
\v ASCII Vertical Tab (VT)
\xhh Character with hex value hh
\uxxxx Character with 16-bit hex value xxxx
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx
\N{name} Character named name in the Unicode database

Variables

PVQL supports variables. This allows you to insert variables with names you choose (e.g. $server) whereever PVQL is expecting a value. When querying data with such a PVQL expression, you can assign the variables with custom values. The query is performed as if all variables were substituted with the corresponding values.

For example, if you want a PVQL query regarding an application for a specific server, you would usually write something like this:

server.rt, server.dtt FROM tcp WHERE application.name = "NC" AND server.ip = 8.8.8.8

However, you might want to be able to easily choose another IP address without rewriting the query each time. For this purpose, you can introduce a variable, such as $server, in place of the actual value:

server.rt, server.dtt FROM tcp WHERE application.name = "NC" AND server.ip = $server

Then wherever you request data using this query, you pass the corresponding value in the variables parameters.

Example:

=> query expr="server.rt, server.dtt FROM tcp WHERE application.name = \"NC\" AND server.ip = $server" \
..       variables={"server": {"value": "8.8.4.4", "type": {"type": "ip"}}}

Examples

In the following examples, it is implied that the queries are applied on the last hour (because if neither SINCE nor UNTIL are provided, the default time interval is used).

Retrieving the client traffic over time for the “transport” layer (which is the combination of TCP, UDP, ICMP, Non IP and Other IP layers):

client.traffic FROM transport BY time()

Retrieving both the client traffic and the server traffic, over time for each “transport” layer:

client.traffic, server.traffic FROM transport BY time(), layer

Retrieving the client traffic, the server traffic, and the mean of both, over time by points of 5 minutes (300 seconds):

client.traffic, server.traffic, (client.traffic + server.traffic) / 2
    BY time(300) FROM transport

Retrieving the total traffic for each /24 IPv4 subnet limited to those in 192.168.0.0/16 (thus for each 256 192.168.x.0/24 networks):

traffic BY server.ip[24] FROM transport WHERE server.ip[16] = 192.168.0.0

Retrieving the 20 highest average server response times for a matrix of application × client zone:

server.rt BY application.name, client.zone.name FROM transport TOP 20

Retrieving the server traffic per client zone:

server.traffic BY client.zone.name FROM transport

Retrieving the server traffic for a matrix of client zone × application:

server.traffic BY client.zone.name, application.name FROM transport

Retrieving the total traffic by protocol stack for each IPv4 client within 192.168.0.0/16:

traffic BY protostack, client.ip FROM transport WHERE client.ip[16] = 192.168.0.0

Retrieving the total traffic from the HTTP layer for the given host:

traffic FROM http WHERE host = "www.google.com"

Retrieving the top 10 total traffic by host from the HTTP layer limited to those matching the given pattern:

traffic BY host FROM http WHERE host IN "*.google.com" TOP 10

The server.ip[n] expression in these examples refer to subnets.

PVQL Functions

date

Signature

date(spec: set) -> set

Description

Parse a date in ISO 8601 format as a PVQL time.

Arguments

Returns

A PVQL time based on the date string.

date

Signature

date(spec: string) -> time

Description

Parse a date in ISO 8601 format as a PVQL time.

Arguments

Returns

A PVQL time based on the date string.

family

Signature

family(ip: set) -> set

Description

Get the family type of the current IP.

Arguments

Returns

IPv4 or IPv6, depending of the current IP address type.

family

Signature

family(ip: ip) -> ipfamily

Description

Get the family type of the current IP.

Arguments

Returns

IPv4 or IPv6, depending of the current IP address type.

flatten

Signature

flatten(zone: set, depth: set) -> set

Description

Simplify a zone hierarchy up to a maximum depth level.

Arguments

Returns

A parent zone ID, depending on depth.

Example

flatten(client.zone.id, 1) where zone hierarchy of client.zone.id = [1, 2] returns 1.

flatten

Signature

flatten(url: set, depth: set) -> set

Description

Return an URL to a maximum depth level using its hierarchical path.

Arguments

Returns

An URL to a maximum depth level using its hierarchical path.

Example

traffic BY flatten(url, 2) FROM http where url is http://example.com/a/b/c/d returns http://example.com/a/b/.

flatten

Signature

flatten(url: url, depth: number) -> url

Description

Return an URL to a maximum depth level using its hierarchical path.

Arguments

Returns

An URL to a maximum depth level using its hierarchical path.

Example

traffic BY flatten(url, 2) FROM http where url is http://example.com/a/b/c/d returns http://example.com/a/b/.

flatten

Signature

flatten(zone: set, depth: set) -> set

Description

Simplify a zone hierarchy up to a maximum depth level.

Arguments

Returns

A parent zone, depending on depth.

Example

flatten(client.zone.name, 2) where client.zone.name = "/All/Public/Documents" returns /All/Public.

flatten

Signature

flatten(zone: zone, depth: number) -> zone

Description

Simplify a zone hierarchy up to a maximum depth level.

Arguments

Returns

A parent zone, depending on depth.

Example

flatten(client.zone.name, 2) where client.zone.name = "/All/Public/Documents" returns /All/Public.

flatten

Signature

flatten(zone: zone_id, depth: number) -> zone_id

Description

Simplify a zone hierarchy up to a maximum depth level.

Arguments

Returns

A parent zone ID, depending on depth.

Example

flatten(client.zone.id, 1) where zone hierarchy of client.zone.id = [1, 2] returns 1.

glob

Signature

glob(value: set) -> set

Description

Build a wildcard pattern matcher.

Internally, the PVQL wildcard pattern is converted to a case sensitive regex.

Arguments

Returns

A case sensitive regex.

Example

traffic FROM http WHERE url IN glob("http://*") returns the traffic from all URLs that use the HTTP scheme.

glob

Signature

glob(value: string) -> regex

Description

Build a wildcard pattern matcher.

Internally, the PVQL wildcard pattern is converted to a case sensitive regex.

Arguments

Returns

A case sensitive regex.

Example

traffic FROM http WHERE url IN glob("http://*") returns the traffic from all URLs that use the HTTP scheme.

iglob

Signature

iglob(value: set) -> set

Description

Build a wildcard pattern matcher.

Internally, the PVQL wildcard pattern is converted to a non-case sensitive regex.

Arguments

Returns

A non-case sensitive regex.

Example

traffic FROM http WHERE url IN glob("http://*") returns the traffic from all URL using the HTTP protocol.

iglob

Signature

iglob(value: string) -> regex

Description

Build a wildcard pattern matcher.

Internally, the PVQL wildcard pattern is converted to a non-case sensitive regex.

Arguments

Returns

A non-case sensitive regex.

Example

traffic FROM http WHERE url IN glob("http://*") returns the traffic from all URL using the HTTP protocol.

ire

Signature

ire(value: string) -> regex

Description

Build with a re2 regex string, as a PVQL case-insensitive regex.

Arguments

Returns

The same string, converted to a regex type.

Example

traffic FROM http WHERE url IN re("^(http|https)://*.com$") returns the traffic from HTTP URLs that use the top-level domain .com.

ire

Signature

ire(value: set) -> set

Description

Build with a re2 regex string, as a PVQL case-insensitive regex.

Arguments

Returns

The same string, converted to a regex type.

Example

traffic FROM http WHERE url IN re("^(http|https)://*.com$") returns the traffic from HTTP URLs that use the top-level domain .com.

itext

Signature

itext(value: set) -> set

Description

Force PVQL to apply a case insentitive comparaison on filtering.

Arguments

Returns

The same string, with an internal marker for doing case-insensitive operations.

Example

query FROM databases WHERE itext(system) = itext("mysql") returns all queries from the MySQL databases.

itext

Signature

itext(value: Parameter<S>) -> normalized_case

Description

Force PVQL to apply a case insentitive comparaison on filtering.

Arguments

Returns

The same string, with an internal marker for doing case-insensitive operations.

Example

query FROM databases WHERE itext(system) = itext("mysql") returns all queries from the MySQL databases.

lower

Signature

lower(value: Parameter<S>) -> Parameter<S>

Description

Transform a string to a lowercase string.

Arguments

Returns

The same string, on lowercase format.

lower

Signature

lower(value: set) -> set

Description

Transform a string to a lowercase string.

Arguments

Returns

The same string, on lowercase format.

meta_category_capture

Signature

meta_category_capture(cat_type: set) -> set

Description


meta_category_capture

Signature

meta_category_capture(cat_type: string) -> string

Description


meta_category_capture_iface_client

Signature

meta_category_capture_iface_client(cat_type: string) -> string

Description


meta_category_capture_iface_client

Signature

meta_category_capture_iface_client(cat_type: set) -> set

Description


meta_category_capture_iface_server

Signature

meta_category_capture_iface_server(cat_type: set) -> set

Description


meta_category_capture_iface_server

Signature

meta_category_capture_iface_server(cat_type: string) -> string

Description


meta_category_zones_client

Signature

meta_category_zones_client(cat_type: string) -> string

Description


meta_category_zones_client

Signature

meta_category_zones_client(cat_type: set) -> set

Description


meta_category_zones_server

Signature

meta_category_zones_server(cat_type: set) -> set

Description


meta_category_zones_server

Signature

meta_category_zones_server(cat_type: string) -> string

Description


prefix

Signature

prefix(mac: set, mask: set) -> set

Description

Keep the nth bytes on the current MAC address. The expression prefix_mac(client.mac, 24) is equivalent to client.mac/24.

Arguments

Returns

A MAC address where the first nth bytes have been kept, and the remaining ones are zeros.

prefix

Signature

prefix(ip: ip, mask: number) -> ip

Description

Apply a CIDR mask on the current IPv4. The expression prefix(client.ip, 24) is equivalent to client.ip/24.

Arguments

Returns

An IPv4 with the applied CIDR mask.

prefix

Signature

prefix(ip: set, mask: set) -> set

Description

Apply a CIDR mask on the current IPv4. The expression prefix(client.ip, 24) is equivalent to client.ip/24.

Arguments

Returns

An IPv4 with the applied CIDR mask.

prefix

Signature

prefix(mac: mac, mask: number) -> mac

Description

Keep the nth bytes on the current MAC address. The expression prefix_mac(client.mac, 24) is equivalent to client.mac/24.

Arguments

Returns

A MAC address where the first nth bytes have been kept, and the remaining ones are zeros.

prefix6

Signature

prefix6(ip: ip, mask: number) -> ip

Description

Apply a CIDR mask on the current IPv6. The expression prefix_ipv6(client.ip, 48) is equivalent to client.ip/48.

Arguments

Returns

An IPv6 with the applied CIDR mask.

prefix6

Signature

prefix6(ip: set, mask: set) -> set

Description

Apply a CIDR mask on the current IPv6. The expression prefix_ipv6(client.ip, 48) is equivalent to client.ip/48.

Arguments

Returns

An IPv6 with the applied CIDR mask.

re

Signature

re(value: set) -> set

Description

Build with a re2 regex string, as a PVQL case sensitive regex.

Arguments

Returns

The same string, converted as a regex type.

Example

traffic FROM http WHERE url IN re("^(http|https)://*.com$") returns the traffic from HTTP URLs that use the top-level domain .com.

re

Signature

re(value: string) -> regex

Description

Build with a re2 regex string, as a PVQL case sensitive regex.

Arguments

Returns

The same string, converted as a regex type.

Example

traffic FROM http WHERE url IN re("^(http|https)://*.com$") returns the traffic from HTTP URLs that use the top-level domain .com.

safesum

Signature

safesum(left: set, right: set) -> set

Description

Safe operator for add if we have a nullable argument. If either operand is NULL, it will be replaced by 0.

Arguments

Returns

The result of left + right.

safesum

Signature

safesum(left: number, right: number) -> number

Description

Safe operator for add if we have a nullable argument. If either operand is NULL, it will be replaced by 0.

Arguments

Returns

The result of left + right.

strip_url

Signature

strip_url(value: set) -> set

Description

Removes the query string and fragment identifier from an URL.

Arguments

Returns

An URL, without the query string and fragmant identifier.

strip_url

Signature

strip_url(value: string) -> url

Description

Removes the query string and fragment identifier from an URL.

Arguments

Returns

An URL, without the query string and fragmant identifier.

strip_url

Signature

strip_url(value: url) -> url

Description

Removes the query string and fragment identifier from an URL.

Arguments

Returns

An URL, without the query string and fragmant identifier.

strip_url

Signature

strip_url(value: set) -> set

Description

Removes the query string and fragment identifier from an URL.

Arguments

Returns

An URL, without the query string and fragmant identifier.

substr

Signature

substr(value: set, offset: set, length: set) -> set

Description

Return a substring from an offset, up to a certain bytes length.

Arguments

Returns

A substring.

substr

Signature

substr(value: Parameter<S>, offset: number, length: number) -> Parameter<S>

Description

Return a substring from an offset, up to a certain bytes length.

Arguments

Returns

A substring.

substr

Signature

substr(value: Parameter<S>, length: number) -> Parameter<S>

Description

Return a substring from the beginning, up to a certain bytes length.

Arguments

Returns

A substring.

substr

Signature

substr(value: set, length: set) -> set

Description

Return a substring from the beginning, up to a certain bytes length.

Arguments

Returns

A substring.

text

Signature

text(value: Parameter<S>) -> normalized_case

Description

Force PVQL to apply a case-insentitive comparaison on the filtering.

Arguments

Returns

The same string, with an internal marker for not doing case-insensitive operations.

Example

query FROM databases WHERE text(system) = text("MySQL") returns all queries from the MySQL databases.

text

Signature

text(value: set) -> set

Description

Force PVQL to apply a case-insentitive comparaison on the filtering.

Arguments

Returns

The same string, with an internal marker for not doing case-insensitive operations.

Example

query FROM databases WHERE text(system) = text("MySQL") returns all queries from the MySQL databases.