API monitoring

Overview

OpenTV Video Platform uses Prometheus to collect API usage and performance data from both Platform and SSP. You can then query Prometheus to get information such as:

Number of responses from a particular endpoint
Total response time
Whether a particular probe was successful or not

Prometheus allows you to construct complex, sophisticated queries. It is beyond the scope of this page to cover all of its functionality.

For full details, see the Prometheus API documentation:

You can perform calculations on the data that is returned to compute additional metrics.

Metric categories

For OpenTV Video Platform and SSP, there are two categories of metrics that NAGRA exposes through Prometheus:

Probe metrics

The probe_success metric indicates whether execution was successful for each probe.

Nginx metrics

There are a number of metrics that are gathered by monitoring nginx. These include response time, which is the interval between the arrival of a request and the response, that is, how long it takes to service each request. This is a key indicator of an application's performance.

An increase in response time can mean that there is an issue with an end-user application or with an upstream service, which may be caused by a recent change or upgrade.

The following metrics are available:

sni_http_response_count_total – the total number of processed HTTP responses
sni_http_response_time_seconds – a summary vector of the total response times (in seconds)
sni_http_response_time_seconds_sum – a sum of the total response times in seconds
sni_http_response_time_seconds_count – the total number of processed HTTP responses

You can perform calculations on the data that is returned to compute additional metrics, such as:

Average response time per API
Requests per second per API
Response time for the top n% of calls per API

Available metrics

The following nginx metrics are available:

Module	Metric name	REST methods
Account and Device Manager (ADM)	`adm_accounts_actions`	DELETE
	`adm_bundled_accounts`	GET, POST
	`adm_devices`	GET, POST, DELETE,
	`adm_update`	GET, POST
	`adm_user_accounts`	GET
API Gateway (AGW)	`agw_create`	POST
Cast, Crew, and Persona Service (CCP)	`ccp`	GET
Content Builder	`rail`	GET
CRM Gateway (CRM-GW)	`crm_gateway`	GET
IAM (Keycloak)	`iam`	GET
Identity Authentication Service (IAS)	`ias_content_token`	GET, POST
Identity Authentication Service (IAS)	`ias_token`	POST
Image Handler Service (IHS)	`ihs`	GET
Keycloak	`keycloak_nagra`	POST
	`keycloak_opcon`	GET, POST
	`keycloak_resources`	GET
Metadata Server (MDS)	`mds_events`	GET, PUT, POST, DELETE
	`btv_programmes`	GET
	`btv_services`	GET
	`epg`	GET
	`solr_search`	GET
	`vod_editorials`	GET
	`vod_nodes`	GET
	`vod_products`	GET
Operator Console (OpCon)	`opui`	GET
	`opconsole_adm`	GET, POST
	`opconsole_bcm`	GET
	`opconsole_core`	GET, PUT, POST
Rights Manager (RMG)	`rmg`	GET, POST
User Activity Vault (UAV)	`uav`	GET, PUT, POST
User Recordings	`cdvr`	GET, POST, DELETE

Authentication

Access to the Prometheus APIs is controlled by Keycloak. See Accessing operator APIs using Keycloak for more information.

Output formatting

If you are using Postman to make these requests, it automatically pretty-prints the JSON output.

If you are using curl, you can pipe its output to the jq JSON formatting tool to make the output more readable.

For example:

CODE

curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: bearer <keycloak_token>| jq

Examples

Get all monitored endpoints

Request

To query Prometheus for all the endpoints it is monitoring, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

To save space, the following example includes the output for one module only (in this case, ADM).

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "__name__": "probe_success",
                    "api": "Account and Device Manager",
                    "instance": "http://http-router/adm/v1/accounts?limit=0",
                    "job": "adm-api"
                },
                "value": [
                    1673878896.183,
                    "1"
                ]
            },
            ...
        ]
    }
}

Get a count of monitored endpoints

Request

To query Prometheus for a count of the monitored endpoints, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=count(probe_success)

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {},
                "value": [
                    1674043081.6,
                    "31"
                ]
            }
        ]
    }
}

This shows that 31 endpoints are being monitored. (The other value in the same block is the Unix epoch timestamp.)

Get a list of monitored endpoints showing only the most relevant fields

Request

To query Prometheus for a list of monitored endpoints, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=count without (job,api)(probe_success)

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

To save space, the following example includes the output for one endpoint only (in this case, ADM accounts).

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "instance": "https://operator.sitq3ga.otv-staging.com/adm/v1/accounts?limit=0"
                },
                "value": [
                    1674045670.591,
                    "1"
                ]
            },
            ...
        ]
    }
}

If you are using curl and jq, you can use the -r option to filter the output to show just the list of endpoints.

For example:

CODE

curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=count without(job,api)(probe_success)' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: bearer <keycloak_token> | jq -r '.data.result[].metric.instance'

Get a list of inactive endpoints

Request

To query Prometheus for just the endpoints that are inactive, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success==0

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

To save space, the following example includes the output for one module only (in this case, MDS).

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                 "metric": {
                    "__name__": "probe_success",
                    "api": "External",
                    "instance": "https://admin.sitq3ga.otv-staging.com/metadata/delivery/GLOBAL/vod/nodes?limit=0",
                    "job": "mds-api"
                },
                "value": [
                    1674048210.825,
                    "0"
                ]
             },
             ...
        ]
    }
}

Get usage counts for all metrics and statuses

Request

To query Prometheus for the total response count per HTTP status for each metric, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sni_http_response_count_total

The value that is returned for a particular metric and status is the cumulative number of responses since the service started. To get the number of responses over a particular time period, use a time offset to get the count at a specific point in the past and compare it with the current value.

Note that multiple blocks are returned for certain modules.

For example, for ADM, there are separate blocks for adm_devices, adm_update, adm_bundled_accounts, and adm_user_accounts.

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

To save space, the following example includes the output for one metric and one HTTP status only (in this case, status 201 for RMG).

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "__name__": "sni_http_response_count_total",
                    "environment": "sitq3ga",
                    "host": "sni_router01",
                    "http_code": "201",
                    "instance": "sni_router01",
                    "job": "sni_router-log-exporter",
                    "method": "POST",
                    "request_uri": "rmg",
                    "status": "201"
                },
                "value": [
                    1673955157.403,
                    "27"
                ]
             },
             ...
        ]
    }
}

Get count for a specific metric and status

Request

To query Prometheus for the total response count for a specific HTTP status for a specific metric, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sni_http_response_count_total{http_code="200",request_uri="adm_devices"}

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

This shows the response that is returned when you request the response count for HTTP status 200 for the adm_devices metric.

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                 "metric": {
                    "__name__": "sni_http_response_count_total",
                    "environment": "sitq3ga",
                    "host": "sni_router01",
                    "http_code": "200",
                    "instance": "sni_router01",
                    "job": "sni_router-log-exporter",
                    "method": "DELETE",
                    "request_uri": "adm_devices",
                    "status": "200"
                },
                "value": [
                    1673955157.403,
                    "12"
                ]
             },
            ...
        ]
    }
}

Get the total response time for all metrics and statuses

Request

To query Prometheus for the total response time for all available metrics and statuses, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sni_http_response_time_seconds_sum

You can use the total response time together with the usage counts to calculate the average response time for each metric.

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

If there were no requests for the endpoints that are covered by a particular metric for the data collection period, the value returned will be NaN.

Example

This shows the response that is returned when you request the total response time.

To save space, the following example includes the output for one metric and one HTTP status only (in this case, status 200 for MDS events.

CODE

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                 "metric": {
                    "__name__": "sni_http_response_time_seconds_sum",
                    "environment": "sitq3ga",
                    "host": "sni_router01",
                    "http_code": "200",
                    "instance": "sni_router01",
                    "job": "sni_router-log-exporter",
                    "method": "DELETE",
                    "request_uri": "mds_events",
                    "status": "200"
                },
                "value": [
                    1674482200.427,
                    "79.46000000000002"
                ]
            },
            ...
        ]
    }
}