Metrics API

Overview

OpenTV Video Platform relies on Prometheus to collect public API usage and health metrics. These metrics are the ones that are used to build our API Service Status and API Service Usage Grafana dashboards, but they can also be directly collected from the standard Prometheus API to enable integration with an external monitoring system: https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1

You can query the Prometheus API to get information such as:

Public API endpoints' health status (those used to build your Grafana “[UD01] Service API Status Dashboards”):
- API endpoints for whether a particular probe was successful or not
Public API endpoints' traffic usage metrics (those used to build your Grafana “[UD02] Service API Usage Dashboards”):
- Request throughput (number of calls per API endpoint)
- Error ratio per API endpoint
- Response time per API endpoint

Prometheus allows you to construct complex, sophisticated queries. It is beyond the scope of this page to cover all of its functionality.

For full details, see the Prometheus API documentation:

You can perform calculations on the data that is returned to compute additional metrics.

As a SaaS provider, NAGRA is accountable for the monitoring of our OpenTV Video Platform. These metrics have been exposed to enable customers who are integrating our OpenTV platform to use their own monitoring solution, but it is optional and definitely not required.
Also be aware that exposed metrics are tightly linked to our underlying reverse proxy technology solution (i.e., Istio), and are expected to evolve and possibly be replaced in future release. We are constantly evolving our inbound stack to keep it aligned with the best standards.

Metric categories

For OpenTV Video Platform (and SSP), there are two categories of metrics that NAGRA exposes through Prometheus:

Probe metrics

The probe_success metric indicates whether reply was successful for each API endpoint probe.

Returned value is 0 for unhealthy and 1 for healthy.

Probe metrics labels

The main useful probe_success metric labels you can use on for your queries are as follows:

Labels

Description

api

API endpoint name (as displayed in the [UD01] Service API Status Dashboards):

“AGS API”
“ADM API”
“CDVR API”
“CIM API”
“CRM-GATEWAY API”
"Cast, Crew, and Persona Service API"
"Content Discovery Gateway API”
"Content Workflow Manager API"
"Content and Product Manager API"
“IAM-API”
“IAS API”
"ION External Endpoint website"
"Keycloak API"
"Metadata API"
"Ncanto API probe Endpoint APIs"
"Opconsole API"
"Prometheus Federated API"
"Rights Management API"
"User Activity Vault API"

Query to retrieve exhaustive lists of “api” enpoints:

CODE

https://operator..<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/label

instance

URI used to probe a given api endpoint

job

Internal probe job name

Inbound traffic metrics

There are a number of inbound traffic metrics that are issued by our Istio Ingress Gateways. These include counters on received HTTP requests, as well as HTTP request duration per endpoint, HTTP methods, and HTTP status.

Request duration is the interval between the arrival of a request and the response, that is, how long it takes to serve each request. This is a key indicator of an application's performance.

An increase in response time can mean that there is an issue with an end-user application or with an upstream service, which may be caused by a recent change or upgrade.

The following metrics are available:

istio_requests_total: counter type metric incremented for every request handled by our Istio proxy
- Used to monitor the inbound traffic throughput (number of requests per seconds received on a given endpoint)
istio_request_duration_milliseconds_sum: counter type metric of all request duration handled by our Istio proxy
- Used to monitor our API endpoint per request response time (max, min, or average over a given time period)
istio_request_duration_milliseconds_count: counter type metric incremented for every request handled by the Istio proxy (equivalent to istio_requests_total)
istio_request_duration_milliseconds_bucket: histogram type metric used to track the distribution of Istio request durations. It is typically used to compute request duration percentiles (using the histogram_quantile Prometheus function).
For example, a bucket labeled istio_request_duration_milliseconds_bucket{le="100"} counts the number of requests that had a duration of less than or equal to 100 milliseconds

The above metrics can be used to perform calculations such as:

Average response time per API
Requests per second per API
Response time for the top n% of calls per API
Etc.

See Istio / Istio Standard Metrics for more information.

Some useful Istio metrics labels

Istio metrics are returned for given set of label values. It could be interesting to aggregate a given metric on given labels. For example, you may want to compute the number of requests handled over a given time frame for a given endpoint (i.e., request_url) and a given response_code (i.e., response_code)

Labels	Description
`app`	Ingress gateway application that issued the metrics To monitor inbound traffic, always filter on `app="ingress-gateway-otvpcse"`.
`request_host`	Host header of the HTTP request Example: `"request_host": "api.<environment_name>.<dns_name>"`
`request_method`	HTTP request method (GET, PUT, POST, OPTION, etc.) Example: `"request_method": "GET"`
`request_url`	HTTP request URL endpoint (see Monitored endpoints below) Example: `"request_url": "/adm/v1/user"`
`response_code`	HTTP response status code (2xx, 3xx, 4xx or 5xx) Example: `"response_code": "200"`

Monitored endpoints

Here is a selection of some interesting API endpoints being monitored:

Services	Endpoints (regex)
Account and Device Manager (ADM)	`/adm/.` `/adm/v[0-9]+/admin.` `/adm/v[0-9]+/accountProfiles.` `/adm/v[0-9]+/accounts.` `/adm/v[0-9]+/bundled/accounts.` `/adm/v[0-9]+/deviceProfiles.` `/adm/v[0-9]+/devices.` `/adm/v[0-9]+/pinTypes.` `/adm/v[0-9]+/user.*`
Authentication Gateway Service (AGS)	`/ags.` `/ags/servicediscovery.` `/ags/signOn.*`
Content Builder	`/contentbuilder.` `/contentbuilder/v[0-9]+/curators.` `/contentbuilder/v[0-9]+/templates.` `/contentbuilder/v[0-9]+/templateviews.`
Content and Product Manager (CPM)	`/cpm/admin` `/cpm/commercial` `/cpm/content` `/cpm/operator` `/cpm/purge`
Content Discovery Gateway (CDG)	`/contentdiscovery.` `/contentdiscovery/v[0-9]+/contexts.` `/contentdiscovery/v[0-9]+/recommendations.` `/contentdiscovery/v[0-9]+/templates.`
Content Importer (CIM)	`/importcim`
CRM Gateway	`/crm-gateway`
Content Workflow Manager (CWM)	`/workflow`
Identity Authentication Service (IAS)	`/ias/.` `/ias/v[0-9]+/content_token.` `/ias/v[0-9]+/token.` `/ias/v[0-9]+/refresh.` `/ias/v[0-9]+/signout.` `/ias/v[0-9]+/localinfo.`
Image Metadata Service (IMDS)	`/imagemetadata`
Metadata Aggregation Service (MAS)	`/mas`
Metadata Server (MDS)	`/metadata.` `/metadata/delivery/changes.` `/metadata/delivery/./vod/editorials.` `/metadata/delivery/./vod/series.` `/metadata/delivery/./vod/nodes.` `/metadata/delivery/./vod/products.` `/metadata/delivery/./btv/products.` `/metadata/delivery/./btv/programmes.` `/metadata/delivery/./btv/services.` `/metadata/solr/GLOBAL/vod/./search.`
Open Device Messaging (ODM)	`/odm`
Rights Manager (RMG)	`/rmg/v1/operator` `/rmg/v1/user`
User Activity Vault (UAV)	`/useractivityvault`
User Recordings	`/cdvr.` `/cdvr/v[0-9]+/aggregatedrecordings.` `/cdvr/v[0-9]+/recordings`.* `/cdvr/v[0-9]+/seriesrecordings.*`

Authentication

Access to the Prometheus APIs is controlled by Keycloak. See Accessing operator APIs using Keycloak for more information.

Output formatting

If you are using Postman to make these requests, it automatically pretty-prints the JSON output.

If you are using curl, you can pipe its output to the jq JSON formatting tool to make the output more readable.

For example:

BASH

curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: bearer <keycloak_token>| jq

All Prometheus responses are return in JSON format in a result object that contains the metric and value.

A metric's value is returned in an array of two values: [Unix epoch timestamp, "value"].

Examples

Note:

Many of the following examples use filters. They only use a few of the many fields that can be filtered on. You can filter on whichever fields you want to get the output that you require.

APIs health monitoring (based on probe metrics)

Get list of all APIs health being monitored

Request

To query Prometheus to list all the APIs whose health is being monitored using probing:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/label/api/values

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

JSON

{
    "status": "success",
    "data": [
        "AGS API",
        "Account and Device Manager API",
        "CDVR API",
        "CIM API",
        "CRM-GATEWAY API",
        "Cast, Crew, and Persona Service API",
        "Content Discovery Gateway API",
        "Content Workflow Manager API",
        "Content and Product Manager API",
        "External Endpoint APIs",
        "IAM-api",
        "IAS API",
        "ION External Endpoint website",
        "Keycloak API",
        "Metadata API",
        "Ncanto API probe Endpoint APIs",
        "Opconsole API",
        "Prometheus Federated API",
        "Rights Management API",
        "User Activity Vault API"
    ]
}

Get the list of probe health checks' status for a given API

Request

To query Prometheus to get the latest result of probe requests for a given API (e.g., “IAS API”), send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min by (api)(probe_success{api="IAS API"})

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

You may notice that several public or private endpoints could be probed to determine if given API is healthy or not.

Returned value is [Unix Epoch Timestamp]:

1 for success
0 for failure

Example

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "api": "IAS API"
                },
                "value": [
                    1715185977.698,
                    "1"
                ]
            }
        ]
    }
}

You can use the min function (e.g., GET https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min(probe_success{api="IAS API"}) to return the API health status as a single value.

Get a list of all unhealthy APIs

Request

To query Prometheus for a list of all the probed APIs that are currently unhealthy, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min by (api)(probe_success==0)

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "api": "ION External Endpoint website"
                },
                "value": [
                    1715185011.919,
                    "0"
                ]
            }
        ]
    }
}

If you are using curl and jq, you can use the -r option to filter the output to show just the list of endpoints.

For example:

curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min by (api)(probe_success==0)' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --header 'Authorization: bearer <keycloak_token> | jq -r '.data.result[].metric.api'

APIs usage monitoring (based on Istio metrics)

Get total number of requests received for a given API endpoint

Request

To query Prometheus for the total requests for a given API endpoint (i.e., request_url), send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query=sum by (request_url)(istio_requests_total{app="ingress-gateway-otvpcse",request_url=~"/metadata/delivery/GLOBAL/btv/services"})

The value that is returned for a particular metric and status is the cumulative number of responses since the service started. To get the number of responses over a particular time period, use a time offset to get the count at a specific point in the past and compare it with the current value.

See Monitored endpoints, above, for examples of API endpoint regexes you can use in your query.

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

139406 requests have been served from Istio services being started:

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services"
                },
                "value": [
                    1715187063.594,
                    "139406"
                ]
            }
        ]
    }
}

Get total number of requests received for a given API endpoint per HTTP response status code

Request

To query Prometheus for the total requests received per HTTP response code status for a given API endpoint, send a GET request to:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sum by (app="ingress-gateway-otvpcse",request_url,response_code)(istio_requests_total{request_url="/metadata/delivery/GLOBAL/btv/services"})

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services",
                    "response_code": "200"
                },
                "value": [
                    1715187189.338,
                    "92816"
                ]
            },
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services",
                    "response_code": "403"
                },
                "value": [
                    1715187189.338,
                    "32302"
                ]
            },
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services",
                    "response_code": "304"
                },
                "value": [
                    1715187189.338,
                    "2172"
                ]
            },
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services",
                    "response_code": "204"
                },
                "value": [
                    1715187189.338,
                    "11740"
                ]
            },
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services",
                    "response_code": "503"
                },
                "value": [
                    1715187189.338,
                    "335"
                ]
            },
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services",
                    "response_code": "0"
                },
                "value": [
                    1715187189.338,
                    "44"
                ]
            }
        ]
    }
}

Get request throughput (requests/second) for a given API endpoint

Request

To query Prometheus for the average request throughput (i.e., the number of requests per second) for a given API endpoint (e.g., "/metadata/delivery/GLOBAL/btv/services") for the last past 10 minutes:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sum by (request_url)(rate(istio_requests_total{request_url=~"/metadata/delivery/GLOBAL/btv/services"}[10m]))

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

Throughput was 0.013 requests/seconds on average for the past 10 minutes:

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "request_url": "/metadata/delivery/GLOBAL/btv/services"
                },
                "value": [
                    1715187654.312,
                    "0.013705368055555556"
                ]
            }
        ]
    }
}

Get average response time for a given API endpoint

Request

To query Prometheus for the average response time in milliseconds for a given API endpoint (e.g., "/metadata/v1/epg") for the last past 10 minutes:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sum(rate(istio_request_duration_milliseconds_sum{app="ingress-gateway-otvpcse",request_url=~"/metadata/v1/epg"}[10m])) by (request_url) / sum(rate(istio_request_duration_milliseconds_count{app="ingress-gateway-otvpcse",request_url=~"/metadata/v1/epg"}[10m])) by (request_url)

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

Average response time was 1072 milliseconds for the past 10 minutes.

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "request_url": "/metadata/v1/epg"
                },
                "value": [
                    1715188797.602,
                    "1072.780273852061"
                ]
            }
        ]
    }
}

Get response time 95th percentile for a given API endpoint

Request

To query Prometheus for the response time 95th percentile for a given API endpoint (e.g., "/metadata/delivery.*") for the last past 10 minutes:

CODE

https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=histogram_quantile(0.95,sum(rate(istio_request_duration_milliseconds_bucket{app="ingress-gateway-otvpcse",request_url=~"/metadata/delivery.*"}[10m])) by (le))

Headers

Authorization – Bearer <keycloak_token>
Content-Type – application/x-www-form-urlencoded

Response

See the Prometheus docs for the status codes that it returns.

Example

95% of the requests served by the /metadata/delivery.* API endpoints have been served in less than 166 milliseconds for the past 10 minutes:

JSON

{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {},
                "value": [
                    1715189231.843,
                    "166.48048820847555"
                ]
            }
        ]
    }
}