Metrics API
Overview
OpenTV Video Platform relies on Prometheus to collect public API usage and health metrics. These metrics are the ones that are used to build our API Service Status and API Service Usage Grafana dashboards, but they can also be directly collected from the standard Prometheus API to enable integration with an external monitoring system: https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1
You can query the Prometheus API to get information such as:
Public API endpoints' health status (those used to build your Grafana “[UD01] Service API Status Dashboards”):
API endpoints for whether a particular probe was successful or not
Public API endpoints' traffic usage metrics (those used to build your Grafana “[UD02] Service API Usage Dashboards”):
Request throughput (number of calls per API endpoint)
Error ratio per API endpoint
Response time per API endpoint
Prometheus allows you to construct complex, sophisticated queries. It is beyond the scope of this page to cover all of its functionality.
For full details, see the Prometheus API documentation:
You can perform calculations on the data that is returned to compute additional metrics.
As a SaaS provider, NAGRA is accountable for the monitoring of our OpenTV Video Platform. These metrics have been exposed to enable customers who are integrating our OpenTV platform to use their own monitoring solution, but it is optional and definitely not required.
Also be aware that exposed metrics are tightly linked to our underlying reverse proxy technology solution (i.e., Istio), and are expected to evolve and possibly be replaced in future release. We are constantly evolving our inbound stack to keep it aligned with the best standards.
Metric categories
For OpenTV Video Platform (and SSP), there are two categories of metrics that NAGRA exposes through Prometheus:
Probe metrics
The probe_success
metric indicates whether reply was successful for each API endpoint probe.
Returned value is 0 for unhealthy and 1 for healthy.
Probe metrics labels
The main useful probe_success
metric labels you can use on for your queries are as follows:
Labels | Description |
---|---|
| API endpoint name (as displayed in the [UD01] Service API Status Dashboards):
Query to retrieve exhaustive lists of “api” enpoints:
CODE
|
| URI used to probe a given |
| Internal probe job name |
Inbound traffic metrics
There are a number of inbound traffic metrics that are issued by our Istio Ingress Gateways. These include counters on received HTTP requests, as well as HTTP request duration per endpoint, HTTP methods, and HTTP status.
Request duration is the interval between the arrival of a request and the response, that is, how long it takes to serve each request. This is a key indicator of an application's performance.
An increase in response time can mean that there is an issue with an end-user application or with an upstream service, which may be caused by a recent change or upgrade.
The following metrics are available:
istio_requests_total
: counter type metric incremented for every request handled by our Istio proxyUsed to monitor the inbound traffic throughput (number of requests per seconds received on a given endpoint)
istio_request_duration_milliseconds_sum
: counter type metric of all request duration handled by our Istio proxyUsed to monitor our API endpoint per request response time (max, min, or average over a given time period)
istio_request_duration_milliseconds_count
: counter type metric incremented for every request handled by the Istio proxy (equivalent toistio_requests_total
)istio_request_duration_milliseconds_bucket
: histogram type metric used to track the distribution of Istio request durations. It is typically used to compute request duration percentiles (using thehistogram_quantile
Prometheus function).
For example, a bucket labeledistio_request_duration_milliseconds_bucket{le="100"
} counts the number of requests that had a duration of less than or equal to 100 milliseconds
The above metrics can be used to perform calculations such as:
Average response time per API
Requests per second per API
Response time for the top n% of calls per API
Etc.
See Istio / Istio Standard Metrics for more information.
Some useful Istio metrics labels
Istio metrics are returned for given set of label values. It could be interesting to aggregate a given metric on given labels. For example, you may want to compute the number of requests handled over a given time frame for a given endpoint (i.e., request_url
) and a given response_code (i.e., response_code
)
Labels | Description |
---|---|
| Ingress gateway application that issued the metrics |
| Host header of the HTTP request Example: |
| HTTP request method (GET, PUT, POST, OPTION, etc.) Example: |
| HTTP request URL endpoint (see Monitored endpoints below) Example: |
| HTTP response status code (2xx, 3xx, 4xx or 5xx) Example: |
Monitored endpoints
Here is a selection of some interesting API endpoints being monitored:
Services | Endpoints (regex) |
---|---|
Account and Device Manager (ADM) |
|
Authentication Gateway Service (AGS) |
|
Content Builder |
|
Content and Product Manager (CPM) |
|
Content Discovery Gateway (CDG) |
|
Content Importer (CIM) |
|
CRM Gateway |
|
Content Workflow Manager (CWM) |
|
Identity Authentication Service (IAS) |
|
Image Metadata Service (IMDS) |
|
Metadata Aggregation Service (MAS) |
|
Metadata Server (MDS) |
|
Open Device Messaging (ODM) |
|
Rights Manager (RMG) |
|
User Activity Vault (UAV) |
|
User Recordings |
|
Authentication
Access to the Prometheus APIs is controlled by Keycloak. See Accessing operator APIs using Keycloak for more information.
Output formatting
If you are using Postman to make these requests, it automatically pretty-prints the JSON output.
If you are using curl
, you can pipe its output to the jq
JSON formatting tool to make the output more readable.
For example:
curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: bearer <keycloak_token>| jq
All Prometheus responses are return in JSON format in a result
object that contains the metric
and value
.
A metric's value is returned in an array of two values: [Unix epoch timestamp, "value"]
.
Examples
Note:
Many of the following examples use filters. They only use a few of the many fields that can be filtered on. You can filter on whichever fields you want to get the output that you require.
APIs health monitoring (based on probe metrics)
Get list of all APIs health being monitored
Request
To query Prometheus to list all the APIs whose health is being monitored using probing:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/label/api/values
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
{
"status": "success",
"data": [
"AGS API",
"Account and Device Manager API",
"CDVR API",
"CIM API",
"CRM-GATEWAY API",
"Cast, Crew, and Persona Service API",
"Content Discovery Gateway API",
"Content Workflow Manager API",
"Content and Product Manager API",
"External Endpoint APIs",
"IAM-api",
"IAS API",
"ION External Endpoint website",
"Keycloak API",
"Metadata API",
"Ncanto API probe Endpoint APIs",
"Opconsole API",
"Prometheus Federated API",
"Rights Management API",
"User Activity Vault API"
]
}
Get the list of probe health checks' status for a given API
Request
To query Prometheus to get the latest result of probe requests for a given API (e.g., “IAS API”), send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min by (api)(probe_success{api="IAS API"})
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
You may notice that several public or private endpoints could be probed to determine if given API is healthy or not.
Returned value is [Unix Epoch Timestamp]
:
1 for success
0 for failure
Example
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"api": "IAS API"
},
"value": [
1715185977.698,
"1"
]
}
]
}
}
You can use the min
function (e.g., GET https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min(probe_success{api="IAS API"}
) to return the API health status as a single value.
Get a list of all unhealthy APIs
Request
To query Prometheus for a list of all the probed APIs that are currently unhealthy, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min by (api)(probe_success==0)
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"api": "ION External Endpoint website"
},
"value": [
1715185011.919,
"0"
]
}
]
}
}
If you are using curl
and jq
, you can use the -r
option to filter the output to show just the list of endpoints.
For example:
curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=min by (api)(probe_success==0)' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --header 'Authorization: bearer <keycloak_token> | jq -r '.data.result[].metric.api'
APIs usage monitoring (based on Istio metrics)
Get total number of requests received for a given API endpoint
Request
To query Prometheus for the total requests for a given API endpoint (i.e., request_url
), send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query=sum by (request_url)(istio_requests_total{app="ingress-gateway-otvpcse",request_url=~"/metadata/delivery/GLOBAL/btv/services"})
The value that is returned for a particular metric and status is the cumulative number of responses since the service started. To get the number of responses over a particular time period, use a time offset to get the count at a specific point in the past and compare it with the current value.
See Monitored endpoints, above, for examples of API endpoint regexes you can use in your query.
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
139406 requests have been served from Istio services being started:
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services"
},
"value": [
1715187063.594,
"139406"
]
}
]
}
}
Get total number of requests received for a given API endpoint per HTTP response status code
Request
To query Prometheus for the total requests received per HTTP response code status for a given API endpoint, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sum by (app="ingress-gateway-otvpcse",request_url,response_code)(istio_requests_total{request_url="/metadata/delivery/GLOBAL/btv/services"})
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services",
"response_code": "200"
},
"value": [
1715187189.338,
"92816"
]
},
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services",
"response_code": "403"
},
"value": [
1715187189.338,
"32302"
]
},
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services",
"response_code": "304"
},
"value": [
1715187189.338,
"2172"
]
},
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services",
"response_code": "204"
},
"value": [
1715187189.338,
"11740"
]
},
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services",
"response_code": "503"
},
"value": [
1715187189.338,
"335"
]
},
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services",
"response_code": "0"
},
"value": [
1715187189.338,
"44"
]
}
]
}
}
Get request throughput (requests/second) for a given API endpoint
Request
To query Prometheus for the average request throughput (i.e., the number of requests per second) for a given API endpoint (e.g., "/metadata/delivery/GLOBAL/btv/services"
) for the last past 10 minutes:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sum by (request_url)(rate(istio_requests_total{request_url=~"/metadata/delivery/GLOBAL/btv/services"}[10m]))
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
Throughput was 0.013 requests/seconds on average for the past 10 minutes:
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"request_url": "/metadata/delivery/GLOBAL/btv/services"
},
"value": [
1715187654.312,
"0.013705368055555556"
]
}
]
}
}
Get average response time for a given API endpoint
Request
To query Prometheus for the average response time in milliseconds for a given API endpoint (e.g., "/metadata/v1/epg"
) for the last past 10 minutes:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sum(rate(istio_request_duration_milliseconds_sum{app="ingress-gateway-otvpcse",request_url=~"/metadata/v1/epg"}[10m])) by (request_url) / sum(rate(istio_request_duration_milliseconds_count{app="ingress-gateway-otvpcse",request_url=~"/metadata/v1/epg"}[10m])) by (request_url)
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
Average response time was 1072 milliseconds for the past 10 minutes.
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"request_url": "/metadata/v1/epg"
},
"value": [
1715188797.602,
"1072.780273852061"
]
}
]
}
}
Get response time 95th percentile for a given API endpoint
Request
To query Prometheus for the response time 95th percentile for a given API endpoint (e.g., "/metadata/delivery.*"
) for the last past 10 minutes:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=histogram_quantile(0.95,sum(rate(istio_request_duration_milliseconds_bucket{app="ingress-gateway-otvpcse",request_url=~"/metadata/delivery.*"}[10m])) by (le))
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
95% of the requests served by the /metadata/delivery.*
API endpoints have been served in less than 166 milliseconds for the past 10 minutes:
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {},
"value": [
1715189231.843,
"166.48048820847555"
]
}
]
}
}