API monitoring
Overview
OpenTV Video Platform uses Prometheus to collect API usage and performance data from both Platform and SSP. You can then query Prometheus to get information such as:
- Number of responses from a particular endpoint
- Total response time
- Whether a particular probe was successful or not
Prometheus allows you to construct complex, sophisticated queries. It is beyond the scope of this page to cover all of its functionality.
For full details, see the Prometheus API documentation:
You can perform calculations on the data that is returned to compute additional metrics.
Metric categories
For OpenTV Video Platform and SSP, there are two categories of metrics that NAGRA exposes through Prometheus:
Probe metrics
The probe_success
metric indicates whether execution was successful for each probe.
Nginx metrics
There are a number of metrics that are gathered by monitoring nginx. These include response time, which is the interval between the arrival of a request and the response, that is, how long it takes to service each request. This is a key indicator of an application's performance.
An increase in response time can mean that there is an issue with an end-user application or with an upstream service, which may be caused by a recent change or upgrade.
The following metrics are available:
sni_http_response_count_total
– the total number of processed HTTP responsessni_http_response_time_seconds
– a summary vector of the total response times (in seconds)sni_http_response_time_seconds_sum
– a sum of the total response times in secondssni_http_response_time_seconds_count
– the total number of processed HTTP responses
You can perform calculations on the data that is returned to compute additional metrics, such as:
- Average response time per API
- Requests per second per API
- Response time for the top n% of calls per API
Available metrics
The following nginx metrics are available:
Module | Metric name | REST methods |
---|---|---|
Account and Device Manager (ADM) |
| DELETE |
| GET, POST | |
| GET, POST, DELETE, | |
| GET, POST | |
| GET | |
API Gateway (AGW) | agw_create | POST |
Cast, Crew, and Persona Service (CCP) | ccp | GET |
Content Builder |
| GET |
CRM Gateway (CRM-GW) | crm_gateway | GET |
IAM (Keycloak) | iam | GET |
Identity Authentication Service (IAS) |
| GET, POST |
ias_token | POST | |
Image Handler Service (IHS) | ihs | GET |
Keycloak | keycloak_nagra | POST |
keycloak_opcon | GET, POST | |
keycloak_resources | GET | |
Metadata Server (MDS) | mds_events | GET, PUT, POST, DELETE |
btv_programmes | GET | |
btv_services | GET | |
epg | GET | |
| GET | |
vod_editorials | GET | |
vod_nodes | GET | |
vod_products | GET | |
Operator Console (OpCon) | opui | GET |
opconsole_adm | GET, POST | |
opconsole_bcm | GET | |
opconsole_core | GET, PUT, POST | |
Rights Manager (RMG) | rmg | GET, POST |
User Activity Vault (UAV) | uav | GET, PUT, POST |
User Recordings |
| GET, POST, DELETE |
Authentication
Access to the Prometheus APIs is controlled by Keycloak. See Accessing operator APIs using Keycloak for more information.
Output formatting
If you are using Postman to make these requests, it automatically pretty-prints the JSON output.
If you are using curl
, you can pipe its output to the jq
JSON formatting tool to make the output more readable.
For example:
curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: bearer <keycloak_token>| jq
Examples
Get all monitored endpoints
Request
To query Prometheus for all the endpoints it is monitoring, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
To save space, the following example includes the output for one module only (in this case, ADM).
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "probe_success",
"api": "Account and Device Manager",
"instance": "http://http-router/adm/v1/accounts?limit=0",
"job": "adm-api"
},
"value": [
1673878896.183,
"1"
]
},
...
]
}
}
Get a count of monitored endpoints
Request
To query Prometheus for a count of the monitored endpoints, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=count(probe_success)
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {},
"value": [
1674043081.6,
"31"
]
}
]
}
}
This shows that 31 endpoints are being monitored. (The other value in the same block is the Unix epoch timestamp.)
Get a list of monitored endpoints showing only the most relevant fields
Request
To query Prometheus for a list of monitored endpoints, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=count without (job,api)(probe_success)
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
To save space, the following example includes the output for one endpoint only (in this case, ADM accounts).
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"instance": "https://operator.sitq3ga.otv-staging.com/adm/v1/accounts?limit=0"
},
"value": [
1674045670.591,
"1"
]
},
...
]
}
}
If you are using curl
and jq
, you can use the -r
option to filter the output to show just the list of endpoints.
For example:
curl -s --location -g --request GET 'https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=count without(job,api)(probe_success)' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--header 'Authorization: bearer <keycloak_token> | jq -r '.data.result[].metric.instance'
Get a list of inactive endpoints
Request
To query Prometheus for just the endpoints that are inactive, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=probe_success==0
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
To save space, the following example includes the output for one module only (in this case, MDS).
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "probe_success",
"api": "External",
"instance": "https://admin.sitq3ga.otv-staging.com/metadata/delivery/GLOBAL/vod/nodes?limit=0",
"job": "mds-api"
},
"value": [
1674048210.825,
"0"
]
},
...
]
}
}
Get usage counts for all metrics and statuses
Request
To query Prometheus for the total response count per HTTP status for each metric, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sni_http_response_count_total
The value that is returned for a particular metric and status is the cumulative number of responses since the service started. To get the number of responses over a particular time period, use a time offset to get the count at a specific point in the past and compare it with the current value.
Note that multiple blocks are returned for certain modules.
For example, for ADM, there are separate blocks for adm_devices
, adm_update
, adm_bundled_accounts
, and adm_user_accounts
.
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
To save space, the following example includes the output for one metric and one HTTP status only (in this case, status 201 for RMG).
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "sni_http_response_count_total",
"environment": "sitq3ga",
"host": "sni_router01",
"http_code": "201",
"instance": "sni_router01",
"job": "sni_router-log-exporter",
"method": "POST",
"request_uri": "rmg",
"status": "201"
},
"value": [
1673955157.403,
"27"
]
},
...
]
}
}
Get count for a specific metric and status
Request
To query Prometheus for the total response count for a specific HTTP status for a specific metric, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sni_http_response_count_total{http_code="200",request_uri="adm_devices"}
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
Example
This shows the response that is returned when you request the response count for HTTP status 200 for the adm_devices
metric.
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "sni_http_response_count_total",
"environment": "sitq3ga",
"host": "sni_router01",
"http_code": "200",
"instance": "sni_router01",
"job": "sni_router-log-exporter",
"method": "DELETE",
"request_uri": "adm_devices",
"status": "200"
},
"value": [
1673955157.403,
"12"
]
},
...
]
}
}
Get the total response time for all metrics and statuses
Request
To query Prometheus for the total response time for all available metrics and statuses, send a GET request to:
https://operator.<environment_name>.<dns_domain>/prometheus-ext-server/api/v1/query?query=sni_http_response_time_seconds_sum
You can use the total response time together with the usage counts to calculate the average response time for each metric.
Headers
Authorization
–Bearer <keycloak_token>
Content-Type
–application/x-www-form-urlencoded
Response
See the Prometheus docs for the status codes that it returns.
If there were no requests for the endpoints that are covered by a particular metric for the data collection period, the value returned will be NaN
.
Example
This shows the response that is returned when you request the total response time.
To save space, the following example includes the output for one metric and one HTTP status only (in this case, status 200 for MDS events.
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "sni_http_response_time_seconds_sum",
"environment": "sitq3ga",
"host": "sni_router01",
"http_code": "200",
"instance": "sni_router01",
"job": "sni_router-log-exporter",
"method": "DELETE",
"request_uri": "mds_events",
"status": "200"
},
"value": [
1674482200.427,
"79.46000000000002"
]
},
...
]
}
}