View Metrics (Prometheus)

HPE Machine Learning Inferencing Software facilitates access to metrics generated by the underlying components used in your deployed model services (e.g., Kubernetes, KServe, BentoML, OpenLLM, and NIM). These components provide numerous metrics out of the box. The specific metrics available for your deployment depend on the type of model you deploy and the configuration of your system.

Before You Start

Review the following information before you start:

  • All containers get Kserve and Knative metrics.
  • Any metrics surfaced via /metrics on the container port (8080) are also collected.
  • MLIS does not generate any metrics of its own.

How to View Metrics

  1. Navigate to the Deployments dashboard.
  2. Select the Ellipsis icon for the deployment you want to monitor.
  3. Select th Dashboard option. A new browser tab opens with the Grafana dashboard for the selected deployment. By default, all packaged model versions are displayed.
  4. Navigate to Explore.
  5. Select the Prometheus data source.
  6. In the Select label dropdown, select one of the following Label names:
    Label Name Value Description
    serving.kserve.io/inferenceservice The deployment name. Selects all instances of all versions of your inference service. Selectable in the Deployment Dashboard via the Deployment Name dropdown.
    inference/packaged-model The packaged model name and version. For example: fb125m-model.v1 Selectable in the Deployment Dashboard via the Packaged Model Version dropdown. By default, all versions of the deployment are shown.
    inference/deployment-id The deployment’s id value. For advanced use. Normally serving.kserve.io/inferenceservice is used as long as deployment names are not reused for different instances.
    inference/packaged-model-id The packaged model’s id value. For advanced use. Normally inference/packaged-model is used as long as packaged model names are not reused for different instances.
    Can't find a label?
    If you can’t find any of the mentioned labels while building a query, it’s likely because the time range selected doesn’t have any data for that label. Try expanding the time range.
  7. In the Select value dropdown, select the corresponding value that matches your deployment or packaged model.
  8. Continue building your query as needed.
  9. Select Run Query.

Metrics

BentoML & OpenLLM

The following table lists some metrics that are commonly useful when monitoring BentoML and OpenLLM deployments.

Description Metric Name Metric Type Dimensions
API Server request in progress bentoml_api_server_request_in_progress Gauge endpoint, service_name, service_version
Runner request in progress bentoml_runner_request_in_progress Gauge endpoint, runner_name, service_name, service_version
API Server request total bentoml_api_server_request_total Counter endpoint, service_name, service_version, http_response_code
Runner request total bentoml_runner_request_total Counter endpoint, service_name, runner_name, service_version, http_response_code
API Server request duration in seconds bentoml_api_server_request_duration_seconds_sum, bentoml_api_server_request_duration_seconds_count, bentoml_api_server_request_duration_seconds_bucket Histogram endpoint, service_name, service_version, http_response_code
Runner request duration in seconds bentoml_runner_request_duration_seconds_sum, bentoml_runner_request_duration_seconds_count, bentoml_runner_request_duration_seconds_bucket Histogram endpoint, service_name, runner_name, service_version, http_response_code
Runner adaptive batch size bentoml_runner_adaptive_batch_size_sum, bentoml_runner_adaptive_batch_size_count, bentoml_runner_adaptive_batch_size_bucket Histogram method_name, service_name, runner_name, worker_index

KServe & Knative

See the Knative observability documentation for more information on metrics that can be made available for your deployment.

Additional Config Required
Some of the metrics mentioned in the Knative observability documentation may require additional configuration to be available in your deployment.

NIM

Some Nvidia NIM containers also provide metrics, such as the llama3 container—however, other reranking and embedding NIMs currently do not. See the official documentation for details.