Object Model Reference Schema

Description and control of your inference service is accomplished via two primary objects: Packaged Model and Deployment. Additionally, when referencing a Package Model via an external hosting method such as S3 bucket, or huggingface.co registry, you may need to configure a Registry to enable that access.

Deployment

The Deployment object controls when a Packaged Model is deployed. The Deployment object has the following attributes:

Input Attributes

  • name: The name of the deployment to enable access to it via the REST interface or CLI. This is the name used in the associated KServe inference service that will be created.
  • namespace: The Kubernetes namespace into which the service is deployed. It must already exist.
  • model: The name (or id) of the packaged model to be deployed.
  • security: Encapsulates the security option (authenticationRequired) for the deployed service.
  • autoScaling: Controls the scaling limits minReplicas/maxReplicas, metric to control scaling, and the target value.
  • canaryTrafficPercent: The percentage of traffic to route to this particular model version. The default is 100.
  • goalStatus: Specifies the intended status to be achieved by the deployment. The default is Ready.
  • environment: Environment variables to be provided to the container image when started.
  • arguments: Arguments to be passed to the container image when started. These are in addition to any configured on the packaged model.

Managed Attributes

  • id: A unique identifier to identify this service.
  • status: Summary status of the deployed service.
  • state: State details of the current service configuration requested. See the DeploymentStateDetails component for details.
  • secondaryState: State details of a prior service configuration until the currently requested configuration has been fully rolled out.

PackagedModel

The Packaged Model object identifies the model and code that make up your inference service. The code may be provided via a container image, or via an external hosting method (S3 or huggingface.co registry). The Packaged Model object has the following attributes:

Input Attributes

  • name: The name of the model.
  • description: A text description of the model.
  • modelFormat: Model format for downloaded models (e.g. from S3, http, etc.).
  • registry: The name or id of a registry object. If the model data is not provided via a container image, this must be specified.
  • url: Reference to the Bento or model to be served.
  • resources: The resource requirements for running the service (requests/limits) for cpu/memory/gpu.
  • image: The containerized bento where the inference service is to be deployed.
  • environment: Environment variables to be provided to the container image when started. See Packaged Model Environment Variables for a list of default options.
  • arguments: Arguments to be passed to the container image when started.

Managed Attributes

  • id: A unique identifier to identify this particular model version.
  • version: An automatically incrementing integer version of the model as you make changes.

Registry

The Registry object provides the metadata that describes how to download a Packaged Model for deployment.

Input Attributes

  • name: The name of service to enable access to it via the REST interface or CLI.
  • description: A text description of the service.
  • type: The type of this model registry.
  • endpointUrl: The registry endpoint (host).
  • bucket: The bucket or organization name, depending on which of the following values is selected as the model registry type.
  • accessKey: The access key, username or team name for the registry.
  • secretKey: The secret key is the password, secret key, or access token for the registry.
  • insecureHttps: For https endpoints, the server certificate will be accepted without validation.

Managed Attributes

  • id: A unique identifier to identify this service.

DeploymentStateDetails

The state details of an inference service Deployment are described with the following attributes:

Attributes

  • endpoint: The endpoint uri used to access the inference service.
  • nativeAppName: The name of the Kubernetes application for the specific service version. Use this name to match the app value in Grafana/Prometheus to obtain logs and metrics for this deployed service version.
  • status: The status of a particular inference service revision.
  • trafficPercentage: Percent of traffic being processed by this service/model version.
  • failureInfo: A list of any failures associated with the deployment of this service/model version.
  • modelId: The id of the deployed packaged model associated with this state.