Metric Profile
The metric profile contains a list of queries used to retrieve metrics such as CPU usage, throttling, memory usage, and more. Users can create metric profiles based on their cluster or datasource provider, such as Prometheus or Thanos. These profiles can be tagged to create experiment APIs, which will then fetch metrics according to the metric profile to generate recommendations.
This article describes how to add and list Metric Profiles with REST APIs using curl command. Documentation still in progress stay tuned.
Attributes
- apiVersion
A string representing version of the Kubernetes API to create metric profile - kind
A string representing type of kubernetes object - metadata
A JSON object containing Data that helps to uniquely identify the metric profile, including a name string- name
A unique string name for identifying each metric profile.
- name
- profile_version
a double value specifying the current version of the profile. - slo
Service Level Objective containing the direction, objective_function and function_variables- slo_class
a standard slo “bucket” defined by Kruize. Can be “resource_usage”, “throughput” or “response_time” - direction
based on the slo_class, it can be ‘maximize’ or ‘minimize’ - objective_function
Define the performance objective here.- function_type
can be specified as ‘source’ (a java file) or as an ‘expression‘(algebraic). If it’s an expression, it needs to defined below. - expression
an algebraic expression that details the calculation using function variables. Only valid if the “function_type” is “expression”
- function_type
- function_variables
Define the variables used in the objective_function- name
name of the variable - datasource
datasource of the query - value_type
can be double or integer - query
one of the query or aggregation_functions is mandatory. Both can be present. - kubernetes_object
k8s object that this query is tied to: “deployment”, “pod” or “container” - aggregation_functions
aggregate functions associated with this variable- function
can be ‘avg’, ‘sum’, ‘min’, ‘max’ - query
corresponding query - versions
Any specific versions that this query is tied to
- function
- name
- slo_class
CreateMetricProfile
This is quick guide instructions to create metric profile using input JSON as follows.
Request
POST /createMetricProfile
curl -H 'Accept: application/json' -X POST --data 'copy paste below JSON' http://<URL>:<PORT>/createMetricProfile
{
"apiVersion": "recommender.com/v1",
"kind": "KruizePerformanceProfile",
"metadata": {
"name": "resource-optimization-openshift"
},
"profile_version": 1,
"k8s_type": "openshift",
"slo": {
"slo_class": "resource_usage",
"direction": "minimize",
"objective_function": {
"function_type": "source"
},
"function_variables": [
{
"name": "cpuRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD ', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "sum",
"query": "sum by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "min",
"query": "min by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "max",
"query": "max by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
]
},
{
"name": "cpuLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "sum",
"query": "sum by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "max",
"query": "max by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "min",
"query": "min by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
]
},
{
"name": "cpuUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container, namespace)(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "min",
"query": "min by(container, namespace)(min_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "max",
"query": "max by(container, namespace)(max_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "sum",
"query": "sum by(container, namespace)(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
]
},
{
"name": "cpuThrottle",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "max",
"query": "max by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "min",
"query": "min by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "sum",
"query": "sum by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
]
},
{
"name": "memoryRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "sum",
"query": "sum by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "max",
"query": "max by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "min",
"query": "min by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
]
},
{
"name": "memoryLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "sum",
"query": "sum by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "max",
"query": "max by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
{
"function": "min",
"query": "min by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
]
},
{
"name": "memoryUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container, namespace) (avg_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "min",
"query": "min by(container, namespace) (min_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "max",
"query": "max by(container, namespace) (max_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "sum",
"query": "sum by(container, namespace) (avg_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
]
},
{
"name": "memoryRSS",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by(container, namespace) (avg_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "min",
"query": "min by(container, namespace) (min_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "max",
"query": "max by(container, namespace) (max_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
{
"function": "sum",
"query": "sum by(container, namespace) (avg_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
]
},
{
"name": "maxDate",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "max",
"query": "max by(namespace,container) (last_over_time((timestamp(container_cpu_usage_seconds_total{namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"} > 0))[15d:]))"
}
]
},
{
"name": "namespaceCpuRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"requests.cpu\", type=\"hard\"})",
"version": ""
}
]
},
{
"name": "namespaceCpuLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"limits.cpu\", type=\"hard\"})",
"version": ""
}
]
},
{
"name": "namespaceMemoryRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"requests.memory\", type=\"hard\"})",
"version": ""
}
]
},
{
"name": "namespaceMemoryLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"limits.memory\", type=\"hard\"})",
"version": ""
}
]
},
{
"name": "namespaceCpuUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "avg",
"query": "avg_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "max",
"query": "max_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "min",
"query": "min_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
]
},
{
"name": "namespaceCpuThrottle",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "avg",
"query": "avg_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "max",
"query": "max_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "min",
"query": "min_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
]
},
{
"name": "namespaceMemoryUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "avg",
"query": "avg_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "max",
"query": "max_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "min",
"query": "min_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
]
},
{
"name": "namespaceMemoryRSS",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "avg",
"query": "avg_over_time(sum by(namespace) (container_memory_rss{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "max",
"query": "max_over_time(sum by(namespace) (container_memory_rss{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
{
"function": "min",
"query": "min_over_time(sum by(namespace) (container_memory_rss{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
]
},
{
"name": "namespaceTotalPods",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "sum",
"query": "sum(count(kube_pod_status_phase{namespace=\"$NAMESPACE$\"}))",
"version": ""
}
]
},
{
"name": "namespaceRunningPods",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": [
{
"function": "sum",
"query": "sum(count(kube_pod_status_phase{namespace=\"$NAMESPACE$\", phase=\"Running\"}))",
"version": ""
}
]
},
{
"name": "acceleratorCoreUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
{
"function": "min",
"query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
{
"function": "max",
"query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
}
]
},
{
"name": "acceleratorMemoryUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": [
{
"function": "avg",
"query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
{
"function": "min",
"query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
{
"function": "max",
"query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
}
]
}
]
}
}
Response
- Success
{ "message": "Metric Profile : <name> created successfully. View all metric profiles at /listMetricProfiles", "httpcode": 201, "documentationLink": "", "status": "SUCCESS" } - Failure
- Duplicate Metric Profile name.
{ "message": "Metric Profile already exists", "httpcode": 409, "documentationLink": "", "status": "ERROR" } - Mandatory parameters are missing.
{ "message": "Missing mandatory parameters", "httpcode": 400, "documentationLink": "", "status": "ERROR" } - Any unknown exception on server side
{ "message": "Internal Server Error", "httpcode": 500, "documentationLink": "", "status": "ERROR" }Note: One of query or aggregation_functions is mandatory. Both can be present together.
- Duplicate Metric Profile name.
List Metric Profiles
List metric profiles output JSON as follows.
Request
GET /listMetricProfiles
curl -H 'Accept: application/json' http://<URL>:<PORT>/listMetricProfiles
Response
[
{
"name": "resource-optimization-openshift"
}
]
Request
GET /listMetricProfiles
curl -H 'Accept: application/json' http://<URL>:<PORT>/listMetricProfiles?name=resource-optimization-openshift
Response
[
{
"apiVersion": "recommender.com/v1",
"kind": "KruizePerformanceProfile",
"metadata": {
"name": "resource-optimization-openshift"
},
"profile_version": 1.0,
"k8s_type": "openshift",
"slo": {
"sloClass": "resource_usage",
"objective_function": {
"function_type": "source"
},
"direction": "minimize",
"function_variables": [
{
"name": "cpuRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD ', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"min": {
"function": "min",
"query": "min by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"max": {
"function": "max",
"query": "max by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"sum": {
"function": "sum",
"query": "sum by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='cpu', unit='core' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
}
},
{
"name": "cpuLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"min": {
"function": "min",
"query": "min by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"max": {
"function": "max",
"query": "max by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"sum": {
"function": "sum",
"query": "sum by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='cpu', unit='core',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
}
},
{
"name": "cpuUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container, namespace)(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"min": {
"function": "min",
"query": "min by(container, namespace)(min_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"max": {
"function": "max",
"query": "max by(container, namespace)(max_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"sum": {
"function": "sum",
"query": "sum by(container, namespace)(avg_over_time(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
}
},
{
"name": "cpuThrottle",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"min": {
"function": "min",
"query": "min by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"max": {
"function": "max",
"query": "max by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"sum": {
"function": "sum",
"query": "sum by(container,namespace) (rate(container_cpu_cfs_throttled_seconds_total{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
}
},
{
"name": "memoryRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"min": {
"function": "min",
"query": "min by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"max": {
"function": "max",
"query": "max by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"sum": {
"function": "sum",
"query": "sum by(container, namespace) (kube_pod_container_resource_requests{container!='', container!='POD', pod!='', resource='memory', unit='byte' ,namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
}
},
{
"name": "memoryLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"min": {
"function": "min",
"query": "min by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"max": {
"function": "max",
"query": "max by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
},
"sum": {
"function": "sum",
"query": "sum by(container,namespace) (kube_pod_container_resource_limits{container!='', container!='POD', pod!='', resource='memory', unit='byte',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"})"
}
}
},
{
"name": "memoryUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container, namespace) (avg_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"min": {
"function": "min",
"query": "min by(container, namespace) (min_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\" }[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"max": {
"function": "max",
"query": "max by(container, namespace) (max_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"sum": {
"function": "sum",
"query": "sum by(container, namespace) (avg_over_time(container_memory_working_set_bytes{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
}
},
{
"name": "memoryRSS",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by(container, namespace) (avg_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"min": {
"function": "min",
"query": "min by(container, namespace) (min_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"max": {
"function": "max",
"query": "max by(container, namespace) (max_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
},
"sum": {
"function": "sum",
"query": "sum by(container, namespace) (avg_over_time(container_memory_rss{container!='', container!='POD', pod!='',namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))"
}
}
},
{
"name": "maxDate",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"max": {
"function": "max",
"query": "max by(namespace,container) (last_over_time((timestamp(container_cpu_usage_seconds_total{namespace=\"$NAMESPACE$\",container=\"$CONTAINER_NAME$\"} > 0))[15d:]))"
}
}
},
{
"name": "namespaceCpuRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"sum": {
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"requests.cpu\", type=\"hard\"})",
"version": ""
}
}
},
{
"name": "namespaceCpuLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"sum": {
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"limits.cpu\", type=\"hard\"})",
"version": ""
}
}
},
{
"name": "namespaceMemoryRequest",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"sum": {
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"requests.memory\", type=\"hard\"})",
"version": ""
}
}
},
{
"name": "namespaceMemoryLimit",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"sum": {
"function": "sum",
"query": "sum by (namespace) (kube_resourcequota{namespace=\"$NAMESPACE$\", resource=\"limits.memory\", type=\"hard\"})",
"version": ""
}
}
},
{
"name": "namespaceCpuUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"min": {
"function": "min",
"query": "min_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"max": {
"function": "max",
"query": "max_over_time(sum by(namespace) (node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
}
},
{
"name": "namespaceCpuThrottle",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"min": {
"function": "min",
"query": "min_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"max": {
"function": "max",
"query": "max_over_time(sum by(namespace) (rate(container_cpu_cfs_throttled_seconds_total{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''}[5m]))[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
}
},
{
"name": "namespaceMemoryUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"min": {
"function": "min",
"query": "min_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"max": {
"function": "max",
"query": "max_over_time(sum by(namespace) (container_memory_working_set_bytes{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
}
},
{
"name": "namespaceMemoryRSS",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg_over_time(sum by(namespace) (container_memory_rss{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"min": {
"function": "min",
"query": "min_over_time(sum by(namespace) (container_memory_rss{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
},
"max": {
"function": "max",
"query": "max_over_time(sum by(namespace) (container_memory_rss{namespace=\"$NAMESPACE$\", container!='', container!='POD', pod!=''})[$MEASUREMENT_DURATION_IN_MIN$m:])",
"version": ""
}
}
},
{
"name": "namespaceTotalPods",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"sum": {
"function": "sum",
"query": "sum(count(kube_pod_status_phase{namespace=\"$NAMESPACE$\"}))",
"version": ""
}
}
},
{
"name": "namespaceRunningPods",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "namespace",
"aggregation_functions": {
"sum": {
"function": "sum",
"query": "sum(count(kube_pod_status_phase{namespace=\"$NAMESPACE$\", phase=\"Running\"}))",
"version": ""
}
}
},
{
"name": "acceleratorCoreUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
"min": {
"function": "min",
"query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
"max": {
"function": "max",
"query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
}
}
},
{
"name": "acceleratorMemoryUsage",
"datasource": "prometheus",
"value_type": "double",
"kubernetes_object": "container",
"aggregation_functions": {
"avg": {
"function": "avg",
"query": "avg by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (avg_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
"min": {
"function": "min",
"query": "min by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (min_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
},
"max": {
"function": "max",
"query": "max by (Hostname,device,modelName,UUID,exported_container,exported_namespace) (max_over_time(DCGM_FI_DEV_MEM_COPY_UTIL{exported_namespace=\"$NAMESPACE$\",exported_container=\"$CONTAINER_NAME$\"}[$MEASUREMENT_DURATION_IN_MIN$m]))",
"version": ""
}
}
}
]
}
}
]