Prometheus集成pushgateway监控k8s集群

琉璃1年前技术文章716

Prometheus部署

环境介绍

本文的k8s环境是通过二进制方式搭建的v1.20.13版本

清单准备

注意集群版本的坑,自己先到Github上下载对应的版本。

注意: 集群版本在v1.21.x之前需要注意下载对应分支的安装包,否则可能会有问题。集群版本和适配分支对应关系如下:

image.png

部署prometheus

下载安装包,下载命令:

image.png

git clone https://github.com/prometheus-operator/kube-prometheus.git

下载指定版本的

git clone -b release-0.7 https://gh.con.sh/https://github.com/prometheus-operator/kube-prometheus.git

下载完成之后解压,并执行安装命令:

进入kube-prometheus/manifests/setup,就可以直接创建CRD对象:

[root@master01 kube-prometheus-main]# kubectl apply --server-side -f manifests/setup 
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
namespace/monitoring serverside-applied

可以看到创建如下的CRD对象:

[root@master01 prometheus-Operator]# kubectl get crd | grep coreos
alertmanagerconfigs.monitoring.coreos.com        2021-12-16T12:17:58Z
alertmanagers.monitoring.coreos.com              2021-12-16T07:03:11Z
podmonitors.monitoring.coreos.com                2021-12-16T12:17:58Z
probes.monitoring.coreos.com                     2021-12-16T12:17:58Z
prometheuses.monitoring.coreos.com               2021-12-16T07:03:11Z
prometheusrules.monitoring.coreos.com            2021-12-16T07:03:11Z
servicemonitors.monitoring.coreos.com            2021-12-16T07:03:11Z
thanosrulers.monitoring.coreos.com               2021-12-16T12:17:58Z

然后在上层目录创建资源清单:

[root@master01 kube-prometheus-main]# kubectl apply -f manifests/ 
alertmanager.monitoring.coreos.com/main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created
unable to recognize "manifests/alertmanager-podDisruptionBudget.yaml": no matches for kind "PodDisruptionBudget" in version "policy/v1"
unable to recognize "manifests/prometheus-podDisruptionBudget.yaml": no matches for kind "PodDisruptionBudget" in version "policy/v1"
unable to recognize "manifests/prometheusAdapter-podDisruptionBudget.yaml": no matches for kind "PodDisruptionBudget" in version "policy/v1"

可以看到有三个资源没有正常创建,需要修改下yaml文件。

将:apiVersion: policy/v1beta1
修改为:apiVersion: policy/v1beta1

然后重新创建资源:

[root@master01 manifests]# kubectl apply -f alertmanager-podDisruptionBudget.yaml
poddisruptionbudget.policy/alertmanager-main created
[root@master01 manifests]# kubectl apply -f prometheus-podDisruptionBudget.yaml
poddisruptionbudget.policy/prometheus-k8s created
[root@master01 manifests]# kubectl apply -f prometheusAdapter-podDisruptionBudget.yaml
poddisruptionbudget.policy/prometheus-adapter created

查看创建的pod:

[root@master01 prometheus-Operator]# kubectl get pod -n monitoring 
NAME                                        READY   STATUS    RESTARTS   AGE
alertmanager-main-0                         2/2     Running   0          6d
alertmanager-main-1                         2/2     Running   0          6d
alertmanager-main-2                         2/2     Running   0          6d
blackbox-exporter-6798fb5bb4-mwdkt          3/3     Running   0          6d
grafana-d7f564887-fd7m2                     1/1     Running   0          6d
kube-state-metrics-7b8ccf569-nxcxv          3/3     Running   0          5d20h
node-exporter-2pgfm                         2/2     Running   0          6d
node-exporter-chcfv                         2/2     Running   2          6d
node-exporter-kfgth                         2/2     Running   0          6d
node-exporter-pvvm2                         2/2     Running   0          6d
node-exporter-vwpzr                         2/2     Running   0          6d
node-exporter-zcm82                         2/2     Running   0          6d
prometheus-adapter-56b57579b4-hrb7x         1/1     Running   0          5d23h
prometheus-adapter-56b57579b4-t5dzv         1/1     Running   0          5d23h
prometheus-k8s-0                            2/2     Running   0          6d
prometheus-k8s-1                            2/2     Running   0          6d
prometheus-operator-66cf6bd9c6-xcxjp        2/2     Running   0          6d

查看创建的Service:

[root@master01 prometheus-Operator]# kubectl get svc -n monitoring
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main                    ClusterIP   10.254.30.198    <none>        9093/TCP,8080/TCP            6d
alertmanager-operated                ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   6d
blackbox-exporter                    ClusterIP   10.254.83.3      <none>        9115/TCP,19115/TCP           6d
grafana                              ClusterIP   10.254.132.62    <none>        3000/TCP                     6d
kube-state-metrics                   ClusterIP   None             <none>        8443/TCP,9443/TCP            6d
node-exporter                        ClusterIP   None             <none>        9100/TCP                     6d
prometheus-adapter                   ClusterIP   10.254.27.119    <none>        443/TCP                      6d
prometheus-k8s                       ClusterIP   10.254.12.222    <none>        9090/TCP,8080/TCP            6d
prometheus-operated                  ClusterIP   None             <none>        9090/TCP                     6d
prometheus-operator                  ClusterIP   None             <none>        8443/TCP                     6d

看到我们常用的prometheus和grafana都是clustorIP,我们要外部访问可以配置为NodePort类型或者用ingress。配置ingress:

  • 配置prometheus-ingress.yaml

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: prometheus-ingress  # Ingress 的名字,仅用于标识
  namespace: monitoring
spec:
  ingressClassName: nginx
  rules:                      # Ingress 中定义 L7 路由规则
  - host: prometheus.com   # 根据 virtual hostname 进行路由(请使用您自己的域名)
    http:
      paths:                  # 按路径进行路由
      - path: /
        backend:
          serviceName: prometheus-k8s  # 指定后端的 Service 为之前创建的 nginx-service
          servicePort: 9090
  • 配置grafana-ingress.yaml

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: grafana-ingress  # Ingress 的名字,仅用于标识
  namespace: monitoring
spec:
  ingressClassName: nginx
  rules:                      # Ingress 中定义 L7 路由规则
  - host: grafana.com  # 根据 virtual hostname 进行路由(请使用您自己的域名)
    http:
      paths:                  # 按路径进行路由
      - path: /
        backend:
          serviceName: grafana  # 指定后端的 Service 为之前创建的 nginx-service
          servicePort: 3000

查看ingress

[root@master01 manifests]# kubectl get  ingress -n monitoring
NAME                 CLASS   HOSTS            ADDRESS          PORTS   AGE
grafana-ingress      nginx   grafana.com      10.254.174.120   80      42h
prometheus-ingress   nginx   prometheus.com   10.254.174.120   80      44h

部署prometheus-pushgateway

cd /root/kube-prometheus/manifests/pushgateway

[root@test-master-65 pushgateway]# cat pushgateway-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-pushgateway
  name: prometheus-pushgateway
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: prometheus-pushgateway
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: prometheus-pushgateway
        component: "prometheus-pushgateway"
    spec:
      serviceAccountName: prometheus-pushgateway
      containers:
        - name: prometheus-pushgateway
          image: "prom/pushgateway:v0.5.2"
          imagePullPolicy: "IfNotPresent"
          args:
          ports:
            - containerPort: 9091
          readinessProbe:
            httpGet:
              path: /#/status
              port: 9091
            initialDelaySeconds: 10
            timeoutSeconds: 10
          resources:
            {}
[root@test-master-65 pushgateway]# cat pushgateway-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: prometheus-pushgateway
  name: prometheus-pushgateway
  namespace: monitoring
You have mail in /var/spool/mail/root

[root@test-master-65 pushgateway]# cat pushgateway-service.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/probe: pushgateway
  labels:
    k8s-app: prometheus-pushgateway
  name: prometheus-pushgateway
  namespace: monitoring
spec:
  ports:
    - name: http
      port: 9091
      protocol: TCP
      targetPort: 9091
      nodePort: 31014
  selector:
    k8s-app: prometheus-pushgateway
    component: "prometheus-pushgateway"
  # type: LoadBalancer   # NodePort

参考文章: https://github.com/prometheus/pushgateway

查看grafana监控数据

1、登陆grafana

image.png

2、核查已接入数据源

image.png

默认已将prometheus数据源接入

image.png

3、单击 Home 下拉菜单,选择已配置好的 dashboard。

image.png

查看 Alertmanager 告警

查看AlertManager service资源,会发现当前为ClusterIP类型

[root@master01 manifests]# kubectl get svc -A | grep alertmanager-main 
monitoring      alertmanager-main                    ClusterIP   10.254.30.198    <none>        9093/TCP,8080/TCP              6d17h

先临时修改为NodePort类型

kubectl edit svc -n monitoring alertmanager-main
将type: NodePort
修改为type: NodePort

再次查看AlertManager  service资源

[root@master01 manifests]# kubectl get svc -n monitoring alertmanager-main 
NAME                TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
alertmanager-main   NodePort   10.254.30.198   <none>        9093:30407/TCP,8080:30024/TCP   6d17h

浏览器访问AlertManager ,查看AlertManager 的配置信息

image.png

image.png

这些配置信息实际上是来自于我们之前在kube-prometheus/manifests目录下面创建的 alertmanager-secret.yaml 文件:

[root@master01 manifests]# cat alertmanager-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.23.0
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    "global":
      "resolve_timeout": "5m" 
    "inhibit_rules": #告警抑制规则
    - "equal":
      - "namespace"
      - "alertname"
      "source_matchers":
      - "severity = critical"
      "target_matchers":
      - "severity =~ warning|info"
    - "equal":
      - "namespace"
      - "alertname"
      "source_matchers":
      - "severity = warning"
      "target_matchers":
      - "severity = info"
    "receivers": # 接收器指定发送人以及发送渠道
    - "name": "Default"
    - "name": "Watchdog"
    - "name": "Critical"
    "route":
      "group_by": # 报警分组
      - "namespace"
      "group_interval": "5m" # 如果组内内容不变化,合并为一条警报信息,5m后发送。
      "group_wait": "30s" # 在组内等待所配置的时间,如果同组内,30秒内出现相同报警,在一个组内出现
      "receiver": "Default"
      "repeat_interval": "12h" # 发送报警间隔,如果指定时间内没有修复,则重新发送报警。
      "routes":
      - "matchers": #根据 标签进行匹配,走不同的接收规则
        - "alertname = Watchdog"
        "receiver": "Watchdog"
      - "matchers":
        - "severity = critical"
        "receiver": "Critical"
type: Opaque

但是目前告警只是在prometheus告警系统内可见

image.png


相关文章

hive元数据操作

1.查看hive从超过5000分区的表select dbs.name, tbls.TBL_NAME, count(1) as part_count from dbs, tbls, partitions...

分布式存储-GlusterFS

分布式存储-GlusterFS

一、分布式存储介绍我们知道NAS是远程通过网络共享目录, SAN是远程通过网络共享块设备。那么分布式存储你可以看作拥有多台存储服务器连接起来的存储输出端。把这多台存储服务器的存储合起来做成一个整体再通...

Spark接入Kerberos交互式命令窗口提交任务

Spark接入Kerberos交互式命令窗口提交任务

1. Spark-shell首先需要使用有操作hdfs文件权限的keytab用户认证,认证上之后可以通过spark-shell交互命令行窗口执行任务如果集成了Ranger组件,如果创建的普通用户没有在...

Python 装饰器

1、闭包自由变量:未在本地作用域中定义的变量。例如定义在内层函数外的外层函数的作用域中的变量。闭包:就是一个概念,出现在嵌套函数中,指的是内层函数引用到了外层函数的自由变量,就形成了闭包。很多语言都有...

kubernetes job和cronjob

kubernetes job和cronjob

一、JobJob 负责批处理任务,即仅执行一次的任务,它保证批处理任务的一个或多个 Pod 成功结束。特殊说明:1、spec.template 格式同 Pod2、RestartPolicy 仅支持 N...

Doris FE节点故障恢复

Doris FE节点故障恢复

FE故障恢复现象:FE由于元数据损坏导致无法启动            &nbs...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。