大数据集群监控配置操作指导(四)Spark监控使用jmx

芒果2年前技术文章2474


graphite_exporter方式
Graphite 来收集度量标准,Grafana 则用于构建仪表板,首先,需要配置 Spark 以将 metrics 报告到 Graphite。
prometheus 提供了一个插件(graphite_exporter),可以将 Graphite metrics 进行转化并写入 Prometheus (本文的方式)。
先去https://prometheus.io/download/下载graphite_exporter。
wget https://github.com/prometheus/graphite_exporter/releases/download/v0.13.1/graphite_exporter-0.13.1.linux-amd64.tar.gz
解压并修改为graphite_exporter
[root@hd1 exporters]# tar -xvf graphite_exporter-0.13.1.linux-amd64.tar.gz
graphite_exporter-0.13.1.linux-amd64/
graphite_exporter-0.13.1.linux-amd64/LICENSE
graphite_exporter-0.13.1.linux-amd64/NOTICE
graphite_exporter-0.13.1.linux-amd64/graphite_exporter
graphite_exporter-0.13.1.linux-amd64/getool
[root@hd1 exporters]# mv graphite_exporter-0.13.1.linux-amd64 graphite_exporter
进入到graphite_exporter下
创建graphite_exporter_mapping文件:
vim graphite_exporter_mapping
添加如下内容
mappings:
- match: '*.*.executor.filesystem.*.*'
  name: spark_app_filesystem_usage
  labels:
    application: $1
    executor_id: $2
    fs_type: $3
    qty: $4
- match: '*.*.jvm.*.*'
  name: spark_app_jvm_memory_usage
  labels:
    application: $1
    executor_id: $2
    mem_type: $3
    qty: $4
- match: '*.*.executor.jvmGCTime.count'
  name: spark_app_jvm_gcTime_count
  labels:
    application: $1
    executor_id: $2
- match: '*.*.jvm.pools.*.*'
  name: spark_app_jvm_memory_pools
  labels:
    application: $1
    executor_id: $2
    mem_type: $3
    qty: $4
- match: '*.*.executor.threadpool.*'
  name: spark_app_executor_tasks
  labels:
    application: $1
    executor_id: $2
    qty: $3
- match: '*.*.BlockManager.*.*'
  name: spark_app_block_manager
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4
- match: '*.*.DAGScheduler.*.*'
  name: spark_app_dag_scheduler
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4
- match: '*.*.CodeGenerator.*.*'
  name: spark_app_code_generator
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4
- match: '*.*.HiveExternalCatalog.*.*'
  name: spark_app_hive_external_catalog
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4
- match: '*.*.*.StreamingMetrics.*.*'
  name: spark_app_streaming_metrics
  labels:
    application: $1
    executor_id: $2
    app_name: $3
    type: $4
    qty: $5
- match: '*.*.executor.filesystem.*.*'
  name: filesystem_usage
  labels:
    application: $1
    executor_id: $2
    fs_type: $3
    qty: $4
- match: '*.*.executor.threadpool.*'
  name: executor_tasks
  labels:
    application: $1
    executor_id: $2
    qty: $3
- match: '*.*.executor.jvmGCTime.count'
  name: jvm_gcTime_count
  labels:
    application: $1
    executor_id: $2
- match: '*.*.executor.*.*'
  name: executor_info
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4
- match: '*.*.jvm.*.*'
  name: jvm_memory_usage
  labels:
    application: $1
    executor_id: $2
    mem_type: $3
    qty: $4
- match: '*.*.jvm.pools.*.*'
  name: jvm_memory_pools
  labels:
    application: $1
    executor_id: $2
    mem_type: $3
    qty: $4
- match: '*.*.BlockManager.*.*'
  name: block_manager
  labels:
    application: $1
    executor_id: $2
    type: $3
    qty: $4
- match: '*.driver.DAGScheduler.*.*'
  name: DAG_scheduler
  labels:
    application: $1
    type: $2
    qty: $3
- match: '*.driver.*.*.*.*'
  name: task_info
  labels:
    application: $1
    task: $2
    type1: $3
    type2: $4
    qty: $5
启动graphite_exporter(成功后 停止进程配置服务)
./graphite_exporter --graphite.mapping-config=graphite_exporter_mapping
配置成服务
vim /etc/systemd/system/graphite_exporter.service
[Unit]
Description=graphite_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=root
ExecStart=/opt/dtstack/exporters/graphite_exporter/graphite_exporter --graphite.mapping-config=/opt/dtstack/exporters/graphite_exporter/graphite_exporter_mapping
Restart=on-failure
[Install]
WantedBy=multi-user.target
启动graphite_exporter服务,并配置开机自启
systemctl daemon-reload
systemctl start graphite_exporter
systemctl status graphite_exporter
systemctl enable graphite_exporter 

B6BCAACC-286F-49A4-BD45-BCC918F6A7C7.png



配置Prometheus
vim /opt/dtstack/prometheus-2.33.3/prometheus.yml
增加
  - job_name: 'graphite_exporter'
    static_configs:
    - targets:
      - ‘hd1:9108' 




重启prometheus
在prometheus服务器上执行
systemctl restart prometheus
Spark配置Graphite metrics
Spark 是自带 Graphite Sink 的,
只需要配置一下metrics.properties;
 进入到spark安装目录下,进入到conf目录下,找到metrics.properties
cd /opt/spark/conf/
vim metrics.properties
添加如下内容:(graphite_exporter 接收数据端口为9109)
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.graphite.protocol=tcp
*.sink.graphite.host=hd1(主机名)
*.sink.graphite.port=9109
*.sink.graphite.period=1
*.sink.graphite.unit=seconds 

D723D32F-30D9-4C28-89A0-A04B3C4E1E5A.png



启动Spark程序 
启动spark程序时,需要加上–files /usr/etc/spark/conf/metrics.properties参数。
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --files /opt/spark/conf/metrics.properties --executor-cores 1 --queue default examples/jars/spark-examples_2.12-3.3.1.jar 10
shell:
./bin/spark-shell --files /opt/spark/conf/metrics.properties
访问Prometheus是否收集到metrics数据
http://hd1:9108/metrics 


23C9B04B-81D3-4D9D-8BF7-E6D1ED1DBCB7.png


相关文章

mysql高可用半同步配置(二)

一、配置半同步1.1、部署半同步:#首先判断MySQL服务器是否支持动态增加插件mysql> select @@have_dynamic_loading#确认支持动态增加插件后,检查MySQL的...

Kafka性能维度标准

如何判断一个kafka集群是否已经处于性能瓶颈,通常的判断条件有如下几点:维度1:磁盘IO读写磁盘性能是kafka重要的参数指标,如果磁盘IO到达性能瓶颈会直接导致业务故障。Kafka读写性能跟磁盘I...

Debezium抽取SQL Server同步kafka

Debezium抽取SQL Server同步kafka

ebezium SQL Server连接器捕获SQL Server数据库模式中发生的行级更改。官方2.0文档:https://debezium.io/documentation/reference/2...

PromQL语法

PromQL语法

一、PromQL语法1.1、什么是PromQLPromQL(Prometheus Query Language)是 Prometheus 自己开发的表达式语言,语言表现力很丰富,内置函数也很多。使用它...

SQL Server优化入门系列(二)—— 等待事件

SQL Server优化入门系列(二)—— 等待事件

在上一篇文章中(SQL Server优化入门系列(一)——快速定位阻塞SQL),我们介绍了如何快速定位SQL Server中当前正在执行的SQL,以及被阻塞的SQL。这里,我们将介绍如何通过等待事件来...

Kubernetes 网络插件

Kubernetes 自身并不提供网络解决方案,允许托管使用第三方的网络解决方案。flannelcalicocanelkube-router......各种 CNI 插件的解决方案: 虚拟网桥(bri...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。