MongoDB复制原理

太阳1年前 (2023-12-18)技术文章479

一、Initial Sync

大体来说，MongoDB副本集同步主要包含两个步骤： 1. Initial Sync，全量同步 2. Replication，即sync oplog 先通过init sync同步全量数据，再通过replication不断重放Primary上的oplog同步增量数据。全量同步完成后，成员从转换 STARTUP2为SECONDARY

1.1 初始化同步过程

1) 全量同步开始，获取同步源上的最新时间戳t1
2) 全量同步集合数据，建立索引（比较耗时）
3) 获取同步源上最新的时间戳t2
4) 重放t1到t2之间所有的oplog
5) 全量同步结束

简单来说，就是遍历Primary上的所有DB的所有集合，将数据拷贝到自身节点，然后读取全量同步开始到结束时间段内的oplog并重放。

initial sync结束后，Secondary会建立到Primary上local.oplog.rs的tailable cursor，不断从Primary上获取新写入的oplog，并应用到自身。

1.2 初始化同步场景

Secondary节点当出现如下状况时，需要先进⾏全量同步 1) oplog为空 2) local.replset.minvalid集合⾥_initialSyncFlag字段设置为true（用于init sync失败处理） 3) 内存标记initialSyncRequested设置为true（用于resync命令，resync命令只用于master/slave架构，副本集无法使用）

这3个场景分别对应(场景2和场景3没看到官网文档有写，参考张友东大神博客) 1) 新节点加⼊，⽆任何oplog，此时需先进性initial sync 2) initial sync开始时，会主动将_initialSyncFlag字段设置为true，正常结束后再设置为false；如果节点重启时，发现_initialSyncFlag为true，说明上次全量同步中途失败了，此时应该重新进⾏initial sync 3)当⽤户发送resync命令时，initialSyncRequested会设置为true，此时会强制重新开始⼀次initial sync

二、Replication

2.1 sync oplog的过程

全量同步结束后，Secondary就开始从结束时间点建立tailable cursor，不断的从同步源拉取oplog并重放应用到自身，这个过程并不是由一个线程来完成的，mongodb为了提升同步效率，将拉取oplog以及重放oplog分到了不同的线程来执行。具体线程和作用如下（这部分暂时没有在官方文档找到，来自张友东大神博客）：

producer thread：这个线程不断的从同步源上拉取oplog，并加入到一个BlockQueue的队列里保存着，BlockQueue最大存储240MB的oplog数据，当超过这个阈值时，就必须等到oplog被replBatcher消费掉才能继续拉取。
replBatcher thread：这个线程负责逐个从producer thread的队列里取出oplog，并放到自己维护的队列里，这个队列最多允许5000个元素，并且元素总大小不超过512MB，当队列满了时，就需要等待oplogApplication消费掉
oplogApplication会取出replBatch thread当前队列的所有元素，并将元素根据docId（如果存储引擎不支持文档锁，则根据集合名称）分散到不同的replWriter线程，replWriter线程将所有的oplog应用到自身；等待所有oplog都应用完毕，oplogApplication线程将所有的oplog顺序写入到local.oplog.rs集合。

2.3 注意事项

initial sync单线程复制数据，效率比较低，生产环境应该尽量避免initial sync出现，需合理配置oplog。
新加⼊节点时，可以通过物理复制的⽅式来避免initial sync，将Primary上的dbpath拷⻉到新的节点，然后直接启动。
当Secondary同步滞后是因为主上并发写入太高导致，db.serverStatus().metrics.repl.buffer的 sizeBytes值持续接近maxSizeBytes的时候，可通过调整Secondary上replWriter并发线程数来提升。

三、日志分析

3.1 初始化同步日志

将日志级别 verbosity设置为 1，然后过滤日志 cat mg36000.log |egrep "clone|index|oplog" >b.log 最后拿出过滤后的部分日志。 3.4.21新加入节点日志

因为日志太多，贴太多出来也没什么意义，下面贴出了对db01库的某个
集合的日志。
可以发现是先创建collection索引，然后clone集合数据和索引数据，这样就完成了该集合的clone。最后将配置改为下一个集合。

2019-08-21T16:50:10.880+0800 D STORAGE  [InitialSyncInserters-db01.test20] create uri: table:db01/index-27-154229953453504826 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "num" : 1 }, "name" : "num_1", "ns" : "db01.test2" }),
2019-08-21T16:50:10.882+0800 I INDEX    [InitialSyncInserters-db01.test20] build index on: db01.test2 properties: { v: 2, key: { num: 1.0 }, name: "num_1", ns: "db01.test2" }
2019-08-21T16:50:10.882+0800 I INDEX    [InitialSyncInserters-db01.test20]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-08-21T16:50:10.882+0800 D STORAGE  [InitialSyncInserters-db01.test20] create uri: table:db01/index-28-154229953453504826 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "db01.test2" }),
2019-08-21T16:50:10.886+0800 I INDEX    [InitialSyncInserters-db01.test20] build index on: db01.test2 properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "db01.test2" }
2019-08-21T16:50:10.886+0800 I INDEX    [InitialSyncInserters-db01.test20]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-08-21T16:50:10.901+0800 D INDEX    [InitialSyncInserters-db01.test20]      bulk commit starting for index: num_1
2019-08-21T16:50:10.906+0800 D INDEX    [InitialSyncInserters-db01.test20]      bulk commit starting for index: _id_
2019-08-21T16:50:10.913+0800 D REPL     [repl writer worker 11] collection clone finished: db01.test2
2019-08-21T16:50:10.913+0800 D REPL     [repl writer worker 11]     collection: db01.test2, stats: { ns: "db01.test2", documentsToCopy: 2000, documentsCopied: 2000, indexes: 2, fetchedBatches: 1, start: new Date(1566377410875), end: new Date(1566377410913), elapsedMillis: 38 }
2019-08-21T16:50:10.920+0800 D STORAGE  [InitialSyncInserters-db01.collection10] create uri: table:db01/index-30-154229953453504826 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "db01.collection1" }),

3.6.12加入新节点日志

3.6较3.4的区别是，复制数据库的线程明确了是：repl writer worker 进行重放（看文档其实3.4已经是如此了）
还有就是明确是用cursors来进行。
其他和3.4没有区别，也是创建索引，然后clone数据。

2019-08-22T13:59:39.444+0800 D STORAGE  [repl writer worker 9] create uri: table:db01/index-32-3334250984770678501 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "db01.collection1" }),log=(enabled=true)
2019-08-22T13:59:39.446+0800 I INDEX    [repl writer worker 9] build index on: db01.collection1 properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "db01.collection1" }
2019-08-22T13:59:39.446+0800 I INDEX    [repl writer worker 9]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-08-22T13:59:39.447+0800 D REPL     [replication-1] Collection cloner running with 1 cursors established.
2019-08-22T13:59:39.681+0800 D INDEX    [repl writer worker 7]      bulk commit starting for index: _id_
2019-08-22T13:59:39.725+0800 D REPL     [repl writer worker 7] collection clone finished: db01.collection1
2019-08-22T13:59:39.725+0800 D REPL     [repl writer worker 7]     database: db01, stats: { dbname: "db01", collections: 1, clonedCollections: 1, start: new Date(1566453579439), end: new Date(1566453579725), elapsedMillis: 286 }
2019-08-22T13:59:39.725+0800 D REPL     [repl writer worker 7]     collection: db01.collection1, stats: { ns: "db01.collection1", documentsToCopy: 50000, documentsCopied: 50000, indexes: 1, fetchedBatches: 1, start: new Date(1566453579440), end: new Date(1566453579725), elapsedMillis: 285 }
2019-08-22T13:59:39.731+0800 D STORAGE  [repl writer worker 8] create uri: table:test/index-34-3334250984770678501 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.user1" }),log=(enabled=true)

4.0.11加入新节点日志

使用cursors，和3.6基本一致

2019-08-22T15:02:13.806+0800 D STORAGE  [repl writer worker 15] create uri: table:db01/index-30--463691904336459055 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "num" : 1 }, "name" : "num_1", "ns" : "db01.collection1" }),log=(enabled=false)
2019-08-22T15:02:13.816+0800 I INDEX    [repl writer worker 15] build index on: db01.collection1 properties: { v: 2, key: { num: 1.0 }, name: "num_1", ns: "db01.collection1" }
2019-08-22T15:02:13.816+0800 I INDEX    [repl writer worker 15]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-08-22T15:02:13.816+0800 D STORAGE  [repl writer worker 15] create uri: table:db01/index-31--463691904336459055 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "db01.collection1" }),log=(enabled=false)
2019-08-22T15:02:13.819+0800 I INDEX    [repl writer worker 15] build index on: db01.collection1 properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "db01.collection1" }
2019-08-22T15:02:13.819+0800 I INDEX    [repl writer worker 15]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-08-22T15:02:13.820+0800 D REPL     [replication-0] Collection cloner running with 1 cursors established.

3.2 复制日志

2019-08-22T15:15:17.566+0800 D STORAGE  [repl writer worker 2] create collection db01.collection2 { uuid: UUID("8e61a14e-280c-4da7-ad8c-f6fd086d9481") }
2019-08-22T15:15:17.567+0800 I STORAGE  [repl writer worker 2] createCollection: db01.collection2 with provided UUID: 8e61a14e-280c-4da7-ad8c-f6fd086d9481
2019-08-22T15:15:17.567+0800 D STORAGE  [repl writer worker 2] stored meta data for db01.collection2 @ RecordId(22)
2019-08-22T15:15:17.580+0800 D STORAGE  [repl writer worker 2] db01.collection2: clearing plan cache - collection info cache reset
2019-08-22T15:15:17.580+0800 D STORAGE  [repl writer worker 2] create uri: table:db01/index-43--463691904336459055 config: type=file,internal_page_max=16k,leaf_page_max=16k,checksum=on,prefix_compression=true,block_compressor=,,,,key_format=u,value_format=u,app_metadata=(formatVersion=8,infoObj={ "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "db01.collection2" }),log=(enabled=false)

返回列表

上一篇：ReadConcern与WriteConcern

下一篇：副本集同步原理

MongoDB复制原理

一、Initial Sync

1.1 初始化同步过程

1.2 初始化同步场景

二、Replication

2.1 sync oplog的过程

2.3 注意事项

三、日志分析

3.1 初始化同步日志

3.2 复制日志

相关文章

通过Nodeport方式暴露集群

oracle手工完全恢复

HDP-Yarn开启CPU调度和隔离

trino组件对接ldap（二）

CDH实操--修改集群主机名

MySQL基本配置文件

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号

MongoDB复制原理

一、Initial Sync

1.1 初始化同步过程

1.2 初始化同步场景

二、Replication

2.1 sync oplog的过程

2.3 注意事项

三、日志分析

3.1 初始化同步日志

3.2 复制日志

相关文章

通过Nodeport方式暴露集群

oracle手工完全恢复

HDP-Yarn开启CPU调度和隔离

trino组件对接ldap（二）

CDH实操--修改集群主机名

MySQL基本配置文件

发表评论 取消回复

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号var _hmt = _hmt || [];(function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?dcf8139ce75b768b71dccc5e589b983c"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s);})();

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号