MySQL优化器特性（六）表扫描成本计算

俊达2年前 (2023-05-19)技术文章1020

全表扫描成本

使用optimizer_trace，或者使用explain format=tree, 或者explain format=json，可以查看查询的cost

mysql> explain format=tree select * from test_ror;
+---------------------------------------------------+
| EXPLAIN                                           |
+---------------------------------------------------+
| -> Table scan on test_ror  (cost=10.75 rows=105)
 |
+---------------------------------------------------+
1 row in set (0.00 sec)

使用explain format=json可以看到更详细的数据

mysql> explain format=json select * from test_ror;

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10.75"
    },
    "table": {
      "table_name": "test_ror",
      "access_type": "ALL",
      "rows_examined_per_scan": 105,
      "rows_produced_per_join": 105,
      "filtered": "100.00",
      "cost_info": {
        "read_cost": "0.25",
        "eval_cost": "10.50",
        "prefix_cost": "10.75",
        "data_read_per_join": "2K"
      },
      "used_columns": [
        "id",
        "a",
        "b",
        "c"
      ]
    }
  }
}

从上面的例子可以看到，查询总cost为10.75，其中包括read_cost 0.25, eval_cost 10.50。

我们来分析下这里的成本分别是怎么计算的。

全表扫描成本

全表扫描成本计算代码位于sql/sql_planner.cc文件，函数Optimize_table_order::best_access_path中计算。

    double scan_read_cost = calculate_scan_cost(
        tab, idx, best_ref, prefix_rowcount, found_condition, disable_jbuf,
        &rows_after_filtering, &trace_access_scan);

    /*
      We estimate the cost of evaluating WHERE clause for found
      records as row_evaluate_cost(prefix_rowcount * rows_after_filtering).
      This cost plus scan_cost gives us total cost of using
      TABLE/INDEX/RANGE SCAN.
    */
    const double scan_total_cost =
        scan_read_cost +
        cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering);

成本分为两部分：

扫描表的IO成本（table_scan_cost)
读取记录、分析记录的成本(row_evaluate_cost)

需要注意，这里的全表扫描成本，是在没有其他访问路径可选的情况下计算得到的成本。优化器在评估索引范围扫描的成本时，也会先评估一下全表扫描的成本，这时计算全表扫描的成本公式和本文的描述会有一些差异。

table_scan_cost

Cost_estimate handler::table_scan_cost() {
  /*
    This function returns a Cost_estimate object. The function should be
    implemented in a way that allows the compiler to use "return value
    optimization" to avoid creating the temporary object for the return value
    and use of the copy constructor.
  */

  const double io_cost = scan_time() * table->cost_model()->page_read_cost(1.0);
  Cost_estimate cost;
  cost.add_io(io_cost);
  return cost;
}

scan_time

对于innodb引擎，scan_time取primary key(cluster index)的大小（单位是page）。

double ha_innobase::scan_time() {
  ulint stat_clustered_index_size;
  stat_clustered_index_size = m_prebuilt->table->stat_clustered_index_size;
  return ((double)stat_clustered_index_size);
}

这个信息最终从统计信息中获取：

mysql> select * from mysql.innodb_table_stats where database_name='test' and table_name='test_ror';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| test          | test_ror   | 2023-05-06 15:07:16 |    105 |                    1 |                        3 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+

page_read_cost

page_read_cost会基于表在内存中的缓存比例来计算读取数据块的成本。mysql会评估表的缓存比例，分别计算内存中的块的数量以及磁盘中的块的数据量，

pages_in_mem = pages * in_mem

pages_on_disk = pages - pages_in_mem

cost = pages_in_mem * memory_block_read_cost + pages_on_disk * io_block_read_cost

其中memory_block_read_cost和io_block_read_cost分别代表从内存和磁盘上读取一个页面的成本，这两个值可以在mysql.engine_cost中配置。8.0版本中，memory_block_read_cost默认为0.25，io_block_read_cost默认为1。

page_read_cost代码：

double Cost_model_table::page_read_cost(double pages) const {
  assert(m_initialized);
  assert(pages >= 0.0);

  // 表cache在内存中的比例，范围是0 - 1，0表示所有数据都不在内存中。1表示所有数据都缓存在内存中
  const double in_mem = m_table->file->table_in_memory_estimate();

  const double pages_in_mem = pages * in_mem;
  const double pages_on_disk = pages - pages_in_mem;
  assert(pages_on_disk >= 0.0);

  // buffer_blaock_read_cost：从内存中读取page的成本
  // io_block_read_cost：从磁盘上读取page的成本
  const double cost =
      buffer_block_read_cost(pages_in_mem) + io_block_read_cost(pages_on_disk);

  return cost;
}

table_in_memory_estimate：计算表和索引数据缓存率

对于InnoDB，MySQL会维护每个索引缓存在内存中的Page数，可以基于这个数据计算出数据的缓存率。如果存储引擎没用提供缓存率数据（如MyISAM存储引擎），mysql会使用一个经验公式来预估数据的缓存率，具体逻辑在函数estimate_in_memory_buffer中实现。

double handler::table_in_memory_estimate() const {
  assert(stats.table_in_mem_estimate == IN_MEMORY_ESTIMATE_UNKNOWN ||
         (stats.table_in_mem_estimate >= 0.0 &&
          stats.table_in_mem_estimate <= 1.0));

  /*
    If the storage engine has supplied information about how much of the
    table that is currently in a memory buffer, then use this estimate.
  */
  if (stats.table_in_mem_estimate != IN_MEMORY_ESTIMATE_UNKNOWN)
    return stats.table_in_mem_estimate;

  /*
    The storage engine has not provided any information about how much of
    this index is in memory, use an heuristic to produce an estimate.
  */
  return estimate_in_memory_buffer(stats.data_file_length);
}

estimate_in_memory_buffer

根据表的大小和内存大小计算表的缓存率。对于InnoDB，内存大小是指InnoDB Buffer Pool的大小。

经验公式：如果表占用的空间不到内存的20%，则认为所有数据都缓存在内存中（缓存率为1）。如果表占用的空间超过内存的大小，则认为缓存率为0。介于两者之间的表，缓存率为 ( 1 - x ) / 0.8， 0.2 < x < 1，x为表的大小和内存大小的比值。

double handler::estimate_in_memory_buffer(ulonglong table_index_size) const {
  /*
    The storage engine has not provided any information about how much of
    the table/index is in memory. In this case we use a heuristic:

    - if the size of the table/index is less than 20 percent (pick any
      number) of the memory buffer, then the entire table/index is likely in
      memory.
    - if the size of the table/index is larger than the memory buffer, then
      assume nothing of the table/index is in memory.
    - if the size of the table/index is larger than 20 percent but less than
      the memory buffer size, then use a linear function of the table/index
      size that goes from 1.0 to 0.0.
  */

  longlong memory_buf_size = get_memory_buffer_size();
  if (memory_buf_size <= 0) memory_buf_size = 100 * 1024 * 1024;  // 100 MB

  const double table_index_in_memory_limit = 0.2;

  const double percent_of_mem =
      static_cast<double>(table_index_size) / memory_buf_size;

  double in_mem_est;

  if (percent_of_mem < table_index_in_memory_limit)  // Less than 20 percent
    in_mem_est = 1.0;
  else if (percent_of_mem > 1.0)  // Larger than buffer
    in_mem_est = 0.0;
  else {
    in_mem_est = 1.0 - (percent_of_mem - table_index_in_memory_limit) /
                           (1.0 - table_index_in_memory_limit);
  }
  return in_mem_est;
}

InnoDB表和索引缓存率的计算

InnoDB维护了每个索引在buffer pool中缓存的页面数，基于这个统计数据，可以计算InnoDB索引的缓存率。

InnoDB表的缓存率为主键的缓存率。（如果InnoDB没有建立显式主键，则好像不会记录表的缓存率，这种情况下会使用经验公式来预估缓存率）。

变量buf_stat_per_index记录了每个索引缓存在buffer pool中的页面数。

/** Estimate what percentage of an index's pages are cached in the buffer pool
@param[in]      index   index whose pages to look up
@return a real number in [0.0, 1.0] designating the percentage of cached pages
*/
inline double index_pct_cached(const dict_index_t *index) {
const ulint n_leaf = index->stat_n_leaf_pages;
if (n_leaf == 0) {
return (0.0);
}
const uint64_t n_in_mem =
buf_stat_per_index->get(index_id_t(index->space, index->id));
const double ratio = static_cast<double>(n_in_mem) / n_leaf;
return (std::max(std::min(ratio, 1.0), 0.0));
}

row_evaluate_cost

row_evaluate_cost计算访问行的成本，计算公式为 rows * row_evaluate_cost，row_evaluate_cost的值可以在mysql.server_cost表配置，8.0版本中，这个值默认为0.1，也就是访问一行数据的成本是0.1。

  double row_evaluate_cost(double rows) const {
    return rows * m_server_cost_constants->row_evaluate_cost();
  }

回到本文开始的例子，这个表总共有105行记录，cluster index占用了1个页面，数据全部缓存在内存中，总体成本计算：

io_cost = cluster_index_size * 0.25

row_eval_cost = rows * 0.1 = 10.5

cost = io_cost + row_eval_cost = 0.25 + 10.5 = 10.75

标签: MySQL运维实战 MySQL优化器

返回列表

上一篇：查看 Redis 不过期 key

下一篇：MySQL优化器特性（七）成本估算常数

MySQL优化器特性（六）表扫描成本计算

全表扫描成本

全表扫描成本

table_scan_cost

scan_time

page_read_cost

table_in_memory_estimate：计算表和索引数据缓存率

estimate_in_memory_buffer

InnoDB表和索引缓存率的计算

row_evaluate_cost

相关文章

MySQL优化器特性（二）MRR优化

MySQL优化器特性（一）IN和Exists(semijoin)子查询优化策略

MySQL优化器特性(五）单表访问路径

MySQL优化器特性（九）行数评估

MySQL优化器特性（七）成本估算常数

MySQL优化器特性（四）表关联之BNL（Block Nested Loop）和Hash Join

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号

MySQL优化器特性（六）表扫描成本计算

全表扫描成本

全表扫描成本

table_scan_cost

scan_time

page_read_cost

table_in_memory_estimate：计算表和索引数据缓存率

estimate_in_memory_buffer

InnoDB表和索引缓存率的计算

row_evaluate_cost

相关文章

MySQL优化器特性（二）MRR优化

MySQL优化器特性（一）IN和Exists(semijoin)子查询优化策略

MySQL优化器特性(五）单表访问路径

MySQL优化器特性（九）行数评估

MySQL优化器特性（七）成本估算常数

MySQL优化器特性（四）表关联之BNL（Block Nested Loop）和Hash Join

发表评论 取消回复

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号var _hmt = _hmt || [];(function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?dcf8139ce75b768b71dccc5e589b983c"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s);})();

发表评论

©Copyrights 2016-2022 YUNCHE 浙ICP备2021017017号