Knob Tuning (for China Mobile)

QTune for MySQL (Co-contributors: Jiesi Liu, Jianming Wu, Guoliang Li) [Reference]

We tested QTune for a banking scenario, where most workloads are OLTP. Besides, there may be bottlenecks like IO, concurrent connections, and data update.

Tuning Information

1. Hardware Settings
2 core 4 GiB ESSD PL0 80GiB (2760 IOPS)
Mysql Ver 14.14 Distrib 5.7.34, for Linux (x86_64) using EditLine wrapper
‌2. Workload (OLTP: sysbench; data: 34.9 GiB)
T1: SELECT c FROM sbtest3 WHERE id=3624874;
T2: SELECT c FROM sbtest2 WHERE id BETWEEN 1317306 AND 1317315 ORDER BY c;
T3: SELECT c FROM sbtest2 WHERE id BETWEEN 2488565 AND 2488574;
The workload is CPU intensive‌
(max_connections = 1)
avg_qps(queries/s): 595.2381
avg_lat(s): 0.001610
(max_connections = 1000)
avg_qps(queries/s): 900.9009
avg_lat(s): 1.0936
‌We can see 1 or 1000 are not the optimal knob values if we consider both throughput and latency.
3. Default knob values (mainly list part of the CPU and IO (CPU-consuming) related knobs)

Effectiveness Evaluation

(1) The relations between workload performance with max_connections
Latency gets worse with the increase of max_connections; while throughput first gets better (at around 5) and turns to be steady afterward, which is generally more important for OLTP workloads. But we also need to balance between latency and throughput based on the SLA requirements.

Tuning Results

(1) First, consider the initial knob setting as "max_connections = 1" (串行场景), we set the target as "stopping_throughput_improvement_percentage = 0.52". However, the aim seems to be too "cautious" and it only takes one turn to randomly find the settings.‌
Hence, we do not set the performance threshold and see how much performance QTune can improve within 500 iterations (numtrial=500). The results show that the performance does not change much after max_connectionsis larger than 400. And there are two observations:‌
  1. 1.
    max_connections have significant effects to the query performance;
  2. 2.
    QTune may mainly change max_connections value and ignore the other optimal values for the other knobs
‌(2) Then, we compare the performance under default settings and do not change the value of max_connections(max_connections = 151). ‌ The following are performance changes in 50 iterations.

Methodology behind QTune

@ Tuning Model (Double-State Deep Deterministic Policy Gradient)‌
@ Train the Actor-Model and Critic-Model‌
基于偏导的链式法则,计算actor网络的梯度:给定Observation S' = S + A 的情况下,要增加Q值,我们计算action对Q的梯度,再计算theta对action的梯度,最后乘起来就是theta对Q的梯度。用这个来向增加Q值的方向移动theta。


Q: "SQL Error 1040: Too Many Connection"
‌A: Limit to the max_connection parameter
def execute_command(self):
if self.parser.argus['thread_num_auto'] == '0':
thread_num = int(self.parser.argus['thread_num'])
thread_num = int(self.db.max_connections_num) - 1
run_job(thread_num, self.workload, self.parser.resfile)
Q: connection resources cannot be fully used
A: update the max_connection_num before executing the workload
def max_connections(self):
# if not self.max_connections_num:
if 1:
conn = self._get_conn()
cursor = conn.cursor()
sql = "show global variables like 'max_connections';"
self.max_connections_num = int(cursor.fetchone()["Value"])
except Exception as error:
return self.max_connections_num
‌Q: mysql execute: (2003, "Can't connect to MySQL server on '' ([Errno 111] Connection refused)")
‌A: Restart Database
systemctl restart mysqld.service
Q:Check Database Size
‌A: SELECT table_schema AS "sysbench", ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size in (MB)" FROM information_schema.TABLES GROUP BY table_schema;
Q: 【关于不同benchmark之间迁移模型】在不同benchmark之间迁移模型时(TPC-H --> JOB, JOB --> Sysbench),请问是迁移Agent 和 Predictor 共同组成的DS-DDPG model,抑或是只迁移Agent的部分呢?
A: 整体transfer的。对于predictor,我们需要它具有适应不同[workload, db state]的能力。在换到新的benchmark时,只需要将关系表位替换成当前dataset就可以了。
Q: 【关于Agent训练】论文中提及,在训练Agent的时候采用了三元组,其中,其中是运行query之前环境中的真实值, 由Predictor预测产生。那么,相较于用真实运行query之后的状态来训练,为什么采用Predictor的预测值来训练Agent呢?
A: 为了体现查询的代价特征。预测的状态s’中的\Delta s体现了执行当前负载的开销,这能供agent学习执行当前负载的需求(如work_mem_size, IO frequence),从而提前调好参数。CDBTune中只用到了当前的真实s,导致新来负载的差别很大时,表现劣化。
Q:【关于PostgreSQL上cluster-level实验】从Figure 8中我注意到cluster-level tuning方法应用在了TPC-H与JOB benchmark 上,但是没有在Sysbench上测试;我猜测cluster-level应用场景主要适合于OLAP等操作复杂的场景,不知道这一理解是否正确
A: 是的,主要针对慢查询。clustering算法有执行开销,不是很适合OLTP负载,如OLTP中很多查询都在1ms内就执行完了。参数聚合这一块作为补充部分没有在paper中细化,感觉可以做进一步研究。
Q: query featurization部分提及用table行数代替query information 中的"1"; 考虑到 insert/delete/update操作可能改变table的行数,那么在执行每一条query之前,都需要查询表的行数吗?
A: 这是一个很好的问题。因为row number不会在系统视图中存储(如pg),而且表很大时count开销较大。但是由于我们不需要精确数字,利用事务统计的语句就可以做规模估计(