ICode9

精准搜索请尝试: 精确搜索
首页 > 数据库> 文章详细

腾讯云TDSQL PostgreSQL版 -最佳实践 |优化 SQL 语句

2021-08-11 20:34:33  阅读:264  来源: 互联网

标签:rows PostgreSQL tbase Scan TDSQL 0.00 width cost SQL


查看是否为分布键查询
postgres=# explain select * from tbase_1 where f1=1;
QUERY PLAN

Remote Fast Query Execution (cost=0.00…0.00 rows=0 width=0)
Node/s: dn001, dn002
-> Gather (cost=1000.00…7827.20 rows=1 width=14)
Workers Planned: 2
-> Parallel Seq Scan on tbase_1 (cost=0.00…6827.10 rows=1 width=14)
Filter: (f1 = 1)
(6 rows)
postgres=# explain select * from tbase_1 where f2=1;
QUERY PLAN

Remote Fast Query Execution (cost=0.00…0.00 rows=0 width=0)
Node/s: dn001
-> Gather (cost=1000.00…7827.20 rows=1 width=14)
Workers Planned: 2
-> Parallel Seq Scan on tbase_1 (cost=0.00…6827.10 rows=1 width=14)
Filter: (f2 = 1)
(6 rows)
如上,第一个查询为非分布键查询,需要发往所有节点,这样最慢的节点决定了整个业务的速度,需要保持所有节点的响应性能一致,如第二个查询所示,业务设计查询时尽可能带上分布键。

查看是否使用索引
postgres=# create index tbase_2_f2_idx on tbase_2(f2);
CREATE INDEX
postgres=# explain select * from tbase_2 where f2=1;
QUERY PLAN

Remote Fast Query Execution (cost=0.00…0.00 rows=0 width=0)
Node/s: dn001, dn002
-> Index Scan using tbase_2_f2_idx on tbase_2 (cost=0.42…4.44 rows=1 width=14)
Index Cond: (f2 = 1)
(4 rows)
postgres=# explain select * from tbase_2 where f3=‘1’;
QUERY PLAN

Remote Fast Query Execution (cost=0.00…0.00 rows=0 width=0)
Node/s: dn001, dn002
-> Gather (cost=1000.00…7827.20 rows=1 width=14)
Workers Planned: 2
-> Parallel Seq Scan on tbase_2 (cost=0.00…6827.10 rows=1 width=14)
Filter: (f3 = ‘1’::text)
(6 rows)
postgres=#
第一个查询使用了索引,第二个没有使用索引,通常情况下,使用索引可以加速查询速度,但索引也会增加更新的开销。

查看是否为分布 key join
postgres=# explain select tbase_1.* from tbase_1,tbase_2 where tbase_1.f1=tbase_2.f1 ;
QUERY PLAN

Remote Subquery Scan on all (dn001,dn002) (cost=29.80…186.32 rows=3872 width=40)
-> Hash Join (cost=29.80…186.32 rows=3872 width=40)
Hash Cond: (tbase_1.f1 = tbase_2.f1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.00…158.40 rows=880 width=40)
Distribute results by S: f1
-> Seq Scan on tbase_1 (cost=0.00…18.80 rows=880 width=40)
-> Hash (cost=18.80…18.80 rows=880 width=4)
-> Seq Scan on tbase_2 (cost=0.00…18.80 rows=880 width=4)
(8 rows)
postgres=# explain select tbase_1.* from tbase_1,tbase_2 where tbase_1.f2=tbase_2.f1 ;
QUERY PLAN

Remote Fast Query Execution (cost=0.00…0.00 rows=0 width=0)
Node/s: dn001, dn002
-> Hash Join (cost=18904.69…46257.08 rows=500564 width=14)
Hash Cond: (tbase_1.f2 = tbase_2.f1)
-> Seq Scan on tbase_1 (cost=0.00…9225.64 rows=500564 width=14)
-> Hash (cost=9225.64…9225.64 rows=500564 width=4)
-> Seq Scan on tbase_2 (cost=0.00…9225.64 rows=500564 width=4)
(7 rows)
第一个查询需要数据重分布,而第二个不需要,分布键 join 查询性能会更高。

查看 join 发生的节点
postgres=# explain select tbase_1.* from tbase_1,tbase_2 where tbase_1.f1=tbase_2.f1 ;
QUERY PLAN

Hash Join (cost=29.80…186.32 rows=3872 width=40)
Hash Cond: (tbase_1.f1 = tbase_2.f1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.00…158.40 rows=880 width=40)
-> Seq Scan on tbase_1 (cost=0.00…18.80 rows=880 width=40)
-> Hash (cost=126.72…126.72 rows=880 width=4)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.00…126.72 rows=880 width=4)
-> Seq Scan on tbase_2 (cost=0.00…18.80 rows=880 width=4)
(7 rows)
postgres=# set prefer_olap to on;
SET
postgres=# explain select tbase_1.* from tbase_1,tbase_2 where tbase_1.f1=tbase_2.f1 ;
QUERY PLAN

Remote Subquery Scan on all (dn001,dn002) (cost=29.80…186.32 rows=3872 width=40)
-> Hash Join (cost=29.80…186.32 rows=3872 width=40)
Hash Cond: (tbase_1.f1 = tbase_2.f1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.00…158.40 rows=880 width=40)
Distribute results by S: f1
-> Seq Scan on tbase_1 (cost=0.00…18.80 rows=880 width=40)
-> Hash (cost=18.80…18.80 rows=880 width=4)
-> Seq Scan on tbase_2 (cost=0.00…18.80 rows=880 width=4)
(8 rows)
第一个 join 在 cn 节点执行,第二个在 dn 上重分布后再 join,业务设计上,一般 OLTP 类业务在 cn 上进行少数据量 join ,性能会更好。

查看并行的 worker 数
postgres=# explain select count(1) from tbase_1;
QUERY PLAN

Finalize Aggregate (cost=118.81…118.83 rows=1 width=8)
-> Remote Subquery Scan on all (dn001,dn002) (cost=118.80…118.81 rows=1 width=0)
-> Partial Aggregate (cost=18.80…18.81 rows=1 width=8)
-> Seq Scan on tbase_1 (cost=0.00…18.80 rows=880 width=0)
(4 rows)
postgres=# analyze tbase_1;
ANALYZE
postgres=# explain select count(1) from tbase_1;
QUERY PLAN

Parallel Finalize Aggregate (cost=14728.45…14728.46 rows=1 width=8)
-> Parallel Remote Subquery Scan on all (dn001,dn002) (cost=14728.33…14728.45 rows=1 width=0)
-> Gather (cost=14628.33…14628.44 rows=1 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=13628.33…13628.34 rows=1 width=8)
-> Parallel Seq Scan on tbase_1 (cost=0.00…12586.67 rows=416667 width=0)
(6 rows)
上面第一个查询没走并行,第二个查询 analyze 后走并行才是正确的,建议大数据量更新再执行 analyze。

查看各节点的执行计划是否一致
./tbase_run_sql_dn_master.sh “explain select * from tbase_2 where f2=1”
dn006 — psql -h 172.16.0.13 -p 11227 -d postgres -U tbase -c “explain select * from tbase_2 where f2=1”
QUERY PLAN

Bitmap Heap Scan on tbase_2 (cost=2.18…7.70 rows=4 width=40)
Recheck Cond: (f2 = 1)
-> Bitmap Index Scan on tbase_2_f2_idx (cost=0.00…2.18 rows=4 width=0)
Index Cond: (f2 = 1)
(4 rows)
dn002 — psql -h 172.16.0.42 -p 11012 -d postgres -U tbase -c “explain select * from tbase_2 where f2=1”
QUERY PLAN

Index Scan using tbase_2_f2_idx on tbase_2 (cost=0.42…4.44 rows=1 width=14)
Index Cond: (f2 = 1)
(2 rows)
两个 dn 的执行计划不一致,最大可能是数据倾斜或者是执行计划被禁用。
如有可能,DBA 可以配置在系统空闲时执行全库 analyze 和 vacuum。

标签:rows,PostgreSQL,tbase,Scan,TDSQL,0.00,width,cost,SQL
来源: https://www.cnblogs.com/tencentdb/p/15130104.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有