首页 > 数据库> 文章详细

Postgresql 表连接方法介绍(和Oracle对比测试)

2021-11-09 09:58:22 阅读：198 来源： 互联网

标签：city rows Postgresql .. country 测试 Oracle id select

作者：杨云龙，瀚高PG实验室核心成员，数据库高级工程师，擅长HGDB、Post-greSQL、Oracle等主流数据库。

表连接方式方法介绍(Nested Loop/Hash Join/Merge Join/Join) 数据库版本(oracle11.2.0.4 and PostgreSQL 13.1)

环境构造
Nested Loop
- Oracle 执行举例
- Postgresql 执行举例
Hash Join
- Oracle举例
- Postgresql举例
Merge Join
- Oracle举例
- Postgresql举例
三种连接方法的优劣
Join连接类型
union 介绍
自连接
参考文档

环境构造

-- oracle11.2.0.4
--城市、国家
drop table country purge;
CREATE TABLE country (
country_id int primary key,
country_name VARCHAR(50) NOT NULL
);

drop table city purge;
CREATE TABLE city (
city_id int primary key,
city_name VARCHAR(50) NOT NULL,
country_id int NOT NULL
);

begin
for i in 1 .. 10 loop
insert into country values(i,'country'||i);
end loop;
commit;
end;

begin
for i in 1 .. 10000 loop
insert into city values(i,'city'||i,ceil(i/1000));
end loop;
commit;
end;

execute dbms_stats.gather_table_stats(ownname =>'SCOTT',tabname => 'CITY' ,estimate_percent=> 100 ,cascade => true);
execute dbms_stats.gather_table_stats(ownname =>'SCOTT',tabname => 'COUNTRY' ,estimate_percent=> 100 ,cascade => true);

-- PostgreSQL 13.1
drop table country;
CREATE TABLE country (
country_id integer primary key,
country_name text NOT NULL
);

drop table city purge;
CREATE TABLE city (
city_id integer primary key,
city_name text NOT NULL,
country_id integer NOT NULL
);

insert into country values (generate_series(1,10),'country'||generate_series(1,10));

insert into city values(generate_series(1,10000),'city'||generate_series(1,10000),ceil(random()*(10-1)+1));

analyze city;
analyze country;

Nested Loop

如上图所示，为Nested Loop方式介绍。其算法：驱动表返回一行数据，通过连接列传值给被驱动表，驱动表返回多少行，被驱动表就要被扫描多少次。 被驱动表索引方式索引唯一扫描或者范围扫描。在被驱动表数据子集较少的情况下，嵌套循环是比较好的选择，也就是适合结果集比较小的查询，通常超过10000行被认为大行，会变成低效。举例如下：

Oracle 执行举例

SQL> explain plan for select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id=99;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2738185913

---------------------------------------------------------------------------------------------
| Id  | Operation		     | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |		    |	  1 |	 93 |	  2   (0)| 00:00:01 |
|   1 |  NESTED LOOPS		     |		    |	  1 |	 93 |	  2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CITY	    |	  1 |	 53 |	  1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN	     | SYS_C0018242 |	  1 |	    |	  1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| COUNTRY	    |	  1 |	 40 |	  1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN	     | SYS_C0018239 |	  1 |	    |	  0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CITY"."CITY_ID"=99)
   5 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

18 rows selected.

语句对比

--下面方式 oracle会选择 hash  全表扫描
explain plan for select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id>=10;

--强制使用nested loop 方式
explain plan for select /*+ leading(city) use_nl(country) */ city_name,country_name from city,country
 where city.country_id=country.country_id and city.city_id>=10;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 103883790

---------------------------------------------------------------------------------------------
| Id  | Operation		     | Name	    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|   1 |  NESTED LOOPS		     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|   2 |   NESTED LOOPS		     |		    |  9991 |	907K| 10005   (1)| 00:02:01 |
|*  3 |    TABLE ACCESS FULL	     | CITY	    |  9991 |	517K|	 11   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN	     | SYS_C0018239 |	  1 |	    |	  0   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| COUNTRY	    |	  1 |	 40 |	  1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("CITY"."CITY_ID">=10)
   4 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

22 rows selected.
--由上所示，被驱动别执行了 9991次，也就是（10000-9），资源消耗明显上升

Postgresql 执行举例

mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id=99;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.44..16.49 rows=1 width=40) (actual time=0.031..0.036 rows=1 loops=1)
   ->  Index Scan using city_pkey on city  (cost=0.29..8.30 rows=1 width=12) (actual time=0.015..0.017 rows=1 loops=1)
         Index Cond: (city_id = 99)
   ->  Index Scan using country_pkey on country  (cost=0.15..8.17 rows=1 width=36) (actual time=0.006..0.006 rows=1 loops=1)
         Index Cond: (country_id = city.country_id)
 Planning Time: 0.621 ms
 Execution Time: 0.098 ms
(7 rows)

--返回多行，强制使用nested loop,比默认选择hash join 多30ms
mydb=# set enable_hashjoin=off;
SET
mydb=# set enable_mergejoin=off;
SET
mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id and city.city_id>=10;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.15..1934.29 rows=9991 width=40) (actual time=0.050..43.354 rows=9991 loops=1)
   ->  Seq Scan on city  (cost=0.00..188.00 rows=9991 width=12) (actual time=0.025..4.870 rows=9991 loops=1)
         Filter: (city_id >= 10)
         Rows Removed by Filter: 9
   ->  Index Scan using country_pkey on country  (cost=0.15..0.17 rows=1 width=36) (actual time=0.002..0.002 rows=1 loops=9991)
         Index Cond: (country_id = city.country_id)
 Planning Time: 0.393 ms
 Execution Time: 45.366 ms
(8 rows)

PG Nested loop说明

Postgresql使用嵌套循环执行查询，那么它可以通过迭代表1中的所有条目、迭代表2中的所有条目，然后在表1和表2中的行对满足筛选条件时发出一行。嵌套循环是Postgresql唯一可以用来处理任何连接的连接算法。例如下面语句，Oracle会选择Merge Join,Postgresql则选择 Nested Loop,详细可参考 Merge Join部分。

--即使数据量较大情况，以下语句pg还会选择Nested Loop
explain  select a.city_name,b.city_id from city a,city2 b where a.country_id<b.country_id;

Hash Join

两表关联时主要依靠哈希运算来得到结果集的表连接方式，只支持等值。算法：两表等值连接，返回大量数据，较少的表选为驱动表，将驱动表相关列读入PGA中的work area(PG则放入内存work_mem),然后对驱动表的连接列进行hash运算生成hash table，然后读取被驱动表并对关联列进行hash运算，然后到pga探测hash table，找到数据关联上。如果HASH表太大，无法一次构造在内存中，则分成若干个partition，写入磁盘的temporary segment，则会多一个写的代价，会降低效率

hash Loop

上图为Hash大概流程，如果统计信息等都准确，数据库会自动选择最好执行计划。当使用ORDERED提示时,FROM子句中的第一张表将用于建立哈希表。

Oracle举例

-- country 作为驱动表,不管使用那种join(left/right/full)方式，Oracle都会选择小表为驱动表。
SQL> explain plan for select city_name,country_name from city,country where city.country_id=country.country_id;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 114462077

------------------------------------------------------------------------------
| Id  | Operation	   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	     | 10000 |	 781K|	  14   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	     | 10000 |	 781K|	  14   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| COUNTRY |	  10 |	 400 |	   3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| CITY    | 10000 |	 390K|	  11   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("CITY"."COUNTRY_ID"="COUNTRY"."COUNTRY_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

19 rows selected.

Postgresql举例

mydb=# explain analyze select city_name,country_name from city,country where city.country_id=country.country_id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=38.58..227.91 rows=10000 width=40) (actual time=0.185..12.086 rows=10000 loops=1)
   Hash Cond: (city.country_id = country.country_id)
   ->  Seq Scan on city  (cost=0.00..163.00 rows=10000 width=12) (actual time=0.026..3.735 rows=10000 loops=1)
   ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.064..0.067 rows=10 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on country  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.025..0.031 rows=10 loops=1)
 Planning Time: 1.953 ms
 Execution Time: 13.983 ms
(8 rows)

为了可以使用小表建立hash table，优化器会转换,如 right join 被改为Left Join

mydb=# explain analyze select * from country right join city on city.country_id=country.country_id;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=38.58..227.91 rows=10000 width=52) (actual time=0.087..11.883 rows=10000 loops=1)
   Hash Cond: (city.country_id = country.country_id)
   ->  Seq Scan on city  (cost=0.00..163.00 rows=10000 width=16) (actual time=0.020..3.035 rows=10000 loops=1)
   ->  Hash  (cost=22.70..22.70 rows=1270 width=36) (actual time=0.044..0.047 rows=10 loops=1)
         Buckets: 2048  Batches: 1  Memory Usage: 17kB
         ->  Seq Scan on country  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.028 rows=10 loops=1)
 Planning Time: 0.413 ms
 Execution Time: 13.588 ms
(8 rows)

Merge Join

两张表在做连接时用排序操作和合并操作来得到结果集的连接方式。排序主要处理非等值管理。算法：先对两张表根据连接列各自进行排序，嵌套循环是从被驱动表的索引中匹配数据，排序合并连接是从内存(PGA中的work area)中匹配数据，严格来说没有驱动表，可以认为较少表作为驱动表。HASH JOIN只需要把驱动表放入PGA中，但是排序连接合并需要将两表结果集均放入PGA中

Merge Loop

上图为Merge大概流程，如果统计信息等都准确，数据库会自动选择最好执行计划。

Oracle举例

col PLAN_TABLE_OUTPUT for a100
set lines 200 pages 999
explain plan for select a.city_name,b.country_name from city a,country b where a.country_id<b.country_id;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1026867539

------------------------------------------------------------------------------------------------------
| Id  | Operation		      | Name	     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	      | 	     | 45000 |	1054K|	     |	  62   (2)| 00:00:01 |
|   1 |  MERGE JOIN		      | 	     | 45000 |	1054K|	     |	  62   (2)| 00:00:01 |
|   2 |   SORT JOIN		      | 	     |	  10 |	 120 |	     |	   2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| COUNTRY      |	  10 |	 120 |	     |	   2   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN	      | SYS_C0018239 |	  10 |	     |	     |	   1   (0)| 00:00:01 |
|*  5 |   SORT JOIN		      | 	     | 10000 |	 117K|	 408K|	  60   (2)| 00:00:01 |
|   6 |    TABLE ACCESS FULL	      | CITY	     | 10000 |	 117K|	     |	  11   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(INTERNAL_FUNCTION("A"."COUNTRY_ID")<INTERNAL_FUNCTION("B"."COUNTRY_ID"))
       filter(INTERNAL_FUNCTION("A"."COUNTRY_ID")<INTERNAL_FUNCTION("B"."COUNTRY_ID"))

19 rows selected.

Postgresql举例

merge join需要两个JOIN的表的KEY都是先排好顺序的，有索引的话没有排序过程，merge join的好处是两个表都只扫描一次. 而nested loop的话其中一个表扫描一次, 另一个表则循环多次.

Postgresql中，Merge Join只适用于自然连接和等值连接，如下例所示，即使将enable_nestloop关闭，pg在此种连接方式时都会转变成Nested Loop，且初始成本估算很大。由此看出，Postgresql与Oracle 机制有区别。

mydb=# analyze city;
ANALYZE
mydb=# analyze country;
ANALYZE
mydb=# set enable_nestloop=off;
SET
mydb=#  explain analyze select a.city_name,b.country_name from city a,country b where a.country_id<b.country_id;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=10000000000.00..10000016637.12 rows=333333 width=18) (actual time=0.104..419.608 rows=400687 loops=1)
   Join Filter: (a.country_id < b.country_id)
   Rows Removed by Join Filter: 599313
   ->  Seq Scan on city a  (cost=0.00..1636.00 rows=100000 width=13) (actual time=0.066..20.058 rows=100000 loops=1)
   ->  Materialize  (cost=0.00..1.15 rows=10 width=13) (actual time=0.000..0.001 rows=10 loops=100000)
         ->  Seq Scan on country b  (cost=0.00..1.10 rows=10 width=13) (actual time=0.014..0.019 rows=10 loops=1)
 Planning Time: 0.649 ms
 Execution Time: 457.499 ms
(8 rows)

hash join 只有当在内存中执行是最快连接方式，当pg认为hash连接所需超出内存时，会选择Merge join，Merge Join可以更有效使用磁盘。Postgresql中，Merge Join仅适用于条件为相等的连接。其工作原理，比较每个输出的第一行（最小值），如果相等，结果中添加一行，并从input删除已比较的两行；如果字段不相同，删除id最小的，并继续与下一个最小的对比。

Postgres可以非常有效地在磁盘上执行合并，方法是将数据的一小部分保存在内存中，对该数据执行合并，将合并结果写入磁盘，然后从磁盘将更多的输入读取到内存中。这种方法一次在磁盘上读写大量的数据，比Hash 随机读高效。

mydb=# explain  select a.city_name,b.city_id from city a,city2 b where a.country_id=b.country_id and a.city_id<30000;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Hash Join  (cost=3274.21..3764393.24 rows=333242482 width=13)
   Hash Cond: (a.country_id = b.country_id)
   ->  Index Scan using city_pkey on city a  (cost=0.29..1052.87 rows=30033 width=13)
         Index Cond: (city_id < 30000)
   ->  Hash  (cost=1634.63..1634.63 rows=99863 width=8)
         ->  Seq Scan on city2 b  (cost=0.00..1634.63 rows=99863 width=8)
(6 rows)

mydb=# set enable_hashjoin=off;
SET
mydb=# explain  select a.city_name,b.city_id from city a,city2 b where a.country_id=b.country_id and a.city_id<30000;
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 Merge Join  (cost=13213.55..5012000.94 rows=333242482 width=13)
   Merge Cond: (a.country_id = b.country_id)
   ->  Sort  (cost=3286.46..3361.55 rows=30033 width=13)
         Sort Key: a.country_id
         ->  Index Scan using city_pkey on city a  (cost=0.29..1052.87 rows=30033 width=13)
               Index Cond: (city_id < 30000)
   ->  Sort  (cost=9927.09..10176.74 rows=99863 width=8)
         Sort Key: b.country_id
         ->  Seq Scan on city2 b  (cost=0.00..1634.63 rows=99863 width=8)
(9 rows)

Merge Join方法主要是内存不够用时，效果更好

当数据量特别大时，会选择Merge Join，如果没有索引，会选择"Materialize",将结果缓存内存或文件中，避免多次全表扫描。如下所示：

mydb=# create  unlogged table tbl1(id int, info text) with (autovacuum_enabled=off);
CREATE TABLE
mydb=#  create  unlogged table tbl2(id int, info text) with (autovacuum_enabled=off);
CREATE TABLE
mydb=# insert into tbl1 select generate_series(1,10000000);  
INSERT 0 10000000
mydb=# insert into tbl2 select generate_series(1,10000000);  
INSERT 0 10000000
mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id;  
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Merge Join  (cost=2075531.84..2370522825.11 rows=157893676470 width=72)
   Merge Cond: (tbl1.id = tbl2.id)
   ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
         Sort Key: tbl1.id
         ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Materialize  (cost=1037765.92..1065863.40 rows=5619496 width=36)
         ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
               Sort Key: tbl2.id
               ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(9 rows)

mydb=#  explain select * from tbl1,tbl2 where tbl1.id<tbl2.id;  
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Nested Loop  (cost=0.00..720393977232.90 rows=10526245098005 width=72)
   Join Filter: (tbl1.id < tbl2.id)
   ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Materialize  (cost=0.00..172443.44 rows=5619496 width=36)
         ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(5 rows)

mydb=# set enable_mergejoin=off;
SET
mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id;  
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Hash Join  (cost=214589.66..5526742276.56 rows=157893676470 width=72)
   Hash Cond: (tbl1.id = tbl2.id)
   ->  Seq Scan on tbl1  (cost=0.00..100442.96 rows=5619496 width=36)
   ->  Hash  (cost=100442.96..100442.96 rows=5619496 width=36)
         ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(5 rows)

mydb=#  explain select * from tbl1,tbl2 where tbl1.id=tbl2.id and tbl1.id<2000;  
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Merge Join  (cost=1449858.75..790941515.17 rows=52631216124 width=72)
   Merge Cond: (tbl1.id = tbl2.id)
   ->  Sort  (cost=412092.83..416775.74 rows=1873165 width=36)
         Sort Key: tbl1.id
         ->  Seq Scan on tbl1  (cost=0.00..114491.70 rows=1873165 width=36)
               Filter: (id < 2000)
   ->  Materialize  (cost=1037765.92..1065863.40 rows=5619496 width=36)
         ->  Sort  (cost=1037765.92..1051814.66 rows=5619496 width=36)
               Sort Key: tbl2.id
               ->  Seq Scan on tbl2  (cost=0.00..100442.96 rows=5619496 width=36)
(10 rows)

三种连接方法的优劣

还有一种连接叫笛卡尔积(Cross Join/Cartesian Product),两表关联没有任何条件。结果集为两张表行数的乘积，此种方式基本不会用到，这次不做讨论。

类别	Nested Loop	Hash Join	Merge Join
使用条件	任何条件	等值连接	非等值，’<>'除外
使用资源	CPU， disk IO	内存、临时空间	内存、临时空间
特点	高选择索引或进行限制性搜索时效率高，能快速返回第一次搜索结果	表记录多，缺乏有效索引，效率高	非等值连接，效率高
缺点	数据量大效率低，尤其无索引或限制条件不够时	需要大量内存，可能会用磁盘	表都需要排序，结果最后返回，PG批量依次内存排序

Join连接类型

Merge Loop

inner Join

--结果只包含满足连接条件的记录
select a.id,a.col1,b.col2 from a,b where a.id=b.id;
--or
select a.id,a.col1,b.col2 from a inner join b on a.id=b.id;
--or
select id,a.col1,b.col2 from a inner join b using(id);

Full outer Join

SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 1 aaa		   1 aaa
			       2 bbb
	 3 ccc

--如果没有匹配记录，null作为值，可以去掉相同结果集
SELECT * FROM TableA FULL OUTER JOIN TableB ON TableA.name = TableB.name 
where tablea.id is null or tableb.id is null;
	ID      NAME  ID       NAME
---------- ------ ------ ------
			        2     bbb
	 3       ccc

left/right join

--如上图所示，产生A表所有结果，B没有的用null作为值,除相同值外，right与left正好相反
SELECT * FROM TableA left JOIN TableB ON TableA.name = TableB.name;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 1 aaa		   1 aaa
	 3 ccc
--可以去掉相同结果集
SELECT * FROM TableA left JOIN TableB ON TableA.name = TableB.name 
where  tableb.id is null;
	ID NAME 	  ID NAME
---------- ------ ---------- ------
	 3 ccc

union 介绍

UNION 操作符用于合并两个或多个 SELECT 语句的结果集。

请注意，UNION 内部的 SELECT 语句必须拥有相同数量的列。列也必须拥有相似的数据类型。同时，每条 SELECT 语句中的列的顺序必须相同。UNION 只选取记录，有去重功能，而UNION ALL会列出所有记录。

SELECT name FROM TableA UNION SELECT name FROM TableB;
NAME
------------------------------
aaa
bbb

SELECT name FROM TableA UNION all SELECT name FROM TableB;

NAME
------------------------------
aaa
aaa
bbb

有些情况下可以使用union 代替or ，提供查询效率

select * from t1 where owner='SCOTT' or object_id in (select object_id from t2);
--上述语句会使用FILTER，可以调整sql,或者t2.object_id创建索引
select * from t1 where owner='SCOTT' 
union 
select * from t1 where object_id in (select object_id from t2);

--列转行举例
select employee,child_1 as child from personnel
union all
select employee,child_2 as child from personnel;

自连接

分页查询

--四层分页更适合页数比较大的分页查询，返回结果集小的当做驱动表，扫描被驱动表。
with tmp as
(SELECT * FROM
    ( SELECT rid, ROWNUM as RN
       FROM
           (SELECT rowid as rid
             FROM t1
             where owner='SYS'
             order by object_id desc
            ) WHERE ROWNUM <= 500
   )  WHERE RN > 490
) 
select  /*+ use_nl(a) leading(b) */ owner,object_name,object_id,rn
from t1 a,tmp b
where a.rowid=b.rid;
--三层select是常用的
select * from (select owner,object_name,object_id,rownum as rn
 from(select owner,object_name,object_id from t1 where owner='SYS' order by object_id desc) 
 where rownum<=500) 
 where rn>490;

删除重复行

--极致函数
delete from products p1
where rowid < (
   select max(p2.rowid) from products p2
   where p1.name = p2.name
   and p1.price =p2.price
);
--使用非等值连接
delete from products p1
where exists (
   select * from products p2
   where p1.name = p2.name
   and p1.price = p2.price
   and p1.rowid < p2.rowid
);

关联子查询比较行与行 eg

--求与上一年营业额一样的年份
select year,sale from sales s1
where sale = (
   select sale from sales s2 
   where s2.year = s1.year-1
) order by year;

--or
select s1.year,s1.sale from sales s1,sales s2
where s2.sale =s1.sale
and s2.year = s1.year-1
order by year;

--求同期对比情况
select s1.year,s1.sale,
case when s1.sale = s2.sale then '持平'
     when s1.sale > s2.sale then '增长'
     when s1.sale < s2.sale then '减少'
else '-' end as var
from sales s1,sales s2
where s2.year = s1.year -1
order by year;

参考文档

Join introduce: https://www.cnblogs.com/xqzt/p/4469673.html
Join introduce: https://logicalread.com/
pg join method: http://www.jasongj.com/2015/03/07/Join1/
pg join method: https://developer.aliyun.com/article/70992
pg join method: https://www.interdb.jp/pg/pgsql03.html
pg Merge Joins: https://malisper.me/postgres-merge-joins/
表连接方法参考：书籍《SQL 优化核心思想》
SQL连接示例参考：书籍《SQL进阶教程》

标签：city,rows,Postgresql,..,country,测试,Oracle,id,select
来源： https://blog.csdn.net/pg_hgdb/article/details/121221665

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

Postgresql 表连接方法介绍(和Oracle对比测试)

环境构造

Nested Loop

Oracle 执行举例

Postgresql 执行举例

Hash Join

Oracle举例

Postgresql举例

Merge Join

Oracle举例

Postgresql举例

三种连接方法的优劣

Join连接类型

inner Join

Full outer Join

left/right join

union 介绍

自连接

分页查询

删除重复行

关联子查询比较行与行 eg

参考文档