你可以为列统计直方图。这些直方图为列数据的分布情况提供准确的估算。当列数据倾斜时,直方图提供更为优化的选择性估算,为数据分布不均匀的情况提供最优的执行计划。
Oracle Database为提供2种类别的列统计信息直方图:
-
Height-Balanced Histograms
-
Frequency Histograms
数据库存储直方图信息*TAB_COL_STATISTICS视图(用户和DBA)。列值范围:HEIGHTBALANCED, FREQUENCY, or NONE。
2、Height-Balanced Histograms
在height-balanced histogram中,列值被划分为桶,使得每个桶包含大致相同数量的行。直方图显示端点在值范围内的位置。
考虑一个my_col值为1到100之间的列,以及一个10个桶的直方图。如果数据my_col均匀分布,则直方图看起来与图13-1类似,其中数字是端点值。例如,第七个桶具有值在60到70之间的行。
图13-1具有均匀分布的高度平衡直方图
每个桶中的行数为总行数的10%。在这个均匀分布的例子中,40%的行的值在60到100之间。
如果数据不均匀分布,则直方图可能如图13-2所示。在这种情况下,大多数行的列的值为5。只有10%的行的值在60到100之间。
图13-2具有非均匀分布的高度平衡直方图
您可以使用USER_TAB_HISTOGRAMS表格查看高度平衡的直方图,如示例13-1所示。
-
BEGIN
-
DBMS_STATS
.
GATHER_table_STATS
(
-
OWNNAME
=
>
'OE'
,
-
TABNAME
=
>
'INVENTORIES'
,
-
METHOD_OPT
=
>
'FOR COLUMNS SIZE 10 quantity_on_hand'
)
;
-
END
;
-
/
-
SELECT
COLUMN_NAME
,
NUM_DISTINCT
,
NUM_BUCKETS
,
HISTOGRAM
-
FROM
USER_TAB_COL_STATISTICS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'QUANTITY_ON_HAND'
;
-
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
-
------------------------------ ------------ ----------- ---------------
-
QUANTITY_ON_HAND 237 10 HEIGHT BALANCED
-
SELECT
ENDPOINT_NUMBER
,
ENDPOINT_VALUE
-
FROM
USER_TAB_HISTOGRAMS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'QUANTITY_ON_HAND'
-
ORDER
BY
ENDPOINT_NUMBER
;
-
ENDPOINT_NUMBER ENDPOINT_VALUE
-
--------------- --------------
-
0 0
-
1 27
-
2 42
-
3 57
-
4 74
-
5 98
-
6 123
-
7 149
-
8 175
-
9 202
-
10 353
在示例13-1查询输出中,一行(1-10)对应于直方图中的每个桶。Oracle数据库向该直方图添加了特殊的第0个数据桶,因为第1个数据桶(27)中的值不是quantity_on_hand列的最小值。第0个桶的最小值为0 quantity_on_hand。
3、
frequency histogra
m
在
frequency histogram中,列的每个值对应于直方图的单个桶。每个桶包含此单个值的出现次数。例如,假设36行包含列的值1 warehouse_id。端点值1具有端点号36。
数据库在以下条件下自动创建频率直方图,而不是高度平衡的直方图:
-
不同值的数量小于或等于指定的直方图桶数(最多254个)。
-
每个列值重复一次。
您可以使用USER_TAB_HISTOGRAMS视图查看频率直方图,如示例13-2所示。
-
BEGIN
-
DBMS_STATS
.
GATHER_TABLE_STATS
(
-
OWNNAME
=
>
'OE'
,
-
TABNAME
=
>
'INVENTORIES'
,
-
METHOD_OPT
=
>
'FOR COLUMNS SIZE 20 warehouse_id'
)
;
-
END
;
-
/
-
SELECT
COLUMN_NAME
,
NUM_DISTINCT
,
NUM_BUCKETS
,
HISTOGRAM
-
FROM
USER_TAB_COL_STATISTICS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'WAREHOUSE_ID'
;
-
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
-
------------------------------ ------------ ----------- ---------------
-
WAREHOUSE_ID 9 9 FREQUENCY
-
SELECT
ENDPOINT_NUMBER
,
ENDPOINT_VALUE
-
FROM
USER_TAB_HISTOGRAMS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'WAREHOUSE_ID'
-
ORDER
BY
ENDPOINT_NUMBER
;
-
ENDPOINT_NUMBER ENDPOINT_VALUE
-
--------------- --------------
-
36 1
-
213 2
-
261 3
-
370 4
-
484 5
-
692 6
-
798 7
-
984 8
-
1112 9
在例13-2中,第一个桶为warehouse_id1。该值在表中显示36次,如以下查询所证实:
oe@PROD> SELECT COUNT(*) FROM inventories WHERE warehouse_id = 1;
COUNT(*)
----------
36
5、
练习4
、直方图优化练习
统计已销户用户数量,请优化以下语句
select count(1) from ht.c_cons where status='close';
-
SQL
>
select
status
,
count
(
1
)
from
ht
.
c_cons
group
by
status
;
-
STATUS
COUNT
(
1
)
-
------------------------------------------------------------ ----------
-
close 19
-
open 9519
-
creating 462
-
SQL
>
create
index
ht
.
idx_c_cons_status
on
ht
.
c_cons
(
status
)
;
-
SQL
>
col
owner
for
a10
-
col
table_name
for
a20
-
col
column_name
for
a20
-
col
data_type
for
a30
-
col
histogram
for
a20
-
select
owner
,
table_name
,
column_name
,
data_type
,
-
column_id
,
num_distinct
,
histogram
,
NUM_NULLS
,
LAST_ANALYZED
from
-
dba_tab_columns
where
table_name
=
'C_CONS'
and
owner
=
'HT'
-
order
by
column_id
;
SQL
>
SQL
>
SQL
>
SQL
>
SQL
>
2 3 4
-
OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
-
---------- -------------------- -------------------- ------------------------------ ---------- ------------ -------------------- ---------- ------------------------------
-
HT C_CONS CONS_NO
NUMBER
1 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS CONS_NAME
VARCHAR2
2 5057 NONE 0 20
-
AUG
-
17
-
HT C_CONS ORG_NAME
VARCHAR2
3 12 NONE 0 20
-
AUG
-
17
-
HT C_CONS BUILD_DATE
DATE
4 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS STATUS
VARCHAR2
5 3 NONE 0 20
-
AUG
-
17
-
SQL
>
exec DBMS_STATS
.
GATHER_TABLE_STATS
(
ownname
=
>
'HT'
,
tabname
=
>
'C_CONS'
,
estimate_percent
=
>
30
,
method_opt
=
>
'for columns size 50 status'
,
no_invalidate
=
>
FALSE
,
degree
=
>
4
,
cascade
=
>
TRUE
)
;
-
PL
/
SQL procedure successfully completed
.
-
SQL
>
col
owner
for
a10
-
SQL
>
col
table_name
for
a20
-
col
column_name
for
a20
-
col
data_type
for
a30
-
col
histogram
for
a20
-
select
owner
,
table_name
,
column_name
,
data_type
,
-
column_id
,
num_distinct
,
histogram
,
NUM_NULLS
,
LAST_ANALYZED
from
-
dba_tab_columns
where
table_name
=
'C_CONS'
and
owner
=
'HT'
-
order
by
column_id
;
SQL
>
SQL
>
SQL
>
SQL
>
2 3 4
-
OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
-
---------- -------------------- -------------------- ------------------------------ ---------- ------------ -------------------- ---------- ------------------------------
-
HT C_CONS CONS_NO
NUMBER
1 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS CONS_NAME
VARCHAR2
2 5057 NONE 0 20
-
AUG
-
17
-
HT C_CONS ORG_NAME
VARCHAR2
3 12 NONE 0 20
-
AUG
-
17
-
HT C_CONS BUILD_DATE
DATE
4 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS STATUS
VARCHAR2
5 3 FREQUENCY 0 20
-
AUG
-
17
-
SQL
>
select
count
(
1
)
from
ht
.
c_cons
where
status
=
'open'
;
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash
value
:
2016425671
-
-------------------------------------------------------------------------------------------
-
|
Id
|
Operation
|
Name
|
Rows
|
Bytes
|
Cost
(
%
CPU
)
|
Time
|
-
-------------------------------------------------------------------------------------------
-
|
0
|
SELECT
STATEMENT
|
|
1
|
6
|
8
(
0
)
|
00
:
00
:
01
|
-
|
1
|
SORT AGGREGATE
|
|
1
|
6
|
|
|
-
|
*
2
|
INDEX
FAST FULL SCAN
|
IDX_C_CONS_STATUS
|
9639
|
57834
|
8
(
0
)
|
00
:
00
:
01
|
-
-------------------------------------------------------------------------------------------
-
Predicate Information
(
identified
by
operation id
)
:
-
---------------------------------------------------
-
2
-
filter
(
"STATUS"
=
'open'
)
-
Statistics
-
----------------------------------------------------------
-
1 recursive calls
-
0 db block gets
-
28 consistent gets
-
0 physical reads
-
0 redo
size
-
527 bytes sent via SQL
*
Net
to
client
-
523 bytes received via SQL
*
Net
from
client
-
2 SQL
*
Net roundtrips
to
/
from
client
-
0 sorts
(
memory
)
-
0 sorts
(
disk
)
-
1
rows
processed
-
SQL
>
-
SQL
>
select
count
(
1
)
from
ht
.
c_cons
where
status
=
'close'
;
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash
value
:
2292286995
-
---------------------------------------------------------------------------------------
-
|
Id
|
Operation
|
Name
|
Rows
|
Bytes
|
Cost
(
%
CPU
)
|
Time
|
-
---------------------------------------------------------------------------------------
-
|
0
|
SELECT
STATEMENT
|
|
1
|
6
|
1
(
0
)
|
00
:
00
:
01
|
-
|
1
|
SORT AGGREGATE
|
|
1
|
6
|
|
|
-
|
*
2
|
INDEX
RANGE SCAN
|
IDX_C_CONS_STATUS
|
24
|
144
|
1
(
0
)
|
00
:
00
:
01
|
-
---------------------------------------------------------------------------------------
-
Predicate Information
(
identified
by
operation id
)
:
-
---------------------------------------------------
-
2
-
access
(
"STATUS"
=
'close'
)
-
Statistics
-
----------------------------------------------------------
-
1 recursive calls
-
0 db block gets
-
2 consistent gets
-
0 physical reads
-
0 redo
size
-
526 bytes sent via SQL
*
Net
to
client
-
523 bytes received via SQL
*
Net
from
client
-
2 SQL
*
Net roundtrips
to
/
from
client
-
0 sorts
(
memory
)
-
0 sorts
(
disk
)
-
1
rows
processed