周一开发人员报告说上周六一个job跑了整整一天都没有运行完,但是到了周日该job运行又恢复正常,要求查找一下原因
首先生成了一下周六当天的AWR报告,查看sql部分的信息
发现buffer gets排名第二高sql的executions为0,与开发人员确认,确实是该存储过程
检查该存储过程pkd_justin.pro_execute(sysdate),里面针对表justin的所有查询大致如下
select *
from justin t
where t.time < trunc(sysdate) + 1
and t.time >= trunc(sysdate);
查看该语句对应的执行计划,由于job执行的都是前一天的数据,所以查看周五当天的,执行计划明显有问题
SQL> explain plan for select * from justin t
2 where t.time < trunc(sysdate-3) + 1
3 and t.time >= trunc(sysdate-3);
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1449K| 710M| 1 (0)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR | | 1449K| 710M| 1 (0)| 00:00:01 | KEY | KEY |
| 3 | TABLE ACCESS BY LOCAL INDEX ROWID| justin | 1449K| 710M| 1 (0)| 00:00:01 | KEY | KEY |
|* 4 | INDEX SKIP SCAN | PK_justin | 15258 | | 1 (0)| 00:00:01 | KEY | KEY |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TRUNC(SYSDATE@!-3)+1>TRUNC(SYSDATE@!-3))
4 - access("T"."time">=TRUNC(SYSDATE@!-3) AND "T"."time" filter("T"."time"=TRUNC(SYSDATE@!-3))
查看索引分区统计信息,相比之下,2月19号对应的分区统计信息有问题
SQL> select index_name,partition_name,blevel,leaf_blocks,distinct_keys,clustering_factor,num_rows,sample_size from user_ind_partitions where partition_name in('P219','P220','P218') order by partition_name;
INDEX_NAME PARTITION_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS SAMPLE_SIZE
------------------------------ ------------------------------ ---------- ----------- ------------- ----------------- ---------- -----------
PK_justin P218 2 5055 1450676 113018 1450676 322833
IDX_justin_TIME_D P218 2 3854 82474 364667 1452752 1452752
IDX_justin_TIME_D P219 1 25 100 800 2500 1452752
PK_justin P219 0 0 0 0 0
IDX_justin_TIME_D P220 2 3854 82474 364667 1452752 1452752
PK_justin P220 2 5055 1450676 113018 1450676 322833
18 rows selected
查看user_objects,可以看到分区P219对应的索引在18日也就是周五下午重建过,由此可以大致推断出原因,由于索引重建后没有收集统计信息
导致第二天job运行的时候选择了错误的执行计划
SQL> select object_name, to_char(last_ddl_time,'yyyy-mm-dd hh24:mi:ss') from user_objects where subobject_name ='P219';
OBJECT_NAME TO_CHAR(LAST_DDL_TIME,'YYYY-MM
-------------------------------------------------------------------------------- ------------------------------
PK_justin 2011-02-18 17:42:14
IDX_justin_TIME_D 2011-02-18 17:42:19
justin 2011-01-11 14:34:52
重新收集一下该分区上的所有索引的统计信息, 由于
SQL> begin
2 dbms_stats.gather_table_stats(ownname => user,tablename => 'justin',partname => 'P219',force => true,no_invalidate => false);
3 end;
4 /
PL/SQL procedure successfully completed
然后在查看统计信息
SQL> select index_name,partition_name,blevel,leaf_blocks,distinct_keys,clustering_factor,num_rows,sample_size from user_ind_partitions where partition_name in('P219') order by partition_name;
INDEX_NAME PARTITION_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS SAMPLE_SIZE
------------------------------ ------------------------------ ---------- ----------- ------------- ----------------- ---------- -----------
PK_justin P219 2 5884 1688581 130830 1688581 319144
IDX_justin_TIME_D P219 2 4510 83136 419426 1700253 421486
查看执行计划,已经恢复正常
SQL> explain plan for select *
2 from justin t
3 where t.time < trunc(sysdate - 3) + 1
4 and t.time >= trunc(sysdate - 3);
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 926600260
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1449K| 710M| 24462 (2)| 00:04:54 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR| | 1449K| 710M| 24462 (2)| 00:04:54 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | justin | 1449K| 710M| 24462 (2)| 00:04:54 | KEY | KEY |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TRUNC(SYSDATE@!-3)+1>TRUNC(SYSDATE@!-3))
3 - filter("T"."time"=TRUNC(SYSDATE@!-3))
至此该问题算是告一段落
总结: oracle在9i的时候,创建或者重建索引的时候需要显示指定compute statistics才会收集统计信息,演化到10g,该功能已变成默认的;
我们数据库版本是10.2.0.5,却遇到此问题,那是因为该表上所有分区的统计信息都已被lock;
这个案例也为我们提了个醒,以后重建索引后需要检查一下其相应统计信息才行
SQL> select partition_name,stattype_locked from user_tab_statistics where table_name='JUSTIN' and partition_name ='P219';
PARTITION_NAME STATTYPE_LOCKED
------------------------------ ---------------
P219 ALL
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/15480802/viewspace-688367/,如需转载,请注明出处,否则将追究法律责任。