Gather stats on Large table

We have very large table with almost 80 Billion records with one hash paritition. any suggestions how to gather stats on this table as it is taking a week to get completed or failed in between with snapshot too old error. I am using below parameter to gather stats:

BEGIN dbms_stats.gather_table_stats(ownname => ABC' tabname =>'table1', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, cascade => DBMS_STATS.AUTO_CASCADE, method_opt => 'FOR ALL COLUMNS SIZE AUTO', degree => DBMS_STATS.DEFAULT_DEGREE); END; 

Oracle version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.17.0.0.0

12.8k 3 3 gold badges 11 11 silver badges 39 39 bronze badges asked Mar 3, 2023 at 12:48 Mansi Raval Mansi Raval 371 2 2 gold badges 7 7 silver badges 15 15 bronze badges

I provided an answer below, but also wanted to ask why you have only one hash partition. Partitioning provides no benefit if there is only one partition. And why hash rather than range or list? Is access to the table always across 100% of your data, rarely date ranged or scoped to a specific class of records? I ask because improper partitioning can also increase stats gathering times.

Commented Mar 3, 2023 at 13:39

Thank you for the answer @PaulW. I know one partition is not making sense, its legecy application is with old code. About partition what I found is there is Hash Partition, but not sure why I can just see one hash partition with all rows in one partition. Is the way hash partition happens on table.

Commented Mar 3, 2023 at 18:00

If the hash key (see dba_part_key_columns) is not frequently used by equality predicates in application queries, then you can probably just skip gather stats on the partition entirely. To do so, add the line " granularity => 'GLOBAL' " to the suggested gather_table_stats call I provided in my answer.

Commented Mar 3, 2023 at 18:11

Well, in package its using all data, like hash partition is on Account_type. and it has 5 different values, we are using atleast 3. Account_type in (1,2,3) and we delete account type(4,5) as clean up process.

Commented Mar 3, 2023 at 18:17

If your SQL uses account_type = for its predicates, you will need stats on the partition because Oracle will prune to the partition and won't even look at global stats. So just leave granularity at its default.

Commented Mar 3, 2023 at 19:15

2 Answers 2

There are various speed tweaks available to you: (1) lower the estimate percent, (2) explicitly use parallelism, (3) request block sampling, (4) bypass indexes, and (5) disallow histograms by setting SIZE 1 on all columns:

BEGIN dbms_stats.gather_table_stats(ownname => ABC' tabname =>'TABLE1', estimate_percent => 0.1, block_sample => true, cascade => false, method_opt => 'FOR ALL COLUMNS SIZE 1', degree => 16); END; 

Obviously you have to consult your own database hardware sizing to determine an appropriate degree of parallelism. Some systems can handle more, others less.

I have found that on massive tables a percent well below 1%, something like 0.1% works just fine, and goes 10x faster than 1, which goes 100x faster than the default (100). Block sampling can lead to some incorrect results if your data is skewed badly and you use a tiny percent (like 0.001%), but in many cases it's fine at a 0.1% percent level or higher, and it does speed things up by randomly picking blocks rather than rows, which reduces the # of blocks that need to be read. Lastly, histograms are expensive, and often they are unneeded. If you find out later that you do need histograms on a particular column to support a particular important application SQL that is getting messed up because of the lack a histogram, then you can explicitly request SIZE AUTO on just that column, but keep all the others at 1. Histograms take a long time to gather, so they are best minimized to on an as-needed basis.