Loading...

Follow Ric Ramblings on Feedspot

Continue with Google
Continue with Facebook
or

Valid



It’s rather incredible what a simple thing like DISTINCTcan do the performance.  Recently I was tasked with improving performance on a query.  It was taking over an hour to complete.  Running the plan and watching SQL Monitor it was easy to see that the plan was spending the vast major of time on a SORT UNIQUEstep.   It was sorting thru 1.4 Billion rows, so that is a lot of data to go thru.  But the worse part was there was the same number of rows going into the SORT UNIQUE as coming out.  Meaning, all the rows going in were already unique. 

Looking at the columns in the select list, one column (an ID type field) was part of a primary key for a table was in the select list.  The other column of the primary key was a numeric representation of a date, this second column of the primary key was not in the select list.  The actual date was stored as a separate column.  (Why they do this I have no idea, but there it is.)  And this date column was in the select list along with the ID column.  Because of this the ID and Date combination was unique.  The optimizer didn’t and couldn’t know this because those two columns were not the primary key definition.  Hence it had to do the UNIQUE step to find the unique values, which in this case was all entire set of rows.  Why it didn’t do a HASH UNIQUE I couldn’t figure out, that likely would have been faster than the SORT UNIQUE.   

But really the fastest way to do this was to not do it at all.  It was completely unnecessary. Removing the unneeded DISTINCT chopped of an hour of processing time.

Once again proving the fastest way to do anything is to not do it at all.

What I hope you get out of this is, know your data.  You will always know something the optimizer doesn’t know.  Use that knowledge to write a better query and help the optimizer make better choices.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
At the start of this year I embarked on a new adventure.  I’m now at Zione Solutions in a consulting role as a Senior Database Administrator and doing what I’ve been teaching for the past so many years.  It’s been quite a change, for one I am now using that strange new join syntax “LEFT OUTER JOIN … ON…” which I had successfully avoided for all these years.  It’s turned out to be harder than I expected to learn.  Also the queries I’ve been tuning are rather huge.  An average query is about 3-400 lines long, and the plans for these will easily be 1000 lines.   These are reporting queries the classic Data Warehouse type queries.  The smaller tables are millions of rows and 10’s of billions in a table is not uncommon.  Also it’s on Exadata.

I’ve seen firsthand that what I’ve been teaching works.  

One of the most important things you can do when tuning a query is; let the numbers lead you to the problem.   Some folks can look at a query and, without running it, make a good guesses as to where the problem will likely be.  And that can work, however why bother?  Run the query and use a tool like SQL Monitor to see how the plan is executing.  In the SQL Monitor report the activity column is a great way to see which step is taking the most percentage of time.   Now you can see exactly where a query is spending all it’s time and then pound that nail down. 

Also it’s about starting and staying small with the rows.  Here is where getting predicates applied at the right time and use of things like the WITH clause can really help.  With Exadata and working with huge volumes of data this concept is still critical to success; however the use of indexes is not a solution more often than not.

Why?  Not because indexes are bad, it’s just that index use tends to serialize the access.  This tends to cause nested loop joins which end up being very expensive many times because the plan will hit the inner table (or index) way too much.  This can be millions of times, even if using a unique index to look up values, doing it over a million times isn’t efficient.  A fast full index scan can work fine, since it works like a full table scan and can be in parallel.

Using subquery factors (also called CTE, Common Table Expression) are very powerful in these queries.  These are those temporary views you define in the WITH clause, typically it’s best to use the MATERIALIZE hint.   This hint forces the optimizer into making the sub query factor a global temp table like structor, without thie hint the optimizer might just merge it back into the query.  This is of course exactly what you don’t want to happen most times. Taking a subquery that is executed even just a couple times and making a sub query factor can be huge.  Also sometimes taking that subquery out of the main query and making it a “stand alone” query can help parallelization.   This can also work well to get that “start small stay small” thing. 

In the team that I’m working in now, it’s speculated that about 60% of the performance problems are solved with using sub query factors (CTEs).

Oh and did I mention LAT (Lateral) views?  Yea these tend to work very poorly in these kinds of queries.    Seeing one or more in these plans is a known killer to performance.  Making them go away can be tricky sometimes.  A typical cause of these is having an OR in the ON clause of an outer join.  These LAT view might work well with smaller data sets, but with large ones these can cause a query to stall at that LAT view for very long periods of time, like hours.  

For me it’s been quite a beginning to the year.  As the saying goes chance is inevitable, expect from vending machines.  And this year has had plenty, but still not from vending machines.    
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Recently in class a student mentioned that she’d be told that for the WITH clause you don’t need to use the MATERIALIZEhint anymore.  Maybe.   It turns out that if you REALLY want the subquery factor to be materialized, you do need to use it. 

Keep in mind that the optimizer sees your SQL statement as just that, a statement.  It can and will rearrange and rewrite your SQL into something it likes better.  And this means it can merge your subquery factors defined in the WITH clause into the statement.

I’ll use a simple statement to illustrate this point.   Clearly this select could (and really should) be written without a WITH clause at all.  This is just a simple example to illustrate the point.  Once you have a better understanding of how it works you can decide what is best for your particular situation.

Edit: Tests were done on an Oracle Database 12c Enterprise Edition Release 12.2.0.1.0, on a windows laptop.

SQL> -- no hints
SQL> WITH inline_view AS
  2  (
  3  SELECT /*+ qb_name(cust_totals)*/ CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders
  4    FROM ord2
  5   GROUP BY CUST_NO
  6  )
  7  SELECT /*+ qb_name(main) */ *
  8    FROM inline_view where total_orders > 1400000;

        CUST_NO    TOTAL_ORDERS
--------------- ---------------
           9379      1409495.95

If we run this the optimizer will merge the subquery factor and the main query together into one statement.  Which really makes sense here, the subquery factor isn’t really doing anything special. Here is the plan, notice the query block MAIN disappears completely and the only reference to the CUST_TOTALS block is the table alias:

SQL_ID  bnzfdp1yvk7vc, child number 0
-------------------------------------
WITH inline_view AS ( SELECT /*+ qb_name(cust_totals)*/ CUST_NO,
SUM(TOTAL_ORDER_PRICE) AS total_orders   FROM ord2  GROUP BY CUST_NO )
SELECT /*+ qb_name(main) */ *   FROM inline_view where total_orders >
1400000

Plan hash value: 2809195938

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    69 (100)|          |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   HASH GROUP BY     |      |    50 |   550 |    69   (2)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ORD2 | 12890 |   138K|    68   (0)| 00:00:01 |
----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$8092F496
   3 - SEL$8092F496 / ORD2@CUST_TOTALS

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(SUM("TOTAL_ORDER_PRICE")>1400000)

Now to use the MATERIALIZE hint and the subquery factor defined in the WITH clause stays as a separate entity. 

SQL> -- materialize hint
SQL> WITH inline_view AS
  2  (
  3  SELECT /*+ materialize qb_name(cust_totals)*/ CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders
  4    FROM ord2
  5   GROUP BY CUST_NO
  6  )
  7  SELECT /*+ qb_name(main) */ *
  8    FROM inline_view where total_orders > 1400000;

        CUST_NO    TOTAL_ORDERS
--------------- ---------------
           9379      1409495.95

In the plan this time we can see the both query blocks and a new one which is the materialized data from the subquery factor. 

SQL_ID  65bmr36y814d3, child number 0
-------------------------------------
WITH inline_view AS ( SELECT /*+ materialize qb_name(cust_totals)*/
CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders   FROM ord2  GROUP BY
CUST_NO ) SELECT /*+ qb_name(main) */ *   FROM inline_view where
total_orders > 1400000

Plan hash value: 1306428065

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |       |       |    72 (100)|          |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6653_4B0C1F7 |       |       |            |          |
|   3 |    HASH GROUP BY                         |                            |  1000 | 11000 |    69   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL                    | ORD2                       | 12890 |   138K|    68   (0)| 00:00:01 |
|*  5 |   VIEW                                   |                            |  1000 | 26000 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D6653_4B0C1F7 |  1000 | 11000 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - MAIN
   2 - CUST_TOTALS
   4 - CUST_TOTALS  / ORD2@CUST_TOTALS
   5 - SEL$03D1C9D1 / INLINE_VIEW@MAIN
   6 - SEL$03D1C9D1 / T1@SEL$03D1C9D1

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("TOTAL_ORDERS">1400000)

This makes it clear that if the optimizer can merge the subquery factor in to the main query it will.  The materialize hint will keep the subquery factor as a separate entity.   Bottom line is that if you want a subquery factor kept as a separate entity, and you notice that in the plan it’s being merged in, then use the materialize hint.

Which brings up another point.  Some folks will use the NO_MERGE hint to keep the subquery factor from being merged in.  This works, sort of.  Here is the same query using the NO_MERGE hint.

SQL> -- no merge hint
SQL> WITH inline_view AS
  2  (
  3  SELECT /*+ no_merge qb_name(cust_totals)*/ CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders
  4    FROM ord2
  5   GROUP BY CUST_NO
  6  )
  7  SELECT /*+ qb_name(main) */ *
  8    FROM inline_view where total_orders > 1400000;

        CUST_NO    TOTAL_ORDERS
--------------- ---------------
           9379      1409495.95

And now here is the plan.  Notice that the subquery factor isn’t completely merged in, but it does become basically an inline view instead.  This of course might be just fine and works the way you want, however it’s not a materialized version of the subquery factor, but an inline view. 

SQL_ID  6mf7u56n8bad6, child number 0
-------------------------------------
WITH inline_view AS ( SELECT /*+ no_merge qb_name(cust_totals)*/
CUST_NO, SUM(TOTAL_ORDER_PRICE) AS total_orders   FROM ord2  GROUP BY
CUST_NO ) SELECT /*+ qb_name(main) */ *   FROM inline_view where
total_orders > 1400000

Plan hash value: 2169976290

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    69 (100)|          |
|   1 |  VIEW                |      |     1 |    26 |    69   (2)| 00:00:01 |
|*  2 |   FILTER             |      |       |       |            |          |
|   3 |    HASH GROUP BY     |      |     1 |    11 |    69   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| ORD2 | 12890 |   138K|    68   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - CUST_TOTALS / INLINE_VIEW@MAIN
   2 - CUST_TOTALS
   4 - CUST_TOTALS / ORD2@CUST_TOTALS

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SUM("TOTAL_ORDER_PRICE")>1400000)
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Some folks believe that the optimizer will not use stale statistics.  This is simply not true. 

The stats being stale or not is only used by the stats collection job to know which tables should have stats collected on them.  Below is a script you can use to see this is true.  I’ll go thru the key points in the script to show what is going on.  I was using Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 on a windows laptop for this test.  In this post I'm only showing the core information form the run for brevity.  Run the script to see all the details.

At the top of the script a table is created as a copy of the good old EMPLOYEEStable from the sample schema.  And then collect stats on it, and run a query.  You can see that the stats are being used, the estimated rows is 107 which is the number for rows in the table.

SQL> SELECT /*+ qb_name(main) */ COUNT(*) FROM STALE_TAB;
  COUNT(*)
----------
       107

Here is the plan:

SELECT /*+ qb_name(main) */ COUNT(*) FROM STALE_TAB

Plan hash value: 3396615767

------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |     3 (100)|          |
|   1 |  SORT AGGREGATE    |           |     1 |            |          |
|   2 |   TABLE ACCESS FULL| STALE_TAB |   107 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------

14 rows selected.

A look into the 10053 trace for this query shows this for the base statistics.  These are the stats on the table just created and used.  

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: STALE_TAB  Alias: STALE_TAB
  #Rows: 107  SSZ: 0  LGR: 0  #Blks:  5  AvgRowLen:  69.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000

Then in the script is an update that updates all rows in the table. The stats are now stale.

SQL> UPDATE STALE_TAB SET SALARY = SALARY *.15;

107 rows updated.

SQL> commit;

Commit complete.

SQL> -- show the staleness of the stats
SQL> EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

SQL> SELECT '****************' STARS, STALE_STATS FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'STALE_TAB';

STARS            STALE_STATS
---------------- ---------------
**************** YES

1 row selected.

The script then runs a new select on the table.  This is using the stale stats.  If it were not using the stale stats it would have to do one of two things.  Either use dynamic sampling or use the default statistics.  It is not doing either of these.  If it was doing dynamic sampling there would be a note about that in the plan, there isn’t.  If it were using the default statistics the base number of rows for the table would be something like 3186 or  8168, which are the default number of rows in an 8K block size database, which what this database has.  (Side note: the 3186 number appears to be used for a real table which has not been analyzed, and the 8168 number used for a casted table.)

Here is the plan:

SQL_ID  g63r4gfua0wtq, child number 0
-------------------------------------
SELECT /*+ QB_NAME(MAIN) */ DEPARTMENT_ID, SUM(SALARY) FROM STALE_TAB
GROUP BY DEPARTMENT_ID

Plan hash value: 1068443980

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |       |     4 (100)|          |
|   1 |  HASH GROUP BY     |           |    11 |    77 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| STALE_TAB |   107 |   749 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

15 rows selected.

And here are the stats from the 10053 trace of this above query, notice this time it pulled the stats on the DEPARTMENT_IDcolumn.  This is because that column is used in the group by clause.  And remember this is when the stats are stale.  

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: STALE_TAB  Alias: STALE_TAB
  #Rows: 107  SSZ: 0  LGR: 0  #Blks:  5  AvgRowLen:  69.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
  Column (#11): DEPARTMENT_ID(NUMBER)
    AvgLen: 3 NDV: 11 Nulls: 1 Density: 0.090909 Min: 0.000000 Max: 10.000000

Notice there is nothing saying that the stats are stale.  The optimizer uses stats as long as they are there, stale or not.  

Below this in the script the stats are dropped and another query is run.  Now the optimizer has to do dynamic sampling since there are no stats.  However the select that ran the very first time is still in the library cache and still runs without a new parse.  Dropping stats will not invalidate plans that are in the cache. 

SQL> EXEC DBMS_STATS.DELETE_TABLE_STATS ('OP', 'STALE_TAB');

PL/SQL procedure successfully completed.

SQL> EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

SQL> SELECT '****************' STARS, STALE_STATS FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'STALE_TAB';

STARS            STALE_STATS
---------------- ---------------
**************** NULL

1 row selected.

. . .

SELECT /*+ QB_NAME(MAIN) */ DEPARTMENT_ID, COUNT(*) FROM STALE_TAB
GROUP BY DEPARTMENT_ID

Plan hash value: 1068443980

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |       |     4 (100)|          |
|   1 |  HASH GROUP BY     |           |   107 |  1391 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| STALE_TAB |   107 |  1391 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


19 rows selected.  


And here is the first query being run again after the stats are dropped:


SELECT /*+ qb_name(main) */ COUNT(*) FROM STALE_TAB

Plan hash value: 3396615767

------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |     3 (100)|          |
|   1 |  SORT AGGREGATE    |           |     1 |            |          |
|   2 |   TABLE ACCESS FULL| STALE_TAB |   107 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------


14 rows selected.


* * * * * * * * * * * * * * * * * * * * * * * * * * * *

Here is the full script if you’d like to try it yourself:

rem demo to show stale stats are used by the optimizer
rem November 2018 RVD
rem file name stale_test.sql
rem
set NULL NULL
set long 255
set lines 255
set serveroutput off
set echo on feedback on termout on heading on

COLUMN STALE_STATS FORMAT A15

-- create the test table
-- this uses the standard EMPLOYEES table as a base from the sample schema
DROP TABLE STALE_TAB PURGE;
CREATE TABLE STALE_TAB AS SELECT * FROM EMPLOYEES;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'OP',TABNAME =>'STALE_TAB',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1');
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

-- run a query on it and show the plan
SELECT /*+ qb_name(main) */ COUNT(*) FROM STALE_TAB;
-- capture the SQL_ID
COLUMN PREV_SQL_ID NEW_VALUE PSQLID
COLUMN PREV_CHILD_NUMBER NEW_VALUE PCHILDNO
SELECT PREV_SQL_ID, PREV_CHILD_NUMBER FROM V$SESSION
WHERE AUDSID = USERENV('SESSIONID');
-- Showing the plan
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR ('&PSQLID','&PCHILDNO','TYPICAL NOTE'));
-- create a 10053 trace file from the sql
EXECUTE DBMS_SQLDIAG.DUMP_TRACE(P_SQL_ID=>'&PSQLID',  P_CHILD_NUMBER=>'&PCHILDNO', -
   P_COMPONENT=>'Optimizer', P_FILE_ID=>'STALE01_TEST_'||TO_CHAR(SYSDATE, 'HH24MISS'));

-- show the staleness of the stats
-- stars are there just to make it eaiser to see in the output
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
SELECT '****************' STARS, STALE_STATS FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'STALE_TAB';

-- update every row in the table
UPDATE STALE_TAB SET SALARY = SALARY *.15;
COMMIT;
-- show the staleness of the stats
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
SELECT '****************' STARS, STALE_STATS FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'STALE_TAB';

-- run another query and show the plan
-- notice dynamic sampling is not done, it's using the stale stats
SELECT /*+ QB_NAME(MAIN) */ DEPARTMENT_ID, SUM(SALARY) FROM STALE_TAB GROUP BY DEPARTMENT_ID;
COLUMN PREV_SQL_ID NEW_VALUE PSQLID
COLUMN PREV_CHILD_NUMBER NEW_VALUE PCHILDNO
SELECT PREV_SQL_ID, PREV_CHILD_NUMBER FROM V$SESSION
WHERE AUDSID = USERENV('SESSIONID');
-- Showing the plan
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR ('&PSQLID','&PCHILDNO','TYPICAL NOTE'));
-- create a 10053 trace file from the sql
-- look for BASE STATISTICAL INFORMATION to see the stats used
-- the table stat are stale but are still being used
EXECUTE DBMS_SQLDIAG.DUMP_TRACE(P_SQL_ID=>'&PSQLID',  P_CHILD_NUMBER=>'&PCHILDNO', -
   P_COMPONENT=>'Optimizer', P_FILE_ID=>'STALE02_TEST_'||TO_CHAR(SYSDATE, 'HH24MISS'));

-- run the first query on it and show the plan
-- this is using the plan that is in the library cache
SELECT /*+ qb_name(main) */ COUNT(*) FROM STALE_TAB;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR (FORMAT=>'TYPICAL NOTE'));


-- delete the stats and run another the query
-- show the staleness notice they it is NULL because there are no stats
EXEC DBMS_STATS.DELETE_TABLE_STATS ('OP', 'STALE_TAB');
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
SELECT '****************' STARS, STALE_STATS FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'STALE_TAB';

-- run another query and show the plan
-- notice dynamic sampling is done because the stats are gone in this case
SELECT /*+ QB_NAME(MAIN) */ DEPARTMENT_ID, COUNT(*) FROM STALE_TAB GROUP BY DEPARTMENT_ID;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR (FORMAT=>'TYPICAL NOTE'));

-- run the first query on it and show the plan
-- this is using the plan that is in the library cache
-- hence no dynamic sampling on this one
SELECT /*+ qb_name(main) */ COUNT(*) FROM STALE_TAB;

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR (FORMAT=>'TYPICAL NOTE'));
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Bitmap indexes are really great for flexibility of data selection in data warehouse application environments.  This is achieved by the optimizer being able to use sets of bitmaps at a time to find data in the table and the set of indexes used can change based on the need of the moment.  Bitmap indexes can be merged together (typically with a bitmap AND or OR operation) unlike normal B-tree indexes.  But in a transactional system they can really cause problems.  And locking is the issue.  The problem is that when you lock an index entry in a bitmap index you lock a set of rowids, not just one as you do in a normal b-tree index. 

A simplified look at bitmap entry on a column COLORwould be something like this:

 Control Bytes    Key        Start ROWID       End ROWID         Bitmap
FLG-LCK-LEN      BLUE      AAAcc7AA…       AAAcc7AA…       1010111100000…

The range of rowids for any one entry could cover one extent of a table, but there could be more than one entry for a value on large extents.  What this means is that if I were to do any INSERTS, UPDATES or DELETES of rows for BLUE in this range, I’ve effectively locked that entire set of rows in the table.  I don’t of course; the lock is on the index entry not at the table level.  But from a functionally point of view that hardly matters, I can’t do DML on other rows in that same range with the same value.   (Here I mean DML as in the commands INSERT, UPDATE and DELETE.  The SELECT command is a DML statement but isn’t part of this discussion.)

Here is an example to illustrate the point. 

Session ONE                       Session TWO
Time 1 - insert into bitmaplock values (9000,'RED');
                                                Time 2 - insert into bitmaplock values (9001,'BLUE');
Time 3 - insert into bitmaplock values (9002,'BLUE');
Session one is now waiting on session two.
                                                Time 4 - insert into bitmaplock values (9003,'RED');
                                                Session two is now waiting on session one, a deadlock.
Session one is now gets this message:
insert into bitmaplock values (9002,'BLUE')
            *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

At this point session one must either commit or rollback to get out of the deadlock situation.  Reissuing the insert will just continue the deadlock and no one gets anywhere.

Notice that this example is using INSERT commands. It should be impossible to have a deadlock for an insert, since either session is aware of the other session’s row, how could they be deadlocked?  It’s because they are deadlocked on the bitmap index not the table data.   Of course if the two sessions happen to do the inserts into different extents of the table they would be fine.  Even so, I’m sure you can see this is just a tripwire that will be hit at some point.

The bottom line on this, don’t use bitmap indexes on table that have DML activity.  You may be able to get away with relatively small amounts of DML from a single session.  Many folks have found that even small amounts of DML can be problematic.  Hence they will drop all bitmaps on a table, do the DML operation, and then recreate the bitmap indexes.  Good news is that bitmap indexes tend to be much smaller and build faster than normal b-trees. 

Here is the code to do this example yourself:

drop table bitmaplock;

create table bitmaplock (id number, color varchar2(10));

begin
  for i in 1..1000 loop
    insert into bitmaplock values(1, 'RED');
    insert into bitmaplock values(2, 'WHITE');
    insert into bitmaplock values(3, 'BLUE');
    insert into bitmaplock values(4, 'YELLOW');
    insert into bitmaplock values(5, 'GREEN');
    insert into bitmaplock values(6, 'BROWN');
    insert into bitmaplock values(7, 'BLACK');
    insert into bitmaplock values(8, 'TAN');
    insert into bitmaplock values(9, 'GRAY');
    insert into bitmaplock values(10, 'GOLD');
  end loop;
  commit;
  end;
/

create bitmap index bitmaplock_idx on bitmaplock(color);

exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BITMAPLOCK', estimate_percent=>100, cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 1');

-- the following steps must be done in order
-- need two sessions using same schema
-- in first session
insert into bitmaplock values (9000,'RED');

-- in other session
insert into bitmaplock values (9001,'BLUE');

-- in first session
insert into bitmaplock values (9002,'BLUE');

-- in other session
insert into bitmaplock values (9003,'RED');

-- at this point you will get a deadlock error
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Many folks are familiar with the single table hash cluster in Oracle land.  However there are really two kinds of clusters in Oracle land, Index and Hash.  Both are rarely used these days and have their roots way back in Oracle time when these were the only other alternative to a POT (Plain Old Table).   Should you use them?  Most likely no, I’ll talk about them and show you why not.

The index cluster is a way to pre-join table data.  This is also the default type of cluster.  It’s a multi-step process; you first create the cluster, then the tables in the cluster, then the index on the cluster key.   What happens at the block level is that data of the same cluster key are stored together. The main point here is that unlike a POT, data from the set of tables are stored together in the same block.  (Typically there will be two tables in an index cluster but you can have more.)  

Using the classic example for the good old EMPand DEPTtables if you were to create a cluster for these you’d use DEPTNO as the cluster key.   Then create each table in the cluster.  Now the table data for a matching department number in both tables are stored together in the same data block.  Note that the cluster is now the storage segment, not the tables.  

On the surface this can seem like a great thing, and it is provided you always retrieve the data in the joined state.  But what if you access the tables independently?  Keep in mind they are literally stored together in the same data blocks.  This means that to get either table, you are scanning blocks that contain data from both tables.   In a typical parent-child table relationship, the parent table has far fewer rows then the child.  Hence to get a set of parent rows you may end having to go thru 100s or 1,000s of blocks to find a relatively small amount of data.

Some other issues are modifying the cluster key is more expensive than if the table were not clustered.  If the data from the set of tables with the same key will occupy more than about 2 blocks, they you’re likely better off not clustering the tables.  Also if the number of matching rows per key is very different, the cluster will likely have a lot of wasted space in it.  Any block in the cluster will only have rows with one cluster key in it; hence you could have some blocks with very few rows in them.

 So should you use them?  Personally every time I tried to use them, the issues mentioned above ended up forcing us to go back to POTs.  But if you always return the data in the joined form then they can be helpful in reducing IO since you could get data from all the tables with very few LIOs. 

Now how about them Hash clusters?  They too can have more than one table in it.   It’s the same basic technique, create the cluster, and put the tables into it.  You don’t have to create an index on the key. You can if you want to, but the idea of using a hash key is to not have an index on it.  This time Oracle stores data in the same block that has the same hash value.   Again if you retrieve the data from the set of tables joining on the hash key this is good.

These are popular as Single Table Hash Clusters, which are very cool because they allow you to get a row with two IO operations pretty much no matter how big the table is.   With small tables this is easily achievable with an index.  However as soon as the index grows and has two or more levels, the LIOs will increase with the levels of the index. 

To create these you have to use the clause “SINGLE TABLE HASHKEYS N” where N is a positive value.  This is the number of expected hash keys there will be in the table.  This number is round up to the nearest prime number.  Warning: Unlike a POT, Oracle will immediately allocate the space needed for all the HASHKEYS; hence a large number will allocate a lot of space right away.  Of course if you need this then all is good, just be aware that it’s allocating all the space up front.   Optionally you can also specify a SIZEparameter; this is the amount of space reserved for all rows with the same hash key.  If not specified, then one data block is allocated for each key.   

Also you can even define the hash algorithm used to hash the values. If you don’t Oracle uses one to hash them.   This does give you a lot of control over how the data is stored and you can even predict the number of keys that will hash to the same value if you define it.

The best use of a single table hash cluster is looking up rows by the hash key using an equality predicate.  You’ll see the access of the table as “TABLE ACCESS HASH”.  Even when you retrieve a set of rows (when the hash key is not unique) this will likely have a few LIOs.

Here is a set of test queries that shows the good and bad of single table hash clusters.  This test is using a table ORD2 a POT and ORD2_HASH in a single table hash cluster.   The script is below; I’ll just show the output here.  

These tests are running this query which will retrieve a count of 20 rows:
select count(*) into x1 from [TABLE] where cust_no = 32076;
......................................................
selecting from ORD2 10000 times a POT
......................................................
In hundredths of a second
**** TIME - 524
**** CPU  - 524
**** LIO  - 1910000
......................................................
selecting from ORD2 10000 times a POT with index
......................................................
In hundredths of a second
**** TIME - 53
**** CPU  - 50
**** LIO  - 20000

......................................................
selecting from ORD2_HASH 10000 times a single table hash cluster
......................................................
In hundredths of a second
**** TIME - 47
**** CPU  - 45
**** LIO  - 10079

What you can see in this first set of tests that the single table hash cluster beat out the POT even when it had an index on the column.  So this is looking good for the single table hash cluster.  Now let’s see how things go with a range scan.

These tests are running this query which will retrieve a count of 26 rows:
select count(*) into x1 from [TABLE] where cust_no between 32076 and 32080;
......................................................
range scan from ORD2 10000 times a POT with index
......................................................
In hundredths of a second
**** TIME - 50
**** CPU  - 50
**** LIO  - 20000

......................................................
range scan from ORD2_HASH 10000 times a single table hash cluster
......................................................
In hundredths of a second
**** TIME - 1273
**** CPU  - 1265
**** LIO  - 3170067

......................................................
range scan from ORD2_HASH 10000 times a single table hash cluster with index
......................................................
In hundredths of a second
**** TIME - 53
**** CPU  - 51
**** LIO  - 20014

Here you can see the single table hash cluster really doesn’t help.  Without an index on the column it reverts to a full table scan.  With an index it’s really the same as the POT with an index (the extra LIOs and time are from the parsing of the statement because of the new index).  Is that really worth it?  Of course that is a question only you can answer for your environment.

As you might guess most of the issues with index clusters are still there with hash clusters.  A hash cluster is likely not a good choice is where the hash key is updated.  This will require the row to move (most likely) and will in effect cause a delete/insert like activity which will be slower than just updating the field.   Doing full table scans on tables in a hash cluster will likely be longer, since Oracle only stores data in a block with the same hash value, there is likely to be blocks with only a few rows in them.   And it’s really important to use equality predicates when accessing the hash key.  Other types of predicates will tend to drive full scans as seen above.   Yes you can have an index on the column, but the basic idea of using a cluster is to not use an index.  Because if you’re going to use an index, than you’re likely better off with just a POT with and index.

What you should glen from this is that you pretty much have to know the size of the data you’ll be using in the single hash table.  You need to know the number of rows that will hash to the same value (the HASHKEYS) and how big that total set of rows will be (the SIZE).  Using a single table hash cluster is not likely to work optimally when these are not known.  You don’t necessarily have to be precise with these values, but you should be close and likely over estimate a bit.  The issue here is if you under estimate to much then over flow blocks get created and you start to lose the advantage of the hashing.  In which case your performance is likely back to what a POT and indexing would have been, maybe worse.

What’s the bottom line?  Using clusters is cool and they do work well for very specific usages.  If you wander outside these usages then they will likely not help and may even hurt performance.


Here is the script I used for the tests show above.  The script to create the ORD2 table is a bit too big to put in here as a blog post.  If you’d like it please contact me ric.van.dyke at hotsos dot com and I can send it to you.

rem testing selecting from a POT vs a single table hash cluster
rem OCT2018  RVD
rem
set echo on feedback on serveroutput on
drop index ord2_cust_no;
drop index ord2h_cust_no;
drop table ord2_hash;
drop cluster cust_orders;

CREATE CLUSTER cust_orders (cust_id NUMBER(*,0))
SINGLE TABLE HASHKEYS 200;

create table ord2_hash
cluster cust_orders(cust_no)
as select * from ord2;

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2 where cust_no = 32076;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting from ORD2 10000 times a POT');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

create index ord2_cust_no on ord2(cust_no);

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2 where cust_no = 32076;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting from ORD2 10000 times a POT with index');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/


declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2_hash where cust_no = 32076;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting from ORD2_HASH 10000 times a single table hash cluster');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

rem now doing a range scan leaving the index in place on ORD2 for CUST_NO
rem but no index on ORD2_HASH for CUST_NO

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2 where cust_no between 32076 and 32080;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('range scan from ORD2 10000 times a POT with index');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/


declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2_hash where cust_no between 32076 and 32080;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('range scan from ORD2_HASH 10000 times a single table hash cluster');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

rem lastly put an index on ORD2_HASH.CUST_NO

create index ord2h_cust_no on ORD2_HASH(CUST_NO);

declare
    x1           number;
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 10000 loop
           select count(*) into x1 from ord2_hash where cust_no between 32076 and 32080;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('range scan from ORD2_HASH 10000 times a single table hash cluster with index');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Looking for a single row out of a table it’s always better to an index and a unique is best.  Yes even for a small table.  As a test of this I have a table that has 10 rows in it and just two columns, a number and a character sting of 20 random characters.    I’ll run a select in a loop that gets each row one at a time using the number column which will be unique, 1 to 10.  I’ll do this 100,000 times and capture both elapsed time and CPU time.  Also I’ll capture the number for consistent gets (LIOs) done for each. 

The first go around it will be without an index, with an index, with a unique index and last with a unique covering index.  Here is how it stacked up in my test (I ran the test multiple times and the results were consistent):

......................................................
selecting all 10 rows without an index 100000 times
......................................................
In hundredths of a second
**** TIME - 5669
**** CPU  - 5629
**** LIO  - 7000020

......................................................
selecting all 10 rows with an index 100000 times
......................................................
In hundredths of a second
**** TIME - 4883
**** CPU  - 4852
**** LIO  - 2000011

......................................................
selecting all 10 rows with a unique index 100000 times
......................................................
In hundredths of a second
**** TIME - 4556
**** CPU  - 4536
**** LIO  - 2000004

......................................................
selecting all 10 rows with a unique covering index 100000 times
......................................................
In hundredths of a second
**** TIME - 4516
**** CPU  - 4482
**** LIO  - 1000014

Well there is it.  Over all everything got better as the test went from the no index to index and then to unique index.   The full table scan used for the first query did about seven LIOs per execution, it’s a five block table plus the header block which gets visited twice each scan.  The data of the table fits in one block, but the high water mark is at five.  A full table scan always scans to the high water mark, so we get five LIOs for the table scan.

For the first two index scans there are about two LIOs per execution, one in the index and one in the table.   The entire index is in one index block, which gets the ROWID and then the scan into the table goes to just the one block where the data is.  

The LIOs are not really that much different when it used the unique index but it was faster, why?  A key here is the latching protocol used.   With a unique index oracle is able to use a lighter weight latching protocol  (consistent gets examination) when accessing the index, this ends up taking less time than just using a non-unique index which uses the normal latching protocol (consistent gets). 

The last query used a covering index; this index had both columns of the table in it.  The time is better, not by much, but it’s now down to just one LIO per access.  This is because it just goes to the index and gets everything and never even goes to the table.   By only doing one LIO is really best, this has the best scalability and the lest possible contention.   (Also you could use an IOT (Index Organized Table) which would be the same.)

So it is very true that looking up a single row is really faster with an index and best with a unique index even on a 10 row table.  It might not seem like much but doing it thousands or millions of time, the saving will start to make a difference.

Note:  There are some “extra” LIOs when the queries run, most of which are likely caused by any parsing going on.
 
Here is the code for the test if you’d like to try it yourself:

rem testing to get one row out of a table.
rem SEP2018  RVD
rem
set echo on feedback on serveroutput on
drop table iamsmall purge;
create table iamsmall (id_col number, data_col varchar(20));
begin
 for x in 1..10 loop
   insert into iamsmall values (x, (dbms_random.string('A',20)));
 end loop;
end;
/

declare
    x1           varchar2(20);
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 100000 loop
     for i in 1 .. 10 loop
        select data_col into x1 from iamsmall where id_col = i;
     end loop;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting all 10 rows without an index 100000 times');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

rem create a nonunique index on id_col
create index ias_id on iamsmall(id_col);
declare
    x1           varchar2(20);
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 100000 loop
     for i in 1 .. 10 loop
        select data_col into x1 from iamsmall where id_col = i;
     end loop;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting all 10 rows with an index 100000 times');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/

rem create a unique index on id_col
drop index ias_id;
create unique index ias_id on iamsmall(id_col);
declare
    x1           varchar2(20);
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 100000 loop
     for i in 1 .. 10 loop
        select data_col into x1 from iamsmall where id_col = i;
     end loop;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting all 10 rows with a unique index 100000 times');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/


rem create a unique covering index on id_col,data_col

drop index ias_id;
create unique index ias_id on iamsmall(id_col, data_col);
declare
    x1           varchar2(20);
    l_start_time pls_integer;
    l_start_cpu  pls_integer;
    l_start_cr   pls_integer :=0;
    l_end_cr     pls_integer :=0;
begin
    select value into l_start_cr from v$mystat where STATISTIC# = 139;
    l_start_time := DBMS_UTILITY.GET_TIME;
    l_start_cpu  := DBMS_UTILITY.GET_CPU_TIME;
    for ii in 1 .. 100000 loop
     for i in 1 .. 10 loop
        select data_col into x1 from iamsmall where id_col = i;
     end loop;
    end loop;
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('selecting all 10 rows with a unique covering index 100000 times');
    DBMS_OUTPUT.put_line ('......................................................');
    DBMS_OUTPUT.put_line ('In hundredths of a second');
    DBMS_OUTPUT.put_line ('**** TIME - '||to_char(DBMS_UTILITY.GET_TIME - l_start_time));
    DBMS_OUTPUT.put_line ('**** CPU  - '||to_char(DBMS_UTILITY.GET_CPU_TIME - l_start_cpu));
    select value into l_end_cr from v$mystat where STATISTIC# = 139;
    DBMS_OUTPUT.put_line ('**** LIO  - '||to_char( l_end_cr - l_start_cr));
end;
/
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Hey did you miss this in the latest drop of SQL Developer?  You can now rename the columns in the AUTOTRACE output! Way cool!! (Oh and Explain Plan too.)

Since I do a lot of screen shots this is great.  A lot of the columns in AUTOTRACE start with the work LAST, so when shrinking columns to make the screen shot a nice size the names are less than useful.  For example:







Notice the thee columns with just “LAST…” in them, um, which is which? Well now I can right click on the column heading and rename it!  YES!  This is very nice.









The SQL Developer team keeps hitting home runs.  Go Team!!

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

When the Oracle Optimizer receives a query, one key thing it does is transforms the query.  This is a literally rewrite of the query text.  The optimizer has been doing this to some degree always, but it’s getting much more aggressive as we move into 12 and 18.  When I teach about this in my classes, I’m often asked “How can I see the query after the transformations are done?” 

The only place I know of is the 10053 trace file.  In the trace you’ll find a section with the title:


Final query after transformations:******* UNPARSED QUERY IS *******


After this line will be the query after all the transformations have happened.  Here are a couple of points about the statement.  First it is one long string.  There are no line breaks so you may want to copy it off to a text editor to make it a bit more readable.  Also everything has been fully qualified. 

This means that even a very simple select like this one:

select * from emp

Becomes this:
SELECT "EMP"."EMPNO" "EMPNO","EMP"."ENAME" "ENAME","EMP"."JOB" "JOB","EMP"."GENDER" "GENDER","EMP"."MGR" "MGR","EMP"."HIREDATE" "HIREDATE","EMP"."SAL" "SAL","EMP"."COMM" "COMM","EMP"."DEPTNO" "DEPTNO" FROM "OP"."EMP" "EMP"

This is the query text that will actually be optimized.   Notice that the entire statement has be converted into uppercase, of course any quoted strings wouldn’t be.

So how do I get this? You might be asking yourself. 

Getting the 10053 trace used to be rather cumbersome.  The problem is that the 10053 only traces a hard parse, and the only way to turn it one was the alter session command.  More recently Oracle has added the dbms_sqldiag.dump_tracecall that creates a trace file from an existing cursor that is in the library cache.

The call looks like this:
execute dbms_sqldiag.dump_trace(p_sql_id=>'&hsqlid',  p_child_number=>'&hchild', p_component=>'Optimizer', p_file_id=>'MY_TRACE'));

The first two parameters are what you really have to enter, the SQL_ID and the CHILD_NUMBER.  The component has two values optimizer or compiler.  I’d recommend that you stick with the optimizer value (it’s the default).  Compiler adds more to the trace but the additional stuff is internal code calls which are likely great for the folks out is Redwood Shores who are developing the optimizer, but not to useful for us. 

The last parameter is pretty cool; this lets you add a string that will be appended to the end of the trace file name to help you find the trace file you just created.  This has the same effect as setting TRACEFILE_IDENTIFIER.  I hope this gets added to creating 10046 trace files someday.
 
Here is a great blog post by Greg Rahn about dbms_sqldiag.dump_trace if you’d like to see more about it, http://structureddata.org/2011/08/18/creating-optimizer-trace-files/

So now you can create a 10053 trace of a sql statement and see the statement that was actually optimized. 

A little example:
SQL> select /*+ qb_name (main1) */
  2  d.department_name, e.last_name as manager_name
  3  from dept1700 d, employees e
  4  where e.department_id = d.department_id;

DEPARTMENT_NAME                MANAGER_NAME
------------------------------ -------------------------
Purchasing                     Baida
Finance                        Chen
...

SQL> select sql_id from v$sql where sql_text like 'select /*+ qb_name (main1) */%';

SQL_ID
-------------
5sshry022tdrc
SQL>
SQL> execute dbms_sqldiag.dump_trace(p_sql_id=>'5sshry022tdrc',  p_child_number=>'0', -
>    p_component=>'Optimizer', p_file_id=>'HOTSOS_SQL_10053_'||to_char(sysdate, 'hh24miss'));
SQL>
SQL> SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Default Trace File';

VALUE
----------------------------------------------------------------------
C:\APP\ORACLE\diag\rdbms\orcl\orcl\trace\orcl_ora_11880_HOTSOS_SQL_10053_082115.trc

In the trace here is the final query:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT /*+ QB_NAME ("MAIN1") */ "DEPARTMENTS"."DEPARTMENT_NAME" "DEPARTMENT_NAME","E"."LAST_NAME" "MANAGER_NAME" FROM "OP"."DEPARTMENTS" "DEPARTMENTS","OP"."EMPLOYEES" "E" WHERE "E"."DEPARTMENT_ID"="DEPARTMENTS"."DEPARTMENT_ID" AND "DEPARTMENTS"."LOCATION_ID"=1700
 
If you compare this to what was run a couple things stand out, the query was written to select from a table called dept1700, which isn’t here in the final query.  The table DEPARTMENTSis instead, clearly dept1700 is really a view. And a new predicate has been added to the file query about the LOCATION_ID, which comes from the view.  This is what is transformation called view merging happening.

There is a lot more in the 10053 trace and some of it is even useful to us, a lot however is more suitable for the developers out in Redwood Shores.   

Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Every once and a while it’s good to refresh yourself on basic assumptions you have about how things work in Oracle.   Like how unique index scan work.   Somewhere along the line I had picked up the “false-truth” that as long as there was a unique constraint on the column(s) the optimizer would be able to do a unique scan, even if the index wasn’t unique.  Now this is NOT the norm, which might be why I thought this.  However a test is worth, – you know the rest of the saying.

Here we go.  First I create a non-unique index then put on a Primary Key constraint on the ID column, fill it up with 1000 rows and run a select.  (The column “SOMEDATA” is a random 9 char string.)  The full code for the example is at the bottom, I’m just highlighting some important bits here.

If I try to insert a value that is already in the table I get (as expected) an error:

Error starting at line : 33 in command -
INSERT INTO drop_me_now VALUES (1,'ABCDEFGHI')
Error report -
ORA-00001: unique constraint (OP.DMN_PK) violated


And when I select that row here is the plan:

SELECT * FROM drop_me_now WHERE ID=1;
        ID SOMEDATA
---------- ---------
         1 lVxoEQSTS

SELECT * FROM TABLE(dbms_xplan.display_cursor (FORMAT=>'typical'));
---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DROP_ME_NOW |     1 |    14 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DMN_ID      |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

Notice it is a RANGE SCAN, but the optimizer does know it’s only going to get one row because of the primary key constraint.   

Coolness.

Now I recreate the table, put in a unique index but do not have any constraints, no primary key or unique constraint.  Again fill it with 1000 rows.  And do the same routine.  When inserting a row that already exists I again get an error.   Notice I get a “unique constraint” violation, but there isn’t a constraint on the column.  The constraint name used is the index name.

Error starting at line : 65 in command -
INSERT INTO drop_me_now VALUES (1,'ABCDEFGHI')
Error report -
ORA-00001: unique constraint (OP.DMN_ID) violated

And here is the plan for the select getting ID=1 on this table with just the unique index, but without any constraint on the column.

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| DROP_ME_NOW |     1 |    14 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | DMN_ID      |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=1)

OK, I got that “false-truth” out of my head now!  Here is the full script used if you want to have a go of it yourself.

rem file pk_index_example1.sql
rem RVD JUN2018
rem create a table with a nonunique index to support a
rem primary key constraint
rem then recreate with a unique index without a
rem primary key constraint
set serveroutput off

DROP TABLE drop_me_now;

CREATE TABLE drop_me_now ( ID NUMBER, somedata VARCHAR2(9));

CREATE INDEX dmn_id ON drop_me_now(ID);

ALTER TABLE drop_me_now ADD CONSTRAINT dmn_pk PRIMARY KEY (ID);

BEGIN
 FOR X IN 1..1000 LOOP
   INSERT INTO drop_me_now VALUES (X, (dbms_random.STRING('A',9)));
 END LOOP;
END;
/
COMMIT;

BEGIN
  dbms_stats.gather_table_stats (
    ownname => 'OP',
    tabname => 'DROP_ME_NOW',
    estimate_percent => dbms_stats.auto_sample_size);
END;
/

INSERT INTO drop_me_now VALUES (1,'ABCDEFGHI');

SELECT * FROM drop_me_now WHERE ID=1;

SELECT * FROM TABLE(dbms_xplan.display_cursor (FORMAT=>'typical'));

rem ***************************
rem Recreating the table
rem ***************************

DROP TABLE drop_me_now;

CREATE TABLE drop_me_now ( ID NUMBER, somedata VARCHAR2(9));

CREATE UNIQUE INDEX dmn_id ON drop_me_now(ID);

BEGIN
 FOR X IN 1..1000 LOOP
   INSERT INTO drop_me_now VALUES (X, (dbms_random.STRING('A',9)));
 END LOOP;
END;
/
COMMIT;

BEGIN
  dbms_stats.gather_table_stats (
    ownname => 'OP',
    tabname => 'DROP_ME_NOW',
    estimate_percent => dbms_stats.auto_sample_size);
END;
/

INSERT INTO drop_me_now VALUES (1,'ABCDEFGHI');

SELECT * FROM drop_me_now WHERE ID=1;

SELECT * FROM TABLE(dbms_xplan.display_cursor (FORMAT=>'typical'));

rem DROP TABLE drop_me_now;

Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
close
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview