Others written in SQL to improve the efficiency of a good summary of the method, very practical, down
In the massive data tables, each table has a basic or more indexes to ensure efficient query, the index during the ETL process needs to use the following principles:
(1) When inserting the data for the number of data records in the table more than 10%, you first need to remove the table indexes to improve the efficiency of data insertion, and then when all the data into the index.
(2) Avoid using functions on indexed columns or calculations in the where clause, if the index column is part of the function, the optimizer will not use the index to use full table scan. For example:
Inefficient: select * ROM DEPT where SAL * 12> 25000;
Efficient: select * FROM DEPT where SAL> 25000/12;
(3) Avoid the use of the index columns and NOT "! =" Index can only tell what exists in the table, but can not tell what does not exist in the table, when the database encounters NOT and "! =", It will Stop using the index instead perform a full table scan.
(4) an index column with> = alternative>
Efficient: select * FROM EMP where DEPTNO> = 4
Inefficient: select * FROM EMP where DEPTNO> 3
The difference is that the former DBMS will skip the first record of DEPT is equal to 4 while the latter will first locate the DEPTNO = 3 records, and scans forward to a record of greater than 3 DEPT.
(5) function of the column to enable indexing, be sure to use the function if the column enable indexing, Oracle9i or later a new function: function-based index (Function-Based Index) is a better solution, but the type of index drawback is that only for a function to create and use this function.
create INDEX EMP_I ON EMP (UPPER (ENAME));
select * FROM EMP where UPPER (ENAME) = 'BLACKSNAIL';
3.2 The proper use of the cursor
When the mass data table for data delete, update and insert operations with the efficiency of cursor processing is the slowest way, but it's used during the ETL process and essential, but its use has an important role, Therefore, the proper use of the cursor is particularly important.
The data warehouse dimension table data for maintenance, because of the need to ensure the consistency of the dimension table ID, so the cursor is used to maintain the integrity of the data the best way. Because of its low efficiency, according to the ordinary way can not handle a large amount of data in the dimension table data maintenance (generally refers to more than 100,000 records the dimension table), the following is an effective way to deal with this situation:
(1) data extraction of the source table using the timestamp, so the dimension table data every day to maintain only the latest date for the updated time data, greatly reducing the need to maintain data records.
(2) in the dimension table insert and update are combined with a condition to filter dimension record already exists in the table, examples are:
insert INTO DIM_CUSTOMER select * FROM ODS_CUSTOMER where ODS_CUSTOMER.CODE NOT exists (DIM_CUSTOMER.CODE)
(3) the use of explicit cursors (CURSORs), since the use of implicit cursor will perform two operations, first retrieve the record, a second inspection TOO MANY ROWS this EXCEPTION, while the explicit cursor does not perform a second operation .
3.3 Data extraction and load on the SQL optimization
3.3.1 where clause in the order of the connections
ORACLE uses a bottom-up sequence analysis where clause, according to this principle, the connection between the table must be written before in other conditions where those records can filter out the maximum number of conditions must be written in the where clause at the end.
Inefficient: select * FROM EMP E where SAL> 50000 AND JOB = 'MANAGER' AND 25 <(select count (*) FROM EMP where MGR = E. EMPNO);
Efficient: select * FROM EMP E where 25 <(select count (*) FROM EMP where MGR = E. EMPNO) AND SAL> 50000 AND JOB = 'MANAGER';
3.3.2 Delete entire table with the truncate instead delete
When the Delete to delete records in the table, there rollback (rollback segments) can be used to store recovery information, and when the use of truncate, the rollback will not store any information that can be restored, so the execution time will very short. Also note that truncate the whole table only apply to the deletion, because the truncate is DDL not DML.
3.3.3 as much as possible use the COMMIT
ETL process in data with a lot of steps, the data warehouse is the data after extraction using model Chong calculation principles, so unlike the business of the COMMIT data system to ensure data integrity and consistency of an operation requiring all completed in order to carry out, whenever possible in the program for each delete, insert and update operations as much as possible to use COMMIT, so the system performance because of the resources released by COMMIT greatly improved.
3.3.4 use alternative exists IN
In many queries based on the underlying table in order to satisfy a condition often need to join another table, for example, in the ETL process to write data to the model 10 or so often associated with the dimension table, in this case, the use exists and IN will not improve the query efficiency.
3.3.5 replace NOT IN with NOT exists
Subquery, NOT IN clause will perform an internal sorting and merging, no matter under what circumstances, NOT IN is the least efficient, because it subquery table performed a full table traverse. Replace NOT IN with NOT exists to improve the efficiency of query.
3.3.6 GROUP BY Optimization
Improve the efficiency of the GROUP BY statement, you can not record in the GROUP BY to filter out before.
Inefficient: select JOB, AVG (SAL) FROM EMP GROUP BY JOB HAVING JOB = 'PRESIDENT' OR JOB = 'MANAGER'
Efficient: select JOB, AVG (SAL) FROM EMP where JOB = 'PRESIDENT' OR JOB = 'MANAGER' GROUP BY JOB
3.3.7 Conditional replace union with union-ALL
ETL process operation for multi-table joins many cases, the use of union-ALL conditional replace union on the premise that: the connection is no primary key in each table the same record, because the union ALL will repeat the same set of output two results records.
When a SQL statement requires two union query result set, the two result sets will be the way to union-ALL were merged, then sorted out before the final result. If union ALL alternative union, so that sorting is not necessary, the efficiency will be improved 3-5 times so
3.3.8 separating tables and indexes
Always build your tables and indexes in different table space, never to ORACLE does not belong to the object stored within the system to the SYSTEM table space. While ensuring that the data tablespace and index tablespace set to a different hard drive controller with the control of hard disk