Introduction to Couchbase for Oracle Developers and Experts: Part 7: Optimizer

Here are the previous articles comparing architecture, database objects, data types, data modeling, statements and features, and indexing of Oracle with Couchbase. This post will focus on the optimizer.

To database what Yoda is to Star Wars, optimizer is. – Yoda

Separating the HOW (logical, physical representations, access methods) from WHAT (the SQL query) was the genius of the relational model. SQL has been unreasonably effective for relational models and beyond. A good optimizer is critical for SQL irrespective of the data model. The job of an optimizer is to produce an efficient execution algorithm, commonly known as the query plan or simply the plan.

Oracle is an RDBMS, that used a rule-based optimizer (RBO) for many years before switching to Selinger style cost-based optimizer. Couchbase N1QL is a declarative query language that is SQL for JSON. Couchbase N1QL also started with a rule-based optimizer. In 2021, we added a cost-based optimizer (CBO) for the query service. CBO for the analytics service is on the roadmap. JSON is flexible, supports complex objects, arrays in documents and predicates.

Couchbase’s patented cost-based optimizer builds on and extends the traditional CBO concepts to JSON’s flexible, nested model. Just like Oracle, RBO and CBO coexist in Couchbase. This article describes the query optimizer in the query service of Couchbase. Couchbase analytics service still uses the rule-based optimizer.

*See the full intro to Couchbase for the Oracle developer series here.

Below is a typical flow of query execution:

Query execution flow

The job of the optimizer is to transform the query into a query plan:

Transforming the query into a query plan

To achieve that, at a high level, the query optimizer does the following steps:

  1. Rewrite a query to its logical equivalent to make the optimizer easier/better.

    1. Eg, Expression rewrites, subquery flattening.

  2. Join reordering (aka join enumeration).

    1. Determine the most efficient join order.

  3. Choose an efficient access method for each data source and join method.

    1. Choose the right index, predicate pushdowns, join type (eg, hash join or nested loop join) for each join.

  4. Create a physical plan for the query engine to execute.

Oracle Resources on Optimizer

SQL Maria has great byte-sized videos explaining the Oracle Optimizer. Much of the Oracle features and descriptions in this article are from them. Oracle’s SQL tuning guide is also a great resource for learning about the Oracle optimizer. Optimizer is the key to the success of SQL and the performance of your queries.

Let’s deep dive into each of the optimizers and compare the features.

RESOURCES

ORACLE

Couchbase

Optimizer Concepts

Optimizer Statistics

Optimizer hints

Advisor

Oracle CBO Book

Rule-Based Optimizer

CBO documentation

CBO Blog and Stats blog

The CBO Talk

Query Transformation (Rewrites) (chapter 5)

INLIST handling

IN list handling.

We’ve seen queries with 25,000 values ​​in the IN list! N1QL optimizer handles various forms of IN list efficiency by using multiple index lookups on both static and dynamic lists by creating a hash map that has O(1) lookup complexity during IN clause evaluation. This was added in Couchbase 6.5.

Subqueries:

Significant subquery optimizations via query rewrites. See chapter 5.4

Subqueries. There are simple optimizations for both correlated and non-correlated subqueries. Subquery unnesting (flatting) or rewriting subqueries as joins aren’t done yet.

Materialized view rewrites. Used mainly in data warehouse workload.

Couchbase does have materialized views (simply called views) that execute map-reduce functions. These views have a separate API to access and are not used by N1QL.

Star join transformations and others.

Query service is targeted for OLTP-like workload. Many of the transformations like star transformations are useful for data warehousing workload.

JOIN Methods

Nested loop

Nested loop

Hash join

Hash join. With RBO, you need to specify the build or the probe side of the has joined for one of the collections; CBO determines probe and builds sides based on the statistics. The query service does limit the number of items for the hash table to 16 million to avoid memory bloat.

Sort-merge join

Unavailable

Join reordering is an important step in optimization; Join order can change a query execution time from hours to minutes and vice versa. It’s based on the cardinality estimates to create a plan with the lowest cost.

Couchbase 7.1 adds cost-based join-reordering optimization (aka, join enumeration). Prior to 7.1, the join order is the same as the order specified in the FROM clause of the SELECT statement.

ACCESS methods

Full table scan; Reads all the rows in the table without any logical order. Rarely done in an OLTP application and use only where index paths are unavailable.

Couchbase collection is a hash distributed table and doesn’t support the exact full table (collection) scan. However, you can create a primary index that provides equivalent functionality. In addition, you can filter the data based on the document key and you can also exploit the key ordering in the index

Eg FROM t ORDER BY META

Table access by rowid

Document key is the equivalent of rowid. Index scans return a set of document keys to query service which uses a look-up by rowid to retrieve the document for further processing. The applications can also retrieve documents by specifying one or more keys

SELECT * FROM t USE KEYS [“k1”]

SELECT * FROM t USE KEYS [“k1”, “k2”, “K8”, “K4”]

Index unique scan

Index scan with limit 1

Index range scan

Index range scan is the main workhorse for the query service to use the indexer. The query specifies the spans for the index to filter and return the results. This range scan can also skip the keys in the case of a composite index. Eg index on (c1, c2, c3) can be used with index spans on c1 and c3 for WHERE c1 = 20 and c3 > 40;

Index skip scan

The optimizer does a statistical analysis of the leading columns to scan and filter values ​​in non-leading columns. This can reduce the number of indexes you need in the system.

Unavailable.

Full Index Scan

Full index scans are used depending on the predicate and cost.

Eg WHERE c1 IS NOT MISSING ORDER BY c1;

Fast Full Index Scan

[All the columns are in the index and used for order by]

This is simply an ordinary index scan with an optimizer matching the order by expressions with index key order; Additional optimization by pushing down the pagination clauses (OFFSET, LIMIT) to the index scan

Index join

In Couchbase parlance, it’s called a multi-index scan. Multiple indexes are used to evaluate a single collection, results of which can be joined with other indexes.

Bitmap indexes

Not available

Array indexes are known as multivalue indexes in Oracle. They need to have a special code remove duplicates from the qualified rowids returned from the index scan.

An array index scan is similar to Couchbase. For arrays, there could be multiple index entries for a single document. Hence duplicate document keys returned by the index scan need to be removed. Couchbase array indexes can themselves have an array constructor, complex expressions, and therefore is more flexible.

Statistics Collection

ANALYZE statement

ANALYZE statement

Same as

UPDATE STATISTICS statement

Automatic statistics collection via DBMS package.

Unavailable

EXPLAIN

Multiple ways to generate the query plan. The plan is tabular with indentation to represent the order.

  1. EXPLAIN PLAN

  2. V$SQL_PLAN (Plan for the statement in cursor cache)

See the full guide here.

A single way to generate the query plan. The plan generated is in JSON and therefore the query execution tree is correctly represented without having to guess.

  1. EXPLAIN. Explained here and here.

Graphical plan with profile: I searched and searched. I could only see an old graphical explain; Nothing for 21c

Graphical explain plan via query work bench.

Graphical plan with profile: I searched and searched. I could only see an old graphical explain; Nothing for 21c

When you execute the query, the profile will give the same plan with different shades of color to give you a sense of the expense. Clicking on each box will give you stats on that iterator.

Hints: Extensive hints are available.

Hints: Extensive hints are available. N1QL hints are similar to the other Oracle database, MySQL than Oracle database.

.

Leave a Comment