Introduction to Couchbase for Oracle Developers and Experts: Part 4: Data Modeling

There are three important things in the database world: performance, performance, and performance. Bruce Lindsay

Let me start with the real impact of proper modeling on application performance. This is an excerpt from a talk by Amadeus engineers about their Customer Experience Management (Traveler Loyalty App) app they migrated from Enterprise RDBMS to Couchbase.

Performance comparison between Enterprise RDBMS and Couchbase:

Enterprise performance comparison RDBMS and Couchbase

“In terms of stability and performance it’s been great as well. It’s 100% available for over 1 year and still counting. In terms of performance, it was day and night compared to what we had before, basically, these graphs give you the percentage of graphic response time The first graph is the previous implementation using the standard RDBMS. With RDBMS, at 99 percentage Of the cases, we were responding in less than 4 seconds from the database. Now on Couchbase for 99.9 percentage Of the cases, we respond in less than 300 milliseconds. This is more than 10 times faster [13.3 times]. This is so amazing. In fact, the average response time in Couchbase corresponds to 90 percentage The RDBMS… which is amazing. To be honest, when the developers initially extracted this data, they thought it was a scaling problem. It really is day and night. Performance in Couchbase is really more stable. As you can see, the graph for 99 percentage Flat, that’s amazing. “- Amadeus

Watch the full presentation here:

I have worked in RDBMS R&D for a long time. It’s hard to beat RDBMS performance by this margin. very Difficult. How did Amadeus achieve this with Couchbase?

Answer:

  • Super data modeling with JSON
  • Access methods and optimizations in N1QL (SQL for JSON)

Data Modeling for JSON

Couchbase has a good number of resources with detailed advice on how to move from the relational model to the Couchbase JSON model. Best practices, tips, and things to avoid. These are the main resources.

Couchbase 7.0 has a three-tier Bucket->Scope-> Collection for organizing data. Detailed explanations for this here and here. So, with Couchbase 7.0 and above, here are the database objects that you use to model your physical data.

By using ranges and collections, you can avoid having to use type in every document, index definition, and queries. All of these get a little more natural. While you can assign a table to a collection and move a relational table to a Couchbase collection, to reap the full benefit of the model, you need to redesign to exploit JSON’s nested structures: objects and arrays. Just like Amadeus did.

database objects

Notes about access and performance methods in Couchbase

The main purpose of practicing logical and physical data modeling exercises is to improve programmability and performance. The easier it is for the developer to write programs to access the required data, the better. This program must perform to meet service level agreements. It depends not only on the logical and physical data model, but also on the access and optimizer methods in the system.

Here are the data access APIs and languages ​​available via the Couchbase SDKs:

  • Direct access to and manipulation of a JSON document (eg Java SDK)
  • Direct access to and manipulation of the subdocument. (eg Java SDK)
  • N1QL- SQL to JSON for OLTP access
    • This is a rich language that supports different project processes being selected, and joined
    • Numerous indexes on metrics and matrices for scope and pattern scans covering all use cases made by Oracle. Couchbase has a built-in index advisor as well as a service.
    • State of the art improved
    • Acid transactions
  • N1QL – SQL for JSON for Analytics
    • Rich SQL optimized for easier analytics. This is a tutorial
    • Large scale parallel distributed execution of large group analysis
  • Full text search
    • Linguistics, language-based research
    • Complex query language for mixing search with scope scanning
    • Fully integrated with N1QL and tools.
  • Event
    • It helps you easily encode final actions after changes to data.

Example query from Amadeus

Now, let’s look more closely at how Amadeus achieved a 13x improvement compared to an enterprise RDBMS.

Here’s their typical query to fetch an eligible Couchbase query:

SELECT DATA,
       META(DATA).id,
                  META(data.cas)
FROM
  (SELECT RAW id
   FROM
     (SELECT id,
             SUM(weights)
      FROM (
              (SELECT META().id AS weights
               FROM prdcem_xx
               WHERE TYPE = "links"
                 AND OWNER = "XX"
                 AND (ANY param IN `identity`.`Contact`.`Emails` SATISFIES (param = $1) END)
               LIMIT 50)
            UNION ALL
              (SELECT META().id,
                             1 AS weight
               FROM prdcem_xx
               WHERE TYPE = "links"
                 AND OWNER = "XX"
                 AND (ANY param IN `identity`.`Contact`.`Phones` SATISFIES ((REVERSE(param.`Number`) LIKE $2) OR (reverse(param.`Number`) LIKE $3)) END)
               LIMIT 50) UNION ALL
              (SELECT (META().id,
                              ` AS weight FROM prdcem_xx WHERE type = "links" AND owner = "XX" AND (ANY param IN `IDENTITY`.`Name`.`NAMES` satisfies [param.`FirstName`, param.`Lastname`] = [$4, $5] END)
               LIMIT 50)) AS union_query
      GROUP BY id
      ORDER BY weights DESC) AS order_query) AS dkeys
INNER JOIN prdcem_xx AS data ON dkeys

Only in this query you can see a number of powerful modeling features of JSON, N1QL language, and indexing:

pipe lines

The FROM clause can process data from collections or other SELECT statements – both are data sources that return a set of JSON documents. Pipelines (subqueries) can be nested at any level and anywhere, table expression is allowed.

matrix processing

predicate (any value in arrayfield SATISFIES value = “abc” end) Finds the value abc in an array field. Matrix is ​​the main difference between relational model and JSON model. You can use arrays of volumetrics any database can store arrays, but not everyone can index and search arrays efficiently. Couchbase supports efficient indexing of index keys using GSI and FTS. Index Search supports indexing and scanning of multiple fields from multiple arrays in an efficient manner.

indexing

Indexing allows keys, arrays, array expressions, simple scalar expressions, and more. This allows developers to efficiently implement expressions (inverse(param.`Number`) LIKE$3) by using them as an index key.

More powerful

The final query plan that is generated must be effective at every stage of the pipeline as well as the entire query. For example, the optimizer must select the correct index for each subquery, but also push limit 50 to index scans. This avoids the need to transfer and process unnecessary data.

Set operations

  1. N1QL works based on a set of documents. Apart from selecting and joining the project, real applications need additional group operations: UNION, UNION ALL, EXCEPT, EXCEPT ALL, INTERSECT, INTERSECT ALL. N1QL supports all of these standard operations.
  2. Index split: Couchbase groups are automatically split on the document key. GSI indexes can be unpartitioned (on a single node, this is default) or partitioned on any expression. Search indexes (the hash on the document key) can also be divided into more than one partition.

Couchbase is a modern database that is greater than the sum of its parts. Each feature is integrated with other features to make life easier for developers. I hope you benefit from it as much as Amadeus!

.

Leave a Comment