MongoDB to Couchbase, Part 4: Data Modeling

Data Modeling Guides at this introduction, this video tutorial, and this additional video tutorial.

Additional examples

Data Modeling Guides: in this Couchbase documentation, this JSON Data Modeling Guide, and this video.

More examples at this blog.

Flexible document model based on BSON (binary JSON)

Flexible document model based on JSON and binary objects

Supports embedding and loose references (no foreign key reference enforcement)

Supports embedding and loose references (no foreign key reference enforcement)

Atomic Write operations:

Single document atomicity by default; Multi-document writes are supported as well as single-document atomicity.

Multi-document atomicity and full ACID for MQL operations is supported via transactions in 4.2 and above.

Atomic Write operations:

Single document atomicity by default; Multi-document writes are supported as well as single-document atomicity. The operations can be done via direct KV SDK or N1QL.

Multi-document atomicity has been supported since 6.5. Full ACID for N1QL statements is supported via N1QL transactions in 7.0 and above.

Performance Advisor:

Index advisor built into MongoDB ops manager

Performance Advisor:

Has index advisor built into the product (via ADVISE statement and ADVISOR function) as well as an open service.

Schema Validation is optionally done synchronously.

Synchronous schema validation is unavailable. Schema validation can be done asynchronously.

Embedded data model

Exactly the same thing on JSON and Couchbase

Normalized data model: References to other documents are stored as ObjectId().

Example:

“product”: ObjectId(“Lego.US.beatles123”)

“friends”: [ObjectId(“fred.123”), objectId(“joe.234”), objectId(“john.345”)]

In a normalized schema, queries typically end up joining the data using the $lookupoperator. $lookup implements left outer join against an unsharded collection. Joining between two sharded collections has to be implemented by the application/user.

$graphlookup mimics recursion using Oracle’s CONNECT BY feature by querying the same object based on the previous result.

Normalized data model: References to other documents are simply stored as a string representing the document key of the references.

Example:

“product”: “Lego.US.beatles123”

“friends”: [“fred.123”, “joe.234”, “john.345”]

In a normalized schema, queries can issue INNER JOIN, LEFT OUTER JOIN, and RIGHT OUTER JOIN to manage the data. Collections in Couchbase are always automatically sharded (hash partitioned). All joins are supported partitioned collections. See details here and here.

Couchbase doesn’t have the equivalent recursive querying capability. There are some workarounds: See here; The JavaScript UDF feature in 7.1 will help you write a loop or recursion easily.

Sharding Collections

The collections are not sharded by default. You’d need to add the config servers and mongos to shard a collection. Mongo supports hash and range-based strategies for collection and index sharding. Indexes follow the collections shard strategy since the MongoDB indexes are local to the data. Mongo supports one or more fields as the shard key for hash strategy.

Sharding is called partitioning in Couchbase. The collections in Couchbase are always hash-partitioned based on the document key. There is no range partitioning of the collection itself. The indexes in Couchbase can use a different strategy than the collection. Indexes can be partitioned or not. Indexes can be partitioned using hash on any field or expression or any complicated expression. The expressions for creating the index provide the flexibility to create indexes for a subset of the data (partial index).

Time to Live (TTL) is supported by creating an index on a date field within the collection.

Couchbase supports TTL natively on the collections without the need for a separate index.

Modeling 1:1 relationships with embedding documents

For 1:1 relationships, you use the exact same approach.

Modeling 1:n relationships with embedding documents. This comes naturally with BSON arrays. Supporting storing an array of objects and or object of arrays is easy. The important thing is the support for querying and speeding them up using indexes. MongoDB supports querying data based on any of the values ​​or fields in an array. The queries can be accelerated using array indexes. While MongoDB does not support indexing multiple arrays in a single index, it allows creating an array key with keys from multiple arrays and creating a single index on it.

You can use the exact same approach on Couchbase. Arrays are a fundamental type in JSON and can be nested easily. Couchbase supports querying arrays, indexing arrays, and expressions on the index. The query support both nesting and unnesting of arrays easily in the FROM clause of the statement. Array indexes support indexing on one or more keys or expressions. Couchbase also supports indexing multiple arrays in a single index in FTS and using them in a direct search query or N1QL query. See additional details here.

Modeling 1:n relationship with references. Simply store the n references as an array of document _id list. Within the query, simply do the $lookup to join the documents. Also, you can sequence many $lookup operations in a single aggregate() query. $lookup is fine as long as you avoid joining a sharded collection to another sharded collection. You have to be careful about joining two humongous collections. The second limitation is $lookup only supports nested loop join. When the amount of data increases, you need other join implementations like hash joins, which Mongo does not have.

You can use the exact same approach on Couchbase. Instead of $lookup, use JOIN operations. See details here and here. The collections and the indexes used for the JOIN operations can be partitioned (Couchbase term for sharding). Couchbase support both nested loop and hash joins in both query and analytics service. If you’re using the Couchbase cost-based optimizer in 7.0, it chooses the join method based on the lowest cost estimates. The rule-based optimizer uses NL join by default and can use hash join when you specify the hash join hint.

.

Leave a Comment