MongoDB Schema Design



1. Basics

The main data related features supported by MongoDB are:

MongoDB data models created in Daprota M2 include:


Document View


Collection

Document

Reference

MongoDB resolves relationships by either

Embedding approach resolves relationships by storing all related data (documents) in a single document.

Referencing approach resolves relationships by references that points out to related documents.

Embedding is equivalent to de-normalization while referencing is equivalent to normalization in data modeling.

A referenced document can be in the same collection or in the separate collection in the same database or in another database.

Two types of references are supported:


References


MongoDB does not support joins. For some data models, it is fine to model data with embedded documents (denormalized model), but in some cases referencing documents (normalized model) is a better choice.

There are several types of MongoDB data models you can create:


Embedding and Referencing Model


Embedding model enables de-normalization of data, which means that two or more related pieces of data will be stored in a single document. Frequently it is a choice for "contains" and "one-to-many" relationships between entities (documents).

Referencing model enables normalization of data by storing references between documents to indicate a relationship between the data stored in each document.

Hybrid model is a combination of embedding and referencing model. It is usually used when neither embedding or referencing model is the best choice but their combination makes the most balanced model.


2. Key Considerations with Data Modeling

The key considerations you have to think about while modeling data that will be stored and managed in MongoDB are:

You will need to consider performance, complexity and flexibility of your solution in order to come up with the most appropriate model.


3. Embedding Model

Embedding model enables de-normalization of data, which means that two or more related pieces of data will be stored in a single document. Frequently it is a choice for "contains" and "one-to-many" relationships between entities (documents).

Generally, embedding provides better read operation performance since data can be retrieved in a single database operation. In other words, embedding supports locality. If your application frequently accesses related data objects, the best performance can be achieved by putting them in a single document which is supported by the embedding model.

You should embed if

Examples
Embedded One-to-Many
Embedded One-to-One
Pre-Aggregated Report
Product Catalog
Tara Modeling Ontology Embedded


4. Referencing Model

Referencing model enables normalizatiom of data by storing references between documents to indicate relationship between the data stored in each document.

Main features of the referencing model include:

These are additional considerations for making design decisions with regards to references:

These are typical types of relationships that are usually implemented via references:

Examples
Referenced One-to-Many V1
Referenced One-to-Many V2
Tree with Child References
Tree with an Array of Ancestors
Inventory Management
Process


5. Embedding and Referencing Model Optimizations

5.1 Embedded One-to-N relationship where N is greater than one


5.1.1 N is less than couple of hundred by an assumption that documents are not large


5.1.1.1 Documents do not grow

Design an embedding model where N-side documents will be embedded via an array defined as a field in the "one"-side document.

Exammples

Basic One-to-Few
Embedded One-to-Many

Advantages

Disadvantages


5.1.1.2 Documents grow by either adding new fields or documents are updated frequently

Design a referencing model where a field in the one-side document is an array of references to N-side referenced documents.

Exammples

Basic One-to-Many

Advantages Disadvantages Additional Optimizations

lf you want to further optimize your model under assumption that your application pattern allows you to do so, you can also apply bi-directional (two-way) referencing in which case the N-side referenced documents can have a manual reference to the One-side document. You have to keep in mind that in this case you are paying the price of not having atomic updates.

Exammples

One-to-Many (Bi-directional)

Again, depending on your application patterns, you may also wish to de-normalize your model on either "N" or "One" side of the relationships in your model. This will eliminate the need to perform the application level joins in some cases. The following sample models demonstrate these techniques:

This kind of denormalization helps when there is a high ratio of reads to updates. Otherwise it could be counterproductive.


5.1.2 N is very large (tens of thousands/hundreds of thousands/millions)

Go with "parent" referencing.

Examples

Basic One-to-Squillions

If it is needed, you can also denormalize the "Basic One-to-Squilliones" example. You can either put information abou the "one" side into the "squillions" side (sample model Basic One-to-Squillions (Denormalizing One-Side)), or you can put information from the "squillions" side into the "one" side (sample model Basic One-to-Squillions (Denormalizing Squillions-Side)).


5.2 Some fields are embedded documents with large number of fields

There are more than few hundred (the critical number of the fields depends on the size of the fields) fields in the embedded document.

Transform the embedded document into an embedded document with fields of a document type that group large number of fields into groups of fields based on the application logic. This is an intra-document hierarchy.

Why do we have to do this?

MongoDB has a document size limit of 16 MB. MongoDB also stores BSON documents as a sequence of fields and values. When MongoDB writes to a document or updates a field in a document it has to read document sequentially in order to come to the fields to be updated or to add a field with its value to the document. If the document has many fields these sequental access to the document's fields will take longer. If you are dealing with large number of documents during some application operations it can add up to lots of time just spent on moving through the document to access specific fields or to add fileds to the document.

Examples

Pre-Aggregated Report shows how the 1440-field Minute embedded document was transformed into an embedded document of 24 sub- document fields where minutes are grouped by an hour.


5.3 Some fields are arrays with large number of scalar values or embedded documents

There are several hundred array elements but the critical number of array elements depends on the size of the elements.

Split up array into smaller arrays and use "to-be-continued" ("continuation") document to include all continued arrays.

Why do we have to do this?

MongoDB has a document size limit of 16 MB. MongoDB also stores BSON documents as a sequence of fields and values. When MongoDB writes to a document or updates a field in a document it has to read document sequentially in order to come to the fields to be updated or to add a field with its value to the document. If the document has many fields these sequental access to the document fields will take longer. If you are dealing with large number of documents during some application operations it can add up to lots of time just spent on moving through the document to access specific fields or to add fileds to the document.

Examples

Continuation Document


5.4 Array constantly grows or grows significantly at some point

Design a referencing model.

Why do we have to do this?

MongoDB will move documents to accommodate their new space requirements for enlarged array. Document moves are generally slow (because every index must be updated) and can also fragment space where the file with document's collection resides. If the array field is indexed, one document in the collection is responsible for a separate entry in that index for each and every element in its array. So inserting or deleting a document with a 100-element array, if that array is indexed, is like inserting or deleting 100 documents in terms of the amount of indexing work required. BSON data format manipulates documents with a linear memory scan, so that finding elements all the way at the end of a large array takes a long time, and most operations dealing with such a document would be slow.


5.5 Array has large rate of modifications

Design a referencing model.

Why do we have to do this?

Large arrays require relatively high CPU-overhead. It slows down insert/update/query array elements operations.


5.6 Documents in a single collection have different structure but with a common set of fields

Model a document with fields belonging to the common set of attributes and also fields included in a separate embedded document for each type of an object you want to represent.

Examples

Product Catalog