MongoDB Schema Design Patterns: A Guide to Flexible and Scalable Data Modelling
MongoDB Schema Design Patterns with examples to design efficient document data models or schema for your application
Overview
MongoDB, a leading NoSQL document database, offers unparalleled flexibility compared to traditional relational databases. However, this flexibility doesn't mean a lack of structure. Effective schema design is crucial for building high-performing, scalable, and maintainable MongoDB applications. This article explores common MongoDB schema design patterns, providing examples to illustrate their application.
Understanding MongoDB's Document Model
Before diving into patterns, it's essential to grasp MongoDB's core concept: the BSON document. Documents are JSON-like structures that can contain embedded documents and arrays. This enables rich, hierarchical data representation within a single record, often eliminating the need for complex joins typically found in relational databases.
Key considerations in MongoDB schema design include:
Data Locality: Keeping related data together often leads to improved performance, as fewer queries are required.
Atomicity of Operations: Ensuring that a single write operation affects all necessary data.
Application Access Patterns: How your application queries and updates data is paramount.
Scalability: Designing for growth and distributed environments.
To elaborate on the schema design patterns in detail, let’s consider an example of a particular Case Management Software.
Common Schema Design Patterns: MongoDB
Below are the most common schema design patterns used or adopted when creating document models or schemas in the MongoDB datastore.
Inheritance Pattern
With the Inheritance Pattern for data modeling in MongoDB, the document could be modeled to have common fields describing the inheritance, with each document having other fields describing the document details.
Here, let’s consider the Conversation example concerning Case Management. The picture below shows what different conversations a tech-support case could have and how those could be modeled in a MongoDB document using the Inheritance Pattern.
Here, Chat, Audio, and Survey are different forms of Conversation with the conversation_type field, indicating it’s a type of Conversation, having other different fields describing each Conversation type.
Single Collection Pattern
To elaborate on the Single Collection Pattern, considering the Case Management example. Considering the Domain Driven Design Model, where Case Management is the bounded context, and below are the domain models related to the bounded context.
User
Case Document
Conversation
Case
When storing data in MongoDB for this particular use, one could consider creating separate collections for each document model. Like, the Conversations collection could have all conversation-related documents, and the same for all other domain models mentioned above. Now, as the scope of the use case increases, there could be more domain models introduced. Having different collections could lead to performance bottlenecks moving forward; it is better to have fewer collections and have related data stored together, as in this use case.
Putting It all together in a single collection
Let’s create a Case Management collection by adding the domain models mentioned above as documents, each having a doc_type field to help differentiate each of them.
Schema Versioning Pattern
Considering the above example. There are some of the client applications that access this data through the API. Suppose one of the client applications is integrated with the Users API, pulling user-related information. To fit certain use cases, it is required to change the API contract and the schema or data model for the User. In the current example, the Case Management Software needs to support subscriptions for the users, differentiating each user based on the subscription they have opted for.
Now, if there is a change to the API contract and the underlying data model in the datastore. The client application may have to accommodate the changes to process the user data. To enable the client application(s) to keep using the older version of the data until it’s ready to consume and process the new version. It would be logical to version the API and the schema for the document in the MongoDB datastore.
Versioning User Document Schema
With this change client application can continue to consume and process user data with version 1 API it would get the documents with schema_version as 1, as the query now could filter documents with schema_version.
Extended Reference Pattern
The Extended Reference Pattern is a form of denormalization used to improve read performance by reducing the need for costly lookups (similar to SQL JOIN) operations.
Considering the Case Management example(as mentioned above), A support case could have case-related documents. Let’s assume a case is related to the user KYC. To model this in the datastore, usually we would consider having a separate document for the case-document and the case with their doc_types, identifying the type of document. If a client application wants to query the datastore for a KYC type of case and its associated documents, it would need to look up multiple doc types.
To avoid client queries from multiple lookups, one could add an extended reference to the most frequently accessed fields inside the case document. With this approach, a single query would fetch the details of the case and the required data for the case documents
Example: KYC case document with case-document reference
{
"id": "id-case-kyc",
"case_type": "KYC",
"doc_type": "case",
"case_details": {
"problem": "abc issue",
"agent_assigned": "test-agent"
},
"case_documents": {
"kyc_documents": [
{
"id": "docid_1",
"name": "abc_doc",
"download_link": "link"
},
{
"id": "docid_2",
"name": "xyz_doc",
"download_link": "link"
}
]
},
"created_at": "date",
"update_at": "date"
}
Conclusion
MongoDB's flexible document model empowers developers to design highly efficient and scalable databases. By understanding and strategically applying schema design patterns, such as embedding, referencing, subsetting, bucketing, and the attribute pattern, you can unlock the full potential of MongoDB and build robust applications that cater to your unique data access patterns and growth needs. Always iterate and refine your schema as your application evolves, using MongoDB's schema-less nature to your advantage.