It is important to consider the data requirements of the application from the beginning of development. However, if your application will use NoSQL and you are from an RDBMS/SQL background, you may think it may be difficult to view data based on NoSQL. This article will help you by showing you how some basic data modeling concepts can be applied to the NoSQL domain.
I will use MongoDB for discussion because it is one of the leading open source NoSQL databases because of its simplicity, performance, scalability and active user base. Of course, this article assumes that you understand basic MongoDB concepts such as collections and documents. If not, I suggest you read some previous articles on SitePoint to get started with MongoDB.
Understanding relationships
Relationships show how your MongoDB documents relate to each other. To understand the different ways documents are organized, let's look at possible relationships.
One to one relationship (1:1)
When an object of an entity is related to one and only one object of another entity, a 1:1 relationship exists. For example, a user can have one and only one birth date. Therefore, if we have a document storing user information and another document storing birth date, there will be a 1:1 relationship between them.
One to many relationship (1:N)
This relationship exists when an object of one entity can be related to many objects of another entity. For example, there may be a 1:N relationship between a user and his contact number, because a user may have multiple numbers.
Many to many relationship (M:N)
This relationship exists when an object of one entity is related to multiple objects of another entity, and vice versa. If we associate it with users and the items they buy, a user can buy more than one item, and an item can be purchased by multiple users.
Modeling one-to-one relationships (1:1)
Considering the following example, we need to store address information for each user (now assume that each user has an address). In this case, we can design an embedded document with the following structure:
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d8524270060001f4"), "user_name": "Mark Benzamin" "dob": "12 Jan 1991", "address": { "flat_name": "305, Sunrise Park", "street": "Cold Pool Street", "city": "Los Angeles" } }</code></span></span>
We embed the address entity into the user entity so that all information exists in a single document. This means that we can find and retrieve all content through a single query.
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-"><?php // query to find user 'Mark Benzamin' and his address $cursor = $collection->find( array("user_name" => "Mark Benzamin"), array("user_name" => 1,"address" => 1) );</code></span></span>
Embedding a document is roughly similar to denormalization and is useful when there is an "include" relationship between two entities. That is, a document can be stored in another document, so that relevant pieces of information can be placed in a single document. Because all information is available in one document, this method has better reading performance, because the query operation in the document is less expensive for the server, and we can find and retrieve relevant data in the same query.
In contrast, the normalization method requires two documents (preferably in a separate collection), one for storing basic user information and the other for storing address information. The second document will contain a user_id field, indicating the user to which the address belongs.
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d8524270060001f4"), "user_name": "Mark Benzamin", "dob": "12 Jan 1991" }</code></span></span>
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d852427006000de4"), "user_id": ObjectId("5146bb52d8524270060001f4"), "flat_name": "305, Sunrise Park", "street": "Cold Pool Street", "city": "Los Angeles" }</code></span></span>
We now need to execute two queries to get the same data:
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-"><?php // query to find user information $user = $collection->findOne( array("user_name" => "Mark Benzamin"), array("_id" => 1, "user_name" => 1) ); // query to find address corresponding to user $address = $collection->findOne( array("user_id" => $user["_id"]), array("flat_name" => 1, "street" => 1, "city" => 1) );</code></span></span>
First query get_ The id of the user is then used to retrieve his address information in the second query.
Embedding methods makes more sense than referencing methods in this case because we often retrieve users_ Name and address are together. Which method you should eventually use depends on how you logically connect entities and what data you need to retrieve from the database.
Modeling embedded one to many relationships (1:N)
Now let's consider the case where a user can have multiple addresses. If all addresses should be retrieved together with basic user information, it would be ideal to embed the address entity into the user entity.
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d8524270060001f4"), "user_name": "Mark Benzamin" "address": [ { "flat_name": "305, Sunrise Park", "street": "Cold Pool Street", "city": "Los Angeles" }, { "flat_name": "703, Sunset Park", "street": "Hot Fudge Street", "city": "Chicago" } ] }</code></span></span>
We can still get all the necessary information through a single query. The reference / normalization method allows us to design three documents (one user, two addresses) and two queries to complete the same task.
In addition to efficiency and convenience, we should use embedded methods in instances where atomicity needs to be manipulated. Since any updates occur in the same document, atomicity is always guaranteed.
One to many relationship of modeling reference (1: N)
Remember that the size of embedded documents may continue to grow throughout the life cycle of the application, which may seriously affect write performance. The maximum size of each document is also limited to 16MB. If the embedded document is too large, the embedding method will lead to a large amount of duplicate data, or if you need to model the complex or hierarchical relationship between documents, the normalization method is preferred.
Consider maintaining an example of a post posted by a user. Suppose we want each post to have the user's name and his profile picture (similar to Facebook posts, we can see the name and profile picture in each post). The denormalization method stores user information in each publication document:
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d8524270060001f7"), "post_text": "This is my demo post 1", "post_likes_count": 12, "user": { "user_name": "Mark Benzamin", "profile_pic": "markbenzamin.jpg" } } { "_id": ObjectId("5146bb52d8524270060001f8"), "post_text": "This is my demo post 2", "post_likes_count": 32, "user": { "user_name": "Mark Benzamin", "profile_pic": "markbenzamin.jpg" } }</code></span></span>
We can see that this method stores redundant information in each post document. Looking ahead, if the user name or profile picture has changed, we will have to update the corresponding fields in all corresponding posts.
Therefore, the ideal approach is to standardize the information and connect it by reference.
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d852427006000121"), "user_name": "Mark Benzamin", "profile_pic": "markbenzamin.jpg" }</code></span></span>
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": ObjectId("5146bb52d8524270060001f7"), "post_text": "This is my demo post 1", "post_likes_count": 12, "user_id": ObjectId("5146bb52d852427006000121") } { "_id": ObjectId("5146bb52d8524270060001f8"), "post_text": "This is my demo post 2", "post_likes_count": 32, "user_id": ObjectId("5146bb52d852427006000121") }</code></span></span>
user_ The field in the ID publishing document contains a reference to the user document. Therefore, we can use the following two queries to get Posts published by users:
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-"><?php $user = $collection->findOne( array("user_name" => "Mark Benzamin"), array("_id" => 1, "user_name" => 1, "profile_pic" => 1) ); $posts = $collection->find( array("user_id" => $user["_id"]) );</code></span></span>
Many to many relationship modeling (M:N)
Let's take the previous example as an example, store users and their purchased items (preferably in a separate collection) and design reference documents to illustrate the M:N relationship. Suppose that the collection of documents storing user information is as follows, and each document contains the reference ID of the list of goods purchased by the user.
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": "user1", "items_purchased": { "0": "item1", "1": "item2" } } { "_id": "user2", "items_purchased": { "0": "item2", "1": "item3" } }</code></span></span>
Similarly, suppose another collection stores documents for available items. These documents, in turn, store the reference ID of the list of users who purchased it.
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-">{ "_id": "item1", "purchased_by": { "0": "user1" } } { "_id": "item2", "purchased_by": { "0": "user1", "1": "user2" } } { "_id": "item3", "purchased_by": { "0": "user2" } }</code></span></span>
To get all the goods purchased by users, we will write the following query:
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-"><?php // query to find items purchased by a user $items = $collection->find( array("_id" => "user1"), array("items_purchased" => 1) );</code></span></span>
The above query will return the ID s of all products purchased by user1. We can use these later to get the corresponding product information.
Or, if we want to get users who have purchased specific goods, we will write the following:
<span style="background-color:#f9f9fa"><span style="color:#000000"><code class="language-"><?php // query to find users who have purchased an item $users = $collection->find( array("_id" => "item1"), array("purchased_by" => 1) );</code></span></span>
The above query returns the IDs of all users who have purchased item1. We can use these IDS later to get the corresponding user information.
This example shows the M:N relationship that is very useful in some cases. However, you should remember that many times you can use 1:N relationships and some intelligent queries to deal with such relationships. This reduces the amount of data to maintain in both documents.
conclusion
This is this article. We have learned some basic modeling concepts, which will certainly help you start your own data modeling: 1-to-1, 1-to-many and many to many relationships, as well as some knowledge about data normalization and data modeling- Normalization. You should be able to easily apply these concepts to the modeling requirements of your own applications. If you have any questions or comments about this article, please feel free to share them in the comments section below.