It’s important to consider the data needs of your application right from the very start of development. But if your app will be using NoSQL and you come from a RDBMS/SQL background, then you might think looking at data in terms of NoSQL might be difficult. This article will help you by showing you how some of the basic data modeling concepts apply in the realm of NoSQL.
从开发之初就考虑应用程序的数据需求非常重要。 但是,如果您的应用程序将使用NoSQL,并且您来自RDBMS / SQL背景,那么您可能会认为,以NoSQL来看数据可能很困难。 本文将通过向您展示一些基本数据建模概念如何在NoSQL领域中应用来为您提供帮助。
I’ll be using MongoDB for our discussion as it is one of the leading open-source NoSQL databases due to its simplicity, performance, scalability, and active user base. Of course the article assumes that you are aware of basic MongoDB concepts (like collections and documents). If not, I suggest you read some of the previous articles here at SitePoint to get started with MongoDB.
我将使用MongoDB进行讨论,因为它是简单的,性能,可伸缩性和活跃用户群的领先的开源NoSQL数据库之一。 当然,本文假设您了解MongoDB的基本概念(例如集合和文档)。 如果没有,我建议您在SitePoint上阅读以前的一些文章,以开始使用MongoDB。
Relationships show how your MongoDB documents are related to each other. To understand the different ways in which we can organize our documents, let’s look at the possible relationships.
关系显示了您的MongoDB文档如何相互关联。 为了了解组织文档的不同方式,让我们看一下可能的关系。
A 1:1 relationship exists when one object of an entity is related to one and only one object of another entity. For example, one user can have one and only one birth date. So if we have a document which stores user information and another document which stores birth dates, there will be a 1:1 relationship between them.
当一个实体的一个对象与另一个实体的一个对象相关时,则存在1:1关系。 例如,一个用户可以只有一个生日。 因此,如果我们有一个存储用户信息的文档和另一个存储出生日期的文档,那么它们之间将存在1:1的关系。
Such relationship exists when one object of an entity can be related to many objects of another entity. For example, there can be 1:N relationship between a user and his contact numbers since it is possible for one user to have more than one number.
当一个实体的一个对象可以与另一个实体的许多对象相关时,就存在这种关系。 例如,一个用户与他的联系号码之间可能存在1:N的关系,因为一个用户可能拥有一个以上的号码。
Such relationship exists when one object of an entity is related to many objects of another entity and vice versa. If we correlate it to the case of users and items bought by them, one user can purchase more than one items as well as one item can be purchased by more than one user.
当实体的一个对象与另一实体的许多对象相关时,则存在这种关系,反之亦然。 如果我们将其与用户和他们购买的商品的情况相关联,则一个用户可以购买多个商品,而一个商品可以由多个用户购买。
Consider the following example where we need to store address information for each user (for now let’s assume that there’s a single address for each user). In this case, we can design an embedded document having the following structure:
考虑下面的示例,在该示例中,我们需要存储每个用户的地址信息(现在,我们假设每个用户有一个地址)。 在这种情况下,我们可以设计具有以下结构的嵌入式文档:
{ "_id": ObjectId("5146bb52d8524270060001f4"), "user_name": "Mark Benzamin" "dob": "12 Jan 1991", "address": { "flat_name": "305, Sunrise Park", "street": "Cold Pool Street", "city": "Los Angeles" } }We have the address entity embedded within the user entity making all of the information present in a single document. This means we can find and retrieve everything with a single query.
我们将address实体嵌入到用户实体中,使所有信息都出现在单个文档中。 这意味着我们可以通过一个查询找到并检索所有内容。
<?php // query to find user 'Mark Benzamin' and his address $cursor = $collection->find( array("user_name" => "Mark Benzamin"), array("user_name" => 1,"address" => 1) );Embedding documents is roughly analogous to de-normalization and is useful when there is a “contains” relationship between two entities. That is, one document can be stored within another, thus placing the related pieces of information within a single document. As all the information is available in one document, such approach has better read performance because a query operation within a document is less expensive for the server and we find and retrieve the related data in the same query.
嵌入文档大致类似于反规范化,并且在两个实体之间存在“包含”关系时很有用。 也就是说,一个文档可以存储在另一个文档中,从而将相关信息放置在单个文档中。 由于所有信息都可以在一个文档中获得,因此这种方法具有更好的读取性能,因为文档中的查询操作对服务器而言较便宜,并且我们在同一查询中查找和检索相关数据。
In contrast, a normalized approach would call for two documents (ideally in separate collections), one to store basic user information and another to store address information. The second document would contain a user_id field indicate the user to which the address belongs.
相比之下,标准化方法将需要两个文档(最好是在单独的集合中),一个文档用于存储基本用户信息,另一个文档用于存储地址信息。 第二个文档将包含一个user_id字段,指示该地址所属的用户。
{ "_id": ObjectId("5146bb52d8524270060001f4"), "user_name": "Mark Benzamin", "dob": "12 Jan 1991" } { "_id": ObjectId("5146bb52d852427006000de4"), "user_id": ObjectId("5146bb52d8524270060001f4"), "flat_name": "305, Sunrise Park", "street": "Cold Pool Street", "city": "Los Angeles" }We now need to execute two queries to fetch the same data:
现在,我们需要执行两个查询以获取相同的数据:
<?php // query to find user information $user = $collection->findOne( array("user_name" => "Mark Benzamin"), array("_id" => 1, "user_name" => 1) ); // query to find address corresponding to user $address = $collection->findOne( array("user_id" => $user["_id"]), array("flat_name" => 1, "street" => 1, "city" => 1) );The first query fetches the _id of the user which is then used in the second query to retrieve his address information.
第一个查询获取用户的_id ,然后在第二个查询中将其用于检索其地址信息。
The embedding approach makes more sense than the referencing approach in this case since we are frequently retrieving user_name and address together. What approach you should use ultimately depends on how you logically connect your entities and what data you need to retrieve from the database.
在这种情况下,嵌入方法比引用方法更有意义,因为我们经常一起检索user_name和address 。 最终应使用哪种方法取决于您在逻辑上如何连接实体以及需要从数据库检索哪些数据。
Now let’s consider the case where one user can have multiple addresses. If all of the addresses should be retrieved along with the basic user information, it would be ideal to embed the address entities inside the user entity.
现在让我们考虑一个用户可以有多个地址的情况。 如果应该与基本用户信息一起检索所有地址,则将地址实体嵌入用户实体内部将是理想的。
{ "_id": ObjectId("5146bb52d8524270060001f4"), "user_name": "Mark Benzamin" "address": [ { "flat_name": "305, Sunrise Park", "street": "Cold Pool Street", "city": "Los Angeles" }, { "flat_name": "703, Sunset Park", "street": "Hot Fudge Street", "city": "Chicago" } ] }We’re still able to fetch all of the required information with a single query. The referencing/normalized approach would have us design three documents (one user, two addresses) and two queries to accomplish the same task. Besides efficiency and convenience, we should use the embedded approach in instances where we need atomicity in operations. Since any updates happen within the same document, atomicity is always insured.
我们仍然可以通过一个查询来获取所有必需的信息。 引用/规范化方法将使我们设计三个文档(一个用户,两个地址)和两个查询以完成同一任务。 除了效率和便利性,我们还应该在需要操作原子性的情况下使用嵌入式方法。 由于任何更新都发生在同一文档中,因此始终确保原子性。
Bear in mind that embedded documents can continue to grow in size over the life of an application which can badly impact write performance. There’s also a limit of 16MB on the maximum size of each document. A normalized approach is preferred if the embedded documents would be too large, when the embedding approach would result in a lot of duplicate data, or if you need to model complex or hierarchical relationships between documents.
请记住,嵌入式文档在应用程序的生命周期中可能会继续增长,这可能严重影响写入性能。 每个文档的最大大小上限为16MB。 如果嵌入的文档太大,嵌入的方法会导致大量重复数据,或者需要对文档之间的复杂关系或层次关系进行建模,则首选标准化方法。
Consider the example of maintaining posts made by a user. Let’s assume that we want the user’s name and his profile picture with every post (similar to a Facebook post where we can see a name and profile picture with every post). The denormalized approach would store the user information in each post document:
考虑维护用户发表的帖子的示例。 假设我们希望每个帖子都带有用户名和个人资料图片(类似于Facebook帖子,我们可以在每个帖子中看到姓名和个人资料图片)。 非规范化方法会将用户信息存储在每个后置文档中:
{ "_id": ObjectId("5146bb52d8524270060001f7"), "post_text": "This is my demo post 1", "post_likes_count": 12, "user": { "user_name": "Mark Benzamin", "profile_pic": "markbenzamin.jpg" } } { "_id": ObjectId("5146bb52d8524270060001f8"), "post_text": "This is my demo post 2", "post_likes_count": 32, "user": { "user_name": "Mark Benzamin", "profile_pic": "markbenzamin.jpg" } }We can see this approach stores redundant information in each post document. Looking a bit forward, if the user name or profile picture was ever changed, we would have to update the appropriate field in all of the corresponding posts. So the ideal approach would be to normalize the information and connect it via references.
我们可以看到这种方法在每个后期文档中都存储了冗余信息。 展望未来,如果用户名或个人资料图片已更改,我们将必须更新所有相应帖子中的相应字段。 因此,理想的方法是将信息标准化并通过引用将其连接。
{ "_id": ObjectId("5146bb52d852427006000121"), "user_name": "Mark Benzamin", "profile_pic": "markbenzamin.jpg" } { "_id": ObjectId("5146bb52d8524270060001f7"), "post_text": "This is my demo post 1", "post_likes_count": 12, "user_id": ObjectId("5146bb52d852427006000121") } { "_id": ObjectId("5146bb52d8524270060001f8"), "post_text": "This is my demo post 2", "post_likes_count": 32, "user_id": ObjectId("5146bb52d852427006000121") }The user_id field in the post document contains a reference to the user document. Thus, we can fetch posts made by the user using two queries as follows:
发布文档中的user_id字段包含对用户文档的引用。 因此,我们可以使用以下两个查询来获取用户发表的帖子:
<?php $user = $collection->findOne( array("user_name" => "Mark Benzamin"), array("_id" => 1, "user_name" => 1, "profile_pic" => 1) ); $posts = $collection->find( array("user_id" => $user["_id"]) );Let’s take our previous example of storing users and the items they purchased (ideally in separate collections) and design referenced documents to illustrate the M:N relationship. Assume the collection which store documents for user information is as follows such that each document contains reference IDs for the list of items purchased by the user.
让我们以前面的示例为例,该示例存储用户及其购买的商品(理想情况下位于单独的集合中),并设计参考文档以说明M:N关系。 假设存储文档以供用户信息的集合如下,以使每个文档都包含用户购买的物品清单的参考ID。
{ "_id": "user1", "items_purchased": { "0": "item1", "1": "item2" } } { "_id": "user2", "items_purchased": { "0": "item2", "1": "item3" } }Similarly, assume another collection storing the documents for the available items. These documents will in turn store reference IDs for the list of users who have purchased it.
同样,假设另一个集合存储可用项目的文档。 这些文档将依次存储已购买用户列表的参考ID。
{ "_id": "item1", "purchased_by": { "0": "user1" } } { "_id": "item2", "purchased_by": { "0": "user1", "1": "user2" } } { "_id": "item3", "purchased_by": { "0": "user2" } }To fetch all the items purchased by a user, we would write the following query:
要获取用户购买的所有商品,我们将编写以下查询:
<?php // query to find items purchased by a user $items = $collection->find( array("_id" => "user1"), array("items_purchased" => 1) );The above query will return the IDs of all the items purchased by user1. We can later use these to fetch the corresponding item information.
上面的查询将返回user1购买的所有商品的ID。 以后我们可以使用它们来获取相应的项目信息。
Alternatively, if we wanted to fetch the users who have purchased a specific item, we would write the following:
或者,如果我们想获取购买了特定商品的用户,则可以编写以下内容:
<?php // query to find users who have purchased an item $users = $collection->find( array("_id" => "item1"), array("purchased_by" => 1) );The above query returns the IDs of all the users who have purchased item1. We can later use these IDs to fetch the corresponding user information.
上面的查询返回已购买item1的所有用户的ID。 以后我们可以使用这些ID来获取相应的用户信息。
This example demonstrated M:N relationships which are very useful in some cases. However, you should keep in mind that many times such relationships can be handled using 1:N relationships along with some smart queries. This reduces the amount of data to be maintained in both documents.
该示例说明了M:N关系,在某些情况下非常有用。 但是,请记住,很多时候可以使用1:N关系以及一些智能查询来处理这种关系。 这减少了两个文档中要维护的数据量。
That’s it for this article. We’ve learned about some basic modeling concepts which will surely help you head start your own data modeling: 1-to-1, 1-to-many, and many-to-many relationships, and also a little about data normalization and de-normalization. You should be able to easily apply these concepts to your own application’s modeling needs. If you have any questions or comments related to the article, feel free to share in the comments section below.
本文就是这样。 我们已经学习了一些基本的建模概念,这些概念肯定会帮助您开始自己的数据建模:一对一,一对多和多对多关系,以及有关数据规范化和数据删除的一些知识。 -正常化。 您应该能够轻松地将这些概念应用于自己的应用程序的建模需求。 如果您对本文有任何疑问或评论,请随时在下面的评论部分中分享。
翻译自: https://www.sitepoint.com/modeling-data-relationships-in-mongodb/
相关资源:jdk-8u281-windows-x64.exe