mongodb 索引

tech2023-09-28  92

mongodb 索引

In part 1 of this series we had an introduction to indexing in MongoDB. we saw how to create, use, and analyze queries with indexes giving us a good foundation to build on. In this part, we’ll take a look at a few more small but important concepts, like indexing on sub-documents and embedded fields, covered queries, and index direction.

在本系列的第1部分中 ,我们对MongoDB中的索引进行了介绍。 我们看到了如何使用索引创建,使用和分析查询,从而为我们提供了良好的基础。 在这一部分中,我们将研究一些更小但重要的概念,例如在子文档和嵌入式字段上建立索引,涵盖的查询以及索引方向。

Of course, this part assumes that you know how to create an index on a and use the explain() method to analyze it. If you don’t already know how, I suggest you go back and read part one before continuing here.

当然,这部分假设您知道如何在上创建索引并使用explain()方法进行分析。 如果您还不知道该怎么做,建议您先阅读第一部分,然后再继续此处。

We used a collection named posts in the last article. For our work here, let’s add a new field location to it to store the location from which the post was made. The field is a sub-document, and stores the city, state, and country of the user as shown below (a sub-document is a field having a document structure):

在上一篇posts中,我们使用了一个名为posts的集合。 对于此处的工作,让我们向其添加一个新的字段location ,以存储发布该帖子的位置。 该字段是子文档,并存储用户的城市,州和国家,如下所示(子文档是具有文档结构的字段):

{ "_id": ObjectId("5146bb52d852470060001f4"), "comments": { "0": "This is the first comment", "1": "This is the second comment" }, "post_likes": 40, "post_tags": { "0": "MongoDB", "1": "Tutorial", "2": "Indexing" }, "post_text": "Hello Readers!! This is my post text", "post_type": "private", "user_name": "Mark Anthony", "location": { "city": "Los Angeles", "state": "California", "country": "USA" } }

子文档索引 (Indexing on Sub-documents)

Suppose we want to search posts based on where the user lives. For this, we need to create an index on the sub-document location field, which in turn indexes the sub-fields. Then we’ll be able to use the index for the following kinds of queries:

假设我们要根据用户的住所来搜索帖子。 为此,我们需要在子文档location字段上创建索引,该索引又为子字段建立索引。 然后,我们可以将索引用于以下类型的查询:

<?php // query to find posts from the city of Los Angeles $cursor = $collection->find( array( "location" => "Los Angeles" ), array() ); // query to find posts from the state of California $cursor = $collection->find( array( "location" => "California" ), array() ); // query to find posts from the United States $cursor = $collection->find( array( "location" => "USA" ), array() );

We’re able to search all of the sub-fields (city, state, and country) in the sub-document using only location as the key. The query looks to see if any of the sub-fields of location meet our search criteria.

我们可以仅使用location作为关键字来搜索子文档中的所有子字段( city , state和country )。 该查询将查看location的任何子字段是否满足我们的搜索条件。

It should be noted that, similar to indexing on arrays, separate indexes are created for all the of the sub-fields internally. In this case, three indexes are created as location.city, location.state and location.country, hence such indexes should be used with care since each index occupies space in memory.

应当指出,类似于在数组上建立索引,内部会为所有子字段创建单独的索引。 在这种情况下,将创建三个索引作为location.city , location.state和location.country ,因此应谨慎使用此类索引,因为每个索引都占用内存空间。

嵌入字段的索引 (Indexing on Embedded Fields)

It will happen sometimes that we won’t need indexes on all of the fields of a sub-document. If in our application we only want to find posts based on city but not state or country, we can create the index on the embedded field city.

有时,我们将不需要子文档的所有字段上的索引。 如果在我们的应用程序中我们只想查找基于城市而不是州或国家的帖子,则可以在嵌入式字段city上创建索引。

We can now use this index in queries to find posts based on city:

现在,我们可以在查询中使用此索引来查找基于城市的帖子:

<?php // query to find posts from the city of Los Angeles $cursor = $collection->find( array( "location.city" => "Los Angeles" ), array() );

索引方向(升/降) (Index Direction (Ascending/Descending))

We’ve always provide an index direction (1 or -1) to keys when creating our indexes. I touched on this briefly in part 1, but this is actually an important discussion point I’d like like to pick up again. If we have one key in the index, direction 1 or -1 doesn’t really matter, but it comes into play when we do sorting or ranged queries with compound indexes.

创建索引时,我们总是向键提供索引方向(1或-1)。 我在第1部分中简要地谈到了这一点,但这实际上是我想再次讨论的重要讨论点。 如果索引中有一个键,那么方向1或-1并不重要,但是当我们对复合索引进行排序或范围查询时,它就会起作用。

Suppose we have a compound index with key field1 ascending and key field2 descending. In this case, the indexing table may look like this:

假设我们有一个复合索引,其中key field1升序,key field2降序。 在这种情况下,索引表可能如下所示:

A query with sorting on field1 ascending and field2 ascending will travel rows in this order: 1, 2, 3, 4, 5, 6, 7, 8, 9. A query with field1 ascending and field2 descending will travel: 3, 2, 1, 6, 5, 4, 9, 8, 7. Such out-of-order jumps in the search tree can end up being costly to query performance.

以field1升序和field2升序排序的查询将按以下顺序行:1、2、3、4、5、6、7、8、9。以field1升序且field2降序进行查询的行:3、2。 1、6、5、4、9、8、7。搜索树中的此类乱序跳转最终可能会增加查询性能的代价。

Of course index structure above is represented as a table just for the purposes of understanding. Remember, MongoDB uses tree structures internally; each element is stored as a node of a tree. The elements closer to each other would be under the same branches, and hence easily approachable. If a query has to retrieve multiple records in a sorted manner, it would be logically correct to place the elements near each other in the tree for faster retrieval in comparison to the case where the query has to jump from one node to another far node to grab the elements.

当然,出于理解的目的,以上索引结构被表示为表格。 记住,MongoDB在内部使用树结构。 每个元素都存储为树的节点。 彼此靠近的元素将在同一分支下,因此很容易接近。 如果查询必须以排序的方式检索多个记录,则与查询必须从一个节点跳转到另一个远端节点以加快检索速度的情况相比,将元素在树中彼此靠近放置在逻辑上是正确的。抢元素。

If you are looking to sort on field1:1,field2:1, then index {field1:1, field2:1} would be faster than {field1:1, field2:-1} or {field1:-1, field2:1}.

如果要对field1:1,field2:1进行排序,则索引{field1:1, field2:1}会比{field1:1, field2:-1}或{field1:-1, field2:1} 。

涵盖查询 (Covered Queries)

As per MongoDB’s documentation, a covered query is the one in which:

根据MongoDB的文档,涵盖的查询是以下查询:

all fields used in the query are part of an index used in the query, and

查询中使用的所有字段都是查询中使用的索引的一部分,并且 all the fields returned in the results are in the same index

结果中返回的所有字段都在同一索引中

Since all the fields are covered in the index itself, MongoDB can match the query condition as well as return the result fields using the same index without looking inside the documents. Since indexes are stored in RAM or sequentially located on disk, such access is a lot faster.

由于索引本身涵盖了所有字段,因此MongoDB可以匹配查询条件,并使用相同的索引返回结果字段,而无需查看文档内部。 由于索引存储在RAM中或顺序位于磁盘上,因此这种访问要快得多。

Consider we have a compound index defined on the post_type and user_name fields. This index covers the following query:

考虑我们在post_type和user_name字段上定义了一个复合索引。 该索引涵盖以下查询:

<?php // query to find posts with type public and get only user_name in result $cursor = $collection->find( array( "post_type" => "public", ), array( "user_name" => 1, "_id" => 0 ) );

We’ve explicitly excluded the _id field from the result to take advantage of the covered query. As you may already know, all queries return the _id field by default. As per the second condition for covered queries, all the fields returned in the result should be included in the index. We don’t have _id in our compound index on post_type and user_name, so we have to exclude this field from the result.

我们已从结果中明确排除_id字段,以利用覆盖的查询。 您可能已经知道,所有查询默认都会返回_id字段。 根据涵盖查询的第二个条件,结果中返回的所有字段都应包含在索引中。 我们在post_type和user_name复合索引中没有_id ,因此我们必须从结果中排除此字段。

To check if the query is covered, we can look to the indexOnly field in the result of the explain() method. A true value of indicates that ours was a covered query.

为了检查查询是否被覆盖,我们可以在explain()方法的结果中查找indexOnly字段。 值为true表示我们是一个涵盖的查询。

It’s important to know that an index can’t cover a query if:

重要的是要知道在以下情况下索引不能覆盖查询:

any of the indexed fields is an array (e.g. post_tags), or

任何索引字段都是一个数组(例如post_tags ),或者

any of the indexed fields are fields in sub-documents (e.g. location.city)

任何索引字段都是子文档中的字段(例如location.city )

Thus, it’s always a good practice to check your query index usage with explain().

因此,始终最好使用explain()检查查询索引的使用情况。

删除索引 (Removing Indexes)

To check the current index size for a database, we can use the totalIndexSize() method which returns the index size in bytes.

要检查数据库的当前索引大小,我们可以使用totalIndexSize()方法,该方法以字节为单位返回索引大小。

We just have to ensure that we have enough RAM available to accommodate indexes as well as the data that MongoDB manages and uses regularly.

我们只需要确保有足够的RAM来容纳索引以及MongoDB定期管理和使用的数据。

To delete an existing index, and thus free up resources, we use the dropIndex() method.

要删除现有索引,从而释放资源,我们使用dropIndex()方法。

结论 (Conclusion)

That’s all for this part and also the series. We’ve touched on a lot of important topics to get you up to speed with indexing in MongoDB.

这就是本部分以及本系列的全部内容。 我们接触了许多重要的主题,以帮助您快速入门MongoDB中的索引编制。

Analyzing your indexes to make sure they are doing well is an on-going process as your application grows and your data changes, so if you have any kind questions or comments about the article, feel free to post in the comments below.

随着应用程序的增长和数据的变化,分析索引以确保它们运行良好是一个持续的过程,因此,如果您对本文有任何疑问或意见,请随时在下面的意见中发表。

Image via Fotolia

图片来自Fotolia

翻译自: https://www.sitepoint.com/mongodb-indexing-2/

mongodb 索引

最新回复(0)