mongodb 索引

tech2023-09-26  98

mongodb 索引

Indexing is one of the more important concepts of working with MongoDB. A proper understanding is critical because indexing can dramatically increase performance and throughput by reducing the number of full documents to be read, thereby increasing the performance of our application. Because indexes can be bit difficult to understand, this two-part series will take a closer look at them.

索引是使用MongoDB的更重要概念之一。 正确理解是至关重要的,因为索引可以通过减少要读取的完整文档数来显着提高性能和吞吐量,从而提高应用程序的性能。 因为索引可能有点难以理解,所以这个由两部分组成的系列将对它们进行仔细的研究。

In this article we’ll explore the following five types of indexes:

在本文中,我们将探讨以下五种类型的索引:

Default _id Index

默认_id索引 Secondary Index

次要指标 Compound Index

复合指数 Multikey Index

多键索引 Multikey Compound Index

多键复合索引

There are some other types too to discuss, but I’ve logically kept them for part 2 to provide a clear understanding and avoid any confusion.

还有其他一些类型需要讨论,但是我在逻辑上将它们保留在第2部分中,以提供清晰的理解并避免任何混淆。

Although more than one index can be defined on a collection, a query can only use one index during its execution. The decision of choosing the best index out of the available options is made at runtime by MongoDB’s query-optimizer.

尽管可以在一个集合上定义多个索引,但是查询在执行期间只能使用一个索引。 在运行时由MongoDB的查询优化器做出选择最佳索引的决定。

This article assumes that you have a basic understanding of MongoDB concepts (like Collections, Documents, etc.) and performing basic queries using PHP (like find and insert). If not, I suggest you to read our beginner articles: Introduction to MongoDB and MongoDB Revisited.

本文假设您对MongoDB概念(如Collections,Documents等)有基本了解,并使用PHP执行基本查询(如find和insert)。 如果没有,我建议您阅读我们的初学者文章: MongoDB简介和MongoDB Revisited 。

For the series we’ll assume we have a collection named posts populated with 500 documents having the following structure:

对于本系列,我们假设我们有一个名为posts的集合,其中填充了500个文档,其结构如下:

{ "_id": ObjectId("5146bb52d852470060001f4"), "comments": { "0": "This is the first comment", "1": "This is the second comment" }, "post_likes": 40, "post_tags": { "0": "MongoDB", "1": "Tutorial", "2": "Indexing" }, "post_text": "Hello Readers!! This is my post text", "post_type": "private", "user_name": "Mark Anthony" }

Now, let’s explore various types of indexing in detail.

现在,让我们详细探讨各种类型的索引编制。

默认_id索引 (Default _id Index)

By default, MongoDB creates a default index on the _id field for each collection. Each document has a unique _id field as a primary key, a 12-byte ObjectID. When there are no other any indexes available, this is used by default for all kinds of queries.

默认情况下,MongoDB在_id字段上为每个集合创建一个默认索引。 每个文档都有一个唯一的_id字段作为主键,即12字节的ObjectID。 如果没有其他可用的索引,则默认情况下将其用于所有类型的查询。

To view the indexes for a collection, open the MongoDB shell and do the following:

要查看集合的索引,请打开MongoDB shell,然后执行以下操作:

The getIndexes() method returns all of the indexes for our collection. As you can see, we have the default index with name _id_. The key field indicates that the index is on the _id field, and the value of 1 indicates an ascending order. We’ll learn about ordering in next section.

getIndexes()方法返回我们集合的所有索引。 如您所见,我们拥有名称为_id_的默认索引。 key字段指示索引在_id字段上,值1指示升序。 我们将在下一节中了解有关订购的信息。

次要指标 (Secondary Index)

For cases where we want to use indexing on fields other than _id field, we have to define custom indexes. Suppose we want to search for posts based on the user_name field. In this case, we’ll define a custom index on the user_name field of the collection. Such custom indexes, other than the default index, are called secondary indexes.

对于要在_id字段以外的字段上使用索引的情况,我们必须定义自定义索引。 假设我们要基于user_name字段搜索帖子。 在这种情况下,我们将在集合的user_name字段上定义一个自定义索引。 除默认索引外,此类自定义索引称为辅助索引。

To demonstrate the effect of indexing on database, let’s briefly analyze query performance without indexing first. For this, we’ll execute a query to find all posts having a user_name with “Jim Alexandar”.

为了演示索引对数据库的影响,让我们简要分析一下查询性能,而无需先建立索引。 为此,我们将执行查询以查找所有具有user_name和“ Jim Alexandar”的帖子。

<?php // query to find posts with user_name "Jim Alexandar" $cursor = $collection->find( array("user_name" => "Jim Alexandar") ); // use explain() to get explanation of query indexes var_dump($cursor->explain());

An important method often used with indexing is explain() which returns information relevant to indexing. The output of the above explain() is as shown below:

索引经常使用的一种重要方法是explain() ,它返回与索引相关的信息。 上面的explain()的输出如下所示:

Some of the important keys worth looking at are:

一些值得关注的重要关键是:

cursor – indicates the index used in the query. BasicCursor indicates that the default _id index was used and MongoDB had to search the entire collection. Going ahead, we’ll see that when we apply indexing, BtreeCursor will be used instead of BasicCursor.

cursor –指示查询中使用的索引。 BasicCursor指示使用了默认的_id索引,MongoDB必须搜索整个集合。 走在前面,我们会看到,当我们应用索引,BtreeCursor将被用来代替BasicCursor。

n – indicates the number of documents the query returned (one document in this case).

n –表示查询返回的文档数(在这种情况下为一个文档)。

nscannedObjects – indicates the number of documents searched by the query (in this case, all 500 documents of the collection were searched). This can be an operation with large overhead if the number of documents in collection is very large.

nscannedObjects –指示查询搜索的文档数(在这种情况下,将搜索集合的所有500个文档)。 如果收集中的文档数量很大,这可能是一项开销很大的操作。

nscanned – indicates the number of documents scanned during the database operation.

nscanned –表示数据库操作期间扫描的文档数。

Ideally, n should be equal to or near to nscanned, which means a minimum number of documents were searched.

理想情况下, n应该等于或接近nscanned ,这意味着搜索了最少数量的文档。

Now, let’s execute the same query but using a secondary index. To create the index, execute the following in the MongoDB shell:

现在,让我们执行相同的查询,但使用二级索引。 要创建索引,请在MongoDB shell中执行以下操作:

We created an index on the user_name field in the posts collection using the ensureIndex() method. I’m sure you’ve niced the value of the order argument to the method which indicates either an ascending (1) or descending (-1) order for the search. To better understand this, note that each document has a timestamp field. If we want the most recent posts first, we would use descending order. For the oldest posts first, we would choose ascending order.

我们使用ensureIndex()方法在posts集合的user_name field上创建了一个索引。 我确定您已经将order参数的值添加到该方法,该参数指示搜索的升序(1)或降序(-1)。 为了更好地理解这一点,请注意每个文档都有一个时间戳字段。 如果我们要先获取最新帖子,则可以使用降序排列。 对于最早的帖子,我们将选择升序。

After creating the index, the same find() and explain() methods are used to execute and analyze the query as before. The output of is:

创建索引后,将使用与以前相同的find()和explain()方法来执行和分析查询。 的输出是:

The output shows that the query used a BtreeCursor named user_name_1 (which we defined earlier) and scanned only one document as opposed to the 500 documents searched in the previous query without indexing.

输出显示该查询使用了一个名为user_name_1的BtreeCursor (我们已经在前面定义了),并且仅扫描了一个文档,而前一个查询中搜索的500个文档没有索引。

For now, understand that all MongoDB indexes uses a BTree data structure in its algorithm, and BtreeCursor is the default cursor for it. A detailed discussion of BTreeCursor is out of scope for this article, but this doesn’t affect any further understanding.

现在,请了解所有MongoDB索引在其算法中均使用BTree数据结构,并且BtreeCursor是其默认光标。 BTreeCursor的详细讨论不在本文讨论范围之内,但这不会影响任何进一步的理解。

The above comparison indicates how indexes can dramatically improve the the query performance.

上面的比较表明索引如何可以大大提高查询性能。

复合指数 (Compound Index)

There will be cases when a query uses more than one field. In such cases, we can use compound indexes. Consider the following query which uses both the post_type and post_likes fields:

在某些情况下,查询使用多个字段。 在这种情况下,我们可以使用复合索引。 考虑以下同时使用post_type和post_likes字段的查询:

<?php // query to find posts with type public and 100 likes $cursor = $collection->find( array( "post_type" => "public", "post_likes" => 100 ), array() );

Analyzing this query with explain(), gives the following result, which shows that the query uses BasicCursor and all 500 documents are scanned to retrieve one document.

使用explain()分析此查询,得出以下结果,该结果表明该查询使用BasicCursor并且扫描了所有500个文档以检索一个文档。

This is highly inefficient, so let’s apply some indexes. We can define a compound index on fields post_type and post_likes as follows:

这是非常低效的,因此让我们应用一些索引。 我们可以在字段post_type和post_likes上定义一个复合索引,如下所示:

Analyzing the query now gives the follow result:

现在分析查询将得到以下结果:

A very important point of note here is that compound indexes defined on multiple fields can be used to query a subset of these fields. For example, suppose there is a compound index {field1,field2,field3}. This index can be used to query on:

这里需要特别注意的一点是,可以将在多个字段上定义的复合索引用于查询这些字段的子集。 例如,假设有一个复合索引{field1,field2,field3} 。 该索引可用于查询:

field1

field1

field1, field2

field1, field2

field1, field2, field3

field1, field2, field3

So, if we’ve defined the index {field1,field2,field3}, we don’t need to define separate {field1} and {field1,field2} indexes. However, if we need this compound index while querying field2 and field2,field3, we can use hint() if the optimizer doesn’t select the desired index.

因此,如果我们定义了索引{field1,field2,field3} ,则无需定义单独的{field1}和{field1,field2}索引。 但是,如果在查询field2和field2,field3时需要此复合索引,则在优化器未选择所需索引的情况下,可以使用hint() 。

The hint() method can be used to force MongoDB to use an index we specify and override the default selection and query optimization process. You can specify the field names used in the index as a argument as shown below:

hint()方法可用于强制MongoDB使用我们指定的索引,并覆盖默认选择和查询优化过程。 您可以将索引中使用的字段名称指定为参数,如下所示:

<?php // query to find posts with type public and 100 likes // use hint() to force MongoDB to use the index we created $cursor = $collection ->find( array( "post_type" => "public", "post_likes" => 100 ) ) ->hint( array( "post_type" => 1, "post_likes" => 1 ) );

This ensures the query uses the compound index defined on the post_type and post_likes fields.

这样可以确保查询使用在post_type和post_likes字段上定义的复合索引。

多键索引 (Multikey Index)

When indexing is done on an array field, it is called a multikey index. Consider our post document again; we can apply a multikey index on post_tags. The multikey index would index each element of the array, so in this case separate indexes would be created for the post_tags values: MongoDB, Tutorial, Indexing, and so on.

在数组字段上建立索引时,称为多键索引。 再次考虑我们的post文件; 我们可以在post_tags上应用多键索引。 多键索引将为数组的每个元素建立索引,因此在这种情况下,将为post_tags值创建单独的索引: MongoDB , Tutorial , Indexing等。

Indexes on array fields must be used very selectively, though, as they consume a lot of memory because of the indexing of each value.

但是,必须非常有选择地使用数组字段上的索引,因为由于每个值的索引它们会占用大量内存。

多键复合索引 (Multikey Compound Index)

We can create a multikey compound index, but with the limitation that at most one field in the index can be an array. So, if we have field1 as a string, and [field2, field3] as an array, we can’t define the index {field2,field3} since both fields are arrays.

我们可以创建一个多键复合索引,但有一个限制,即索引中最多一个字段可以是一个数组。 因此,如果我们将field1作为字符串,将[field2, field3]作为数组,则由于两个字段都是数组,因此无法定义索引{field2,field3} 。

In the example below, we create an index on the post_tags and user_name fields:

在下面的示例中,我们在post_tags和user_name字段上创建索引:

索引限制和注意事项 (Indexing Limitations and Considerations)

It is important to know that indexing can’t be used in queries which use regular expressions, negation operators (i.e. $ne, $not, etc.), arithmetic operators (i.e. $mod, etc.), JavaScript expressions in the $where clause, and in some other cases.

重要的是要知道,不能在使用正则表达式,否定运算符(即$ne , $not等),算术运算符(即$mod等), $where中JavaScript表达式的查询中使用索引条款,以及其他一些情况。

Indexing operations also come with their own cost. Each index occupies space as well as causes extra overhead on each insert, update, and delete operation on the collection. You need to consider the read:write ratio for each collection; indexing is beneficial to read-heavy collections, but may not be for write-heavy collections.

索引操作也要自己付费。 每个索引都会占用空间,并且会导致集合上每个插入,更新和删除操作的额外开销。 您需要考虑每个集合的读写比率。 索引对于读取繁重的集合是有益的,但对于写入繁重的集合则可能不是。

MongoDB keeps indexes in RAM. Make sure that the total index size does not exceed the RAM limit. If it does, some indexes will be removed from RAM and hence queries will slow down. Also, a collection can have a maximum of 64 indexes.

MongoDB将索引保留在RAM中。 确保总索引大小不超过RAM限制。 如果是这样,某些索引将从RAM中删除,因此查询速度会降低。 另外,一个集合最多可以有64个索引。

摘要 (Summary)

That’s all for this part. To summarize, indexes are highly beneficial for an application if a proper indexing approach is chosen. In the next part, we’ll look at using indexes on embedded documents, sub-documents, and ordering. Stay tuned!

仅此部分而已。 总之,如果选择了适当的索引编制方法,索引对于应用程序将非常有益。 在下一部分中 ,我们将研究在嵌入式文档,子文档和排序中使用索引。 敬请关注!

Image via Fotolia

图片来自Fotolia

翻译自: https://www.sitepoint.com/mongodb-indexing-1/

mongodb 索引

最新回复(0)