solr空间搜索教程

tech2022-09-26  125

solr空间搜索教程

In a recent series of articles I looked in detail at Apache’s SOLR and Solarium.

在最近的系列文章中,我详细介绍了Apache的SOLR和Solarium。

To recap; SOLR is a search service with a raft of features – such as faceted search and result highlighting – which runs as a web service. Solarium is a PHP library which allows you to integrate with SOLR – whether local or remote – interacting with it as if it were a native component of your application. If you’re unfamiliar with either, then my series is over here, and I’d urge you to take a look.

回顾 SOLR是一种搜索服务,具有作为网络服务运行的许多功能(例如,分面搜索和结果突出显示)。 Solarium是一个PHP库,可让您与SOLR集成(无论是本地的还是远程的),并与之交互,就好像它是应用程序的本机组件一样。 如果您不熟悉任何一个,那么我的系列已经结束了 ,我敦促您看看。

In this article, I’m going to look at another part of SOLR which warrants its own discussion; Geospatial search.

在本文中,我将研究SOLR的另一部分,该部分值得讨论。 地理空间搜索。

一个例子 (An Example)

I’ve put together a simple example application to accompany this article. You can get it from Github, or see it in action here.

我在本文中整理了一个简单的示例应用程序。 您可以从Github上获得它,或者在这里查看它的运行情况 。

Before we delve into that, let’s look at some of the background.

在深入探讨之前,让我们先看一些背景。

Sometimes, the things you want to search for have geographical locations. Often, that provides vital context. It’s all very well me being able to search for “Italian restaurants”, but I’m hungry – a restaurant on another continent, as good as it might be, is of no help. Rather, it would be far more useful to be able to run a search which asks “show me Italian restaurants, but within 5 miles”. Or alternatively, “show me the ten closest Italian restaurants”. That’s where Geospatial search comes in.

有时,您要搜索的内容具有地理位置。 通常,这提供了至关重要的环境。 能够搜索“意大利餐厅”非常好,但是我很饿-另一个大陆上的餐厅,尽管可能很好,但没有帮助。 相反,能够进行询问“向我显示意大利餐馆,但在5英里范围内”的搜索将更为有用。 或者,“给我看看最近的十家意大利餐馆”。 那就是地理空间搜索出现的地方。

地理空间搜索和点 (Geospatial Search and Points)

In geospatial applications we often talk about “points”; i.e., a specific geographical location. Specifically, we’re really talking about a latitude and longitude pair. A latitude and longitude defines a point on the globe, potentially to within a few metres.

在地理空间应用中,我们经常谈论“点”。 即特定的地理位置。 具体来说,我们实际上是在谈论纬度和经度对。 纬度和经度定义了地球上可能在几米之内的一点。

One of the challenges when you’re developing anything involving geographic points is that you need some way of making sense of them for people who don’t think in latitude and longitude – which I’m pretty sure is most of us. Geolocation comes in handy here, because it can be used to determine the latitude and longitude of “where you are”, without the ambiguities of place names. (If you want to take the latter approach, I’ve written about it before.)

开发涉及地理点的任何东西时面临的挑战之一是,您需要某种方式使那些对纬度和经度没有考虑的人了解它们-我敢肯定,我们大多数人都是这样。 地理位置在这里派上用场,因为它可以用来确定“您所在的位置”的纬度和经度,而无需担心地名的不确定性。 (如果您想采用后一种方法,我之前已经写过 。)

So, the first challenge when you’re doing any sort of geo-related work is to work out how to determine the start point – i.e., where to search from. In our example application we’ll hedge our bets and take three approaches. We’ll use the HTML5 geolocation functionality to allow the user’s browser to locate them. For convenience and simplicity we’ll include an arbitrary list of some major cities, which when clicked will populate the latitude and longitude from some hard-coded values. Finally, just so we have all bases covered, and for the geo-geeks among us, we’ll include text fields in which users can manually enter their latitude and longitude.

因此,进行任何与地理相关的工作时的第一个挑战是弄清楚如何确定起点,即从何处搜索。 在示例应用程序中,我们将套期保值并采取三种方法。 我们将使用HTML5地理位置功能来允许用户的浏览器找到它们。 为了方便和简单起见,我们将包含一些主要城市的任意列表,单击这些列表会从一些硬编码的值中填充纬度和经度。 最后,就这样,我们涵盖了所有基础,并且对于我们当中的地理极客,我们将包括文本字段,用户可以在其中手动输入其经度和纬度。

设置架构 (Setting up the Schema)

In order to get our SOLR core setup to support geographical locations, we need to perform some tweaks to the schema.

为了使我们的SOLR核心设置支持地理位置,我们需要对模式进行一些调整。

The first thing we need to do is to add the location field type to schema.xml:

我们需要做的第一件事是将location字段类型添加到schema.xml :

<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>

Note that this field is made up of sub-fields; i.e., a latitude and a longitude. We need to ensure we have a suitable type for those:

注意,该字段由子字段组成; 即纬度和经度。 我们需要确保我们具有适合以下类型的类型:

<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>

As you can see, it’s basically a field of type double (specifically tdouble, represented internally by the Java class solr.TrieDoubleField).

如您所见,它基本上是一个double类型的字段(特别是tdouble ,由Java类solr.TrieDoubleField内部表示)。

Both of these <fieldType> declarations need to be placed within the <fields> element of your schema.xml.

这两个<fieldType>声明都需要放在schema.xml的<fields>元素内。

Now that the types are set up, you can define a new field to hold the latitude and longitude. In the following example, I’m calling it latlon:

现在已经设置了类型,您可以定义一个新字段来保存纬度和经度。 在以下示例中,我将其latlon :

<field name="latlon" type="location" indexed="true" stored="true" multiValued="false" />

It’s important that multiValued is set to false – multiple lat/lon pairs aren’t supported.

将multiValued设置为false非常重要–不支持多个纬度/经度对。

You’ll also need to set up a dynamic field for the components; i.e. the latitude and longitude. _coordinate refers to the suffix we specified when we defined our location field type above.

您还需要为组件设置一个动态字段。 即纬度和经度。 _coordinate指的是我们在上面定义location字段类型时指定的后缀。

<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>

Both the <field> and <dynamicField> declarations go in the <fields> section.

<field>和<dynamicField>声明都在<fields>部分中。

Your schema is now set up to support latitude / longitude pairs, and we’ve added a field called latlon. Next, let’s look at how to populate that field.

现在,您的模式已设置为支持纬度/经度对,并且我们添加了一个名为latlon的字段。 接下来,让我们看看如何填充该字段。

You’ll find an example schema.xml file in the sample application’s repository.

您可以在示例应用程序的存储库中找到一个示例schema.xml文件。

分配位置数据 (Assigning Location Data)

When it comes to assigning a value to a location field, you need to do this:

在为位置字段分配值时,您需要执行以下操作:

$doc = {lat},{long}

So, using Solarium:

因此,使用日光浴室:

$doc->latlon = doubleval($latitude) . "," . doubleval($longitude);

Refer to the section “Populating the Data” for a concrete example.

有关具体示例,请参见“填充数据”部分。

日光浴室的SOLR中的地理空间查询 (Geospatial Queries in SOLR with Solarium)

You might recall that in part three of the SOLR series, we looked at Solarium’s helpers. Basically, these act as syntactic sugar, enabling you to create more complex queries without having to worry too much about the underlying SOLR query syntax.

您可能还记得,在SOLR系列的第三部分中,我们介绍了Solarium的助手。 基本上,它们充当语法糖,使您可以创建更复杂的查询,而不必过多担心基础SOLR查询语法。

Here’s an example of how to add an additional filter to a search query, which – given a $latitude and a $longitude – limits the results to within $distance kilometres:

这是一个如何向搜索查询中添加其他过滤器的示例,在给定$latitude和$longitude ,该查询将结果限制在$distance公里范围内:

$query->createFilterQuery('distance')->setQuery( $helper->geofilt( 'latlon', doubleval($latitude), doubleval($longitude), doubleval($distance) ) );

If you prefer to work in miles, you simply need to multiply $distance by 1.609344:

如果您想以英里为单位工作,则只需将$distance乘以1.609344 :

$query->createFilterQuery('distance')->setQuery( $helper->geofilt( 'latlon', doubleval($latitude), doubleval($longitude), doubleval($distance * 1.609344)) ) );

If you want to return the distance with the search results, you’ll need to add the geodist function to the list of fields, using the same values as the geofilt filter. Again, you can use a helper:

如果要返回搜索结果的距离,则需要使用与geofilt过滤器相同的值,将geodist函数添加到字段列表中。 同样,您可以使用一个助手:

$query->addField($helper->geodist( 'latlon', doubleval($latitude), doubleval($longitude) ) );

It’s far more useful to add a field alias, much like you would in SQL, which you can use to retrieve the value later. The convention with aliases is to prefix and suffix with an underscore, like so:

添加字段别名非常有用,就像在SQL中一样,以后可以使用它来检索值。 别名的约定是在下划线之前加前缀和后缀,如下所示:

$query->addField('_distance_:' . $helper->geodist( 'latlon', doubleval($latitude), doubleval($longitude) ) );

Now, you can display the distance in your search results:

现在,您可以在搜索结果中显示距离:

<ul> <?php foreach ($resultset as $document): ?> <li><?php print $doc->title ?> (<?php print round($document->_distance_, 2) ?> kilometres away)</li> <?php endforeach; ?> </ul>

In order to sort the results by distance, you need to apply a little trickery. Rather than use setSort, you actually need to use a query; this is then used to “score” results based on distance. The underlying SOLR query will look like this:

为了按距离对结果进行排序,您需要使用一些技巧。 实际上,您无需使用setSort ,而需要使用查询。 然后将其用于基于距离“得分”结果。 底层的SOLR查询将如下所示:

{!func}geodist(fieldname,lat,lng)

To do this with Solarium, again using a helper:

要再次使用Solarium,请使用辅助程序:

$query->setQuery('{!func}' . $helper->geodist( 'latlon', doubleval($latitude), doubleval($longitude) ));

The net result of this is that the score will reflect the proximity; the lower the score, the closer it is geographically.

最终结果是分数将反映出接近程度; 分数越低,地理位置越接近。

So, to sort the results by distance, closest first:

因此,要按距离对结果进行排序,最接近的是第一个:

$query->addSort('score', 'asc');

Enough of the theory; let’s build something.

理论足够; 让我们来构建一些东西。

构建我们的示例应用程序 (Building our Example Application)

I’ve created a simple example application where people can search for their nearest airports, which you can find on Github, in the solr folder. There’s an online demo here.

我创建了一个简单的示例应用程序,人们可以在其中搜索其最近的机场,您可以在solr文件夹中的Github上找到它们。 这里有一个在线演示 。

It uses Silex as a framework, along with Twig for templating. You shouldn’t need an in-depth knowledge of either in order to follow along, since most of the application’s complexity comes from the SOLR integration, which is covered here.

它使用Silex作为框架,并使用Twig进行模板制作。 由于后续的大多数应用程序的复杂性都来自SOLR集成,因此您不需要对这两种方法都有深入的了解就可以了。

填充数据 (Populating the Data)

The data we’re using is taken from the excellent OpenFlights.org service. You’ll find the data file in the repository, along with a simple script to populate the search index – run it as follows:

我们正在使用的数据来自出色的OpenFlights.org服务。 您将在存储库中找到数据文件,以及用于填充搜索索引的简单脚本–如下运行:

php scripts/populate.php

Here’s the relevant section:

这是相关的部分:

// Now let's start importing while (($row = fgetcsv($fp, 1000, ",")) !== FALSE) { // get an update query instance $update = $client->createUpdate(); // Create a document $doc = $update->createDocument(); $doc->id = $row[0]; $doc->name = $row[1]; $doc->city = $row[2]; $doc->country = $row[3]; $doc->faa_faa_code = $row[4]; $doc->icao_code = $row[5]; $doc->altitude = $row[8]; $doc->latlon = doubleval($row[6]) . "," . $row[7]; // Let's simply add and commit straight away. $update->addDocument($doc); $update->addCommit(); // this executes the query and returns the result $result = $client->update($update); $num_imported++; // Sleep for a couple of seconds, lest we go too fast for SOLR sleep(2); }

建立搜寻表单 (Building the Search Form)

We’ll start with a simple form with longitude and latitude fields, as well as a drop-down with which the user can specify the distance to limit to:

我们将从一个具有经度和纬度字段的简单表单开始,以及一个下拉菜单,用户可以使用该下拉列表指定将距离限制为以下内容:

<form method="get" action="/"> <div class="form-group"> <a href="#/" id="findme" class="btn btn-default"><i class="icon icon-target"></i> Find my location</a> </div> <div class="form-group"> <label for="form-lat">Latitude</label> <input type="text" name="lat" id="form-lat" class="form-control" /> </div> <div class="form-group"> <label for="form-lat">Longitude</label> <input type="text" name="lng" id="form-lat" class="form-control" /> </div> <div class="form-group"> <label for="form-dist">Within <em>x</em> kilometers</label> <select name="dist" id="form-dist" class="form-control"> <option value="50">50</option> <option value="100">100</option> <option value="250">250</option> <option value="500">500</option> </select> </div> <div class="form-group"> <button type="submit" class="btn btn-primary"><i class="icon icon-search"></i> Search</button> </div> </form>

Next, let’s implement the “find me” button, which uses HTML5 geolocation – if the user’s browser supports it – to populate the search form.

接下来,让我们实现“查找我”按钮,该按钮使用HTML5地理位置(如果用户的浏览器支持 )来填充搜索表单。

function success(position) { $('input[name="lat"]').val(position.coords.latitude); $('input[name="lng"]').val(position.coords.longitude); } function error(msg) { alert(msg); } $('#findme').click(function(){ if (navigator.geolocation) { navigator.geolocation.getCurrentPosition(success, error); } else { error('not supported'); } });

Users will need to grant our application permission to locate them, so really it’s best to run this upon some sort of user interaction, such as at the click of a button, rather than on page-load.

用户将需要授予我们的应用程序权限来定位它们,因此,实际上,最好是在某种用户交互下(例如,单击按钮而不是页面加载)运行它。

Finally, we’ll provide a list of “default” cities; a user can click one to populate the latitude and longitude fields automatically.

最后,我们将提供“默认”城市的列表; 用户可以单击一个以自动填充纬度和经度字段。

Here’s the HTML, showing a limited number of cities for brevity:

这是HTML,为简洁起见,显示了数量有限的城市:

<ul id="cities"> <li><a href="#/" data-lat="52.51670" data-lng="13.33330">Berlin, Germany</a></li> <li><a href="#/" data-lat="-34.33320" data-lng="-58.49990">Buenos Aires, Argentina</a></li>

The corresponding JavaScript is extremely simple:

相应JavaScript非常简单:

$('#cities a').click(function(e){ $('input[name="lat"]').val($(this).data('lat')); $('input[name="lng"]').val($(this).data('lng')); });

Next up, we’re going to implement the search.

接下来,我们将实现搜索。

搜索页面 (The Search Page)

Let’s start by defining a single route; for the one and only page in our example application. It will display the search form, as well as displaying the results when the latutude and longitude are provided via GET parameters by submitting the form.

让我们从定义一条路线开始; 在我们的示例应用程序中只有一页。 提交表单时,它将显示搜索表单,并在通过GET参数提供纬度和经度时显示结果。

// Display the search form / run the search $app->get('/', function (Request $request) use ($app) { $resultset = null; $query = $app['solr']->createSelect(); $helper = $query->getHelper(); $query->setRows(100); $query->addSort('score', 'asc'); if (($request->get('lat')) && ($request->get('lng'))) { $latitude = $request->get('lat'); $longitude = $request->get('lng'); $distance = $request->get('dist'); $query->createFilterQuery('distance')->setQuery( $helper->geofilt( 'latlon', doubleval($latitude), doubleval($longitude), doubleval($distance) ) ); $query->setQuery('{!func}' . $helper->geodist( 'latlon', doubleval($latitude), doubleval($longitude) )); $query->addField('_distance_:' . $helper->geodist( 'latlon', doubleval($latitude), doubleval($longitude) ) ); $resultset = $app['solr']->select($query); } // Render the form / search results return $app['twig']->render('index.twig', array( 'resultset' => $resultset, )); });

The boilerplate code is pretty simple stuff – defining the route, grabbing the relevant parameters and rendering the view.

样板代码很简单-定义路线,获取相关参数并渲染视图。

The code which runs the search utilizes the code we looked at earlier. Essentially it does the following:

运行搜索的代码利用了我们之前看过的代码。 本质上,它执行以下操作:

Creates a filter query, restricting the search to within $distance km of the point specified by $latitude and $longitude; all three are provided as GET parameters

创建一个过滤查询,将搜索限制在$latitude和$longitude指定的点的$distance km之内; 所有这三个都作为GET参数提供

Uses the geodist helper to tell Solarium which field we’re interested in (the latlon field we defined earlier) in order to sort the results

使用geodist助手告诉Solarium我们感兴趣的字段(我们先前定义的latlon字段),以便对结果进行排序

Adds a pseudo-field _distance_ so that we can incorporate it into our search results

添加一个伪字段_distance_以便我们可以将其合并到搜索结果中

Runs the query and assigns its result to the view.

运行查询并将其结果分配给视图。

显示结果 (Displaying the Results)

Here’s the portion of the template which is responsible for displaying the search results:

这是模板中负责显示搜索结果的部分:

{% if resultset %} {% for doc in resultset %} <article> <h4><i class="icon icon-airplane"></i> {{ doc.name }}</h4> <p><strong>{{ doc.city }}</strong>, {{ doc.country}} ({{ doc._distance_|number_format }} km away)</p> </article> <hr /> {% endfor %} {% endif %}

It’s pretty straightforward; note how the _distance_ field is available in our search result document, along with the name and country fields. We’re using Twig’s number_format filter to format the distance.

这很简单; 请注意_distance_字段以及name和country字段如何在我们的搜索结果文档中提供。 我们正在使用Twig的number_format过滤器来格式化距离。

That’s all there is to it – you’ll find the complete example in the repository.

这就是全部–您将在存储库中找到完整的示例。

Of course, this example is only searching based on distance. You can of course combine text-based search with geospatial search – I’ll leave that as an exercise.

当然,该示例仅基于距离进行搜索。 当然,您可以将基于文本的搜索与地理空间搜索相结合–我将把它作为练习。

摘要 (Summary)

In this article I’ve shown how you can use SOLR – in conjunction with the PHP library Solarium – in order to perform geospatial searches. We’ve looked at some of the theory, then dived into setting up our schema, constructing our query and putting it into practice.

在本文中,我展示了如何与PHP库Solarium结合使用SOLR来执行地理空间搜索。 我们已经研究了一些理论,然后深入研究了建立模式,构建查询并将其付诸实践。

Feedback? Comments? Leave them below!

反馈? 注释? 把它们留在下面!

翻译自: https://www.sitepoint.com/geospatial-search-solr-solarium/

solr空间搜索教程

相关资源:jdk-8u281-windows-x64.exe
最新回复(0)