php使用正则表达式

tech2023-11-27  32

php使用正则表达式

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$

It makes all the sense of ancient Egyptian hieroglyphics to you, although those little pictures at least look like they have meaning. But this… this looks like gibberish. What does it mean? It means oleomarg32@hotmail.com, Fiery.Rebel@veneuser.info, robustlamp+selfmag@gmail.ca, or nearly any other simple email address because this is a pattern written in a language that describes how to match text in strings. When you’re looking to go beyond straight text matches, like finding “stud” in “Mustard” (which would fail btw), and you need a way to “explain” what you’re looking for because each instance may be different, you’ve come to need Regular Expressions, affectionately called regex.

尽管这些小图片至少看起来像它们具有意义,但它对您来说具有古埃及象形文字的所有含义。 但这……看起来像胡言乱语。 这是什么意思? 这意味着oleomarg32@hotmail.com,Fiery.Rebel@veneuser.info,robustlamp+selfmag@gmail.ca,或几乎任何其他简单的电子邮件地址,因为这是写在描述如何在字符串中匹配的文本语言的模式。 如果您想超越纯文本匹配,例如在“芥末”中找到“螺柱”(可能会失败),并且您需要一种“解释”所寻找内容的方法,因为每个实例可能不同,您已经需要正则表达式,亲切地称为regex。

正则表达式简介 (Intro to Regex Notation)

To get your feet wet, let’s take the above example and break it down piece by piece.

为了弄湿你的脚,让我们以上面的示例为例,并逐一分解。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ The beginning of a line can be detected similar to carriage returns, even though it isn’t really an invisible character. Using ^ tells the regex engine that the match must start at the beginning of the line.

^ [A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$的开头可以是即使不是真正的隐形字符,也可以检测到与回车类似的字符。 使用^告诉正则表达式引擎,匹配必须从行的开头开始。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ Instead of specifying each and every character of the alphabet, we have a shorthand that gives a range. Usually it is case sensitive so you’ll have to specify both an uppercase and lowercase range.

^ [A-Za-z 0-9-_.+% ] +@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$而不是分别指定每个字母的字符,我们有一个给出范围的简写。 通常,它区分大小写,因此您必须同时指定大写和小写范围。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ The same goes for numbers; we can simply shorten them to a range instead of writing all 10 digits.

^ [ A-Za-z 0-9 -_.+% ] +@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$数字也一样; 我们可以简单地将它们缩短到一个范围,而不必写所有10位数字。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ These are the special characters we’re allowing: a dash, underscore, dot, plus-sign, and percent-sign.

^ [ A-Za-z0-9 -_.+%] +@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$这些是我们的特殊字符重新允许:破折号,下划线,点,加号和百分号。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ The brackets surrounding our ranges effectively take everything you’ve put between them to create your own custom wildcard. Our “wildcard” is capable of matching any letter A-Z in either uppercase or lowercase, a digit 0-9, or one of our special punctuation characters.

^ [ A-Za-z0-9-_.+% ] +@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$有效地包围了我们范围的括号您在它们之间创建自己的自定义通配符所需要的所有内容。 我们的“通配符”能够匹配任何大写或小写字母AZ,数字0-9或我们的特殊标点字符之一。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ This is a quantifier; it modifies how many times the previous character should match, in this case, the previous character is the set within brackets. + means “at least one,” so in our example, after matching the beginning of the string, we have to have at least one of the characters within the brackets.

^[A-Za-z0-9-_.+%] + @[A-Za-z0-9-.]+.[A-Za-z]{2,4}$这是一个量词; 它修改前一个字符应匹配的次数,在这种情况下,前一个字符是括号内的集合。 +表示“至少一个”,因此在我们的示例中,匹配字符串的开头之后,我们必须在括号内至少包含一个字符。

At this point we can match (given the sample email addresses from earlier) oleomarg23, Fiery.Rebel, and robustlamp+selfmag. Something like @SodaCanDrive.com would fail because we must have at least one of the characters in the bracketed set at the very beginning of the text.

在这一点上,我们可以匹配(假设从早期样品电子邮件地址)oleomarg23,Fiery.Rebel和robustlamp + selfmag。 诸如@ SodaCanDrive.com之类的操作将失败,因为我们必须在文本的开头至少将括号中的一个字符包含在内 。

In addition to + as a quantifier, there is * which is almost identical except that it will match if there are no matches at all. If we replaced the first + quantifier in the sample with * and had this:

除了+作为量词外,还有*几乎相同,只是如果完全没有匹配项,它将匹配。 如果我们将样本中的第一个+量词替换为*并具有:

^[A-Za-z0-9-_.+%]*@[A-Za-z0-9-.]+.[A-Za-z]{2,4}

it would have successfully matched the string @SodaCanDrive.com as we are effectively telling the regex engine to keep matching until it comes across a character not in the set, even if there aren’t any.

它会成功匹配字符串@ SodaCanDrive.com,因为我们实际上是在告诉正则表达式引擎保持匹配,直到遇到不存在的字符为止,即使该字符不存在也是如此。

Back to our original pattern…

回到我们原来的模式…

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ The @ matches literally, so we now we’ve matched oleomarg23@, Fiery.Rebel@, and robustlamp+selfmag@. The text greencandelabra.com fails because it doesn’t have an at-sign!

^[A-Za-z0-9-_.+%]+ @ [A-Za-z0-9-.]+.[A-Za-z]{2,4}$ @字面匹配,所以我们现在我们已经匹配oleomarg23 @ , Fiery.Rebel @和robustlamp + selfmag @ 。 文本greencandelabra.com失败,因为它没有符号!

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ This portion of the expression is similar to what we matched before the at-sign except this time we’re not allowing the underscore, plus-sign, or percent-sign. Now we’re up to oleomarg23@hotmail.com, Fiery.Rebel@veneuser.info and robustlamp+selfmag@gmail.ca. gnargly3.1415@pie_a_la_mode.com would only match up to gnarly3.1415@pie.

^[A-Za-z0-9-_.+%]+@ [A-Za-z0-9-.]+ .[A-Za-z]{2,4}$表达式的这一部分类似与我们在符号前匹配的内容,但这次我们不允许使用下划线,加号​​或百分号。 现在,我们已经达到oleomarg23@hotmail.com,Fiery.Rebel@veneuser.info和robustlamp+selfmag@gmail.ca。 gnargly3.1415@pie_a_la_mode.com仅与gnarly3.1415@pie匹配。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ Here we have an escaped dot so as to match it literally. Note the plus-sign matched literally when it was inside brackets, but outside it had special meaning as a quantifier. Outside the brackets, the dot has to be escaped or it will be treated as the wildcard character; inside the brackets, a dot means a dot.

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+ . [A-Za-z]{2,4}$ ^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+ . [A-Za-z]{2,4}$这里有一个转义的点,以便从字面上进行匹配。 请注意,加号在方括号内时在字面上匹配,但在其外面作为量词有特殊含义。 在方括号外,点必须转义,否则将被视为通配符; 在方括号内,点表示点。

Uh oh! Since we already matched the .com, .info and .ca, it would seem like the match would fail because we don’t have any more dots. But regex is smart: the matching engine tries backtracking to find the match. So now we’re back to oleomarg23@hotmail., Fiery.Rebel@veneuser. and robustlamp+selfmag@gmail..

哦! 由于我们已经匹配了.com , .info和.ca ,因此似乎匹配失败,因为我们没有更多的点了。 但是正则表达式很聪明:匹配引擎尝试回溯以找到匹配项。 现在回到oleomarg23 @ hotmail。 , Fiery.Rebel @ veneuser。 和robustlamp + selfmag @ gmail。 。

At this point, gnargly3.1415@pie_a_la_mode.com fails because the character after what’s matching so far is not a dot. drnddog@chewwed.legs.onchair continues as drnddog@chewwed.legs..

此时, gnargly3.1415 @ pie_a_la_mode.com失败,因为到目前为止匹配的字符后面没有点。 drnddog@chewwed.legs.onchair继续作为drnddog@chewwed.legs。 。

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$ Remember how we made our own custom wildcard using brackets? We can do a similar thing with braces to make custom quantifiers. {2,4} means to match at least two times but no more than four times. If we only wanted to match exactly two times, we would write {2}. We could handle any quantity up to a maximum of four with {0,4}. {2,} would match a minimum of two.

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z] {2,4} $记住我们如何进行自己的定制使用方括号通配符? 我们可以用花括号做类似的事情来制作自定义量词。 {2,4}表示匹配至少两次但不超过四次。 如果我们只想精确匹配两次,我们将写{2} 。 我们可以用{0,4}处理最多四个数量的任何数量。 {2,}至少要匹配两个。

{2,4} is our special quantifier that limits the last wildcard match to any 2, 3, or 4 letters or dots. We’ve nearly fully matched oleomarg23@hotmail.com, Fiery.Rebel@venuser.info and robustlamp+selfmag@gmail.ca. drnddog@chewwed.legs.onchair has to goes backwards further to drnddog@chewwed.legs to make the match.

{2,4}是我们的特殊量词,它将最后一个通配符匹配限制为任何2、3或4个字母或点。 我们已经几乎完全匹配oleomarg23@hotmail.com,Fiery.Rebel@venuser.info和robustlamp+selfmag@gmail.ca。 drnddog@chewwed.legs.onchair必须进一步落后于drnddog@chewwed.legs进行比赛。

We just have one more to go…

我们还有一个要走…

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$$ is the counter-part to ^. Just as ^ does for the start of the line, $ anchors the match to the end of the line. Our examples all match now, and drnddog@chewwed.legs.onchair fails because there isn’t 2, 3, or 4 letters preceded by a dot at the end of the string.

^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4} $ $是与之相对的部分^ 。 就像^在行的开头一样, $将匹配项锚定到行的末尾。 我们的示例现在全部匹配,并且drnddog@chewwed.legs.onchair失败,因为在字符串的末尾没有2、3或4个字母,并没有一个点。

PHP中的正则表达式 (Regexs in PHP)

It’s all well and good to have this basic understanding of the notation used by regular expressions, but we still need to know how to apply it in the context of PHP to actually do something productive, so let’s look at the function preg_match(), preg_replace() and preg_match_all().

对正则表达式使用的表示法有基本了解是很好的,但是我们仍然需要知道如何在PHP的上下文中应用它来实际执行一些生产工作,因此让我们来看一下函数preg_match() , preg_replace()和preg_match_all() 。

preg_match() (preg_match())

To validate a form field for an email address, we’d use preg_match():

要验证电子邮件地址的表单字段,我们将使用preg_match() :

<?php if (preg_match('/^[A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4}$/', $_POST["emailAddy"])) { echo "Email address accepted"; } else { echo "Email address is all broke."; }

If a match is found, preg_match() returns 1, otherwise 0. Notice that we added slashes to the beginning and end of the regex. These are used as delimiters to show the function where the regular expression begins and ends. You may ask, “But Jason, isn’t that what the quotes are for?” Let me assure you that there is more to it, as I will explain shortly.

如果找到匹配项,则preg_match()返回1,否则返回0。请注意,我们在正则表达式的开头和结尾添加了斜杠。 这些用作分隔符,以显示正则表达式开始和结束的功能。 您可能会问,“但是杰森,这不是引号的意思吗?” 让我向您保证,还有更多内容,我将在稍后进行解释。

preg_replace() (preg_replace())

To find an email address and add formating, we would use preg_replace():

要查找电子邮件地址并添加格式,我们将使用preg_replace() :

<?php $formattedBlock = preg_replace( '/([A-Za-z0-9-_.+%]+@[A-Za-z0-9-.]+.[A-Za-z]{2,4})/U', "<b>\1</b>", $blockOText);

Here’s that explanation that was promised: we’ve placed a U after the ending delimiter as a flag that modifies how the regex matches. We’ve seen how regex matches are greedy, gobbling up as many characters as it can and only backtracking if it has to. U makes the regex “un-greedy.” Without it, the string tweedle@dee.com-and-tweedle@dum.com would match as one. But by making it un-greedy, we tell it to find the shortest matching pattern… just tweedle@dee.com.

这是承诺的解释:我们在结尾的定界符后放置了一个U ,作为修改正则表达式匹配方式的标志。 我们已经看到了正则表达式匹配是如何贪婪的,如何尽可能多地吞噬字符,并且在必要时仅回溯。 U使正则表达式“不贪心”。 没有它,字符串tweedle @ dee.com-and-tweedle @ dum.com将会匹配为一个。 但是,通过使其不贪婪,我们告诉它找到最短的匹配模式……只是tweedle@dee.com 。

Did you notice we also wrapped the the whole expression in parentheses? This causes the regex engine to capture a copy of the text that matches the expression between the parenthesis which we can reference with a back-reference (1). The second argument to preg_replace() is telling the function to replace the text with an opening bold tag, whatever matched the pattern between the first set of parenthesis, and a closing bold tag. If there were other sets of parenthesis, they could be referenced with 2, 3, etc. depending on their position.

您是否注意到我们也将整个表达式用括号括起来? 这导致正则表达式引擎捕获与括号之间的表达式匹配的文本副本,我们可以使用反引用( 1 )对其进行引用。 preg_replace()的第二个参数是告诉函数用开头的粗体标签替换文本,无论匹配第一组括号和结尾的粗体标签之间的模式如何。 如果有其他套括号的,他们可以被引用2 , 3取决于他们的位置等。

preg_match_all() (preg_match_all())

To scan some text and extract an array of all the email addresses found in it, preg_match_all() is our best choice:

要扫描某些文本并提取其中找到的所有电子邮件地址的数组, preg_match_all()是我们的最佳选择:

<?php $matchesFound = preg_match_all( '/([a-z0-9-_.+%]+@[a-z0-9-.]+.[a-z]{2,4})/Ui', $articleWithEmailAddys, $listOfEmails); if ($matchesFound) { foreach ($listOfEmails[0] as $foundEmail) { echo $foundEmail . "<br>"; } }

preg_match_all() returns how many matches it found, and sticks those matches into the variable reference we supplied as the third argument. It actually creates a multi-dimensional array in which the matches we’re looking for are found at index 0.

preg_match_all()返回找到的匹配项数量,并将这些匹配项放入我们作为第三个参数提供的变量引用中。 它实际上创建了一个多维数组,其中我们要查找的匹配项在索引0处找到。

In addition to the U modifier, we provided i which instructs the regex engine we want the pattern to be applied in a case-insensitive manner. That is, /a/i would match both a lower-case A and an upper-case A (or /A/i would work equally well for that matter since the modifier is asking the engine to be case-agnostic). This allows us to write things like [a-z0-9] in our expression now instead of [A-Za-z0-9] which makes it a little shorter and easier to grok.

除了U修饰符外,我们还提供了i ,它指示正则表达式引擎我们希望以不区分大小写的方式应用模式。 也就是说, /a/i将同时匹配小写字母A和大写字母A(或者/A/i在此问题上同样有效,因为修饰符要求引擎区分大小写)。 这使我们现在可以在表达式中编写[a-z0-9] ,而不是[A-Za-z0-9] ,这使它更短且更容易理解。

Well that about wraps things up. While there is a lot more you can do using regular expressions involving look-ahead, look-behind, and more intricate examples of back-references, all of which you can find in PHP’s online documentation, hopefully you have plenty to work with that will serve you for many scripts just from this article.

好了,关于总结。 尽管可以使用正则表达式做很多事情,这些正则表达式涉及先行查找,后向查找以及更复杂的反向引用示例,所有这些都可以在PHP的在线文档中找到,希望您能做很多工作为您提供许多本文的脚本。

Image via Boris Mrdja / Shutterstock

图片来自Boris Mrdja / Shutterstock

翻译自: https://www.sitepoint.com/regular-expressions/

php使用正则表达式

相关资源:精通正则表达式_第三版(高清版).
最新回复(0)