在PHP中实现范围运算符

tech2022-09-03  123

We sometimes come across some amazing posts in other locations, and with the permissions of their authors, repost them on SitePoint. This is one such instance. In the post below, Thomas Punt implements the range operator in PHP. If you’ve ever been interested in PHP internals and adding features to your favorite programming language, now’s the time to learn!

有时我们会在其他位置遇到一些很棒的帖子,并在其作者许可下,将其重新发布到SitePoint上。 这是一个这样的例子。 在下面的文章中, Thomas Punt用PHP实现了范围运算符。 如果您曾经对PHP的内部结构感兴趣并将功能添加到自己喜欢的编程语言中,那么现在该学习一下了!

This article assumes that the reader is able to build PHP from source. If this is not the case, then please see the Building PHP chapter of the PHP Internals Book first.

本文假定读者能够从源代码构建PHP。 如果不是这种情况,请首先参阅PHP Internals Book的Building PHP章节 。



This article will demonstrate how to implement a new operator in PHP. The following steps will be taken to do this:

本文将演示如何在PHP中实现新的运算符。 为此,将采取以下步骤:

Updating the lexer: This will make it aware of the new operator syntax so that it can be turned into a token

更新词法分析器 :这将使其了解新的运算符语法,以便可以将其转换为令牌

Updating the parser: This will say where it can be used, as well as what precedence and associativity it will have

更新解析器 :这将说明可以在哪里使用它,以及它具有什么优先级和关联性

Updating the compilation stage: This is where the abstract syntax tree (AST) is traversed and opcodes are emitted from it

更新编译阶段 :这是遍历抽象语法树(AST)并从中发出操作码的地方

Updating the Zend VM: This is used to handle the interpretation of the new opcode for the operator during script execution

更新Zend VM :这用于在脚本执行期间为操作员处理新操作码的解释

This article therefore seeks to provide a brief overview of a number of PHP’s internal aspects.

因此,本文旨在简要概述PHP的许多内部方面。

Also, a big thank you to Nikita Popov for proofreading and helping to improve my article!

另外,非常感谢Nikita Popov校对并帮助改善我的文章!

范围运算符 (The Range Operator)

The operator that will be added into PHP in this article will be called the range operator (|>). To keep things simple, the range operator will be defined with the following semantics:

本文将在PHP中添加的运算符称为范围运算符( |> )。 为了简单起见,将使用以下语义定义范围运算符:

The incrementation step will always be one

增量步长始终为1 Both operands must either be integers or floats

两个操作数都必须是整数或浮点数 If min = max, return a one element array consisting of min.

如果min = max,则返回一个由min组成的单元素数组。

(The above points will all be referenced in the final section, Updating the Zend VM, when we finally implement the above semantics.)

(以上几点将在最终实现上述语义的最后一节“ 更新Zend VM”中引用)。

If any of the above semantics are not satisfied, then an Error exception will be thrown. This will therefore occur if:

如果不满足上述任何语义,则将引发Error异常。 因此,在以下情况下会发生这种情况:

either operand is not an integer or float

操作数不是整数或浮点数 min > max

最小值>最大值 the range (max – min) is too large

范围(最大值-最小值)太大

Examples:

例子:

1 |> 3; // [1, 2, 3] 2.5 |> 5; // [2.5, 3.5, 4.5] $a = $b = 1; $a |> $b; // [1] 2 |> 1; // Error exception 1 |> '1'; // Error exception new StdClass |> 1; // Error exception

更新词法分析器 (Updating the Lexer)

Firstly, the new token must be registered in the lexer so that when the source code is tokenized, it will turn |> into the T_RANGE token. For this, the Zend/zend_language_scanner.l file will need to be updated by adding the following code to it (where all of the other tokens are defined, line ~1200):

首先,新令牌必须在词法分析器中注册,以便在对源代码进行令牌化时,它将|>转换为T_RANGE令牌。 为此,需要通过向其添加以下代码来更新Zend / zend_language_scanner.l文件(其中定义了所有其他标记,行〜1200):

<ST_IN_SCRIPTING>"|>" { RETURN_TOKEN(T_RANGE); }

The ST_IN_SCRIPTING mode is the state the lexer is currently in. This means it will only match the |> character sequence when it is in normal scripting mode. The code between the curly braces is C code that will be executed when |> is found in the source code. In this example, it simply returns a T_RANGE token.

ST_IN_SCRIPTING模式是词法分析器当前处于的状态。这意味着它仅在处于正常脚本模式时才匹配|>字符序列。 花括号之间的代码是C代码,当在源代码中找到|>时将执行该代码。 在此示例中,它仅返回T_RANGE令牌。

Note: Since we’re modifying the lexer, we will need Re2c to regenerate it. This dependency is not needed for normal builds of PHP.

注意:由于我们正在修改词法分析器,因此我们需要Re2c来重新生成它。 正常PHP构建不需要此依赖关系。

Next, the T_RANGE identifier must be declared in the Zend/zend_language_parser.y file. To do this, we must add the following line to where the other token identifiers are declared (at the end will do, line ~220):

接下来,必须在Zend / zend_language_parser.y文件中声明T_RANGE标识符。 为此,我们必须在声明其他令牌标识符的位置添加以下行(最后是〜220行):

%token T_RANGE "|> (T_RANGE)"

PHP now recognizes the new operator:

PHP现在可以识别新的运算符:

1 |> 2; // Parse error: syntax error, unexpected '|>' (T_RANGE) in...

But since its usage hasn’t been defined yet, using it will lead to a parse error. This will be fixed in the next section.

但是由于尚未定义其用法,因此使用它会导致解析错误。 这将在下一节中修复。

First though, we must regenerate the ext/tokenizer/tokenizer_data.c file in the tokenizer extension to cater for the newly added token. (The tokenizer extension simply provides an interface for PHP’s lexer to userland through the token_get_all and token_name functions.) At the moment, it is blissfully ignorant of our new T_RANGE token:

不过,首先,我们必须在tokenizer扩展中重新生成ext / tokenizer / tokenizer_data.c文件,以适应新添加的令牌。 (tokenizer扩展只是通过token_get_all和token_name函数为PHP的词法分析器提供了一个到用户token_name接口。)目前,它非常T_RANGE地忽略了我们的新T_RANGE令牌:

echo token_name(token_get_all('<?php 1|>2;')[2][0]); // UNKNOWN

We regenerate the ext/tokenizer/tokenizer_data.c file by going into the ext/tokenizer directory and executing the tokenizer_data_gen.sh file. Then go back into the root php-src directory and build PHP again. Now the tokenizer extension works again:

通过进入ext / tokenizer目录并执行tokenizer_data_gen.sh文件,我们重新生成ext / tokenizer / tokenizer_data.c文件。 然后回到php-src根目录并再次构建PHP。 现在,tokenizer扩展再次起作用:

echo token_name(token_get_all('<?php 1|>2;')[2][0]); // T_RANGE

更新解析器 (Updating the Parser)

The parser needs to be updated now so that it can validate where the new T_RANGE token is used in PHP scripts. It’s also responsible for stating the precedence and associativity of the new operator and generating the abstract syntax tree (AST) node for the new construct. This will all be done in the Zend/zend_language_parser.y grammar file, which contains the token definitions and production rules that Bison will use to generate the parser from.

现在需要更新解析器,以便可以验证新的T_RANGE令牌在PHP脚本中使用的位置。 它还负责说明新运算符的优先级和关联性,并为新构造生成抽象语法树(AST)节点。 所有这些都将在Zend / zend_language_parser.y语法文件中完成,该文件包含Bison将用来生成解析器的标记定义和生产规则。



Digression:

题外话:

Precedence determines the rules of grouping expressions. For example, in the expression 3 + 4 * 2, * has a higher precedence than +, and so it will be grouped as 3 + (4 * 2).

优先级确定对表达式分组的规则。 例如,在表达式3 + 4 * 2 , *的优先级高于+ ,因此它将被分组为3 + (4 * 2) 。

Associativity is how the operator will behave when chained. It determines whether the operator can be chained, and if so, then what direction it will be grouped from in a particular expression. For example, the ternary operator has (rather strangely) left-associativity, and so it will be grouped and executed from left to right. Therefore, the following expression:

关联性是操作员链接时的行为方式。 它确定是否可以链接运算符,如果可以,则在特定表达式中从哪个方向对其进行分组。 例如,三元运算符具有(而不是奇怪的)左关联性,因此它将被分组并从左到右执行。 因此,以下表达式:

1 ? 0 : 1 ? 0 : 1; // 1

Will be executed as follows:

将执行如下:

(1 ? 0 : 1) ? 0 : 1; // 1

This can, of course, be changed (read: rectified) to be right-associative with proper grouping:

当然,可以将其更改(阅读:更正),使其与正确的分组正确关联:

$a = 1 ? 0 : (1 ? 0 : 1); // 0

Some operators, however, are non-associative and therefore cannot be chained at all. For example, the less than (>) operator is like this, and so the following is invalid:

但是,某些运算符是非关联的,因此根本无法链接。 例如,小于( > )运算符是这样的,因此以下内容无效:

1 < $a < 2;

Since the range operator will evaluate to an array, having it as an input operand to itself would be rather useless (i.e. 1 |> 3 |> 5 would be non-sensical). So let’s make the operator non-associative, and whilst we’re at it, let’s set it to have the same precedence as the combined comparison operator (T_SPACESHIP). This is done by adding the T_RANGE token onto the end of the following line (line ~70):

由于范围运算符将对数组求值,因此将其用作自身的输入操作数将非常无用(即1 |> 3 |> 5将是无意义的)。 因此,让我们使运算符成为非关联运算符,并且在使用它的同时,将其设置为与组合比较运算符( T_SPACESHIP )具有相同的优先级。 这是通过将T_RANGE令牌添加到以下行(第70行)的末尾来完成的:

%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGE

Next, we must update the expr_without_variable production rule to cater for our new operator. This will be done by adding the following code into the rule (I placed it just below the T_SPACESHIP rule, line ~930):

接下来,我们必须更新expr_without_variable生产规则以适应我们的新运算符。 这可以通过将以下代码添加到规则中来完成(我将其放置在T_SPACESHIP规则的下面,第930行):

| expr T_RANGE expr { $$ = zend_ast_create(ZEND_AST_RANGE, $1, $3); }

The pipe character (|) is used to denote an or, meaning that any one of those rules can match in that particular production rule. The code within the curly braces is to be executed when that match occurs. The $$ denotes the result node that stores the value of the expression. The zend_ast_create function is used to create our AST node for our operator. This AST node is created with the name ZEND_AST_RANGE, and has two values: $1 references the left operand (expr T_RANGE expr) and $3 references the right operand (expr T_RANGE expr).

竖线字符(|)用于表示或 ,表示这些规则中的任何一个都可以在该特定生产规则中匹配。 发生匹配时,花括号中的代码将被执行。 $$表示存储表达式值的结果节点。 zend_ast_create函数用于为我们的操作员创建AST节点。 此AST节点以名称ZEND_AST_RANGE创建,并具有两个值: $1引用左操作数( expr T_RANGE expr), $3引用右操作数(expr T_RANGE expr )。

Next, we will need to define the ZEND_AST_RANGE constant for the AST. To do this, the Zend/zend_ast.h file will need to be updated by simply adding the ZEND_AST_RANGE constant under the list of two children nodes (I added it under ZEND_AST_COALESCE):

接下来,我们将需要为AST定义ZEND_AST_RANGE常量。 为此,只需在两个子节点的列表下添加ZEND_AST_RANGE常量即可更新Zend / zend_ast.h文件(我在ZEND_AST_COALESCE下添加了该ZEND_AST_COALESCE ):

ZEND_AST_RANGE,

Now executing our range operator will just cause the interpreter to hang:

现在执行我们的范围运算符只会导致解释器挂起:

1 |> 2;

It’s time to update the compilation stage.

现在是时候更新编译阶段了。

更新编译阶段 (Updating the Compilation Stage)

We now need to update the compilation stage. The parser outputs an AST that is then recursively traversed, where functions are triggered to execute as each node in the AST is visited. These triggered functions emit opcodes that the Zend VM will then execute later during the interpretation phase.

现在,我们需要更新编译阶段。 解析器输出一个AST,然后递归遍历该AST,并在访问AST中的每个节点时触发执行功能。 这些触发的函数发出操作码,然后Zend VM将在解释阶段稍后执行。

This compilation happens in Zend/zend_compile.c , so let’s start by adding our new AST node name (ZEND_AST_RANGE) into the large switch statement in the zend_compile_expr function (I’ve added it just below ZEND_AST_COALESCE, line ~7200):

该编译发生在Zend / zend_compile.c中 ,所以让我们开始将新的AST节点名称( ZEND_AST_RANGE )添加到zend_compile_expr函数的大switch语句中(我将其添加到ZEND_AST_COALESCE下面,第7200行):

case ZEND_AST_RANGE: zend_compile_range(result, ast); return;

Now we must define the zend_compile_range function somewhere in that same file:

现在,我们必须在同一文件的某处定义zend_compile_range函数:

void zend_compile_range(znode *result, zend_ast *ast) /* {{{ */ { zend_ast *left_ast = ast->child[0]; zend_ast *right_ast = ast->child[1]; znode left_node, right_node; zend_compile_expr(&left_node, left_ast); zend_compile_expr(&right_node, right_ast); zend_emit_op_tmp(result, ZEND_RANGE, &left_node, &right_node); } /* }}} */

We start by dereferencing the left and right operands of the ZEND_AST_RANGE node into the pointer variables left_ast and right_ast. We then define two znode variables that will hold the result of compiling down the AST nodes for both of the operands (this is the recursive part of traversing the AST and compiling its nodes into opcodes).

我们首先将ZEND_AST_RANGE节点的左右操作数解引用为指针变量left_ast和right_ast 。 然后,我们定义两个znode变量, znode变量将保存两个操作数的AST节点的编译结果(这是遍历AST并将其节点编译为操作码的递归部分)。

Next, we emit the ZEND_RANGE opcode with its two operands using the zend_emit_op_tmp function.

接下来,我们使用zend_emit_op_tmp函数发出带有两个操作数的ZEND_RANGE操作码。

Now would probably be a good time to quickly discuss opcodes and their types to better explain the usage of the zend_emit_op_tmp function.

现在可能是快速讨论操作码及其类型的好时机,以更好地解释zend_emit_op_tmp函数的用法。

Opcodes are instructions that are executed by the Zend VM. They each have:

操作码是Zend VM执行的指令。 他们每个人都有:

a name (a constant that maps to some integer)

名称(映射到某个整数的常量) an op1 node (optional)

一个op1节点(可选) an op2 node (optional)

一个op2节点(可选) a result node (optional). This is usually used to store a temporary value of the opcode operation

结果节点(可选)。 通常用于存储操作码操作的临时值 an extended value (optional). This is an integer value that is used to differentiate between behaviours for overloaded opcodes

扩展值(可选)。 这是一个整数值,用于区分重载操作码的行为

Digression:

题外话:

The opcodes for a PHP script can be seen using either:

可以使用以下任一方式查看PHP脚本的操作码:

PHPDBG: sapi/phpdbg/phpdbg -np* program.php

PHPDBG: sapi/phpdbg/phpdbg -np* program.php

Opcache

操作缓存

Vulcan Logic Disassembler (VLD) extension: sapi/cli/php -dvld.active=1 program.php

火神逻辑反汇编器(VLD)扩展名 : sapi/cli/php -dvld.active=1 program.php

Or, if the script is short and trivial, then 3v4l can be used

或者,如果脚本简短而琐碎,则可以使用3v4l



Opcode nodes (znode_op structures) can be of a number of different types:

操作码节点( znode_op结构)可以具有许多不同的类型:

IS_CV – for Compiled Variables. These are simple variables (like $a) that are cached in a simple array to bypass hash table lookups. They were introduced in PHP 5.1 under the Compiled Variables optimisation. They’re denoted by !n in VLD (where n is an integer)

IS_CV -对C ompiled V ariables。 这些是简单变量(如$a ),它们被缓存在一个简单数组中以绕过哈希表查找。 它们是在PHP 5.1中的“编译变量优化”下引入的。 它们在VLD中用!n表示(其中n是整数)

IS_VAR – for all other variable-like expressions that aren’t simple (like $a->b). They can hold an IS_REFERENCE zval, and are denoted by $n in VLD (where n is an integer)

IS_VAR –用于所有其他IS_VAR的类似变量的表达式(例如$a->b )。 它们可以保存IS_REFERENCE zval,并在VLD中用$n表示(其中n是整数)

IS_CONST – for literal values (e.g. hard-coded strings)

IS_CONST –用于文字值(例如,硬编码的字符串)

IS_TMP_VAR – temporary variables are used to hold the intermediate result of an expression (making them typically short-lived). They too can be refcounted (as of PHP 7), but cannot hold an IS_REFERENCE zval (because temporary values cannot be used as references). They are denoted by ~n in VLD (where n is an integer)

IS_TMP_VAR –临时变量用于保存表达式的中间结果(通常使它们短暂)。 它们也可以被引用(从PHP 7开始),但是不能保存IS_REFERENCE zval(因为临时值不能用作引用)。 它们被表示为~n在VLD(其中,n是整数)

IS_UNUSED – generally used to mark an op node as not used. Sometimes, however, data will be stored in the znode_op.num member to be used in the VM.

IS_UNUSED –通常用于将op节点标记为未使用。 但是,有时,数据将存储在znode_op.num成员中以在VM中使用。

This brings us back to the above zend_emit_op_tmp function, which will emit a zend_op with a type of IS_TMP_VAR. We want to do this because our operator will be an expression, and so the value it produces (an array) will be a temporary variable that may be used as an operand to another opcode (such as ASSIGN from code like $var = 1 |> 3;).

这使我们回到上面的zend_emit_op_tmp函数,该函数将发出带有zend_op类型的IS_TMP_VAR 。 我们要做到这一点,因为我们的运营商将是一个表达式,所以它产生的值(数组)将是可以被用作一个操作数到另一个操作码(如临时变量ASSIGN从这样的代码$var = 1 |> 3; )。

更新Zend VM (Updating the Zend VM)

Now we will need to update the Zend VM to handle the execution of our new opcode. This will involve updating the Zend/zend_vm_def.h file by adding the following code (at the bottom will do):

现在,我们将需要更新Zend VM,以处理新操作码的执行。 这将涉及通过添加以下代码来更新Zend / zend_vm_def.h文件(底部将执行此操作):

ZEND_VM_HANDLER(182, ZEND_RANGE, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV) { USE_OPLINE zend_free_op free_op1, free_op2; zval *op1, *op2, *result, tmp; SAVE_OPLINE(); op1 = GET_OP1_ZVAL_PTR_DEREF(BP_VAR_R); op2 = GET_OP2_ZVAL_PTR_DEREF(BP_VAR_R); result = EX_VAR(opline->result.var); // if both operands are integers if (Z_TYPE_P(op1) == IS_LONG && Z_TYPE_P(op2) == IS_LONG) { // for when min and max are integers } else if ( // if both operands are either integers or doubles (Z_TYPE_P(op1) == IS_LONG || Z_TYPE_P(op1) == IS_DOUBLE) && (Z_TYPE_P(op2) == IS_LONG || Z_TYPE_P(op2) == IS_DOUBLE) ) { // for when min and max are either integers or floats } else { // for when min and max are neither integers nor floats } FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION(); }

(The opcode number should be one more than the previous highest, so 182 may already be taken for you. To quickly see what the highest current opcode number is, look at the bottom of the Zend/zend_vm_opcodes.h file – the ZEND_VM_LAST_OPCODE constant should hold this value.)

(操作码号应该比先前的最高,要多一个,所以您可能已经需要182了。要快速查看当前最高的操作码号,请查看Zend / zend_vm_opcodes.h文件的底部– ZEND_VM_LAST_OPCODE常数应为保持此值。)



Digression:

题外话:

The above code contains a number of pseudo-macros (like USE_OPLINE and GET_OP1_ZVAL_PTR_DEREF). These aren’t actual C macros – instead, they’re replaced by the Zend/zend_vm_gen.php script during VM generation, as opposed to by the preprocessor during source code compilation. Therefore, if you’d like to look up their definitions, you’ll need to dig through the Zend/zend_vm_gen.php file.

上面的代码包含许多伪宏(例如USE_OPLINE和GET_OP1_ZVAL_PTR_DEREF )。 这些不是实际的C宏–而是在VM生成期间由Zend / zend_vm_gen.php脚本代替,而不是在源代码编译期间由预处理器代替。 因此,如果您想查找它们的定义,则需要深入研究Zend / zend_vm_gen.php文件。



The ZEND_VM_HANDLER pseudo-macro contains each opcode’s definition. It can have 5 parameters:

ZEND_VM_HANDLER伪宏包含每个操作码的定义。 它可以有5个参数:

The opcode number (182)

操作码编号(182) The opcode name (ZEND_RANGE)

操作码名称(ZEND_RANGE)

The valid left operand types (CONST|TMP|VAR|CV) (see $vm_op_decode in Zend/zend_vm_gen.php for all types)

有效的左操作数类型(CONST | TMP | VAR | CV)(有关所有类型,请参见Zend / zend_vm_gen.php中的 $vm_op_decode )

The valid right operand types (CONST|TMP|VAR|CV) (ditto)

有效的右操作数类型(CONST | TMP | VAR | CV)(同上)

An optional flag holding the extended value for overloaded opcodes (see $vm_ext_decode in Zend/zend_vm_gen.php for all types)

一个可选标志,用于保存重载操作码的扩展值(有关所有类型,请参见Zend / zend_vm_gen.php中的 $vm_ext_decode )

From our above definitions of the types, we can see that:

从上面对类型的定义中,我们可以看到:

// CONST enables for 1 |> 5.0; // TMP enables for (2**2) |> (1 + 3); // VAR enables for $cmplx->var |> $var[1]; // CV enables for $a |> $b;

If one or both operands are not used, then they are marked with ANY.

如果未使用一个或两个操作数,则将它们标记为ANY 。

Note that TMPVAR was introduced into ZE3 and is similar to TMP|VAR in that it handles the same opcode node types, but differs in what code is generated. TMPVAR generates a single method to handle both TMP and VAR, which decreases the VM size but requires more conditional logic. Conversely, TMP|VAR generates methods for both of TMP and VAR, increasing the VM size but with less conditionals.

注意, TMPVAR是在TMPVAR中引入的,它与TMP|VAR相似,因为它处理相同的操作码节点类型,但是生成的代码有所不同。 TMPVAR生成用于处理TMP和VAR的单个方法,这可以减小VM的大小,但需要更多的条件逻辑。 相反地, TMP|VAR生成用于两者的方法TMP和VAR ,增加VM大小,但具有较少条件句。

Moving onto the body of our opcode definition, we begin by invoking the USE_OPLINE pseudo-macro to declare the opline variable (a zend_op struct). This will be used to fetch the operands (with pseudo-macros like GET_OP1_ZVAL_PTR_DEREF) and setting the return value of the opcode.

进入操作码定义的主体,我们首先调用USE_OPLINE伪宏来声明opline变量( zend_op结构)。 这将用于获取操作数(带有伪宏,例如GET_OP1_ZVAL_PTR_DEREF )并设置操作码的返回值。

Next, we declare two zend_free_op variables. These are simply pointers to zvals that are declared for each operand we use. They are used when checking if that particular operand needs to be freed. Four zval variables are then declared. op1 and op2 are pointers to zvals that hold the operand values. result is declared to store the result of the opcode operation. Lastly, tmp is used as an intermediary value of the range looping operation that will be copied upon each iteration into a hash table.

接下来,我们声明两个zend_free_op变量。 这些只是指向我们使用的每个操作数声明的zval的指针 。 在检查是否需要释放特定操作数时使用它们。 然后声明四个zval变量。 op1和op2是保存操作数值的zval的指针。 声明result以存储操作码操作的结果。 最后, tmp用作范围循环操作的中间值,该值将在每次迭代时复制到哈希表中。

The op1 and op2 variables are then initialized by the GET_OP1_ZVAL_PTR_DEREF and GET_OP2_ZVAL_PTR_DEREF pseudo-macros, respectively. These pseudo-macros are also responsible for initializing the free_op1 and free_op2 variables. The BP_VAR_R constant that is passed into the aforementioned macros is a type flag. It stands for BackPatching Variable Read and is used when fetching compiled variables. Lastly, the opline’s result is dereferenced and assigned to result, to be used later on.

然后分别通过GET_OP1_ZVAL_PTR_DEREF和GET_OP2_ZVAL_PTR_DEREF伪宏初始化op1和op2变量。 这些伪宏还负责初始化free_op1和free_op2变量 。 传递到上述宏中的BP_VAR_R常量是类型标志。 它代表BackPatching Variable Read,并在获取编译后的变量时使用 。 最后,opline的结果被取消引用,并分配给result ,以供稍后使用。

Let’s now fill in the first if body when both min and max are integers:

现在,当min和max均为整数时,让我们填写第一个if主体:

zend_long min = Z_LVAL_P(op1), max = Z_LVAL_P(op2); zend_ulong size, i; if (min > max) { zend_throw_error(NULL, "Min should be less than (or equal to) max"); HANDLE_EXCEPTION(); } // calculate size (one less than the total size for an inclusive range) size = max - min; // the size cannot be greater than or equal to HT_MAX_SIZE // HT_MAX_SIZE - 1 takes into account the inclusive range size if (size >= HT_MAX_SIZE - 1) { zend_throw_error(NULL, "Range size is too large"); HANDLE_EXCEPTION(); } // increment the size to take into account the inclusive range ++size; // set the zval type to be a long Z_TYPE_INFO(tmp) = IS_LONG; // initialise the array to a given size array_init_size(result, size); zend_hash_real_init(Z_ARRVAL_P(result), 1); ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) { for (i = 0; i < size; ++i) { Z_LVAL(tmp) = min + i; ZEND_HASH_FILL_ADD(&tmp); } } ZEND_HASH_FILL_END(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();

We start by defining the min and max variables. These are declared as zend_long, which must be used when declaring long integers (likewise with zend_ulong for defining unsigned long integers). This size is then declared as zend_ulong, which will hold the size of the array to be generated.

我们首先定义min和max变量。 这些被声明为zend_long ,在声明长整数时必须使用(同样,使用zend_ulong定义无符号长整数)。 然后将此大小声明为zend_ulong ,它将保存要生成的数组的大小。

A check is then performed to see if min is greater than max – if it is, an Error exception is thrown. By passing in NULL as the first argument to zend_throw_error, the exception class defaults to Error. We could specialise this exception by sub-classing Error and make a new class entry in Zend/zend_exceptions.c, but that’s probably best covered in a later article. If an exception is thrown, then we invoke the HANDLE_EXCEPTION pseudo-macro that skips onto the next opcode to be executed.

然后执行检查以查看min是否大于max –如果大于,则抛出Error异常。 通过将NULL作为第一个参数传递给zend_throw_error ,异常类默认为Error 。 我们可以通过对Error进行子类化来专门化此异常,并在Zend / zend_exceptions.c中创建一个新的类条目,但这可能在以后的文章中最好地介绍。 如果抛出异常,则我们调用HANDLE_EXCEPTION伪宏,该伪宏会跳至要执行的下一个操作码。

Next, we calculate the size of the array to be generated. This size is one less than the actual size because it does not take into account the inclusive range. The reason why we don’t simply plus one onto this size is because of the potential for overflow to occur if min is equal to ZEND_LONG_MIN (PHP_INT_MIN) and max is equal to ZEND_LONG_MAX (PHP_INT_MAX).

接下来,我们计算要生成的数组的大小。 此大小比实际大小小1,因为它没有考虑到包含范围。 我们之所以不简单地在此大小上加一,是因为如果min等于ZEND_LONG_MIN ( PHP_INT_MIN )并且max等于ZEND_LONG_MAX ( PHP_INT_MAX ),则有可能发生溢出。

The size is then checked against the HT_MAX_SIZE constant to ensure that the array will fit inside of the hash table. The total array size must not be greater than or equal to HT_MAX_SIZE – if it is, then we once again throw an Error exception and exit the VM.

然后对照HT_MAX_SIZE常量检查大小,以确保该数组适合哈希表的内部。 阵列的总大小不能大于或等于HT_MAX_SIZE如果是,则我们再次引发Error异常并退出VM。

Because HT_MAX_SIZE is equal to INT_MAX + 1, we know that if size is less than this, we can safely increment size without fear of overflow. So this is what we do next so that our size now accommodates for an inclusive range.

因为HT_MAX_SIZE等于INT_MAX + 1 ,所以我们知道,如果size小于此值,我们可以安全地增加大小,而不必担心溢出。 因此,这就是我们接下来要做的,以便我们的size现在可以容纳一个包含在内的范围。

We then change the type of the tmp zval to IS_LONG, and then initialise result using the array_init_size macro. This macro basically sets the type of result to IS_ARRAY_EX, allocates memory for the zend_array structure (a hashtable), and sets up its corresponding hashtable. The zend_hash_real_init function then allocates memory for the Bucket structures that hold each of the elements of the array. The second argument, 1, specifies that we would like it to be a packed hashtable.

然后,我们改变的类型tmp的zval到IS_LONG ,然后INITIALISE result使用array_init_size宏。 此宏基本上将result的类型设置为IS_ARRAY_EX ,为zend_array结构(哈希表)分配内存,并设置其对应的哈希表。 然后, zend_hash_real_init函数为存储数组每个元素的存储Bucket结构分配内存。 第二个参数1指定我们希望它是打包的哈希表。



Digression:

题外话:

A packed hashtable is effectively an actual array, i.e. one that is numerically accessed via integer keys (unlike typical associative arrays in PHP). This optimization was introduced into PHP 7 because it was recognized that many arrays in PHP were integer indexed (keys in increasing order). Packed hashtables allow the hashtable buckets to be directly accessed (like a normal array). See Nikita’s PHP’s new hashtable implementation article for more information.

打包的哈希表实际上是一个实际的数组,即通过整数键以数字方式访问的数组(与PHP中的典型关联数组不同)。 将此优化引入PHP 7是因为已认识到PHP中的许多数组都是整数索引(键按升序排列)。 打包的哈希表允许直接访问哈希表存储桶(就像普通数组一样)。 有关更多信息,请参见Nikita的PHP的新哈希表实现文章。



Note: The _zend_array structure has two aliases: zend_array and HashTable.

注意: _zend_array结构具有两个别名: zend_array和HashTable 。

Next, we populate the array. This is done with the ZEND_HASH_FILL_PACKED macro (definition), which basically keeps track of the current bucket to insert into. The tmp zval stores the intermediary result (the array element) when generating the array. The ZEND_HASH_FILL_ADD macro makes a copy of tmp, inserts this copy into the current hashtable bucket, and increments to the next bucket for the next iteration.

接下来,我们填充数组。 这是通过ZEND_HASH_FILL_PACKED宏( 定义 )完成的,该宏基本上跟踪要插入的当前存储桶。 tmp zval在生成数组时存储中间结果(数组元素)。 ZEND_HASH_FILL_ADD宏复制tmp并将其复制到当前哈希表存储桶中,并递增到下一个存储桶以进行下一次迭代。

Finally, the ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION macro (introduced in ZE3 to replace the separate CHECK_EXCEPTION() and ZEND_VM_NEXT_OPCODE() calls made in ZE2) checks whether an exception has occurred. Provided an exception hasn’t occurred, then the VM skips onto the next opcode.

最后, ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION宏(在ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION引入,以替换在CHECK_EXCEPTION()进行的单独的CHECK_EXCEPTION()和ZEND_VM_NEXT_OPCODE()调用)检查是否发生了异常。 如果未发生异常,则VM跳至下一个操作码。

Let’s now take a look at the else if block:

现在让我们看一下else if块:

long double min, max, size, i; if (Z_TYPE_P(op1) == IS_LONG) { min = (long double) Z_LVAL_P(op1); max = (long double) Z_DVAL_P(op2); } else if (Z_TYPE_P(op2) == IS_LONG) { min = (long double) Z_DVAL_P(op1); max = (long double) Z_LVAL_P(op2); } else { min = (long double) Z_DVAL_P(op1); max = (long double) Z_DVAL_P(op2); } if (min > max) { zend_throw_error(NULL, "Min should be less than (or equal to) max"); HANDLE_EXCEPTION(); } size = max - min; if (size >= HT_MAX_SIZE - 1) { zend_throw_error(NULL, "Range size is too large"); HANDLE_EXCEPTION(); } // we cast the size to an integer to get rid of the decimal places, // since we only care about whole number sizes size = (int) size + 1; Z_TYPE_INFO(tmp) = IS_DOUBLE; array_init_size(result, size); zend_hash_real_init(Z_ARRVAL_P(result), 1); ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) { for (i = 0; i < size; ++i) { Z_DVAL(tmp) = min + i; ZEND_HASH_FILL_ADD(&tmp); } } ZEND_HASH_FILL_END(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION();

Note: We use long double to handle the cases where there is potentially a mix of floats and integers as operands. This is because double only has 53 bits of precision, and so any integer greater than 2^53 may not be accurately represented as a double. long double, on the other hand, has at least 64 bits of precision, and so it can therefore accurately represent 64 bit integers.

注意:我们使用long double来处理可能存在浮点数和整数作为操作数的情况。 这是因为double仅具有53位精度,因此任何大于2 ^ 53的整数都可能无法准确表示为double 。 另一方面, long double至少具有64位精度,因此可以准确地表示64位整数。

This code is very similiar to the previous logic. The main difference now is that we handle the data as floating point numbers. This includes fetching them with the Z_DVAL_P macro, setting the type info for tmp to IS_DOUBLE, and inserting the zval (of type double) with the Z_DVAL macro.

该代码非常类似于先前的逻辑。 现在的主要区别是我们将数据作为浮点数进行处理。 这包括使用Z_DVAL_P宏获取它们,将tmp的类型信息设置为IS_DOUBLE以及使用Z_DVAL宏插入zval(类型为double)。

Lastly, we must handle the case where either (or both) min and max are not integers or floats. As defined in point #2 of our range operator semantics, only integers and floats are supported as operands – if anything else is provided, an Error exception will be thrown. Paste the following code in the else block:

最后,我们必须处理min和max (或两者都不是整数或浮点数)的情况。 根据我们的范围运算符语义的第2点的定义,仅将整数和浮点数用作操作数-如果提供了其他任何内容,则将引发Error异常。 将以下代码粘贴到else块中:

zend_throw_error(NULL, "Unsupported operand types - only ints and floats are supported"); HANDLE_EXCEPTION();

With our opcode definition done, we must now regenerate the VM. This is done by running the Zend/zend_vm_gen.php file, which will use the Zend/zend_vm_def.h file to regenerate the Zend/zend_vm_opcodes.h, Zend/zend_vm_opcodes.c, and Zend/zend_vm_execute.h files.

完成我们的操作码定义后,我们现在必须重新生成VM。 这是通过运行Zend / zend_vm_gen.php文件完成的,该文件将使用Zend / zend_vm_def.h文件重新生成Zend / zend_vm_opcodes.h , Zend / zend_vm_opcodes.c和Zend / zend_vm_execute.h文件。

Now build PHP again so that we can see the range operator in action:

现在再次构建PHP,以便我们可以看到作用域运算符:

var_dump(1 |> 1.5); var_dump(PHP_INT_MIN |> PHP_INT_MIN + 1);

Outputs:

输出:

array(1) { [0]=> float(1) } array(2) { [0]=> int(-9223372036854775808) [1]=> int(-9223372036854775807) }

Now our operator finally works! We’re not quite done yet, though. We still need to update the AST pretty printer (that turns the AST back to code). It currently does not support our range operator – this can be seen by using it within assert():

现在我们的操作员终于可以工作了! 不过,我们还没有完成。 我们仍然需要更新AST漂亮打印机(将AST重新转换为代码)。 目前,它不支持我们的范围运算符-在assert()使用它可以看出这一点:

assert(1 |> 2); // segfaults

Note that assert() uses the pretty printer to output the expression being asserted as part of its error message upon failure. This is only done when the asserted expression is not in string form (since the pretty printer would not be needed otherwise), and is something that is new to PHP 7.

请注意, assert()使用漂亮的打印机在失败时将声明的表达式输出为其错误消息的一部分。 仅当断言的表达式不是字符串形式时才这样做(因为否则就不需要漂亮的打印机),并且这是PHP 7的新功能。

To rectify this, we simply need to update the [Zend/zend_ast.c] (http://lxr.php.net/xref/PHP_7_0/Zend/zend_ast.c) file to turn our ZEND_AST_RANGE node into a string. We will firstly update the precedence table comment (line ~520) by specifying our new operator to have a priority of 170 (this should match the zend_language_parser.y file):

为了解决这个问题,我们只需要更新[Zend / zend_ast.c]( http://lxr.php.net/xref/PHP_7_0/Zend/zend_ast.c )文件,即可将我们的ZEND_AST_RANGE节点转换为字符串。 我们将首先通过指定新运算符的优先级为170(这应与zend_language_parser.y文件匹配)来更新优先级表注释(第〜520行):

* 170 non-associative == != === !== |>

Next, we need to insert a case statement in the zend_ast_export_ex function to handle ZEND_AST_RANGE (just above case ZEND_AST_GREATER):

接下来,我们需要在zend_ast_export_ex函数中插入一个case语句来处理ZEND_AST_RANGE (仅在case ZEND_AST_GREATER ):

case ZEND_AST_RANGE: BINARY_OP(" |> ", 170, 171, 171); case ZEND_AST_GREATER: BINARY_OP(" > ", 180, 181, 181); case ZEND_AST_GREATER_EQUAL: BINARY_OP(" >= ", 180, 181, 181);

The pretty printer has been updated now and assert() works fine once again:

漂亮的打印机现在已经更新,并且assert()再次正常工作:

assert(false && 1 |> 2); // Warning: assert(): assert(false && 1 |> 2) failed...

结论 (Conclusion)

This article has covered a lot of ground, albeit thinly. It has shown the stages ZE goes through when running PHP scripts, how these stages interoperate with one-another, and how we can modify each of these stages to include a new operator into PHP. This article demonstrated just one possible implementation of the range operator in PHP – I’ll cover an alternative (and better) implementation in the followup.

本文覆盖了很多基础,尽管内容不多。 它显示了ZE在运行PHP脚本时所经历的阶段,这些阶段如何相互操作以及如何修改每个阶段以在PHP中包含新的运算符。 本文仅演示了PHP中范围运算符的一种可能实现-我将在后续文章中介绍另一种(更好的)实现。

翻译自: https://www.sitepoint.com/implementing-the-range-operator-in-php/

最新回复(0)