We sometimes come across some amazing posts in other locations, and with the permissions of their authors, repost them on SitePoint. This is one such instance. In the post below, Thomas Punt reimplements the previously implemented range operator in PHP, this time using an improved approach. If you’ve ever been interested in PHP internals and adding features to your favorite programming language, now’s the time to learn!
有时我们会在其他位置遇到一些很棒的帖子,并在其作者许可下,将其重新发布到SitePoint上。 这是一个这样的例子。 在下面的文章中, Thomas Punt使用改进的方法重新实现了以前在PHP中实现的范围运算符。 如果您曾经对PHP的内部结构感兴趣并将功能添加到自己喜欢的编程语言中,那么现在该学习一下了!
This article assumes that the reader is able to build PHP from source. If this is not the case, then please see the Building PHP chapter of the PHP Internals Book first.
本文假定读者能够从源代码构建PHP。 如果不是这种情况,请首先参阅PHP Internals Book的Building PHP章节 。
In the prequel to this article (hint: make sure you’ve read it first), I showed one way to implement a range operator in PHP. Initial implementations, however, are rarely the best, and so it is the intention of this article to look at how the previous implementation can be improved.
在本文的前传中 (提示:请确保您已先阅读它),我展示了一种在PHP中实现范围运算符的方法。 但是,最初的实现很少是最好的,因此本文旨在研究如何改进先前的实现。
Thanks once again to Nikita Popov for proofreading this article!
再次感谢Nikita Popov校对本文!
The initial implementation put all of the logic for the range operator into the Zend VM, which forced computation to take place purely at runtime when the ZEND_RANGE opcode was executed. This not only meant that computation could not be shifted to compile time for operands that were literal, but also meant that some features would simply not work.
最初的实现将范围运算符的所有逻辑放到Zend VM中,这迫使计算仅在执行ZEND_RANGE操作码时在运行时进行。 这不仅意味着不能将计算转移到文字操作数的编译时间,而且还意味着某些功能根本无法使用。
In this implementation, we will shift the range operator logic out of the Zend VM to enable for computation to be done at either compile time (for literal operands) or runtime (for dynamic operands). This will not only provide a small win for Opcache users, but will more importantly allow for constant expression features to be used with the range operator.
在此实现中,我们将范围运算符逻辑移出Zend VM,以使计算能够在编译时(对于文字操作数)或运行时(对于动态操作数)完成。 这不仅会给Opcache用户带来一丁点好处,而且更重要的是允许将常量表达式功能与range运算符一起使用。
For example:
例如:
// as constant definitions const AN_ARRAY = 1 |> 100; // as initial property definitions class A { private $a = 1 |> 2; } // as default values for optional parameters: function a($a = 1 |> 2) { // }So without further ado, let’s reimplement the range operator.
因此,事不宜迟,让我们重新实现范围运算符。
The lexer implementation remains exactly the same. The token is firstly registered in Zend/zend_language_scanner.l (line ~1200):
词法分析器实现完全相同。 令牌首先在Zend / zend_language_scanner.l中注册(第1200行):
<ST_IN_SCRIPTING>"|>" { RETURN_TOKEN(T_RANGE); }And then declared in Zend/zend_language_parser.y (line ~220):
然后在Zend / zend_language_parser.y中声明( 〜220行):
%token T_RANGE "|> (T_RANGE)"The tokenizer extension must again be regenerated by going into the ext/tokenizer directory and executing the tokenizer_data_gen.sh file.
必须再次进入ext / tokenizer目录并执行tokenizer_data_gen.sh文件来重新生成tokenizer扩展名。
The parser implementation is partially the same as before. We again start by stating the operator’s precedence and associativity by adding the T_RANGE token onto the end of the following line (line ~70):
解析器的实现与以前部分相同。 我们再次通过在以下行( T_RANGE行)的末尾添加T_RANGE令牌来说明运算符的优先级和关联性:
%nonassoc T_IS_EQUAL T_IS_NOT_EQUAL T_IS_IDENTICAL T_IS_NOT_IDENTICAL T_SPACESHIP T_RANGEWe then update the expr_without_variable production rule again, though this time the semantic action (the code within the curly braces) will be slightly different. Update it with the following code (I placed it just below the T_SPACESHIP rule, line ~930):
然后,我们再次更新expr_without_variable生产规则,尽管这次语义动作(花括号内的代码)将略有不同。 用以下代码更新它(我将它放在T_SPACESHIP规则的下面,第T_SPACESHIP行):
| expr T_RANGE expr { $$ = zend_ast_create_binary_op(ZEND_RANGE, $1, $3); }This time, we’ve used the zend_ast_create_binary_op function (instead of the zend_ast_create function), which creates a ZEND_AST_BINARY_OP node for us. zend_ast_create_binary_op takes an opcode name that will be used to distinguish binary operations from one-another during the compilation stage.
这次,我们使用了zend_ast_create_binary_op函数(而不是zend_ast_create函数),该函数为我们创建了一个ZEND_AST_BINARY_OP节点。 zend_ast_create_binary_op采用一个操作码名称,该名称将用于在编译阶段将二进制操作与另一个操作区分开。
Since we’re reusing the ZEND_AST_BINARY_OP node type now, there is no need to define a new ZEND_AST_RANGE node type as done before in the Zend/zend_ast.h file.
由于我们现在正在重用ZEND_AST_BINARY_OP节点类型,因此无需像之前在Zend / zend_ast.h文件中那样定义新的ZEND_AST_RANGE节点类型。
This time, there is no need to update the Zend/zend_compile.c file since it already contains the necessary logic to handle binary operations. Thus, we are simply reusing this logic by making our operator a ZEND_AST_BINARY_OP node.
这次,无需更新Zend / zend_compile.c文件,因为该文件已经包含了处理二进制操作的必要逻辑 。 因此,我们只是通过将运算符ZEND_AST_BINARY_OP节点来简单地重用此逻辑。
The following is a trimmed version of the zend_compile_binary_op function:
以下是zend_compile_binary_op函数的修剪版本:
void zend_compile_binary_op(znode *result, zend_ast *ast) /* {{{ */ { zend_ast *left_ast = ast->child[0]; zend_ast *right_ast = ast->child[1]; uint32_t opcode = ast->attr; znode left_node, right_node; zend_compile_expr(&left_node, left_ast); zend_compile_expr(&right_node, right_ast); if (left_node.op_type == IS_CONST && right_node.op_type == IS_CONST) { if (zend_try_ct_eval_binary_op(&result->u.constant, opcode, &left_node.u.constant, &right_node.u.constant) ) { result->op_type = IS_CONST; zval_ptr_dtor(&left_node.u.constant); zval_ptr_dtor(&right_node.u.constant); return; } } do { // redacted code zend_emit_op_tmp(result, opcode, &left_node, &right_node); } while (0); } /* }}} */As we can see, it is pretty similar to the zend_compile_range function we created last time. The two important differences are in regards to how the opcode type is acquired and what happens when both operands are literals.
如我们所见,它与我们上次创建的zend_compile_range函数非常相似。 在如何获取操作码类型以及两个操作数均为文字时会发生什么方面,这两个重要的区别。
The opcode type is acquired from the AST node this time (as opposed to being hardcoded, as seen last time), since the ZEND_AST_BINARY_OP node stores this value (as seen from the new production rule’s semantic action) to differentiate between binary operations. When both operands are literals, the zend_try_ct_eval_binary_op function will be invoked. This function looks as follows:
这次是从AST节点获取操作码类型(与上一次看到的相对于硬编码相反),因为ZEND_AST_BINARY_OP节点存储了此值(从新生产规则的语义操作中可以看出)以区分二进制操作。 当两个操作数均为文字时,将调用zend_try_ct_eval_binary_op函数。 该函数如下所示:
static inline zend_bool zend_try_ct_eval_binary_op(zval *result, uint32_t opcode, zval *op1, zval *op2) /* {{{ */ { binary_op_type fn = get_binary_op(opcode); /* don't evaluate division by zero at compile-time */ if ((opcode == ZEND_DIV || opcode == ZEND_MOD) && zval_get_long(op2) == 0) { return 0; } else if ((opcode == ZEND_SL || opcode == ZEND_SR) && zval_get_long(op2) < 0) { return 0; } fn(result, op1, op2); return 1; } /* }}} */The function obtains a callback from the get_binary_op function (source ) in Zend/zend_opcode.c according to the opcode type. This means we will need to update this function next to cater for the ZEND_RANGE opcode. Add the following case statement to the get_binary_op function (line ~750):
该函数根据操作码类型从Zend / zend_opcode.c中的get_binary_op函数( 源 )获取回调。 这意味着我们将需要更新此功能,以适应ZEND_RANGE操作码。 将以下case语句添加到get_binary_op函数( get_binary_op行):
case ZEND_RANGE: return (binary_op_type) range_function;Now we must define the range_function function. This will be done in the Zend/zend_operators.c file alongside all of the other operators:
现在我们必须定义range_function函数。 这将在Zend / zend_operators.c文件中与所有其他运算符一起完成:
ZEND_API int ZEND_FASTCALL range_function(zval *result, zval *op1, zval *op2) /* {{{ */ { zval tmp; ZVAL_DEREF(op1); ZVAL_DEREF(op2); if (Z_TYPE_P(op1) == IS_LONG && Z_TYPE_P(op2) == IS_LONG) { zend_long min = Z_LVAL_P(op1), max = Z_LVAL_P(op2); zend_ulong size, i; if (min > max) { zend_throw_error(NULL, "Min should be less than (or equal to) max"); return FAILURE; } // calculate size (one less than the total size for an inclusive range) size = max - min; // the size cannot be greater than or equal to HT_MAX_SIZE // HT_MAX_SIZE - 1 takes into account the inclusive range size if (size >= HT_MAX_SIZE - 1) { zend_throw_error(NULL, "Range size is too large"); return FAILURE; } // increment the size to take into account the inclusive range ++size; // set the zval type to be a long Z_TYPE_INFO(tmp) = IS_LONG; // initialise the array to a given size array_init_size(result, size); zend_hash_real_init(Z_ARRVAL_P(result), 1); ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) { for (i = 0; i < size; ++i) { Z_LVAL(tmp) = min + i; ZEND_HASH_FILL_ADD(&tmp); } } ZEND_HASH_FILL_END(); } else if ( // if both operands are either integers or doubles (Z_TYPE_P(op1) == IS_LONG || Z_TYPE_P(op1) == IS_DOUBLE) && (Z_TYPE_P(op2) == IS_LONG || Z_TYPE_P(op2) == IS_DOUBLE) ) { long double min, max, size, i; if (Z_TYPE_P(op1) == IS_LONG) { min = (long double) Z_LVAL_P(op1); max = (long double) Z_DVAL_P(op2); } else if (Z_TYPE_P(op2) == IS_LONG) { min = (long double) Z_DVAL_P(op1); max = (long double) Z_LVAL_P(op2); } else { min = (long double) Z_DVAL_P(op1); max = (long double) Z_DVAL_P(op2); } if (min > max) { zend_throw_error(NULL, "Min should be less than (or equal to) max"); return FAILURE; } size = max - min; if (size >= HT_MAX_SIZE - 1) { zend_throw_error(NULL, "Range size is too large"); return FAILURE; } // we cast the size to an integer to get rid of the decimal places, // since we only care about whole number sizes size = (int) size + 1; Z_TYPE_INFO(tmp) = IS_DOUBLE; array_init_size(result, size); zend_hash_real_init(Z_ARRVAL_P(result), 1); ZEND_HASH_FILL_PACKED(Z_ARRVAL_P(result)) { for (i = 0; i < size; ++i) { Z_DVAL(tmp) = min + i; ZEND_HASH_FILL_ADD(&tmp); } } ZEND_HASH_FILL_END(); } else { zend_throw_error(NULL, "Unsupported operand types - only ints and floats are supported"); return FAILURE; } return SUCCESS; } /* }}} */The function prototype contains two new macros: ZEND_API and ZEND_FASTCALL. ZEND_API is used to control the visibility of functions by making them available to extensions that are compiled as shared objects. ZEND_FASTCALL is used to ensure a more efficient calling convention is used, where the first two arguments will be passed using registers rather than the stack (more relevant to 32bit builds than 64bit builds on x86).
函数原型包含两个新的宏: ZEND_API和ZEND_FASTCALL 。 ZEND_API用于通过使功能可用于编译为共享库的扩展来控制功能的可见性。 ZEND_FASTCALL用于确保使用更有效的调用约定,其中前两个参数将使用寄存器而不是堆栈进行传递(与x86上的64位版本相比,与32位版本更相关)。
The function body is very similar to what we had in the Zend/zend_vm_def.h file in the previous article. The VM-specific stuff is no longer present, including the HANDLE_EXCEPTION macro calls (which have been replaced with return FAILURE;), and the ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION macro calls have been removed entirely (this check and operation needs to stay in the VM, and so the macro will be invoked from the VM code later).
函数主体与上一篇文章的Zend / zend_vm_def.h文件中的主体非常相似。 VM专用的内容不再存在,包括HANDLE_EXCEPTION宏调用(已由return FAILURE;代替),并且ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION宏调用已被完全删除(此检查和操作需要保留在VM中,因此,宏将稍后从VM代码中调用)。
Another note-worthy difference is that we’re applying ZVAL_DEFEF to both operands to ensure that references are handled properly. This was something that was previously done inside of the VM using the pseudo-macro GET_OPn_ZVAL_PTR_DEREF, but has now been shifted into this function. This was done not because it is needed at compile time (since for compile time handling, both operands would have to be literals, and they cannot be referenced), but because it enables for other places inside the codebase to safely invoke range_function without having to worry about reference handling. As such, referencing handling is performed by most of the operator functions instead of in their VM opcode definition (except where performance matters).
另一个值得注意的区别是,我们将ZVAL_DEFEF应用于两个操作数,以确保正确处理引用。 这是以前在虚拟机内部使用伪宏GET_OPn_ZVAL_PTR_DEREF ,但现在已转移到此功能中。 这样做不是因为在编译时需要它(因为为了进行编译时处理,两个操作数都必须是文字,并且不能被引用),而是因为它使代码库中的其他位置可以安全地调用range_function而不必担心参考处理。 这样,引用处理由大多数操作员功能执行,而不是在其VM操作码定义中执行(除非在性能方面很重要)。
Lastly, we must add the range_function prototype to the Zend/zend_operators.h file:
最后,我们必须将range_function原型添加到Zend / zend_operators.h文件中:
ZEND_API int ZEND_FASTCALL range_function(zval *result, zval *op1, zval *op2);Now we must once again update the Zend VM to handle the execution of the ZEND_RANGE opcode during runtime. Place the following code in Zend/zend_vm_def.h (at the bottom):
现在,我们必须再次更新Zend VM,以在运行时处理ZEND_RANGE操作码的执行。 将以下代码放在Zend / zend_vm_def.h中 (在底部):
ZEND_VM_HANDLER(182, ZEND_RANGE, CONST|TMPVAR|CV, CONST|TMPVAR|CV) { USE_OPLINE zend_free_op free_op1, free_op2; zval *op1, *op2; SAVE_OPLINE(); op1 = GET_OP1_ZVAL_PTR(BP_VAR_R); op2 = GET_OP2_ZVAL_PTR(BP_VAR_R); range_function(EX_VAR(opline->result.var), op1, op2); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION(); }(Again, the opcode number must be one greater than the current highest opcode number, which can be seen at the bottom of the Zend/zend_vm_opcodes.h file.)
(同样,操作码号必须比当前的最高操作码号大一个,可以在Zend / zend_vm_opcodes.h文件的底部看到。)
The definition this time is far shorter since all of the work is handled in range_function. We simply invoke this function, passing in the result operand of the current opline to hold the computed value. The exception checks and skipping onto the next opcode that were removed from range_function are still handled in the VM by the call to ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION at the end. Also, as mentioned previously, we avoid handling references in the VM by using the GET_OPn_ZVAL_PTR pseudo-macros instead (rather than GET_OPn_ZVAL_PTR_DEREF).
这次的定义要短得多,因为所有工作都在range_function处理。 我们只需调用此函数,传入当前opline的结果操作数即可保存计算出的值。 VM仍通过最后对ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION的调用在VM中处理异常检查并跳到从range_function中删除的下一个操作码。 另外,如前所述,我们避免使用GET_OPn_ZVAL_PTR伪宏(而不是GET_OPn_ZVAL_PTR_DEREF )来处理VM中的引用。
Now regenerate the VM by executing the Zend/zend_vm_gen.php file.
现在,通过执行Zend / zend_vm_gen.php文件来重新生成VM。
Lastly, the pretty printer needs updating in the Zend/zend_ast.c file once again. Update the precedence table comment by specifying the new operator to have a priority of 170 (line ~520):
最后,漂亮的打印机需要再次在Zend / zend_ast.c文件中进行更新。 通过指定新运算符的优先级为170(〜520行)来更新优先级表注释:
* 170 non-associative == != === !== |>Then, insert a case statement into the zend_ast_export_ex function to handle the ZEND_RANGE opcode in the ZEND_AST_BINARY_OP case statement (line ~1300):
然后,将一个case语句插入到zend_ast_export_ex函数中,以处理ZEND_AST_BINARY_OP case语句中的ZEND_RANGE操作码( ZEND_AST_BINARY_OP行):
case ZEND_RANGE: BINARY_OP(" |> ", 170, 171, 171);This article has shown an alternative way to implement the range operator, where the computation logic was shifted out of the VM. This had the advantage of being able to use the range operator in constant expression contexts.
本文展示了一种实现范围运算符的替代方法,该方法将计算逻辑移出了VM。 这具有能够在常量表达式上下文中使用范围运算符的优势。
The third part to this article series will build upon this implementation by covering how we can overload this operator. This will enable for objects to be used as operands (such as those from the GMP library or those that implement an __toString method). It will also show how we can add proper support for strings (not like the support seen with PHP’s current range function). But for now, I hope this has served as a nice demonstration of some of ZE’s further aspects when implementing operators into PHP.
本系列文章的第三部分将通过介绍如何重载此运算符来以该实现为基础。 这将使对象可用作操作数(例如来自GMP库的对象或实现__toString方法的__toString )。 它还将显示我们如何添加对字符串的适当支持(不像PHP当前的range函数所看到的那样)。 但是到目前为止,我希望这可以很好地说明ZE在将操作符实现到PHP中时的其他方面。
翻译自: https://www.sitepoint.com/re-implementing-the-range-operator-in-php/
相关资源:jdk-8u281-windows-x64.exe