使用cURL进行远程请求

tech2023-09-28  87

If you’re a Linux user then you’ve probably used cURL. It’s a powerful tool used for everything from sending email to downloading the latest My Little Pony subtitles. In this article I’ll explain how to use the cURL extension in PHP. The extension offers us the functionality as the console utility in the comfortable world of PHP. I’ll discuss sending GET and POST requests, handling login cookies, and FTP functionality.

如果您是Linux用户,则可能使用过cURL。 从发送电子邮件到下载最新的My Little Pony字幕,这是一个功能强大的工具。 在本文中,我将解释如何在PHP中使用cURL扩展名。 该扩展为我们提供了在舒适PHP世界中作为控制台实用程序使用的功能。 我将讨论发送GET和POST请求,处理登录cookie和FTP功能。

Before we begin, make sure you have the extension (and the libcURL library) installed. It’s not installed by default. In most cases it can be installed using your system’s package manager, but barring that you can find instructions in the PHP manual.

在开始之前,请确保已安装扩展名(和libcURL库)。 默认情况下未安装。 在大多数情况下,可以使用系统的软件包管理器来安装它,但是除非您可以在PHP手册中找到说明,否则都可以。

它是如何工作的? (How Does it Work?)

All cURL requests follow the same basic pattern:

所有cURL请求都遵循相同的基本模式:

First we initialize the cURL resource (often abbreviated as ch for “cURL handle”) by calling the curl_init() function.

首先,我们通过调用curl_init()函数来初始化cURL资源(通常缩写为“ cURL句柄”的ch curl_init() 。

Next we set various options, such as the URL, request method, payload data, etc. Options can be set individually with curl_setopt(), or we can pass an array of options to curl_setopt_array().

接下来,我们设置各种选项,例如URL,请求方法,有效载荷数据等。可以使用curl_setopt()单独设置选项,也可以将选项数组传递给curl_setopt_array() 。

Then we execute the request by calling curl_exec().

然后,我们通过调用curl_exec()执行请求。

Finally, we free the resource to clear out memory.

最后,我们释放资源以清除内存。

So, the boilerplate code for making a request looks something like this:

因此,用于发出请求的样板代码如下所示:

<?php // init the resource $ch = curl_init(); // set a single option... curl_setopt($ch, OPTION, $value); // ... or an array of options curl_setopt_array($ch, array( OPTION1 => $value1, OPTION2 => $value2 )); // execute $output = curl_exec($ch); // free curl_close($ch);

The only thing that changes for the request is what options are set, which of course depends on what you’re doing with cURL.

更改请求的唯一更改是设置了哪些选项,这当然取决于您对cURL所做的操作。

检索网页 (Retrieve a Web Page)

The most basic example of using cURL that I can think of is simply fetching the contents of a web page. So, let’s fetch the homepage of the BBC as an example.

我能想到的使用cURL的最基本示例就是简单地获取网页的内容。 因此,让我们以BBC的主页为例。

<?php curl_setopt_array( $ch, array( CURLOPT_URL => 'http://www.bbc.co.uk/', CURLOPT_RETURNTRANSFER => true )); $output = curl_exec($ch); echo $output;

Check the output in your browser and you should see the BBC website displayed. We’re lucky as the site displays correctly because of its absolute linking to stylesheets and images.

在浏览器中检查输出,您应该看到显示的BBC网站。 我们很幸运,因为该网站可以正确链接到样式表和图像,因此可以正确显示。

The options we just used were:

我们刚才使用的选项是:

CURLOPT_URL – specifies the URL for the request

CURLOPT_URL –指定请求的URL

CURLOPT_RETURNTRANSFER – when set false, curl_exec() returns true or false depending on the success of the request. When set to true, curl_exec() returns the contents of the response.

CURLOPT_RETURNTRANSFER –设置为false时, curl_exec()根据请求的成功返回true或false。 设置为true时, curl_exec()返回响应的内容。

登录网站 (Log in to a Website)

cURL executed a GET request to retrieve the BBC page, but cURL can also use other methods, such as POST and PUT. For this example, let’s simulate logging into a WordPress-powered website. Logging in is done by sending a POST request to http://example.com/wp-login.php with the following details:

cURL执行了GET请求以检索BBC页面,但是cURL也可以使用其他方法,例如POST和PUT。 对于此示例,让我们模拟登录到WordPress驱动的网站。 通过向以下地址发送POST请求到http://example.com/wp-login.php来完成登录:

login – the username

login名–用户名

pwd – the password

pwd –密码

redirect_to – the URL we want to go to after logging in

redirect_to –登录后要转到的URL

testcookie – should be set to 1 (this is just for WordPress)

testcookie –应设置为1(仅适用于WordPress)

Of course these parameters are specific to each site. You should always check the input names for yourself, something that can easily be done by viewing the source of an HTML page in your browser.

当然,这些参数特定于每个站点。 您应该始终自己检查输入名称,这可以通过在浏览器中查看HTML页面的源代码轻松地完成。

<?php $postData = array( 'login' => 'acogneau', 'pwd' => 'secretpassword', 'redirect_to' => 'http://example.com', 'testcookie' => '1' ); curl_setopt_array($ch, array( CURLOPT_URL => 'http://example.com/wp-login.php', CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_POSTFIELDS => $postData, CURLOPT_FOLLOWLOCATION => true )); $output = curl_exec($ch); echo $output;

The new options are:

新的选项是:

CURLOPT_POST – set this true if you want to send a POST request

CURLOPT_POST –如果要发送POST请求,请将其设置为true

CURLOPT_POSTFIELDS – the data that will be sent in the body of the request

CURLOPT_POSTFIELDS –将在请求正文中发送的数据

CURLOPT_FOLLOWLOCATION – if set true, cURL will follow redirects

CURLOPT_FOLLOWLOCATION –如果设置为true,则cURL将跟随重定向

Uh oh! If you test the above however you’ll see an error message: “ERROR: Cookies are blocked or not supported by your browser. You must enable cookies to use WordPress.” This is normal, because we need to have cookies enabled for sessions to work. We do this by adding two more options.

哦! 但是,如果您对上述内容进行了测试,则会看到错误消息:“错误:浏览器已阻止或不支持Cookie。 您必须启用Cookie才能使用WordPress。” 这是正常现象,因为我们需要启用cookie才能使会话正常工作。 我们通过添加另外两个选项来实现。

<?php curl_setopt_array($ch, array( CURLOPT_URL => 'http://example.com/wp-login.php', CURLOPT_RETURNTRANSFER => true, CURLOPT_POST => true, CURLOPT_POSTFIELDS => $postData, CURLOPT_FOLLOWLOCATION => true, CURLOPT_COOKIESESSION => true, CUROPT_COOKIEJAR => 'cookie.txt' ));

The new options are:

新的选项是:

CURLOPT_COOKIESESSION – if set to true, cURL will start a new cookie session and ignore any previous cookies

CURLOPT_COOKIESESSION –如果设置为true,则cURL将启动一个新的cookie会话并忽略以前的任何cookie

CURLOPT_COOKIEJAR – this is the name of the file where cURL should save cookie information. Make sure you have the correct permissions to write to the file!

CURLOPT_COOKIEJAR –这是cURL应该在其中保存cookie信息的文件的名称。 确保您具有写入文件的正确权限!

Now that we’re logged in, we only need to reference the cookie file for subsequent requests.

现在,我们已经登录,我们只需要引用Cookie文件即可进行后续请求。

使用FTP (Working with FTP)

Using cURL to download and upload files via FTP is easy as well. Let’s look at downloading a file:

使用cURL通过FTP下载和上传文件也很容易。 让我们看一下下载文件:

<?php curl_setopt_array($ch, array( CURLOPT_URL => 'ftp://ftp.example.com/test.txt', CURLOPT_RETURNTRANSFER => true, CURLOPT_USERPWD => 'username:password' )); $output = curl_exec($ch); echo $output;

Note that there aren’t many public FTP servers that allow anonymous uploads and downloads for security reasons, so the URL and credentials above are just place-holders.

请注意,出于安全原因,没有很多公共FTP服务器允许匿名上载和下载,因此上面的URL和凭据只是占位符。

This is almost the same as sending an HTTP request, but only a couple minor differences:

这几乎与发送HTTP请求相同,但只有几个细微的差别:

CURLOPT_URL – the URL of the file, note the use of “ftp://” instead of “http://”

CURLOPT_URL –文件的URL,请注意使用“ ftp://”而不是“ http://”

CURLOT_USERPWD – the login credentials for the FTP server

CURLOT_USERPWD – FTP服务器的登录凭据

Uploading a file via FTP is slightly more complex, but still managable. It looks like this:

通过FTP上载文件稍微复杂一些,但仍然可以管理。 看起来像这样:

<?php $fp = fopen('test.txt', 'r'); curl_setopt_array($ch, array( CURLOPT_URL => 'ftp://ftp.example.com/test.txt', CURLOPT_USERPWD => 'username:password' CURLOPT_UPLOAD => true, CURLOPT_INFILE => $fp, CURLOPT_INFILESIZE => filesize('test.txt') )); curl_exec($ch); fclose($fp); curl_close($ch);

The important options here are:

这里的重要选项是:

CURLOPT_UPLOAD – obvious boolean

CURLOPT_UPLOAD –明显的布尔值

CURLOPT_INFILE – a readable stream for the file we want to upload

CURLOPT_INFILE –我们要上传的文件的可读流

CURLOPT_INFILESIZE – the size of the file we want to upload in bytes

CURLOPT_INFILESIZE –我们要上传的文件大小,以字节为单位

发送多个请求 (Sending Multiple Requests)

Imagine we have to perform five requests to retrieve all of the necessary data. Keep in mind that some things will be beyond our control, such as network latency and the response speed of the target servers. It should be obvious then that any delays when issuing five consecutive calls can really add up! One way to mitigate this problem is to issue the requests asynchronously.

假设我们必须执行五个请求以检索所有必需的数据。 请记住,某些事情将超出我们的控制范围,例如网络延迟和目标服务器的响应速度。 显而易见,发出五个连续呼叫时的任何延迟都可能加起来! 缓解此问题的一种方法是异步发出请求。

Asynchronous techniques are more common in the JavaScript and Node.js communities, but briefly instead of waiting for a time-consuming task to complete, we assign the task to a different thread or process and continue to do other things in the meantime. When the task is complete we come back for its result. The important thing is that we haven’t wasted time waiting for a result; we spent it executing other code independently.

异步技术在JavaScript和Node.js社区中更为常见,但是短暂地而不是等待耗时的任务完成,而是将任务分配给其他线程或进程,并在此期间继续执行其他操作。 任务完成后,我们将返回结果。 重要的是,我们没有浪费时间等待结果。 我们花了它独立执行其他代码。

The approach for performing multiple asynchronous cURL requests is a bit different from before. We start out the same – we initiate each channel and then set the options – but then we initiate a multihandler using curl_multi_init() and add our channels to it with curl_multi_add_handle(). We execute the handlers by looping through them and checking their status. In the end we get a response’s content with curl_multi_getcontent().

执行多个异步cURL请求的方法与以前有所不同。 我们开始时是相同的-我们倡导每个通道,然后设置选项-但我们启动multihandler using curl_multi_init()并添加我们的渠道把它与curl_multi_add_handle() 我们通过遍历处理程序并检查其状态来执行处理程序。 最后,我们使用curl_multi_getcontent()获得响应的内容。

<?php // URLs we want to retrieve $urls = array( 'http://www.google.com', 'http://www.bing.com', 'http://www.yahoo.com', 'http://www.twitter.com', 'http://www.facebook.com' ); // initialize the multihandler $mh = curl_multi_init(); $channels = array(); foreach ($urls as $key => $url) { // initiate individual channel $channels[$key] = curl_init(); curl_setopt_array($channels[$key], array( CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true )); // add channel to multihandler curl_multi_add_handle($mh, $channels[$key]); } // execute - if there is an active connection then keep looping $active = null; do { $status = curl_multi_exec($mh, $active); } while ($active && $status == CURLM_OK); // echo the content, remove the handlers, then close them foreach ($channels as $chan) { echo curl_multi_getcontent($chan); curl_multi_remove_handle($mh, $chan); curl_close($chan); } // close the multihandler curl_multi_close($mh);

The above code took around 1,100 ms to execute on my laptop. Performing the requests sequentially without the multi interface it took around 2,000 ms. Imagine what your gain will be if you are sending hundreds of requests!

上面的代码花费了大约1100毫秒在我的笔记本电脑上执行。 在没有多接口的情况下顺序执行请求大约需要2,000毫秒。 想象一下,如果发送数百个请求,您将获得多少收益!

Multiple projects exist that abstract and wrap the multi interface. Discussing them is beyond the scope of the article, but if you’re planning to issue multiple requests asynchronously then I recommend you take a look at them:

存在多个抽象和包装多接口的项目。 讨论它们超出了本文的范围,但是如果您打算异步发出多个请求,那么我建议您看一下它们:

github.com/petewarden/ParallelCurl

github.com/petewarden/ParallelCurl

semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading

semlabs.co.uk/journal/object-curl-class-with-multi-threading

故障排除 (Troubleshooting)

If you’re using cURL then you are probably performing your requests to third-party servers. You can’t control them and much can go wrong: servers can go offline, directory structures can change, etc. We need an efficient way to find out what’s wrong when something doesn’t work, and luckily cURL offers two functions for this: curl_getinfo() and curl_error().

如果您使用的是cURL,则您可能正在执行对第三方服务器的请求。 您无法控制它们,并且可能会出错:服务器可能会脱机,目录结构可能会发生变化,等等。我们需要一种有效的方法来找出无法正常工作的地方出了问题,幸运的是cURL为此提供了两个功能: curl_getinfo()和curl_error() 。

curl_getinfo() returns an array with all of the information regarding the channel, so if you want to check if everything is all right you can use:

curl_getinfo()返回一个数组,其中包含有关该通道的所有信息,因此,如果您要检查一切是否正常,可以使用:

<?php var_dump(curl_getinfo($ch));

If an error pops up, you can check it out with curl_error():

如果弹出错误,可以使用curl_error()进行检查:

<?php if (!curl_exec($ch)) { // if curl_exec() returned false and thus failed echo 'An error has occurred: ' . curl_error($ch); } else { echo 'everything was successful'; }

结论 (Conclusion)

cURL offers a powerful and efficient way to make remote calls, so if you’re ever in need of a crawler or something to access an external API, cURL is a great tool for the job. It provides us an nice interface and a relatively easy way to execute requests. For more information, check out the PHP Manual and the cURL website. See you next time!

cURL提供了一种强大而有效的方式来进行远程调用,因此,如果您需要爬虫或访问外部API的工具,cURL是完成此工作的出色工具。 它为我们提供了一个不错的界面和相对简单的执行请求的方法。 有关更多信息,请查看PHP手册和cURL网站 。 下次见!

Comments on this article are closed. Have a question about PHP? Why not ask it on our forums?

本文的评论已关闭。 对PHP有疑问吗? 为什么不在我们的论坛上提问呢?

Image via Fotolia

图片来自Fotolia

翻译自: https://www.sitepoint.com/using-curl-for-remote-requests/

相关资源:php使用curl简单抓取远程url的方法
最新回复(0)