1. Why use cURL
In order to get web content, we can use file_get_contents, file, readfile and other functions, but they lack flexibility and effective error handling. Moreover, some difficult tasks can not be accomplished, such as processing coockies, validation, form submission, file upload and so on. More ways for PHP to get web content: http://www.ucooper.com/php-get-webpage-content.html
2. Basic steps for establishing cURL requests in PHP
- Initialization
- set an option
- Execute and obtain results
- Release cURL handle
// 1. Initialization
$ch = curl_init ( ) ;
// 2. Setting options, including URL s
curl_setopt ( $ch , CURLOPT_URL, "http://www.ucooper.com" ) ;
curl_setopt ( $ch , CURLOPT_RETURNTRANSFER, 1 ) ;
curl_setopt ( $ch , CURLOPT_HEADER, 0 ) ;
// 3. Execute and retrieve HTML document content
$output = curl_exec ( $ch ) ;
// 4. Release curl handle
curl_close ( $ch ) ;
- CURLOPT_URL: Target URL
- CURLOPT_PORT: Target Port
- CURLOPT_RETURNTRANSFER: Converts output to a string rather than directly to the screen
- CURLOPT_HTTPHEADER: Request header information with an array of parameters, as shown in the example of "Browser-based redirection"
- CURLOPT_FOLLOWLOCATION: Following redirection
- CURLOPT_FRESH_CONNECT: Force the retrieval of content, not from the cache
- CURLOPT_HEADER: Contains the Head
- CURLOPT_NOBODY: Output does not contain the main content of the page
- CURLOPT_POST: post form submission
- CURLOPT_POSTFIELDS: A field submitted by a POST with an array of parameters, as shown in "Sending data with a POST method"
- CURLOPT_PROXY: Proxy settings, IP and ports
- CURLOPT_PROXYUSERPWD: Agent settings, username and password
- CURLOPT_PROXYTYPE: Proxy type, http or socket
More options: http://www.programfan.com/doc/php_manual/function.curl-setopt.html
3. Check for errors
$output = curl_exec ( $ch ) ; if ( $output === FALSE ) { echo "cURL Error: " . curl_error ( $ch ) ; }
4. Access to information
curl_exec ( $ch ) ; $info = curl_getinfo ( $ch ) ; echo 'Obtain' . $info [ 'url' ] . 'time consuming' . $info [ 'total_time' ] . 'second' ;
The array returned contains the following information:
- "url"//resource network address
- "content_type"// Content Encoding
- "http_code"//HTTP status code
- "header_size"//header size
- "request_size"// request size
- filetime"// File Creation Time
- "ssl_verify_result"//SSL verification results
- "redirect_count"//jump technique
- "total_time"//total time-consuming
- "namelookup_time"//DNS query time-consuming
- "connect_time"// Waiting for Connection Time-consuming
- "pretransfer_time"// Time-consuming preparation before transmission
- "size_upload"//Size of uploaded data
- "size_download"// download data size
- "speed_download"// download speed
- "speed_upload"//upload speed
- "download_content_length"//length of download content
- "upload_content_length"//length of uploaded content
- "Start ransfer_time"// Time to start transmission
- "redirect_time"//redirect time-consuming
5. Browser-based redirection
In the first example, we will provide a section of code to detect whether the server has browser-based redirection. For example, some websites redirect web pages based on whether they are mobile browsers or even where users come from.
We use the CURLOPT_HTTPHEADER option to set the HTTP request headers we send, including user agent information and default language. Then let's see if these specific sites redirect us to different URLs.
// URL for testing $urls = array ( "http://www.cnn.com" , "http://www.mozilla.com" , "http://www.facebook.com" ) ; // Browser information for testing $browsers = array ( "standard" => array ( "user_agent" => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)" , "language" => "en-us,en;q=0.5" ) , "iphone" => array ( "user_agent" => "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A537a Safari/419.3" , "language" => "en" ) , "french" => array ( "user_agent" => "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6; .NET CLR 2.0.50727)" , "language" => "fr,fr-FR;q=0.5" ) ) ; foreach ( $urls as $url ) { echo "URL: $url /n " ; foreach ( $browsers as $test_name => $browser ) { $ch = curl_init ( ) ; // Setting url curl_setopt ( $ch , CURLOPT_URL, $url ) ; // Setting specific header s for browsers curl_setopt ( $ch , CURLOPT_HTTPHEADER, array ( "User-Agent: {$browser['user_agent']} " , "Accept-Language: {$browser['language']} " ) ) ; // Page content we don't need curl_setopt ( $ch , CURLOPT_NOBODY, 1 ) ; // Just return HTTP header curl_setopt ( $ch , CURLOPT_HEADER, 1 ) ; // Return the result, not output it curl_setopt ( $ch , CURLOPT_RETURNTRANSFER, 1 ) ; $output = curl_exec ( $ch ) ; curl_close ( $ch ) ; // Do you have redirected HTTP headers? if ( preg_match ( "!Location: (.*)!" , $output , $matches ) ) { echo "$test_name : redirects to $matches[1] /n " ; } else { echo "$test_name : no redirection/n " ; } } echo "/n /n " ; }
First, we create a set of URLs to test, and then specify a set of browser information to test. Finally, the possible situation of matching various URLs and browsers is tested through a loop.
Because we specify the cURL option, the returned output contains only HTTP header information (stored in $output). Using a simple rule, we check whether the header contains the word "Location:".
6. Send data by POST method
With this, we can achieve the effect of form submission.
Form submission has two ways: get and post. For get, we can directly generate URL s and get content directly with functions such as file_get_content, but for post, data is sent through HTTP request body instead of query string, so this kind of function can't be used.
First, make a receive page (action), post_output.php, the content is: print_r($_POST); That is to say, the POST array is output directly.
Then, write our cURL to submit the data as a post
$url="http://localhost/post_output.php"; $post_data=array("foo"=>"bar","query"=>"Nettuts","action"=>"Submit"); $ch=curl_init(); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); // We are in POST data oh! curl_setopt($ch,CURLOPT_POST,1); // Add the post variable curl_setopt($ch,CURLOPT_POSTFIELDS,$post_data); $output=curl_exec($ch); curl_close($ch); echo $output;
This script sends a POST request to post_output.php, which receives and outputs the $_POST variable, which we captured using the cURL.
7. File upload
Uploading files is very similar to the previous POST. Because all file upload forms are submitted through the POST method.
First, create a new page to receive files, named upload_output.php, the content is: print_r($_FILES);
Here are the scripts that actually perform the file upload task:
$url = "http://localhost/upload_output.php" ; $post_data = array ( "foo" => "bar" , // The local file address to upload "upload" => "@C:/wamp/www/test.zip" ) ; $ch = curl_init ( ) ; curl_setopt ( $ch , CURLOPT_URL, $url ) ; curl_setopt ( $ch , CURLOPT_RETURNTRANSFER, 1 ) ; curl_setopt ( $ch , CURLOPT_POST, 1 ) ; curl_setopt ( $ch , CURLOPT_POSTFIELDS, $post_data ) ; $output = curl_exec ( $ch ) ; curl_close ( $ch ) ; echo $output ;
If you need to upload a file, you just need to pass the file path like a post variable, but remember to put the @ sign in front of it. Executing this script should yield the following output
8. cURL batch processing
cURL also has an advanced feature, batch handle. This feature allows you to open multiple URL connections simultaneously or asynchronously.
All you have to do here is open multiple cURL handles and assign them to a batch handle. Then you just need to wait for it to finish in a while loop.
// Create two cURL resources $ch1 = curl_init ( ) ; $ch2 = curl_init ( ) ; // Specify the URL and appropriate parameters curl_setopt ( $ch1 , CURLOPT_URL, "<a href=" http: //lxr.php.net/">http://lxr.php.net/</a>"); curl_setopt ( $ch1 , CURLOPT_HEADER, 0 ) ; curl_setopt ( $ch2 , CURLOPT_URL, "<a href=" http: //www.php.net/">http://www.php.net/</a>"); curl_setopt ( $ch2 , CURLOPT_HEADER, 0 ) ; // Create cURL batch handle $mh = curl_multi_init ( ) ; // Add the first two resource handles curl_multi_add_handle ( $mh , $ch1 ) ; curl_multi_add_handle ( $mh , $ch2 ) ; // Predefine a state variable $active = null ; // Execute batch processing do { $mrc = curl_multi_exec ( $mh , $active ) ; } while ( $mrc == CURLM_CALL_MULTI_PERFORM) ; while ( $active && $mrc == CURLM_OK) { if ( curl_multi_select ( $mh ) != - 1 ) { do { $mrc = curl_multi_exec ( $mh , $active ) ; } while ( $mrc == CURLM_CALL_MULTI_PERFORM) ; } } // Close each handle curl_multi_remove_handle ( $mh , $ch1 ) ; curl_multi_remove_handle ( $mh , $ch2 ) ; curl_multi_close ( $mh ) ;
There are two main loops in this example. The first do-while loop repeatedly calls curl_multi_exec(). This function is non-blocking, but it will be executed as little as possible. It returns a state value, which, as long as it is equal to the constant CURLM_CALL_MULTI_PERFORM FORM, represents some urgent work to be done (for example, sending the http header information for the corresponding URL). That is to say, we need to call the function continuously until the return value changes.
The next while loop continues only when the $activity variable is true. This variable was previously passed to curl_multi_exec() as a second parameter to indicate whether there is an active connection in the batch handle. Next, we call curl_multi_select(), which is "shielded" until active connections (such as receiving server responses) occur. After this function is successfully executed, we go into another do-while loop to continue with the next URL.
Let's see how to put this function into practice.
9.WordPress Connection Checker
10. Other cURL options
HTTP authentication
If a URL request requires HTTP-based authentication, you can use the following code:
$url = "<a href=" http: //www.somesite.com/members/">http://www.somesite.com/members/</a>"; $ch = curl_init ( ) ; curl_setopt ( $ch , CURLOPT_URL, $url ) ; curl_setopt ( $ch , CURLOPT_RETURNTRANSFER, 1 ) ; // Send username and password curl_setopt ( $ch , CURLOPT_USERPWD, "myusername:mypassword" ) ; // You can allow redirection curl_setopt ( $ch , CURLOPT_FOLLOWLOCATION, 1 ) ; // The following option lets the cURL be redirected // User name and password can also be sent curl_setopt ( $ch , CURLOPT_UNRESTRICTED_AUTH, 1 ) ; $output = curl_exec ( $ch ) ; curl_close ( $ch ) ;
FTP Upload
PHP has its own FTP library, but you can also use cURL
// Open a file pointer $file = fopen ( "/path/to/file" , "r" ) ; // url contains most of the required information $url = "ftp://username:password@mydomain.com:21/path/to/new/file" ; $ch = curl_init ( ) ; curl_setopt ( $ch , CURLOPT_URL, $url ) ; curl_setopt ( $ch , CURLOPT_RETURNTRANSFER, 1 ) ; // Upload related options curl_setopt ( $ch , CURLOPT_UPLOAD, 1 ) ; curl_setopt ( $ch , CURLOPT_INFILE, $fp ) ; curl_setopt ( $ch , CURLOPT_INFILESIZE, filesize ( "/path/to/file" ) ) ; // Whether to open ASCII mode (useful for uploading text files) curl_setopt ( $ch , CURLOPT_FTPASCII, 1 ) ; $output = curl_exec ( $ch ) ; curl_close ( $ch ) ;
Wall Turning
You can use a proxy to initiate cURL requests:
$ch = curl_init ( ) ; curl_setopt ( $ch , CURLOPT_URL, 'http://www.example.com' ) ; curl_setopt ( $ch , CURLOPT_RETURNTRANSFER, 1 ) ; // Designated proxy address curl_setopt ( $ch , CURLOPT_PROXY, '11.11.11.11:8080' ) ; // Provide a username and password if necessary curl_setopt ( $ch , CURLOPT_PROXYUSERPWD, 'user:pass' ) ; $output = curl_exec ( $ch ) ; curl_close ( $ch ) ;
callback
You can have a cURL call a specified callback function during a URL request. For example, in the process of content or response download, start using data immediately instead of waiting until it's completely downloaded.
$ch = curl_init ( ) ; curl_setopt ( $ch , CURLOPT_URL, 'http://net.tutsplus.com' ) ; curl_setopt ( $ch , CURLOPT_WRITEFUNCTION, "progress_function" ) ; curl_exec ( $ch ) ; curl_close ( $ch ) ; function progress_function( $ch , $str ) { echo $str ; return strlen ( $str ) ; }
This callback function must return the length of the string, otherwise it will not work properly.
In the process of receiving the URL response, as long as a packet is received, the function will be called.
Summary: Today we learned about the powerful functions and flexible extensibility of the cURL library. I wish you like it. Next time you want to initiate a URL request, consider cURL.