首页 > 代码库 > Http原理及PHP中cURL的使用

Http原理及PHP中cURL的使用

  为了给接下来的教程做好铺垫,本文将讲述如何用PHP发出Http请求进行模拟登录,顺带会讲一些Http请求原理。模拟登录…就是模拟浏览器登录嘛,所谓请求,只不过是你向网站发一些字,网站又给你回复一些字,这一般都是基于Http或Https协议的。平时是浏览器帮我们做好了这些工作,封装数据发送到指定网站,然后接收,最后编译成网页显示出来。在模拟登录中,呵呵,这些都要我们自己做,只是最后不用编译…只要提取到需要的数据就行了。

  PHP中模拟登录有三种方式。第一是直接用file_get_contens(网站)这个函数,这个..用起来很简单,不说了;第二种是用socket,按照套接字的规定把要发送的字符一个个打上去,再发出去,这个..没多研究,也不说了;最后就当然是用PHP自带的CURL工具了。这个工具可以根据不同的需求,设置消息包头信息、发送字流等等,也很方便。至于Http数据包的格式是怎么样的,这是Http协议的基本内容,不多说。下面用CURL模拟发起一次对百度的请求:

1   $curl = curl_init()        //初始化实例2     curl_setopt($curl, CURLOPT_URL, http://www.baidu.com)        //设置URL地址3     curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 5);        //5秒连接超时4     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);        //设为1返回Http响应结果5     //伪造客户端,最好设一下,有些网站会根据客户端来阻隔请求的6     curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);        7     $response = curl_exec($curl);        //curl执行http请求,响应存到$response变量中8     $state = curl_getinfo($curl, CURLINFO_HTTP_CODE);        //可以用这句来获取响应的状态码9     curl_close($curl);        //释放curl资源

 

  至此一次请求就完成了,$response变量是响应结果,也就是百度页面的html源码(字符串)。Http请求中有两种请求方式,一是GET,另一是POST(具体的看Http原理去),以上对百度的是GET方法,POST方法不同就在于,要把参数作为消息报内容发送出去,参数流按照Http协议的规定,p1=v1&p2=v2&p3=v3…,p是参数名v是值,搞不清楚的看懂Http原理再接着看。记参数为$param变量(字符串),那就

         curl_setopt($curl, CURLOPT_POSTFIELDS, $param);

         如果要设置请求头部:

         curl_setopt($curl, CURLOPT_HTTPHEADER, $header);

         其中$header为数组类型,比如要写入CLIENT-IP和X_FORWARDED-FOR这两个头信息,那就$header = array(‘CLIENT-IP: ‘=>‘value‘, ‘X-FORWARDED-FOR: ‘=>‘value‘).

         curl还有很多CURLOPT预设值给curl_setopt使用,具体的我不写出来了..自己找吧

         接下来,既然curl已经知道怎么用了,能不能用curl写一个模拟登录的工具类呢?

         我把这个类叫RequestClient,一般请求关系到三方面:url地址、请求方法、请求参数,请求头部可要可不要,所以也写下去;至于接收到的响应,就取响应数据报、状态码。综上,定义这个类的成员变量:  

    private $response = null;                private $url;                        private $header = null;                private $parameter = null;            private $method = ‘GET‘;        //默认使用GET方法请求    private $state = null;

   

  实例化时要指定url,也可以通过set的方式设定

    public function __construct($url) {        $this->url = $url;    }    public function setUrl($url) {        $this->url = $url;    }

       

   Header的setter($header按照上面提到的格式):

  public function setHeader($header) {      $this->header = $header;  }

        

   以及各种getter:

    public function getUrl() { return $this->url; }    public function getParameter() { return $this->parameter; }    public function getHeader() { return $this->header; }public function getMethod() { return $this->method; }    public function getState() { return $this->state; }    public function getResponse() { return $this->response; }

   

  接下来设置参数了,设置参数有两种方式,一是通过传递数组,再把数据信息转化为参数字符串,二是直接传递字符串,数组格式为array(“p1”=>”value1”, “p2”=>”value2”…),encode可选择是否对参数进行url编码(默认是)

 1   public function setParameter($parameter = null, $encode = true) { 2         if (is_array($parameter)) {     3             $temp = ‘‘; 4             if ($encode) { 5                 foreach ($parameter as $key => $value) { 6                     $temp .= "$key=".urlencode($value) ."&";     7                 }                 8        } else { 9                 foreach ($parameter as $key => $value) {10                     $temp .= "$key=$value&";11                 }12             }13             $this->parameter = substr($temp, 0, -1);14         } elseif (is_string($parameter)) {15             $this->parameter = $parameter;16         }17     }

 

  下面是get和post方法,模拟发出get、post请求,响应报文放在$response,状态码放在$state

 1   public function get($timeout=5) { 2         $this->method = ‘GET‘; 3         if ($this->parameter != null) {        //get在有参数的情况下,把参数附在url上 4             $this->url .= (‘?‘.$this->parameter); 5      } 6         $curl = curl_init(); 7         curl_setopt($curl, CURLOPT_URL, $this->url); 8         curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 9         curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);10         if ($this->header!=null) {            //有头信息时才设置11             curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);12         }13         curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);14         $this->response = curl_exec($curl);15         $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);16         curl_close($curl);17         return $this->response;18     }19 20     public function post($timeout=5) {21         $this->method = ‘POST‘;22         $curl = curl_init();23         curl_setopt($curl, CURLOPT_URL, $this->url);24         curl_setopt($curl, CURLOPT_HEADER, 1);25         curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);26         curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);27         curl_setopt($curl, CURLOPT_POSTFIELDS, $this->parameter);28         if ($this->header!=null) {29             curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);30         }31         curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);32         $this->response = curl_exec($curl);33         $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);34         curl_close($curl);35         return $this->response;36     }

 

  这样,一个用于Http请求的类就基本完成了,但是根据实际情况,对不同的需求可以提供不同的功能,比如说获取网页的标题(<title>的内容):

1     public function getTitle() {2         $source = $this->response;3         $start = stripos($source, ‘<title‘);4         $source = substr($source, $start);5         $start = stripos($source, ‘>‘) + 1;6         $end = stripos($source, ‘<‘, $start);7         return substr($source, $start, $end-$start);8     }

 

  获取cookie返回字符串(CURL提供了一个获取Cookie很方便快捷的方法,在setopt中用CURLOPT_COOKIEJAR和CURLOPT_COOKIE变量获取就可以了,Cookie信息会写在指定的文件中,发出请求时直接调用这个文件上传就可以了,但是由于个人习惯,我还是喜欢把cookie当字符串提取出来,设置在$header头信息的Cookie中,这样比较灵活吧,以下函数就是把cookie串提取出来,以[cookie1=value1; cookie2=value2; …]这个格式返回string):

 1     public function getCookie() { 2             $content = $this->response;        //$response中包含响应头信息 3         $start = 0; 4         $rt = ‘‘; 5         while (($start = stripos($content, ‘Set-Cookie: ‘, $start)) != false) {    //不断搜索’Set-Cookie’字段 6             $start += 12;        //从$start位置开始忽略Set-Cookie这12个字符 7             $end = stripos($content, ‘;‘, $start); 8             $rt .= substr($content, $start, $end-$start).‘; ‘; 9         }10         return substr($rt, 0, -2);        //丢掉最后的分号和空格11     }

  调用时就是

           $client = new RequestClient(“这里是网址”);

           $client->setHeader(头信息);

           $client->setParameter(参数);

           $client->get()   或者       $client->post();

  至此这个类就完成了。最后要说的一点是,这个封装功能的思路和代码实现毕竟都是我凭经验总结出来的,不免会有一点差错或者有点不完善。总之就是,在实际应用中要根据自己的需求改善,增加一些功能,更好地去适应自己的程序。

 

  最后的完整代码:

  1 <?php   2     class RequestClient {  3         private $response = null;              4         private $url;                      5         private $header = null;            //type: array  6         private $parameter = null;        //type String  7         private $proxy = null;            //代理  8         private $method = ‘GET‘;        //default GET method  9         private $state = null; 10  11  12         // a static function to create a new object with parameters url, parameters, and cookie(path) 13         public static function newClient($url, $parameter=null, $header=null) { 14             $client = new RequestClient($url); 15             $client->setParameter($parameter); 16             $client->setHeader($header); 17             return $client; 18         } 19  20         // constructor, with a only parameter url 21         public function __construct($url) { 22             $this->url = $url; 23         } 24  25         public function __destruct() { 26             $this->clear(); 27         } 28  29         // setter 30         public function setUrl($url) { 31             $this->url = $url; 32         } 33  34         public function setHeader($header) { 35             $this->header = $header; 36         } 37  38         public function setProxy($proxy) { 39             $this->proxy = $proxy; 40         } 41  42         public function getCookie() { 43             $content = $this->response; 44             $start = 0; 45             $rt = ‘‘; 46             while (($start = stripos($content, ‘Set-Cookie: ‘, $start)) != false) { 47                 $start += 12; 48                 $end = stripos($content, ‘;‘, $start); 49                 $rt .= substr($content, $start, $end-$start).‘; ‘; 50             } 51             return substr($rt, 0, -2); 52         } 53  54          55         public function setParameter($parameter = null, $encode = true) { 56             if (is_array($parameter)) {    //change to ‘string‘ if the type is ‘array‘ 57                 $temp = ‘‘; 58                 if ($encode) { 59                     foreach ($parameter as $key => $value) { 60                         $temp .= "$key=".urlencode($value) ."&";    //change to string 61                     } 62                 } else { 63                     foreach ($parameter as $key => $value) { 64                         $temp .= "$key=$value&"; 65                     } 66                 } 67                 $this->parameter = substr($temp, 0, -1); 68             } elseif (is_string($parameter)) { 69                 $this->parameter = $parameter; 70             } 71         } 72  73         // request in method ‘GET‘, set the response content to $this->reponse and return it 74         public function get($timeout=5) { 75             $this->method = ‘GET‘; 76             $this->handleParameter(); 77             $curl = curl_init(); 78             curl_setopt($curl, CURLOPT_URL, $this->url); 79             curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 80             curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout); 81             if ($this->header!=null) { 82                 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header); 83             } 84             if ($this->proxy!=null) { 85                 curl_setopt($curl, CURLOPT_PROXY, $this->proxy); 86             } 87             curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘); 88             $this->response = curl_exec($curl); 89             $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE); 90             curl_close($curl); 91             return $this->response; 92         } 93  94         // request in method ‘POST‘, set the response content to $this->reponse and return it 95         public function post($timeout=5) { 96             $this->method = ‘POST‘; 97             $curl = curl_init(); 98             curl_setopt($curl, CURLOPT_URL, $this->url); 99             curl_setopt($curl, CURLOPT_HEADER, 1);100             curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);101             curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);102             curl_setopt($curl, CURLOPT_POSTFIELDS, $this->parameter);103             if ($this->header!=null) {104                 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);105             }106             if ($this->proxy!=null) {107                 curl_setopt($curl, CURLOPT_PROXY, $this->proxy);108             }109             curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);110             $this->response = curl_exec($curl);111             $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);112             curl_close($curl);113             return $this->response;114         }115 116         // get the title117         public function getTitle() {118             $source = $this->response;119             $start = stripos($source, ‘<title‘);120             $source = substr($source, $start);121             $start = stripos($source, ‘>‘) + 1;122             $end = stripos($source, ‘<‘, $start);123             return substr($source, $start, $end-$start);124         }125 126         // reset state of the object, only url remain127         public function clear() {128             $this->parameter = null;129             $this->header = null;130             $this->response = null;131             $this->proxy = null;132             $this->method = ‘GET‘;133         }134 135         // getter136         public function getUrl() { return $this->url; }137         public function getParameter() { return $this->parameter; }138         public function getHeader() { return $this->header; }139         public function getProxy() { return $this->proxy; }140         public function getMethod() { return $this->method; }141         public function getState() { return $this->state; }142         public function getResponse() { return $this->response; }143 144         // private function, mix the parameter with url if the method is ‘GET‘145         private function handleParameter() {146             if ($this->parameter != null) {147                 if ($this->method == ‘GET‘) {148                     $this->url .= (‘?‘.$this->parameter);149                 } 150             }151         }152     }153 ?>