Python 爬虫 urllib模块：p

发布时间：2019-06-30 16:53:12编辑：auto阅读（2368）

本程序以爬取 'http://httpbin.org/post' 为例

格式：

导入urllib.request

导入urllib.parse

数据编码处理，再设为utf-8编码: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')

打开爬取的网页: response = urllib.request.urlopen('网址', data = data)

读取网页代码: html = response.read()

打印:

1.不decode

print(html) #爬取的网页代码会不分行，没有空格显示，很难看

2.decode

print(html.decode()) #爬取的网页代码会分行，像写规范的代码一样，看起来很舒服

查询请求结果：

a. response.status # 返回 200：请求成功 404：网页找不到，请求失败

b. response.getcode() # 返回 200：请求成功 404：网页找不到，请求失败

1.不decode的程序如下：

import urllib.request
import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()

print(html)
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())

运行结果：

2.带decode的程序如下：

import urllib.request
import urllib.parsse

data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()

print(html.decode())
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())

运行结果：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "word": "hello"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Connection": "close", 
    "Content-Length": "10", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.4"
  }, 
  "json": null, 
  "origin": "106.14.17.222", 
  "url": "http://httpbin.org/post"
}

------------------------------------------------------------------
------------------------------------------------------------------
200
200

为什么要用bytes转换？

因为

data = urllib.parse.urlencode({'word': 'hello'}) ##没有用bytes
response = urllib.request.urlopen('http://httpbin.org/post', data = data )
html = response.read()

错误提示：

Traceback (most recent call last):
  File "/usercode/file.py", line 15, in <module>
    response = urllib.request.urlopen('http://httpbin.org/post', data = data )
  File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 453, in open
    req = meth(req)
  File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_
    raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.

由此可见，post方式需要将请求内容用二进制编码。

class bytes([source[, encoding[, errors]]])

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray– it has the same non-mutating methods and the same indexing and slicing behavior.

Accordingly, constructor arguments are interpreted as for bytearray().


                        
                        
                            
关键字：
                                                                
                                
                                
                            
                        

                
                            上一篇：
                                                                    python_day20_Django-                            
                            下一篇：
                                                                    python学习笔记13-python面



        


    
    

    
    
    


    
    
        
             
            
                
                    
                        Run博客上线，欢迎访问
                        内容如有侵犯，请立即联系管理员删除
                        本站内容仅供学习和参阅，不做任何商业用途
                    
                
            
        
    

    
    
        
            标签云
        
        
            
                python3djangopython3爬虫python运维开发linuxpyspiderpython基础dockergitsvnpython练习requestsscrapy系统/运维python全栈人工智能bs4tkinterseleniumurllibphppythonrequests_htmlvue图像处理                
                
                
                
                
                
                
                
            
        
    

    



    
    
        
            搜索
        
        
            
                
                    
                        
                        
                    
                
            
        
    

    
    
        
            热门推荐
        
        
            
                 openvpn linux客户端使用
                                         52026 
                    

                    
                        
                            
                            
                        

                     H3C基本命令大全                     51878 
                    

                    
                        
                            
                            
                        

                     openvpn windows客户端使用
                                         42118 
                    

                    
                        
                            
                            
                        

                     H3C IRF原理及 配置                      38965 
                    

                    
                        
                            
                            
                        

                     Python exit()函数
                                         33466 
                    

                    
                        
                            
                            
                        

                     openvpn mac客户端使用                     30417 
                    

                    
                        
                            
                            
                        

                     python全系列官方中文文档
                                         29043 
                    

                    
                        
                            
                            
                        

                     python 获取网卡实时流量                     24076 
                    

                    
                        
                            
                            
                        

                     1.常用turtle功能函数
                                         23993 
                    

                    
                        
                            
                            
                        

                     python 获取Linux和Windows硬件信息                     22341 
                    

                    
                        
                            
                            
                        

                    

            
        
    

    
    
        
            最新文章
        
        
            
                 LangChain1.0-Agent-部署/上线(开发人员必备)
                                         37° 
                    
                    
                        
                        
                     LangChain1.0-Agent-Spider实战(爬虫函数替代API接口)                     81° 
                    
                    
                        
                        
                     LangChain1.0-Agent(进阶)本地模型+Playwright实现网页自动化操作
                                         110° 
                    
                    
                        
                        
                     LangChain1.0-Agent记忆管理                     102° 
                    
                    
                        
                        
                     LangChain1.0-Agent接入自定义工具与React循环
                                         121° 
                    
                    
                        
                        
                     LangChain1.0-Agent开发流程                     114° 
                    
                    
                        
                        
                     LangChain1.0调用vllm本地部署qwen模型
                                         135° 
                    
                    
                        
                        
                     LangChain-1.0入门实践-搭建流式响应的多轮问答机器人                     159° 
                    
                    
                        
                        
                     LangChain-1.0入门实战-1
                                         159° 
                    
                    
                        
                        
                     LangChain-1.0教程-(介绍，模型接入)                     163° 
                    
                    
                        
                        
                                
        
    

    
    
        
            博主信息
        
        
            
                姓名：Run
                职业：谜
                邮箱：383697894@qq.com
                定位：上海 · 松江
            
        
    
    
    
        
            扫我打开
        
        
	    
        
    

    
    
        
            友情链接
        
        
            百度
            淘宝
            腾讯
            慕课网
            CSDN
            博客园
            51cto博客