发布时间:2019-06-30 16:53:12编辑:auto阅读(2205)
本程序以爬取 'http://httpbin.org/post' 为例
格式:
导入urllib.request
导入urllib.parse
数据编码处理,再设为utf-8编码: bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
打开爬取的网页: response = urllib.request.urlopen('网址', data = data)
读取网页代码: html = response.read()
打印:
1.不decode
print(html) #爬取的网页代码会不分行,没有空格显示,很难看
2.decode
print(html.decode()) #爬取的网页代码会分行,像写规范的代码一样,看起来很舒服
查询请求结果:
a. response.status # 返回 200:请求成功 404:网页找不到,请求失败
b. response.getcode() # 返回 200:请求成功 404:网页找不到,请求失败
1.不decode的程序如下:
import urllib.request
import urllib.parsse
data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()
print(html)
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())运行结果:

2.带decode的程序如下:
import urllib.request
import urllib.parsse
data = bytes(urllib.parse.urlencode({'word': 'hello'}), encoding = 'utf-8')
response = urllib.request.urlopen(' data = data )
html = response.read()
print(html.decode())
print("------------------------------------------------------------------")
print("------------------------------------------------------------------")
print(response.status)
print(response.getcode())运行结果:
{
"args": {},
"data": "",
"files": {},
"form": {
"word": "hello"
},
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Content-Length": "10",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/3.4"
},
"json": null,
"origin": "106.14.17.222",
"url": "http://httpbin.org/post"
}
------------------------------------------------------------------
------------------------------------------------------------------
200
200为什么要用bytes转换?
因为
data = urllib.parse.urlencode({'word': 'hello'}) ##没有用bytes
response = urllib.request.urlopen('http://httpbin.org/post', data = data )
html = response.read()错误提示:
Traceback (most recent call last):
File "/usercode/file.py", line 15, in <module>
response = urllib.request.urlopen('http://httpbin.org/post', data = data )
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 453, in open
req = meth(req)
File "/usr/lib/python3.4/urllib/request.py", line 1104, in do_request_
raise TypeError(msg)
TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str.由此可见,post方式需要将请求内容用二进制编码。
class bytes([source[, encoding[, errors]]])
Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray– it has the same non-mutating methods and the same indexing and slicing behavior.
Accordingly, constructor arguments are interpreted as for bytearray().
上一篇: python_day20_Django-
下一篇: python学习笔记13-python面
51316
50768
41363
38169
32651
29541
28386
23259
23229
21554
1629°
2363°
1967°
1910°
2240°
1947°
2640°
4426°
4266°
3037°