发布时间:2020-10-19 13:05:23编辑:admin阅读(4760)
之前利用Scrapy爬取的数据,都是写入在json文件中,现在需要写入到mysql中。
在items.py中,主要有2个字段:
class CityItem(scrapy.Item): name = scrapy.Field() url = scrapy.Field()
mysql服务器ip:192.168.0.3
用户名:root
密码:abcd@1234
创建数据库
CREATE DATABASE qunar CHARACTER SET utf8 COLLATE utf8_general_ci;
创建表test
CREATE TABLE `test` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(64) DEFAULT NULL, `url` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
修改settings.py
MYSQL_HOST = "192.168.0.3" MYSQL_PORT = 3306 MYSQL_DBNAME = "qunar" MYSQL_USER = "root" MYSQL_PASSWORD = "abcd@1234"
修改pipelines.py,内容如下:
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
# from itemadapter import ItemAdapter
import pymysql
from twisted.enterprise import adbapi
# 异步更新操作
class LvyouPipeline(object):
def __init__(self, dbpool):
self.dbpool = dbpool
@classmethod
def from_settings(cls, settings): # 函数名固定,会被scrapy调用,直接可用settings的值
"""
数据库建立连接
:param settings: 配置参数
:return: 实例化参数
"""
adbparams = dict(
host=settings['MYSQL_HOST'],
port=settings['MYSQL_PORT'],
db=settings['MYSQL_DBNAME'],
user=settings['MYSQL_USER'],
password=settings['MYSQL_PASSWORD'],
cursorclass=pymysql.cursors.DictCursor # 指定cursor类型
)
# 连接数据池ConnectionPool,使用pymysql或者Mysqldb连接
dbpool = adbapi.ConnectionPool('pymysql', **adbparams)
# 返回实例化参数
return cls(dbpool)
def process_item(self, item, spider):
"""
使用twisted将MySQL插入变成异步执行。通过连接池执行具体的sql操作,返回一个对象
"""
query = self.dbpool.runInteraction(self.do_insert, item) # 指定操作方法和操作数据
# 添加异常处理
query.addCallback(self.handle_error) # 处理异常
def do_insert(self, cursor, item):
# 对数据库进行插入操作,并不需要commit,twisted会自动commit
insert_sql = """
insert into test(name, url) VALUES (%s,%s)
"""
cursor.execute(insert_sql, (item['name'], item['url']))
def handle_error(self, failure):
if failure:
# 打印错误信息
print(failure)注意:insert语句,请根据实际情况修改
最后执行爬虫程序,就可以写入数据库了。
本文参考链接:
https://www.cnblogs.com/knighterrant/p/10783634.html
上一篇: Splash抓取jd
下一篇: scrapy-redis分布式爬虫
51209
50625
41248
38062
32524
29431
28296
23152
23109
21450
1502°
2218°
1840°
1774°
2078°
1833°
2510°
4221°
4081°
2917°