Python实现登录人人网并抓取新鲜事的方法

Python实现登录人人网并抓取新鲜事的方法可以分为以下几个步骤：

1.导入requests和BeautifulSoup模块

import requests
from bs4 import BeautifulSoup

2.获取登录页面信息，分析登录页面的HTML结构并提取需要post的数据

login_url = 'http://www.renren.com/ajaxLogin/login'
session = requests.session()
response = session.get('http://www.renren.com/')
soup = BeautifulSoup(response.text, 'html.parser')
form = soup.find('form', attrs={'id': 'loginForm'})
action_url = form['action']
lt = form.find('input', attrs={'name': 'lt'})['value']
execution = form.find('input', attrs={'name': 'execution'})['value']
psp = form.find('input', attrs={'name': 'psp'})['value']  # password加密方式
_data = {
    'email': '输入你的账号',
    'password': '输入你的密码',
    'icode': '',
    'origURL': 'http://www.renren.com/home',
    'domain': 'renren.com',
    'key_id': '1',
    'captcha_type': 'web_login',
    'rt': '',
    'execution': execution,
    '_eventId': 'submit',
    'lt': lt
}
response = session.post(action_url, data=_data)

3.多次发送请求并获取新鲜事信息，解析HTML结构并提取需要的内容

for i in range(10):  # 获取十页新鲜事信息
    url = 'http://www.renren.com/feedretrieve?requestToken=-136 ed017-c5c0-4dd2-a69e-5c17bfdc5afd&start={}&limit=10&publisher=0&_=1510234173952'\
          .format(10*i)
    response = session.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    for item in soup.find_all('div', attrs={'class': 'content'}):
        print(item.text.strip())

这里给出两条示例说明：

示例1：

如果你想要保存获取到的新鲜事信息到文件中，可以使用以下代码：

with open('news.txt', 'w', encoding='utf-8') as f:
    for i in range(10):
        url = 'http://www.renren.com/feedretrieve?requestToken=-136 ed017-c5c0-4dd2-a69e-5c17bfdc5afd&start={}&limit=10&publisher=0&_=1510234173952'\
              .format(10*i)
        response = session.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        for item in soup.find_all('div', attrs={'class': 'content'}):
            f.write(item.text.strip() + '\n')  # 将新鲜事信息写入文件中，每个新鲜事占一行
        print('第{}页新鲜事已保存'.format(i+1))

示例2：

如果你想将获取到的新鲜事信息存入数据库中，可以使用以下代码：

首先，需要安装pymysql模块：

!pip install pymysql

然后，将以下代码插入到第三个步骤中的for循环中：

import pymysql

# 连接数据库
db = pymysql.connect(host='localhost', user='root', password='password', db='test', charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor)
cursor = db.cursor()
for item in soup.find_all('div', attrs={'class': 'content'}):
    sql = "INSERT INTO `news` (`content`) VALUES ('{}')".format(item.text.strip())
    cursor.execute(sql)
db.commit()
db.close()
print('第{}页新鲜事已保存至数据库'.format(i+1))

这样，就可以将获取到的新鲜事信息存储到名为news的数据表中。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：Python实现登录人人网并抓取新鲜事的方法 - Python技术站