使用Python获取公众号下所有的文章

获取公众号下所有文章的完整攻略可以分为以下几个步骤：

第一步：获取公众号的cookie

获取公众号下所有文章需要先获取公众号的cookie，方法如下：
1. 打开浏览器，访问公众平台
2. 登录自己的公众号
3. 登录成功后，在浏览器中按下F12键，打开开发者工具
4. 点击“Application”选项卡，找到“Cookies”项并点击
5. 在“Cookies”下找到mp.weixin.qq.com项，点击后找到wxuin和wxsid两个cookie，将其记录下来备用。

第二步：使用Python发起请求

使用Python的requests库向公众号文章首页发送请求，并带上获取到的cookie，获取公众号文章的信息，方法如下：

import requests

cookie = {'wxuin': 'xxxx', 'wxsid': 'xxxx'}
url = 'https://mp.weixin.qq.com/mp/profile_ext'
params = {'action': 'getmsg', 'count': '10', 'f': 'json', 'offset': '0', 'uin': 'xxx'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 
           'Cookie': ';'.join([key+'='+value for key,value in cookie.items()])}
response = requests.get(url=url, headers=headers, params=params)

其中，cookie是第一步获取到的公众号cookie，url是公众号文章首页的链接，params是请求参数，headers是请求头信息。

第三步：解析响应数据

使用json库解析第二步请求获取到的数据，提取出文章的标题、链接等信息，方法如下：

import json

json_data = json.loads(response.text)
article_data = json_data['general_msg_list']
article_data = json.loads(article_data)['list']
for article in article_data:
    title = article['app_msg_ext_info']['title']
    url = article['app_msg_ext_info']['content_url']
    print(title, url)

其中，json.loads()方法将响应数据中的json字符串转换为Python字典数据，article_data获取文章列表信息，遍历article_data，提取出每篇文章的标题和链接。

示例1：获取公众号“伊索”下的所有文章链接

import requests
import json

cookie = {'wxuin': 'xxxx', 'wxsid': 'xxxx'}
url = 'https://mp.weixin.qq.com/mp/profile_ext'
params = {'action': 'getmsg', 'count': '10', 'f': 'json', 'offset': '0', 'uin': 'xxx'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 
           'Cookie': ';'.join([key+'='+value for key,value in cookie.items()])}
response = requests.get(url=url, headers=headers, params=params)

json_data = json.loads(response.text)
article_data = json_data['general_msg_list']
article_data = json.loads(article_data)['list']
for article in article_data:
    title = article['app_msg_ext_info']['title']
    url = article['app_msg_ext_info']['content_url']
    print(title, url)

示例2：获取公众号“机器之心”近期发布的文章链接

import requests
import json

cookie = {'wxuin': 'xxxx', 'wxsid': 'xxxx'}
url = 'https://mp.weixin.qq.com/mp/profile_ext'
params = {'action': 'getmsg', 'count': '10', 'f': 'json', 'offset': '0', 'uin': 'xxx'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 
           'Cookie': ';'.join([key+'='+value for key,value in cookie.items()])}
response = requests.get(url=url, headers=headers, params=params)

json_data = json.loads(response.text)
article_data = json_data['general_msg_list']
article_data = json.loads(article_data)['list']
for article in article_data:
    if '机器之心' in article['app_msg_ext_info']['title']:
        title = article['app_msg_ext_info']['title']
        url = article['app_msg_ext_info']['content_url']
        print(title, url)

在示例2中，我们遍历了所有文章信息，并使用if语句筛选出包含“机器之心”关键字的文章信息。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：使用Python获取公众号下所有的文章 - Python技术站

使用Python获取公众号下所有的文章

第一步：获取公众号的cookie

第二步：使用Python发起请求

第三步：解析响应数据

相关文章