.NetCore承载系统
.NetCore的承载系统, 可以将长时间运行的服务承载于托管进程中, AspNetCore应用其实就是一个长时间运行的服务, 启动AspNetCore应用后, 它就会监听网络请求, 也就是开启了一个监听器, 监听器会将网络请求传递给管道进行处理, 处理后得到Http响应返回
有很多场景都会有服务承载的需求, 比如这篇博文要做的, 定时抓取华为论坛的文章点赞数
爬取文章点赞数
分析
比如这个链接 https://developer.huawei.com/consumer/cn/forum/topicview?tid=0201308791792470245&fid=23 , 点进去不难发现这是用angular做的一个页面, 既然是Angular, 那说明前后端分离了, 浏览器F12查看网络请求
找到对应api请求方法:
POST https://developer.huawei.com/consumer/cn/forum/mid/partnerforumservice/v1/open/getTopicDetail? HTTP/1.1
Host: developer.huawei.com
Content-Type: application/json
Content-Length: 33
{"topicId":"0201302923811480141"}
这里经过我的测试, Content-Type
和Content-Length
必须上面那样的值, 还有body, 你多一个空格请求都会失败
使用HttpClient请求数据
直接看代码吧, 这里使用了依赖注入来注入HttpClientFactory
, 还可以使用强类型的HttpClient
, 具体可以看文档和dudu
博客的这篇博文
工厂参观记:.NET Core 中 HttpClientFactory 如何解决 HttpClient 臭名昭著的问题
private readonly IHttpClientFactory _httpClientFactory;
public async Task<int> Crawl(string link)
{
using (var httpClient = _httpClientFactory.CreateClient())
{
var uri = new Uri(link);
uri.TryReadQueryAsJson(out var queryParams);
var topicId = queryParams["tid"].ToString();
int likeCount = -1;
if (!string.IsNullOrEmpty(topicId))
{
var body = JsonConvert.SerializeObject(
new { topicId },
Formatting.None);
uri = new Uri(_baseUrl);
var jsonContentType = "application/json";
var requestMessage = new HttpRequestMessage
{
RequestUri = uri,
Headers =
{
{ "Host", uri.Host }
},
Method = HttpMethod.Post,
Content = new StringContent(body)
};
requestMessage.Content.Headers.ContentType = new MediaTypeWithQualityHeaderValue(jsonContentType);
requestMessage.Content.Headers.ContentLength = body.Length;
var response = await httpClient.SendAsync(requestMessage);
if (response.StatusCode == HttpStatusCode.OK)
{
dynamic data = await response.Content.ReadAsAsync<dynamic>();
likeCount = data.result.likes;
}
}
return likeCount;
}
}
这里有更简洁的的写法, 使用_httpClient.PostAsJsonAsync()
, 但是考虑到可能需要自定义Content-Type这些请求头, 所以先这样写;
配置承载系统
class Program
{
static void Main()
{
new HostBuilder()
.ConfigureServices(services =>
{
services.AddHttpClient();
services.AddHostedService<LikeCountCrawler>();
})
.Build()
.Run();
}
}
LikeCountCrawler
实现了IHostedService
接口
IHostedService
接口
public interface IHostedService
{
/// <summary>
/// Triggered when the application host is ready to start the service.
/// </summary>
/// <param name="cancellationToken">Indicates that the start process has been aborted.</param>
Task StartAsync(CancellationToken cancellationToken);
/// <summary>
/// Triggered when the application host is performing a graceful shutdown.
/// </summary>
/// <param name="cancellationToken">Indicates that the shutdown process should no longer be graceful.</param>
Task StopAsync(CancellationToken cancellationToken);
}
LikeCountCrawler
在StartAsync
方法中, 设置开启了一个定时器, 定时器每次溢出, 都执行一次爬虫逻辑
private readonly Timer _timer = new Timer();
private readonly IEnumerable<string> _links = new string[]
{
"https://developer.huawei.com/consumer/cn/forum/topicview?tid=0201308791792470245&fid=23",
"https://developer.huawei.com/consumer/cn/forum/topicview?tid=0201303654965850166&fid=18",
"https://developer.huawei.com/consumer/cn/forum/topicview?tid=0201294272503450453&fid=24",
"https://developer.huawei.com/consumer/cn/forum/topicview?tid=0201294189025490019&fid=17"
};
private readonly string _baseUrl = "https://developer.huawei.com/consumer/cn/forum/mid/partnerforumservice/v1/open/getTopicDetail";
...
public Task StartAsync(CancellationToken cancellationToken)
{
_timer.Interval = 5 * 60 * 1000;
_timer.Elapsed += OnTimer;
_timer.AutoReset = true;
_timer.Enabled = true;
_timer.Start();
OnTimer(null, null);
return Task.CompletedTask;
}
public async Task Crawl(IEnumerable<string> links)
{
await Task.Run(() =>
{
Parallel.ForEach(links, async link =>
{
Console.WriteLine($"Crawling link:{link}, ThreadId:{Thread.CurrentThread.ManagedThreadId}");
var likeCount = await Crawl(link);
Console.WriteLine($"Succeed crawling likecount - {likeCount}, ThreadId:{Thread.CurrentThread.ManagedThreadId}");
});
});
}
private void OnTimer(object sender, ElapsedEventArgs args)
{
_ = Crawl(_links);
}
...
运行效果:
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:NetCore控制台程序-使用HostService和HttpClient实现简单的定时爬虫 - Python技术站