C#批量抓取免费代理并且验证有效性的实战教程
代理服务器可以帮助我们解决一些访问问题,如境外IP访问等。但是,免费代理服务器资源不稳定,稳定的需要付费购买。本教程主要介绍如何使用C#语言进行批量抓取免费代理,并且验证代理有效性。
1. 获取免费代理网站
在开始之前,需要选择一个可靠的免费代理网站。以西刺代理网站为例,网址为:https://www.xicidaili.com/
2. 使用HtmlAgilityPack解析HTML
使用C#语言中的HtmlAgilityPack库,可以简单快捷地解析HTML文件。首先,需要安装HtmlAgilityPack库:
Install-Package HtmlAgilityPack
然后通过以下代码读取网页内容并解析:
// 引用HtmlAgilityPack库
using HtmlAgilityPack;
// 要抓取的URL
string url = "https://www.xicidaili.com/";
// 使用HttpWebRequest获取响应
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3";
var response = (HttpWebResponse)request.GetResponse();
string html = new StreamReader(response.GetResponseStream(), Encoding.UTF8).ReadToEnd();
// 使用HtmlAgilityPack解析HTML
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
// 获取代理列表
var table = doc.DocumentNode.SelectSingleNode("//table[@id='ip_list']");
var rows = table.SelectNodes(".//tr");
foreach (var row in rows)
{
var cells = row.SelectNodes(".//td");
if (cells != null && cells.Count > 1)
{
Console.WriteLine(cells[1].InnerText + ":" + cells[2].InnerText);
}
}
3. 验证代理有效性
获取到代理列表后,需要验证代理的有效性。这里以验证http代理为例:
foreach (var row in rows)
{
var cells = row.SelectNodes(".//td");
if (cells != null && cells.Count > 1)
{
// 获取代理IP和端口号
string ip = cells[1].InnerText;
int port = int.Parse(cells[2].InnerText);
// 创建http代理
WebProxy proxy = new WebProxy(ip, port);
// 使用代理访问百度,如果成功则代理可用
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.baidu.com/");
request.Proxy = proxy;
request.Timeout = 5000;
request.Method = "GET";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Console.WriteLine("有效代理:" + ip + ":" + port);
}
catch (Exception ex)
{
Console.WriteLine("无效代理:" + ip + ":" + port);
}
}
}
示例1:获取代理并输出到文件
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
namespace GetProxy
{
class Program
{
static void Main(string[] args)
{
string url = "https://www.xicidaili.com/";
string fileName = "proxy.txt";
// 使用HttpWebRequest获取响应
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3";
var response = (HttpWebResponse)request.GetResponse();
string html = new StreamReader(response.GetResponseStream(), Encoding.UTF8).ReadToEnd();
// 使用HtmlAgilityPack解析HTML
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
// 获取代理列表
var table = doc.DocumentNode.SelectSingleNode("//table[@id='ip_list']");
var rows = table.SelectNodes(".//tr");
List<string> proxyList = new List<string>();
foreach (var row in rows)
{
var cells = row.SelectNodes(".//td");
if (cells != null && cells.Count > 1)
{
string ip = cells[1].InnerText.Trim();
int port = int.Parse(cells[2].InnerText.Trim());
// 创建http代理
WebProxy proxy = new WebProxy(ip, port);
// 使用代理访问百度,如果成功则代理可用
try
{
HttpWebRequest request2 = (HttpWebRequest)WebRequest.Create("https://www.baidu.com/");
request2.Proxy = proxy;
request2.Timeout = 5000;
request2.Method = "GET";
HttpWebResponse response2 = (HttpWebResponse)request2.GetResponse();
Console.WriteLine("有效代理:" + ip + ":" + port);
// 添加到代理列表
proxyList.Add(ip + ":" + port);
}
catch (Exception ex)
{
Console.WriteLine("无效代理:" + ip + ":" + port);
}
}
}
// 输出到文件
File.WriteAllLines(fileName, proxyList);
Console.WriteLine("代理已保存到文件:" + fileName);
Console.ReadKey();
}
}
}
示例2:获取代理并用于Http请求
using HtmlAgilityPack;
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Text;
namespace GetProxy
{
class Program
{
static void Main(string[] args)
{
string url = "https://www.xicidaili.com/";
// 使用HttpWebRequest获取响应
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3";
var response = (HttpWebResponse)request.GetResponse();
string html = new StreamReader(response.GetResponseStream(), Encoding.UTF8).ReadToEnd();
// 使用HtmlAgilityPack解析HTML
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
// 获取代理列表
var table = doc.DocumentNode.SelectSingleNode("//table[@id='ip_list']");
var rows = table.SelectNodes(".//tr");
foreach (var row in rows)
{
var cells = row.SelectNodes(".//td");
if (cells != null && cells.Count > 1)
{
string ip = cells[1].InnerText.Trim();
int port = int.Parse(cells[2].InnerText.Trim());
// 创建http代理
WebProxy proxy = new WebProxy(ip, port);
// 使用代理访问百度
try
{
HttpWebRequest request2 = (HttpWebRequest)WebRequest.Create("https://www.baidu.com/");
request2.Proxy = proxy;
request2.Timeout = 5000;
request2.Method = "GET";
HttpWebResponse response2 = (HttpWebResponse)request2.GetResponse();
Console.WriteLine("使用代理 " + ip + ":" + port + " 访问百度成功");
Console.WriteLine(new StreamReader(response2.GetResponseStream(), Encoding.UTF8).ReadToEnd());
}
catch (Exception ex)
{
Console.WriteLine("使用代理 " + ip + ":" + port + " 访问百度失败");
}
}
}
Console.ReadKey();
}
}
}
以上两个示例分别实现了将代理输出到文件和使用代理访问百度两个功能。读者可以根据自己的实际需求进行相应的修改和扩展。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:c#批量抓取免费代理并且验证有效性的实战教程 - Python技术站