HttpClient是一个开源的HTTP客户端库,通常用于在Java应用程序中进行HTTP请求并处理服务器响应。通常我们可以使用HttpClient来抓取网页的内容。接下来我就来详细讲解一下HttpClient抓取网页的两种方式的完整攻略。
方式一:使用HttpGet方法抓取网页
这是使用HTTP GET请求方法抓取网页内容的步骤:
1. 添加依赖
首先,我们需要添加HttpClient的依赖,这里我们以Maven为例:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
2. 创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
3. 创建HttpGet对象
HttpGet httpGet = new HttpGet("http://www.example.com");
4. 发送请求并获取响应
CloseableHttpResponse response = httpClient.execute(httpGet);
5. 解析响应并获取网页内容
HttpEntity entity = response.getEntity();
String content = EntityUtils.toString(entity, "UTF-8");
这样我们就可以得到网页的内容了。
下面是一个示例:
public static void main(String[] args) {
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpGet httpGet=new HttpGet("http://www.example.com");
try {
CloseableHttpResponse response=httpClient.execute(httpGet);
HttpEntity entity=response.getEntity();
String content=EntityUtils.toString(entity,"UTF-8");
System.out.println(content);
} catch (IOException e) {
e.printStackTrace();
}
}
方式二:使用HttpPost方法抓取网页
这是使用HTTP POST请求方法抓取网页内容的步骤:
1. 添加依赖
首先,我们需要添加HttpClient的依赖,这里我们以Maven为例:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
2. 创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
3. 创建HttpPost对象
HttpPost httpPost = new HttpPost("http://www.example.com");
4. 设置POST参数
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("username", "example"));
params.add(new BasicNameValuePair("password", "password"));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, "UTF-8");
httpPost.setEntity(entity);
5. 发送请求并获取响应
CloseableHttpResponse response = httpClient.execute(httpPost);
6. 解析响应并获取网页内容
HttpEntity entity = response.getEntity();
String content = EntityUtils.toString(entity, "UTF-8");
这样我们就可以得到网页的内容了。
下面是一个示例:
public static void main(String[] args) {
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpPost httpPost=new HttpPost("http://www.example.com");
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("username", "example"));
params.add(new BasicNameValuePair("password", "password"));
try {
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, "UTF-8");
httpPost.setEntity(entity);
CloseableHttpResponse response=httpClient.execute(httpPost);
HttpEntity resEntity=response.getEntity();
String content=EntityUtils.toString(resEntity,"UTF-8");
System.out.println(content);
} catch (IOException e) {
e.printStackTrace();
}
}
以上就是使用HttpClient抓取网页的两种方式的完整攻略。
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:HttpClient抓取网页的两种方式 - Python技术站