HttpClient抓取网页的两种方式

HttpClient是一个开源的HTTP客户端库，通常用于在Java应用程序中进行HTTP请求并处理服务器响应。通常我们可以使用HttpClient来抓取网页的内容。接下来我就来详细讲解一下HttpClient抓取网页的两种方式的完整攻略。

方式一：使用HttpGet方法抓取网页

这是使用HTTP GET请求方法抓取网页内容的步骤：

1. 添加依赖

首先，我们需要添加HttpClient的依赖，这里我们以Maven为例：

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.13</version>
</dependency>

2. 创建HttpClient对象

CloseableHttpClient httpClient = HttpClients.createDefault();

3. 创建HttpGet对象

HttpGet httpGet = new HttpGet("http://www.example.com");

4. 发送请求并获取响应

CloseableHttpResponse response = httpClient.execute(httpGet);

5. 解析响应并获取网页内容

HttpEntity entity = response.getEntity();
String content = EntityUtils.toString(entity, "UTF-8");

这样我们就可以得到网页的内容了。

下面是一个示例：

public static void main(String[] args) {
    CloseableHttpClient httpClient = HttpClients.createDefault();
    HttpGet httpGet=new HttpGet("http://www.example.com");
    try {
        CloseableHttpResponse response=httpClient.execute(httpGet);
        HttpEntity entity=response.getEntity();
        String content=EntityUtils.toString(entity,"UTF-8");
        System.out.println(content);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

方式二：使用HttpPost方法抓取网页

这是使用HTTP POST请求方法抓取网页内容的步骤：

1. 添加依赖

首先，我们需要添加HttpClient的依赖，这里我们以Maven为例：

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.13</version>
</dependency>

2. 创建HttpClient对象

CloseableHttpClient httpClient = HttpClients.createDefault();

3. 创建HttpPost对象

HttpPost httpPost = new HttpPost("http://www.example.com");

4. 设置POST参数

List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("username", "example"));
params.add(new BasicNameValuePair("password", "password"));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, "UTF-8");
httpPost.setEntity(entity);

5. 发送请求并获取响应

CloseableHttpResponse response = httpClient.execute(httpPost);

6. 解析响应并获取网页内容

HttpEntity entity = response.getEntity();
String content = EntityUtils.toString(entity, "UTF-8");

这样我们就可以得到网页的内容了。

下面是一个示例：

public static void main(String[] args) {
    CloseableHttpClient httpClient = HttpClients.createDefault();
    HttpPost httpPost=new HttpPost("http://www.example.com");
    List<NameValuePair> params = new ArrayList<NameValuePair>();
    params.add(new BasicNameValuePair("username", "example"));
    params.add(new BasicNameValuePair("password", "password"));
    try {
        UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, "UTF-8");
        httpPost.setEntity(entity);
        CloseableHttpResponse response=httpClient.execute(httpPost);
        HttpEntity resEntity=response.getEntity();
        String content=EntityUtils.toString(resEntity,"UTF-8");
        System.out.println(content);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

以上就是使用HttpClient抓取网页的两种方式的完整攻略。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：HttpClient抓取网页的两种方式 - Python技术站

HttpClient抓取网页的两种方式

方式一：使用HttpGet方法抓取网页

1. 添加依赖

2. 创建HttpClient对象

3. 创建HttpGet对象

4. 发送请求并获取响应

5. 解析响应并获取网页内容

方式二：使用HttpPost方法抓取网页

1. 添加依赖

2. 创建HttpClient对象

3. 创建HttpPost对象

4. 设置POST参数

5. 发送请求并获取响应

6. 解析响应并获取网页内容

相关文章