Java操作HDFS文件系统通常需要遵循如下步骤:
- 连接HDFS
通过FileSystem
类的静态方法get()
可获取HDFS文件系统的实例:
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem hdfs = FileSystem.get(conf);
- 创建文件
调用FileSystem
实例的create()
或mkdir()
方法可在HDFS中创建文件或目录:
Path path = new Path("/hdfs/test.txt");
FSDataOutputStream outputStream = hdfs.create(path);
Path dir = new Path("/hdfs/test");
hdfs.mkdirs(dir);
- 上传文件
使用FSDataOutputStream
类的write()
方法可将文件内容写入到HDFS中:
String content = "Hello World!";
outputStream.write(content.getBytes("UTF-8"));
outputStream.close();
- 下载文件
使用FileSystem
的open()
方法可打开HDFS上的文件,然后可以通过FSDataInputStream
类的read()
方法读取文件内容:
FSDataInputStream inputStream = hdfs.open(path);
byte[] buffer = new byte[1024];
int bytesRead = 0;
StringBuilder sb = new StringBuilder();
while ((bytesRead = inputStream.read(buffer)) > 0) {
sb.append(new String(buffer, 0, bytesRead, "UTF-8"));
}
inputStream.close();
- 删除文件
使用FileSystem
的delete()
方法可删除HDFS上的文件:
hdfs.delete(path, false);
示例1:上传本地文件到HDFS
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem hdfs = FileSystem.get(conf);
Path localPath = new Path("file:///home/user/test.txt");
Path hdfsPath = new Path("/hdfs/test.txt");
FSDataOutputStream outputStream = hdfs.create(hdfsPath);
FSDataInputStream inputStream = new BufferedFSInputStream(new FileInputStream(localPath));
byte[] buffer = new byte[1024];
int bytesRead = 0;
while ((bytesRead = inputStream.read(buffer)) > 0) {
outputStream.write(buffer, 0, bytesRead);
}
inputStream.close();
outputStream.close();
示例2:列出HDFS目录中的文件
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem hdfs = FileSystem.get(conf);
Path path = new Path("/hdfs");
RemoteIterator<LocatedFileStatus> it = hdfs.listFiles(path, true);
while (it.hasNext()) {
LocatedFileStatus fileStatus = it.next();
System.out.println(fileStatus.getPath().toString());
}
本站文章如无特殊说明,均为本站原创,如若转载,请注明出处:Java操作hdfs文件系统过程 - Python技术站