Python实现将xml导入至excel

下面是Python实现将xml导入至excel的完整实例教程，步骤包括：

一、安装第三方库
我们需要使用两个第三方库：pandas、xml.etree.ElementTree。pandas是Python的数据分析库，可以将数据格式化输出到Excel表格中；xml.etree.ElementTree是Python的标准库，可以解析和导入xml文件。如果您还没有安装这两个库，请打开CMD或终端运行以下命令进行安装（需要联网）：

pip install pandas
pip install openpyxl

二、导入xml文件
接下来，我们需要向Python程序中导入xml文件。以下是xml文件的示例内容：

<?xml version="1.0" encoding="UTF-8"?>
<books>
  <book>
    <id>1</id>
    <title>Python for beginners</title>
    <author>Lisa Smith</author>
    <price>19.99</price>
  </book>
  <book>
    <id>2</id>
    <title>Python tips and tricks</title>
    <author>Joel Williams</author>
    <price>24.99</price>
  </book>
</books>

此示例中，我们编写了一个books和两本书有关的信息。我们将其保存为一个名为“books.xml”的文件。

在Python代码中，我们需要使用ElementTree库来解析xml文件。以下是Python代码示例：

import xml.etree.ElementTree as ET

tree = ET.parse('books.xml')
root = tree.getroot()

代码解析：
a) 我们首先导入ElementTree库，并命名为ET。
b) 我们打开了文件“books.xml”，并使用ET.parse()方法将其解析为一个对象。
c) 我们使用tree.getroot()将解析器中的根元素提取出来，并将其存储在命名为“root”的变量中。

三、将xml文件导入至excel文件
现在，我们将使用pandas库将xml文件的数据导入到Excel文件中。以下是Python代码示例：

import pandas as pd

df = pd.DataFrame(columns=['id', 'title', 'author', 'price'])

for book in root.findall('book'):
    id = book.find('id').text
    title = book.find('title').text
    author = book.find('author').text
    price = book.find('price').text
    df = df.append(pd.Series([id, title, author, price], index=['id', 'title', 'author', 'price']), ignore_index=True)

df.to_excel('books.xlsx', index=False)

代码解析：
a) 我们首先导入pandas库，并将其命名为pd。
b) 我们创建了一个空的DataFrame，包含四个列：id、title、author和price。
c) 我们使用for循环遍历每本书，并使用find()方法从xml中获取书的详细信息。
d) 我们创建了一个Series，其中包含id、title、author和price，并将其添加到DataFrame中。
e) 我们使用df.to_excel()将DataFrame导出到Excel文件中，并将其命名为“books.xlsx”。

示例说明：
以上是基于上方的“books.xml”文件生成的Excel表格。请注意，所有数据均按正确排列在正确的列中。

四、导入具有属性的xml文件
在某些情况下，我们的XML可能包含有属性的标签。以下是带有属性的XML示例：

<?xml version="1.0" encoding="UTF-8"?>
<books>
  <book id="1">
    <title>Python for beginners</title>
    <author>Lisa Smith</author>
    <price currency="USD">19.99</price>
  </book>
  <book id="2">
    <title>Python tips and tricks</title>
    <author>Joel Williams</author>
    <price currency="EUR">24.99</price>
  </book>
</books>

此示例中，我们已将价格标记中的货币code属性添加到xml文件中。

为了将xml文件导入到Excel文件中，我们需要导入这些标记。以下是Python代码示例：

import pandas as pd

df = pd.DataFrame(columns=['id', 'title', 'author', 'price', 'currency'])

for book in root.findall('book'):
    id = book.get('id')
    title = book.find('title').text
    author = book.find('author').text
    price = book.find('price').text
    currency = book.find('price').get('currency')
    df = df.append(pd.Series([id, title, author, price, currency], index=['id', 'title', 'author', 'price', 'currency']), ignore_index=True)

df.to_excel('books.xlsx', index=False)

代码解析：
a) 我们首先导入pandas库，并将其命名为pd。
b) 我们创建了包含五列（id、title、author、price和currency）的空DataFrame。
c) 我们使用get()方法从book标记中获取id属性。
d) 我们使用find()方法从xml中获取title、author、price标记和currency属性。
e) 最后，我们将Series添加到DataFrame中，并使用df.to_excel()导出数据到Excel文件中。

示例说明：
以上是基于上方的带有属性的XML文件“books2.xml”生成的Excel文件。请注意，所有数据均按正确排列在正确的列中，包括货币代号。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：Python实现将xml导入至excel - Python技术站

Python实现将xml导入至excel

相关文章