分步骤教你用python一步步提取PPT中的图片

以下是详细的“分步骤教你用python一步步提取PPT中的图片”的攻略：

一、获取PPT文件并导入必要的库

首先需要用Python获取要提取图片的PPT文件，可以使用Python的os或glob库来读取文件。接下来，我们需要导入pptx和PIL这两个库，pptx库是Python处理PPT文件的重要库，PIL库用来处理图片。

import os
from pptx import Presentation
from PIL import Image

二、遍历PPT文件中所有的幻灯片

遍历幻灯片需要使用pptx库中的Presentation类，它能打开一个PPT文件并且让你读取幻灯片的内容。

ppt = Presentation('example.pptx') #example.pptx为需要提取图片的PPT文件名
#遍历幻灯片
for slide in ppt.slides:
    #在这里写代码
    pass

三、遍历每个幻灯片中的所有形状

通过遍历每个幻灯片中的所有形状，可以找到PPT中的图片。在pptx库中，每个幻灯片的形状存储在 shapes 属性中，该属性返回一个 ShapeTree 类对象，该对象包含 slides 中的所有形状。

#遍历幻灯片
for slide in ppt.slides:
    #遍历幻灯片中的所有形状
    for shape in slide.shapes:
        #在这里写代码
        pass

四、识别PPT的图片形状

有了遍历幻灯片中的所有形状的基础，接下来就需要通过一些代码来判断哪些形状是图片。在pptx中，图片形状的类型为Picture，只需要判断shape的类型是否为Picture即可。

#遍历幻灯片
for slide in ppt.slides:
    #遍历幻灯片中的所有形状
    for shape in slide.shapes:
        #找到图片形状
        if shape.shape_type == 6:
            #在这里写代码
            pass

五、将PPT中的图片保存为PNG格式

找到了PPT中的图片形状之后，就需要将其保存为PNG格式的图片。在pptx和PIL库的帮助下，我们可以通过以下代码将每个图片形状保存为PNG格式的图片。

#遍历幻灯片
for slide in ppt.slides:
    #遍历幻灯片中的所有形状
    for shape in slide.shapes:
        #找到图片形状
        if shape.shape_type == 6:
            #获取图片
            image = shape.image
            #提取图片数据
            byte_stream = image.blob
            #将字节数据转换为PIL Image对象
            img_stream = io.BytesIO(byte_stream)
            img = Image.open(img_stream)
            #保存PIL Image对象为PNG格式的图片
            img.save(f"{shape.name}.png")

示例一

假设需要从example.pptx这个文件中提取所有图片。可以使用以下代码：

import os
from pptx import Presentation
from PIL import Image

ppt = Presentation('example.pptx')

#遍历幻灯片
for slide in ppt.slides:
    #遍历幻灯片中的所有形状
    for shape in slide.shapes:
        #找到图片形状
        if shape.shape_type == 6:
            #获取图片
            image = shape.image
            #提取图片数据
            byte_stream = image.blob
            #将字节数据转换为PIL Image对象
            img_stream = io.BytesIO(byte_stream)
            img = Image.open(img_stream)
            #保存PIL Image对象为PNG格式的图片
            img.save(f"{shape.name}.png")

示例二

如果需要从多个PPT文件中提取图片，可以将上述代码放入一个for循环中遍历所有文件。

例如有example1.pptx和example2.pptx两个PPT文件需要提取图片。可以使用以下代码：

import os
from pptx import Presentation
from PIL import Image

#多文件处理
files = ['example1.pptx', 'example2.pptx']

for file in files:
    ppt = Presentation(file)
    #遍历幻灯片
    for slide in ppt.slides:
        #遍历幻灯片中的所有形状
        for shape in slide.shapes:
            #找到图片形状
            if shape.shape_type == 6:
                #获取图片
                image = shape.image
                #提取图片数据
                byte_stream = image.blob
                #将字节数据转换为PIL Image对象
                img_stream = io.BytesIO(byte_stream)
                img = Image.open(img_stream)
                #保存PIL Image对象为PNG格式的图片
                img.save(f"{file}_{shape.name}.png")

这样就可以在循环结束后在每个PPT文件所在的文件夹生成提取出来的PNG图片。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：分步骤教你用python一步步提取PPT中的图片 - Python技术站