使用Python批量提取Word文档中的图片（支持多个Word）

❗❗❗本文最后更新于 301 天前，其中的信息可能已经过时；如有错误请在文章下方评论✅，欢迎纠错🥰！

项目背景

很久之前听说了图床的好处（节省服务器磁盘、托管后便于管理等等）之后，便使用了腾讯云的COS用来存放文章的图片，中途又瞎搞装了很多缓存插件、加之先前没有替换至COS的媒体库图片弄的乱七八糟，导致博客主页非常卡顿，每次页面渲染都需要极其之久（WordPress推荐的响应临界值为600ms，我的博客响应时间为3000+ms）；

随后又更换了一次服务器与域名，因此我决心将图片全部存放至媒体库，减少https请求，中途我也装了Redis对象缓存插件，所以我现在只需要将这个大工程（图床的图片URL全部替换为媒体库中的图片）完成即可。

逻辑构思

联想到博客文章的原始文档为word文件，将图片保存在本地还需要一张一张另存为，本来就患懒癌的我懒上加懒，因此通过Python编写了一个批量提取word文档中的图片的脚本，提取完之后，直接一键上传媒体库，这样就方便多了。

先后尝试了很多第三方库（python-office、python-docx、docx2），甚至让ChatGPT帮我编写了很多代码，前前后后踩了很多坑，最终还是成功运行了代码，在最后的最后，我将固定参数用input函数替换掉，并用pyinstaller库打包为了exe，代码与脚本需要自取即可。

代码编写

import os
import docx
import re
import time


def words2imgs2(word_files_path, output_path):
    # 检查输出路径是否存在，如果不存在则创建
    if not os.path.exists(output_path):
        os.makedirs(output_path)

    # 遍历所有Word文件
    for root, _, files in os.walk(word_files_path):
        for file in files:
            if file.endswith(".docx"):
                word_path = os.path.join(root, file)
                # 创建以Word文件名命名的目录
                word_folder_name = os.path.splitext(file)[0]
                word_output_path = os.path.join(output_path, word_folder_name)

                # 检查目录是否存在，如果不存在则创建
                if not os.path.exists(word_output_path):
                    os.makedirs(word_output_path)

                # 提取Word文件中的图片
                word2img2(word_path, word_output_path)


def word2img2(word_path, result_path):
    doc = docx.Document(word_path)
    dict_rel = doc.part._rels
    for rel in dict_rel:
        rel = dict_rel[rel]
        if "image" in rel.target_ref:
            img_name = re.findall("/(.*)", rel.target_ref)[0]
            timestamp = int(time.time())  # 获取当前时间戳
            img_name = f'{timestamp}_{img_name}'  # 使用时间戳作为图片名称
            with open(os.path.join(result_path, img_name), "wb") as f:
                f.write(rel.target_part.blob)


def main():
    wordFiles = input("请输入存放word文档的目录路径：")
    WordImgs = input("请输入存放word文档中图片的目录路径：")
    words2imgs2(
        fr'{wordFiles}',
        fr"{WordImgs}")


if __name__ == '__main__':
    main()

脚本地址：Word文档图片批量提取.exe

发送评论编辑评论

项目背景

逻辑构思

代码编写

发送评论 编辑评论

推荐文章

发送评论编辑评论