快速入门
介绍
MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc). It supports:
- PowerPoint
- Word
- Excel
- Images (EXIF metadata and OCR)
- Audio (EXIF metadata and speech transcription)
- HTML
- Text-based formats (CSV, JSON, XML)
- ZIP files (iterates over contents)
URL
快速运行
下载源码:
git clone https://github.com/microsoft/markitdown.git
注意:请使用
Dockerfile替换原repo中的Dockerfile,增加了apt和pip的镜像,用于加快构建。
构建并运行:
docker build --network=host -t markitdown:latest .
docker run --network=host --rm -i markitdown:latest < ./assets/deepseek_v3.pdf > deepseek_v3.md