| title | sidebarTitle | description |
|---|---|---|
Convert documents to markdown using Python and MarkItDown |
Convert docs to markdown |
Learn how to use Trigger.dev with Python to convert documents to markdown using MarkItDown. |
import PythonLearnMore from "/snippets/python-learn-more.mdx";
This project uses Trigger.dev v4 (which is currently in beta as of 28 April 2025). If you want to run this project you will need to [upgrade to v4](/upgrade-to-v4).Convert documents to markdown using Microsoft's MarkItDown library. This can be especially useful for preparing documents in a structured format for AI applications.
- A project with Trigger.dev initialized
- Python installed on your local machine. This example requires Python 3.10 or higher.
- A Trigger.dev task which downloads a document from a URL and runs the Python script which converts it to markdown
- A Python script to convert documents to markdown using Microsoft's MarkItDown library
- Uses our Python build extension to install dependencies and run Python scripts
<Card title="View the project on GitHub" icon="GitHub" href="https://github.com/triggerdotdev/examples/tree/main/python-doc-to-markdown-converter"
Click here to view the full code for this project in our examples repository on GitHub. You can fork it and use it as a starting point for your own project.
After you've initialized your project with Trigger.dev, add these build settings to your trigger.config.ts file:
import { pythonExtension } from "@trigger.dev/python/extension";
import { defineConfig } from "@trigger.dev/sdk/v3";
export default defineConfig({
runtime: "node",
project: "<your-project-ref>",
// Your other config settings...
build: {
extensions: [
pythonExtension({
// The path to your requirements.txt file
requirementsFile: "./requirements.txt",
// The path to your Python binary
devPythonBinaryPath: `venv/bin/python`,
// The paths to your Python scripts to run
scripts: ["src/python/**/*.py"],
}),
],
},
});This task uses the python.runScript method to run the markdown-converter.py script with the given document URL as an argument.
import { task } from "@trigger.dev/sdk/v3";
import { python } from "@trigger.dev/python";
import * as fs from "fs";
import * as path from "path";
import * as os from "os";
export const convertToMarkdown = task({
id: "convert-to-markdown",
run: async (payload: { url: string }) => {
const { url } = payload;
// STEP 1: Create temporary file with unique name
const tempDir = os.tmpdir();
const fileName = `doc-${Date.now()}-${Math.random().toString(36).substring(2, 7)}`;
const urlPath = new URL(url).pathname;
const extension = path.extname(urlPath) || ".docx";
const tempFilePath = path.join(tempDir, `${fileName}${extension}`);
// STEP 2: Download file from URL
const response = await fetch(url);
const buffer = await response.arrayBuffer();
await fs.promises.writeFile(tempFilePath, Buffer.from(buffer));
// STEP 3: Run Python script to convert document to markdown
const pythonResult = await python.runScript("./src/python/markdown-converter.py", [
JSON.stringify({ file_path: tempFilePath }),
]);
// STEP 4: Clean up temporary file
fs.unlink(tempFilePath, () => {});
// STEP 5: Process result
if (pythonResult.stdout) {
const result = JSON.parse(pythonResult.stdout);
return {
url,
markdown: result.status === "success" ? result.markdown : null,
error: result.status === "error" ? result.error : null,
success: result.status === "success",
};
}
return {
url,
markdown: null,
error: "No output from Python script",
success: false,
};
},
});Add the following to your requirements.txt file. This is required in Python projects to install the dependencies.
markitdown[all]The Python script uses MarkItDown to convert documents to Markdown format.
import json
import sys
import os
from markitdown import MarkItDown
def convert_to_markdown(file_path):
"""Convert a file to markdown format using MarkItDown"""
# Check if file exists
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
# Initialize MarkItDown
md = MarkItDown()
# Convert the file
try:
result = md.convert(file_path)
return result.text_content
except Exception as e:
raise Exception(f"Error converting file: {str(e)}")
def process_trigger_task(file_path):
"""Process a file and convert to markdown"""
try:
markdown_result = convert_to_markdown(file_path)
return {
"status": "success",
"markdown": markdown_result
}
except Exception as e:
return {
"status": "error",
"error": str(e)
}
if __name__ == "__main__":
# Get the file path from command line arguments
if len(sys.argv) < 2:
print(json.dumps({"status": "error", "error": "No file path provided"}))
sys.exit(1)
try:
config = json.loads(sys.argv[1])
file_path = config.get("file_path")
if not file_path:
print(json.dumps({"status": "error", "error": "No file path specified in config"}))
sys.exit(1)
result = process_trigger_task(file_path)
print(json.dumps(result))
except Exception as e:
print(json.dumps({"status": "error", "error": str(e)}))
sys.exit(1)- Create a virtual environment
python -m venv venv - Activate the virtual environment, depending on your OS: On Mac/Linux:
source venv/bin/activate, on Windows:venv\Scripts\activate - Install the Python dependencies
pip install -r requirements.txt. Make sure you have Python 3.10 or higher installed. - Copy the project ref from your Trigger.dev dashboard and add it to the
trigger.config.tsfile. - Run the Trigger.dev CLI
devcommand (it may ask you to authorize the CLI if you haven't already). - Test the task in the dashboard by providing a valid document URL.
- Deploy the task to production using the Trigger.dev CLI
deploycommand.
- Convert various file formats to Markdown:
- Office formats (Word, PowerPoint, Excel)
- PDFs
- Images (with optional LLM-generated descriptions)
- HTML, CSV, JSON, XML
- Audio files (with optional transcription)
- ZIP archives
- And more
- Preserve document structure (headings, lists, tables, etc.)
- Handle multiple input methods (file paths, URLs, base64 data)
- Optional Azure Document Intelligence integration for better PDF and image conversion