A powerful Go library and CLI tool for converting various document formats to Markdown. Marky makes it easy to extract and convert content from different file types into clean, readable Markdown format.
- Multiple Format Support: Convert CSV, EPUB, HTML, Jupiter Notebooks, Word, Excel, PDF, and PowerPoint files to Markdown
- CLI Tool: Easy-to-use command-line interface for quick conversions
- Go Library: Integrate conversion capabilities into your Go applications
- MCP Server: Model Context Protocol server for AI integration
- MIME Type Detection: Automatic file type detection for robust handling
- Extensible Architecture: Plugin-based loader system for easy format additions
Format | Extensions | MIME Types |
---|---|---|
CSV | .csv |
text/csv , application/csv |
EPUB | .epub |
application/epub+zip , application/epub , application/x-epub+zip |
HTML | .html , .htm |
text/html |
Jupyter Notebook | .ipynb |
application/x-ipynb+json , application/json |
Microsoft Word | .docx |
application/vnd.openxmlformats-officedocument.wordprocessingml.document |
Microsoft Excel | .xlsx |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
.pdf |
application/pdf |
|
Microsoft PowerPoint | .pptx |
application/vnd.openxmlformats-officedocument.presentationml.presentation |
Install the CLI tool directly using Go:
go install github.com/flaviodelgrosso/marky/cmd/marky@latest
Install the MCP (Model Context Protocol) server:
go install github.com/flaviodelgrosso/marky/marky-mcp@latest
Add Marky to your Go project:
go get github.com/flaviodelgrosso/marky
Basic usage:
# Convert a file and output to console
marky document.pdf
# Convert a file and save to output file
marky document.docx --output converted.md
marky document.xlsx -o converted.md
# Examples with different formats
marky presentation.pptx -o slides.md
marky data.csv -o table.md
marky webpage.html -o content.md
The MCP server provides AI integration capabilities, allowing AI models to convert documents to Markdown through the Model Context Protocol.
# Start the MCP server
marky-mcp
The server exposes a convert_to_markdown
tool with the following parameters:
input
(required): Path to the input file to convert to markdownoutput
(optional): Path to the output markdown file (defaults to console output)
Configure your AI client (like Claude Desktop) to use the Marky MCP server by adding it to your MCP configuration. The server communicates via stdio and provides document conversion capabilities to AI models.
This is an example of registering the MCP server in your Visual Studio Code settings:
{
"servers": {
"marky-mcp": {
"type": "stdio",
"command": "marky-mcp",
}
}
}
package main
import (
"fmt"
"log"
"github.com/flaviodelgrosso/marky"
)
func main() {
// Initialize Marky with all available loaders
m := marky.New()
// Convert a document to Markdown
result, err := m.Convert("document.pdf")
if err != nil {
log.Fatalf("Conversion failed: %v", err)
}
fmt.Println(result)
}
- Go 1.24.4 or later
- Make (optional, for using Makefile commands)
# Clone the repository
git clone https://github.com/flaviodelgrosso/marky.git
cd marky
# Build the CLI tool
make build
# OR
go build -o bin/marky cmd/marky/main.go
# Build the MCP server
go build -o bin/marky-mcp cmd/marky-mcp/main.go
# Run tests
make test
# OR
go test -v ./...
# Run linting (requires golangci-lint)
make lint
Run the test suite:
go test -v ./...
Test files for various formats are included in the test_files/
directory to ensure proper functionality across all supported document types.
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and add tests
- Run tests:
make test
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
To add support for a new document format:
-
Create a new loader in
internal/loaders/
-
Implement the
DocumentLoader
interface:type DocumentLoader interface { Load(path string) (string, error) CanLoadMimeType(mimeType string) bool }
-
Register the loader in the
New()
function inlib.go
-
Add tests for your new loader
This project is licensed under the ISC License. See the LICENSE file for details.
- html-to-markdown for HTML conversion
- pdf for PDF text extraction
- excelize for Excel file processing
- cobra for CLI framework
- mcp-go for Model Context Protocol implementation
- π Bug Reports: GitHub Issues
- π‘ Feature Requests: GitHub Issues
- π§ Questions: Open a GitHub Discussion
Made with β€οΈ by Flavio Del Grosso