Skip to content

feat: extract Markdown frontmatter metadata#11615

Open
gyx09212214-prog wants to merge 1 commit into
deepset-ai:mainfrom
gyx09212214-prog:codex/markdown-frontmatter-metadata
Open

feat: extract Markdown frontmatter metadata#11615
gyx09212214-prog wants to merge 1 commit into
deepset-ai:mainfrom
gyx09212214-prog:codex/markdown-frontmatter-metadata

Conversation

@gyx09212214-prog

Copy link
Copy Markdown

Summary

Adds optional YAML frontmatter extraction to MarkdownToDocument.

When extract_frontmatter=True, Markdown files beginning with --- ... --- are parsed with PyYAML. Mapping values are added to Document.meta and the frontmatter block is removed before rendering document content. The default remains unchanged, so existing users keep frontmatter in the converted content unless they opt in.

Metadata precedence is ByteStream.meta < frontmatter < run(meta=...), matching the existing behavior where explicit runtime metadata can override source metadata. Date-like YAML scalars are kept as strings so common note fields like date: 2026-06-12 remain JSON-serializable metadata.

This is useful for Markdown/RAG ingestion pipelines where source notes carry fields like ticker, source, report date, author, or document id in frontmatter and downstream retrievers need them as metadata filters or citations.

Tests

  • python -m pytest test/components/converters/test_markdown_to_document.py -q
  • python -m py_compile haystack\components\converters\markdown.py test\components\converters\test_markdown_to_document.py
  • python -m ruff check haystack\components\converters\markdown.py test\components\converters\test_markdown_to_document.py
  • git diff --check

@gyx09212214-prog gyx09212214-prog requested a review from a team as a code owner June 12, 2026 17:13
@gyx09212214-prog gyx09212214-prog requested review from anakin87 and removed request for a team June 12, 2026 17:13
@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

@gyx09212214-prog is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant

CLAassistant commented Jun 12, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants