Dzianis Kotau
About Me

Мy name is Dzianis Kotau. I'm Solutions Architect and Zend Certified PHP Engineer. I'm PHP evangelist and loved in it.

Auto Generating Table of Contents in Markdown

Dzianis Kotau • January 26, 2020

php

Most popular PHP Markdown parsers like Markdown Extra (is used by Sculpin) or Parsedown Extra (is used by Jigsaw) can't generated Table of Contents. Despite this, Sculpin can generate TOC. I became interested in it and researched how Sculpin does this.

Table of Contents

Researching

While reading Sculpin's core code I found such class:

<?php

// vendor/sculpin/sculpin/src/Sculpin/Bundle/MarkdownBundle/MarkdownConverter.php

// ...

final class MarkdownConverter implements ConverterInterface, EventSubscriberInterface
{
    public function __construct(ParserInterface $markdown, array $extensions = [])
    {
        $this->markdown = $markdown;
        if ($this->markdown instanceof Markdown) {
            $this->markdown->header_id_func = [$this, 'generateHeaderId'];
        }
        $this->extensions = $extensions;
    }

    // ...

    /**
     * This method is called to generate an id="" attribute for a header.
     *
     * @internal
     *
     * @param string $headerText raw markdown input for the header name
     */
    public function generateHeaderId(string $headerText): string
    {

        // $headerText is completely raw markdown input. We need to strip it
        // from all markup, because we are only interested in the actual 'text'
        // part of it.

        // Step 1: Remove html tags.
        $result = strip_tags($headerText);

        // Step 2: Remove all markdown links. To do this, we simply remove
        // everything between ( and ) if the ( occurs right after a ].
        $result = preg_replace('%
            (?<= \\]) # Look behind to find ]
            (
                \\(     # match (
                [^\\)]* # match everything except )
                \\)     # match )
            )

            %x', '', $result);

        // Step 3: Convert spaces to dashes, and remove unwanted special
        // characters.
        $map = [
            ' ' => '-',
            '(' => '',
            ')' => '',
            '[' => '',
            ']' => '',
        ];
        return rawurlencode(strtolower(
            strtr($result, $map)
        ));
    }
}

Look at his line of code

$this->markdown->header_id_func = [$this, 'generateHeaderId'];

header_id_func is Markdown's callback function to generate header id attribute:

<?php

// vendor/michelf/php-markdown/Michelf/Markdown.php

/**
 * Optional header id="" generation callback function.
 * @var callable|null
 */
public $header_id_func = null;

/**
 * If a header_id_func property is set, we can use it to automatically
 * generate an id attribute.
 *
 * This method returns a string in the form id="foo", or an empty string
 * otherwise.
 * @param  string $headerValue
 * @return string
 */
protected function _generateIdFromHeaderValue($headerValue) {
    if (!is_callable($this->header_id_func)) {
        return "";
    }

    $idValue = call_user_func($this->header_id_func, $headerValue);
    if (!$idValue) {
        return "";
    }

    return ' id="' . $this->encodeAttribute($idValue) . '"';
}

and generateHeaderId() method from Sculpin core does the rest of the job.

Auto generating

Phpstrom Markdown Plugin

The good news here is that you don't even need to care about manual and unique id generating. PhpStorm IDE (that most PHP developers should use) with Markdown (bundled) plugin will help you. PhpStorm uses the same algorithm for id generating that Sculpin's generateHeaderId() method does. All you need to do is to write your Markdown article with headers, create TOC and then use PhpStorm's autocompletion:

TOC Auto Generating

By the way, GitHub uses the same algorithm too. So, you can use PhpStorm to easily create README.md and other Markdown files.

Standalone library

I decided to extract Sculpin's solution into standalone library that anyone can use it in their own work. I published sources on my GitHub. Please, feel free to use it.

Additional notes

The point of view of PHP Markdown's author is that TOC generating is not the work of his parser. So, it will be never included in his library.