About Me
Мy name is Dzianis Kotau. I'm Solutions Architect and Zend Certified PHP Engineer. I'm PHP evangelist and loved in it.
Auto Generating Table of Contents in Markdown
Dzianis Kotau • January 26, 2020
phpMost popular PHP Markdown parsers like Markdown Extra (is used by Sculpin) or Parsedown Extra (is used by Jigsaw) can't generated Table of Contents. Despite this, Sculpin can generate TOC. I became interested in it and researched how Sculpin does this.
Table of Contents
Researching
While reading Sculpin's core code I found such class:
<?php
// vendor/sculpin/sculpin/src/Sculpin/Bundle/MarkdownBundle/MarkdownConverter.php
// ...
final class MarkdownConverter implements ConverterInterface, EventSubscriberInterface
{
public function __construct(ParserInterface $markdown, array $extensions = [])
{
$this->markdown = $markdown;
if ($this->markdown instanceof Markdown) {
$this->markdown->header_id_func = [$this, 'generateHeaderId'];
}
$this->extensions = $extensions;
}
// ...
/**
* This method is called to generate an id="" attribute for a header.
*
* @internal
*
* @param string $headerText raw markdown input for the header name
*/
public function generateHeaderId(string $headerText): string
{
// $headerText is completely raw markdown input. We need to strip it
// from all markup, because we are only interested in the actual 'text'
// part of it.
// Step 1: Remove html tags.
$result = strip_tags($headerText);
// Step 2: Remove all markdown links. To do this, we simply remove
// everything between ( and ) if the ( occurs right after a ].
$result = preg_replace('%
(?<= \\]) # Look behind to find ]
(
\\( # match (
[^\\)]* # match everything except )
\\) # match )
)
%x', '', $result);
// Step 3: Convert spaces to dashes, and remove unwanted special
// characters.
$map = [
' ' => '-',
'(' => '',
')' => '',
'[' => '',
']' => '',
];
return rawurlencode(strtolower(
strtr($result, $map)
));
}
}
Look at his line of code
$this->markdown->header_id_func = [$this, 'generateHeaderId'];
header_id_func
is Markdown's callback function to generate header id attribute:
<?php
// vendor/michelf/php-markdown/Michelf/Markdown.php
/**
* Optional header id="" generation callback function.
* @var callable|null
*/
public $header_id_func = null;
/**
* If a header_id_func property is set, we can use it to automatically
* generate an id attribute.
*
* This method returns a string in the form id="foo", or an empty string
* otherwise.
* @param string $headerValue
* @return string
*/
protected function _generateIdFromHeaderValue($headerValue) {
if (!is_callable($this->header_id_func)) {
return "";
}
$idValue = call_user_func($this->header_id_func, $headerValue);
if (!$idValue) {
return "";
}
return ' id="' . $this->encodeAttribute($idValue) . '"';
}
and generateHeaderId()
method from Sculpin core does the rest of the job.
Auto generating
The good news here is that you don't even need to care about manual and unique id
generating. PhpStorm IDE
(that most PHP developers should use) with Markdown (bundled) plugin will help you. PhpStorm uses the same algorithm
for id
generating that Sculpin's generateHeaderId()
method does. All you need to do is to write your Markdown
article with headers, create TOC and then use PhpStorm's autocompletion:
By the way, GitHub uses the same algorithm too. So, you can use PhpStorm to easily create README.md and other Markdown files.
Standalone library
I decided to extract Sculpin's solution into standalone library that anyone can use it in their own work. I published sources on my GitHub. Please, feel free to use it.
Additional notes
The point of view of PHP Markdown's author is that TOC generating is not the work of his parser. So, it will be never included in his library.