Determining what markdown parser, renderer, syntax highlighter to choose can be a tricky thing. One caveat that I had was that I wanted to work with React and wanting to render to React elements rather than just rendering to a big string and setting innerHTML.
There is one main reason for this and it’s security. With innerHTML or dangerouslySetInnerHTML this opens up for user injected script tags that can compromise your users. This is generally an issue if you’re accepting content from users to display to other users.
One specific case that is generally an area of conflict is syntax highlighting. Syntax highlighting is hard because you need to parse and understand different languages. That parsing needs to be turned into different DOM elements with css classes applied so that your CSS theme you’ve added will be applied.
There are some options but 2 main options out there are https://highlightjs.org/ and https://prismjs.com/. I found that
highlightjs has more support and examples but in general has very poor support for JSX. So I decided to go with PrismJS but ran into issues supporting that within a React environment.
That’s where Unified, Remark, and Rehype comes into play. Lets look at what each of these does.
Unified is an interface for processing text with syntax trees and transforming between them. Creating an AST of the text it can be transformed by plugins. Remark is one of those plugins that will parse the markdown text into an AST that will let us apply other plugins and create an output. In our case we then transform remark to Rehype which turns that AST into HTML. We finally use the
rehype-react component to turn that HTML into React rather than a raw HTML string.
This ends up being a series of functions that you pipe your content to.
Now where does syntax highlighting fit into that flow? Well there is a plugin out there called rehype-prism however it’s not recommended for browser usage because it includes syntax highlighting for every language via refractor. It would increase your bundle size 352kb (128kb GZipped). That’s a lot!
So we’ll create our own refractor prism highlighter!
This module was 100% pulled from https://github.com/mapbox/rehype-prism and modified to be able to support just the languages that I needed to support. However I’ll still step through and talk about each bit that goes into creating it.
First step let’s follow the recommendation from
refractor and just pull in
refractor/core and the languages we want to support.
Some of these have overlap. For example
jsx actually extends the
jsx. I aliased the
jsx module to just
js so that I get benefits of JSX without changing much of my existing markdown.
This name will refer to the language you apply after the first triple back tick for your code block
Next we create our module. I won’t go through the whole thing but essentially an AST is provided. With the
visit module we visit each thing declared an
element and call the
visitor function checks if we’re dealing with a
code tag which is an indicator we’re dealing with code. Otherwise we return and move on to the next nodes. Then we parse out the language that we’re dealing with. Finally with
refractor we can apply the class names and highlight the string of code with the language we’ve found.
Okay we’ve got our custom plugin created, now lets put together a Markdown rendering component.
We’ll bring in
unified to pipe all our plugins through. Then we register
remark-parse to parse our markdown. Next we use
remark-rehype to convert our markdown AST to rehype. Then we apply our plugin we created
rehypePrism to do our virtual syntax highlighting. Finally we use
rehype-react to turn our final AST into React components.
The processor looks something like this.
Nothing will happen until we supply it with markdown to process. So we create a
Markdown component, agree on a prop we’ll call
source. We then take our
processor we created and call
processSync since we want to synchronously process the markdown in our render. Then return the
Fragment surrounding the output as I’m unsure if there is ever a case where
contents would return multiple roots. So wrapping it in
Fragment is just cautionary.
You can grab your themes from where ever but I grabbed it from CloudFlare CDN. Which you can find here https://cdnjs.com/libraries/prism. Additionally some themes are shipped with
prism however it was easier to link to a hosted solution versus adding a CSS Loader for my existing setup.
Yes all of that just to highlight and parse markdown but there are many other powerful features that are supported with unified and operating in this environment. For example parsing out a table of contents from your markdown.
You can find many plugins for the rehype/unified system here https://github.com/rehypejs/rehype/blob/master/doc/plugins.md.