6 users online. Create an account or sign in to join them.Users
Markdown Text Formatter
This is an open discussion with 49 replies, filed under Extensions.
Search
It might be agood idea to open the issue tracker in the GitHub repo. Extension issues here on the Symphony website are really hard to keep track of.
(I noticed that the HTMLPurifier version which is bundled with the extension is rather outdated.)
Cheers. I have opened up the issue tracker on Git and posted this issue.
Could some sort of Site root variable be parsed through Markdown if I was wishing to post links to images or even display images in a textarea with Markdown?
Just looking that this entry on Skrivr.com for handling images and they seemed to be using {{page}} parethesis for specifiying it I think.
HTML purifier came out on top in terms of catching invalid markup, maintaining data integrity after sanitisation and removing vulnerability exploits.
Confirmed. I researched a bunch of libraries for custom built systems (before I was introduced to Symphony). While HTML purifier was much slower and larger than other libraries, it was also the only one that really seemed to work reliably.
I think HTML Purifier as an event filter makes sense. HTML sanitisation is a relevant issue not just for markdown but for all types text formatters that must interface with frontend user-generated content.
Would be great. I also proposed something like this a while ago.
Since HTML sanitizing is fairly important to prevent XSS and broken pages, and since it's only available with the basic Markdown formatter, it sounds like Markdown Extra and Smarty Pants aren't really supported by Symphony and won't be for some time.
So it probably doesn't matter that Markdown Extra has been updated recently, but I thought I'd throw it out there.
The integration branch of Markdown actually has updated these libraries:
- Update HTML Purifier to 4.4.0
- Update PHP Markdown to 1.2.5
Markdown Extra and Smartypants are supported, just not in conjunction with the Purifier library. For most use cases where content editing is done by authenticated authors, sanitizing HTML is not an issue.
If you require Markdown Extra, purifying HTML and want to allow Frontend users to do this, it would be possible to write an event filter extension that does so :)
@brendo
Thanks for the info.
For most use cases where content editing is done by authenticated authors, sanitizing HTML is not an issue.
I almost agree with you. XSS by authenticated authors shouldn't be an issue. What concerns me is the XML validity part of Allen's comment.
Let's take a common use case: blogging. Users range from novice to expert, and any of them might insert HTML. Novice users would probably be pasting links, images, or embeds, competent ones tweaking their formatting, extending the perks of Markdown Extra.
For those cases where validity is broken, the page fails. Not exactly a graceful result, one that looks bad for me, creates user frustration and confusion, and would likely result in any of these users reaching out to me for a fix, explanation, or training on how to get X done right. My mom would grab a can of Lysol thinking she had uploaded a virus to something.
If it simply resulted in a bit of ugly formatting or tags in the text, no biggie. But allowing a home page collection of blog posts a chance to go bonkers seems like a thing to avoid. If I were developing for expert users, I'd be OK shaming them into writing proper HTML, but that's not a reasonable expectation for novice users.
I'm not sure what HTML scenarios are causing this problem with Markdown, so maybe I'm worrying about a non-issue. If someone can post a link to more info, that might sort me.
@brendo
The xss filter for example only subscribes to a frontend delegate, so I guess it doesn't fire on backend events? Would it be possible to add an event filter to backend events by subscribing to another delegate?
I would like to create an event filter for HTML Purifier (for frontend and backend events) and decouple Purifier from the markdown formatter, so it can be used with other formatters as well.
I'm not sure what HTML scenarios are causing this problem with Markdown, so maybe I'm worrying about a non-issue. If someone can post a link to more info, that might sort me.
Admittedly I tend to ignore these scenarios as I don't believe that raw HTML should be used in these areas. In my experience it's the media embeds that have the most risk to break as some use boolean attributes (allowfullscreen) which is not something XML accepts. Media embed issues can be mitigated by using a dedicated field to do this job (oEmbed or Youtube/Vimeo). I'd be interested in hearing other scenarios though.
Would it be possible to add an event filter to backend events by subscribing to another delegate?
Yes, but it might require you to handle it a little differently. The EntryPreEdit (and EntryPreCreate) delegate receive the Entry object which they can manipulate. You could subscribe to this and then iterate over $entry->getData to sanitize the data. These delegates fire after a text formatter has done it's job.
Create an account or sign in to comment.
HTML Purifier was added when it was discovered that JS-based XSS attacks could be easily added through the frontend (this was before the XSS filter extension). Also, based on our experience with the Symphony forum, about 1% of content breaks XML validity and had rendered affected pages unusable.
Alistair at the time picked out all the invalid content from the Symphony forum and ran it against a few sanitisers. HTML purifier came out on top in terms of catching invalid markup, maintaining data integrity after sanitisation and removing vulnerability exploits.
I remember having argued against integrating HTML Purifier with Markdown, since I also find having an external library that is as large as the entirety of Symphony itself to be at odds with Symphony's philosophies. I can no longer remember why I agreed to bundling HTML purifier with Markdown.
I think HTML Purifier as an event filter makes sense. HTML sanitisation is a relevant issue not just for markdown but for all types text formatters that must interface with frontend user-generated content.