El Dorado

A full-stack community web application written in Ruby/Rails
Tue, 27 May 2008, 1:41pm Updates for bbcodeizer »
irkenInvader
Member
Zim
Registered: May, 2008
Last visit: Wed, 04 Jun 2008
Posts: 85

Well, I think simple_format has an important purpose and does what it does well. The only problem is when you have blocks that don't need it's help mixed in with blocks that do need its help.

The line in question is in the bb function in app/helpers/application_helper.rb:
[code]
...
text = simple_format(bbcodeize(sanitize(h(text))))
...
[/code]

Let's break it out one function at a time, from the inside out.

the h() function (really, just an alias to html_escape()) will escape (dangerous!) html code we don't want our posters to be able to inject. So, any text like:
[code]
<p>This is a paragraph</p>
[/code]
gets translated into:
[code]
&lt;p&gt;This is a paragraph&lt;/p&gt;
[code]

This is a "good thing" and we don't want to get rid of it. To keep up with our current topic, h (or html_escape) doesn't touch any newlines in our post, though it does translate '<br />' or '<br>' into '&lt;br /&gt;' or '&lt;br&gt;'

Next up is sanitize(). sanitize() is designed to remove any html tags or attributes in those tags that the you don't want. You can explicitly allow some if you want, but by default is removes them all. This seems a little redundant as html_escape() is already covering this, but there might be some hacker-ish way of getting around the escaping being done and sanitize() should catch that and remove the "bad" tags.

After that, we pass the escaped and sanitized text to bbcodeizer to translate any bbcode tags into appropriate html tags to markup the post the way the poster wanted. We assume the html_escape() and sanitize() have "secured" our text and now we want bbcodizer to mark it up.

Finally, simple_format is called on the resulting text of the post to inject '<br />' and '<p>...</p>' elements in the "right" places. This makes it easy for us to hook into those elements with CSS if we want and "pretty"s up the post for browser consumption.

The problem comes with certain HTML elements that do need "pretty"fying and other's that don't. For example, the bbcode [code] tag gets translated into '<pre>...</pre>' elements. Those tag names are short for "pre-formated" and are designed to allow an author who has already formated his text with appropriate whitespace and newlines to ignore the browser's desire to "squash" all the whitespace. This is a "good thing" for preformated text, but simple_format doesn't distinguish between <pre> tags and non <pre> tags. Therefore it just translates all newlines, inside or outside of <pre> elements, from "\r\n" to "\r\n<br />", which is (usually) the opposite of what the author intended when he used the <pre> element in the first place. Also, simple format will translate two newlines ("\r\n\r\n") into <p> (paragraph) elements. Again, this is a "good thing" when outside of a <pre> element, but not so much when the text is already "pre-formatted".

I thought strongly about modifying the simple_format function (overriding it) to check for those tags that shouldn't have <br> elements injected and ignoring them, but I felt like that was the wrong thing to do for many reasons. One, I don't dare presume to change what the rails developers have designed! Also, those tags I was having problems with are being generated in the bbcodeizer function in the first place. The problem was being introduced there so I felt like that's where it should be fixed.

I realized that well formed HTML concerning preformated code or lists or whatever shouldn't need any newlines at all unless explicitly desired. What I mean is, inside the bbcodeizer function I could "squash" all the newlines that weren't needed, making simple_format()'s job easier. If we have the following bbcode marked-up text:
[code]
[ code]\r\n
This is some pre-formatted text\r\n
I want it to look just like this\r\n
[ /code]
[code]
I could squash it down before it gets passed to simple_format(), like this:
[code]
[ code]This is some pre-formatted text\r\n
I want it to look just like this[ /code]
[/code]
thereby eliminating unnecessary newlines before simple_format() gets a hold of the post.

I hope all that (lengthy discussion!) makes sense.


Zim wrote:

Invader’s blood marches through my veins like giant, radioactive rubber pants! The pants command me! Do not ignore my veins!

Offline