Monday, June 13, 2011

News Forum Rich Text Sanitizing with Bleach

Bleach github
       - Is bleach already in django?
       - Example of bleach
- Make a news forum


Bleach
- Get the bleach module


(playdoh)haoqili@host-3-248:11:50:59:~/dev/playdoh/playdoh/playdoh$ ./manage.py shell
Python 2.7.1 (r271:86832, Jun  6 2011, 13:57:48)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import bleach
Traceback (most recent call last):
  File "<console>", line 1, in <module>
ImportError: No module named bleach
>>>
KeyboardInterrupt
>>>
(playdoh)haoqili@host-3-248:12:00:06:~/dev/playdoh/playdoh/playdoh$ pip install -e git://github.com/jsocol/bleach.git#egg=bleach
Obtaining bleach from git+git://github.com/jsocol/bleach.git#egg=bleach
  Cloning git://github.com/jsocol/bleach.git to /Users/haoqili/.virtualenvs/playdoh/src/bleach
  Running setup.py egg_info for package bleach
   
Downloading/unpacking html5lib (from bleach)
  Downloading html5lib-0.90.zip (99Kb): 99Kb downloaded
  Running setup.py egg_info for package html5lib
   
Installing collected packages: bleach, html5lib
  Running setup.py develop for bleach
   
    Creating /Users/haoqili/.virtualenvs/playdoh/lib/python2.7/site-packages/bleach.egg-link (link to .)
    Adding bleach 1.0.2 to easy-install.pth file
   
    Installed /Users/haoqili/.virtualenvs/playdoh/src/bleach
  Running setup.py install for html5lib
   
Successfully installed bleach html5lib
Cleaning up...
(playdoh)haoqili@host-3-248:12:00:54:~/dev/playdoh/playdoh/playdoh$ ./manage.py shellPython 2.7.1 (r271:86832, Jun  6 2011, 13:57:48)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import bleach
>>> bleach.clean('an <script>evil()</script> example')
u'an &lt;script&gt;evil()&lt;/script&gt; example'
>>> bleach.linkify('an http://example.com url')
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url'

==========
jsocol: the "u" indicates that it's a unicode string instead of a simple bytestring, it's a python thing
==========

jsocol: haoqili: there's no Bleach() class anymore, it's just "import bleach" or "from bleach import clean, linkify"









haoqili: I'm looking at __init__.py
[12:25pm] jsocol: ok
[12:25pm] haoqili: What is the significance of TLDS = """ac ad ae aero af ag ai al am an ao aq ar arpa as asia at au aw ax az
[12:25pm] haoqili: ba bb bd be bf bg bh bi biz bj bm bn bo br bs bt bv bw by bz ca cat?
[12:25pm] haoqili: i.e. what is TLDS?
[12:26pm] jsocol: TLD = Top Level Domain, it's supposed to be an exhaustive list of all the current, valid TLDs
[12:26pm] jsocol: so that, for example, "example.com" or "example.co.uk" gets linkified, but "example.txt" does not
[12:26pm] haoqili: ah
[12:26pm] haoqili: okay
[12:27pm] haoqili: I'm also trying to understand how clean's ALLOWED_TAGS get added to the pre-defined ALLOWED_TAGS
[12:28pm] haoqili: is it added or replaced?
[12:28pm] haoqili: I guess it's replaced?
[12:28pm] jsocol: replaced. if you pass in a tags= kwarg, your list supercedes the default list, so you can be more restrictive

1 comment:

  1. Thanks for sharing such informative post on web hosting keep updating.
    Snap on our tags to know more about our services.
    Dedicated Servers in India | Cheap Dedicated Hosting India | Dedicated Hosting companies India

    ReplyDelete