Comment Example =============== .. contents:: Introduction ------------ This is an example of how to write WSGI middleware with WebOb. The specific example adds a simple comment form to HTML web pages; any page served through the middleware that is HTML gets a comment form added to it, and shows any existing comments. Code ---- The finished code for this is available in `docs/comment-example-code/example.py <https://github.com/Pylons/webob/blob/master/docs/comment-example-code/example.py>`_ -- you can run that file as a script to try it out. Instantiating Middleware ------------------------ Middleware of any complexity at all is usually best created as a class with its configuration as arguments to that class. Every middleware needs an application (``app``) that it wraps. This middleware also needs a location to store the comments; we'll put them all in a single directory. .. code-block:: python import os class Commenter(object): def __init__(self, app, storage_dir): self.app = app self.storage_dir = storage_dir if not os.path.exists(storage_dir): os.makedirs(storage_dir) When you use this middleware, you'll use it like: .. code-block:: python app = ... make the application ... app = Commenter(app, storage_dir='./comments') For our application we'll use a simple static file server that is included with `Paste <http://pythonpaste.org>`_ (use ``easy_install Paste`` to install this). The setup is all at the bottom of ``example.py``, and looks like this: .. code-block:: python if __name__ == '__main__': import optparse parser = optparse.OptionParser( usage='%prog --port=PORT BASE_DIRECTORY' ) parser.add_option( '-p', '--port', default='8080', dest='port', type='int', help='Port to serve on (default 8080)') parser.add_option( '--comment-data', default='./comments', dest='comment_data', help='Place to put comment data into (default ./comments/)') options, args = parser.parse_args() if not args: parser.error('You must give a BASE_DIRECTORY') base_dir = args[0] from paste.urlparser import StaticURLParser app = StaticURLParser(base_dir) app = Commenter(app, options.comment_data) from wsgiref.simple_server import make_server httpd = make_server('localhost', options.port, app) print 'Serving on http://localhost:%s' % options.port try: httpd.serve_forever() except KeyboardInterrupt: print '^C' I won't explain it here, but basically it takes some options, creates an application that serves static files (``StaticURLParser(base_dir)``), wraps it with ``Commenter(app, options.comment_data)`` then serves that. The Middleware -------------- While we've created the class structure for the middleware, it doesn't actually do anything. Here's a kind of minimal version of the middleware (using WebOb): .. code-block:: python from webob import Request class Commenter(object): def __init__(self, app, storage_dir): self.app = app self.storage_dir = storage_dir if not os.path.exists(storage_dir): os.makedirs(storage_dir) def __call__(self, environ, start_response): req = Request(environ) resp = req.get_response(self.app) return resp(environ, start_response) This doesn't modify the response it any way. You could write it like this without WebOb: .. code-block:: python class Commenter(object): ... def __call__(self, environ, start_response): return self.app(environ, start_response) But it won't be as convenient later. First, lets create a little bit of infrastructure for our middleware. We need to save and load per-url data (the comments themselves). We'll keep them in pickles, where each url has a pickle named after the url (but double-quoted, so ``http://localhost:8080/index.html`` becomes ``http%3A%2F%2Flocalhost%3A8080%2Findex.html``). .. code-block:: python from cPickle import load, dump class Commenter(object): ... def get_data(self, url): filename = self.url_filename(url) if not os.path.exists(filename): return [] else: f = open(filename, 'rb') data = load(f) f.close() return data def save_data(self, url, data): filename = self.url_filename(url) f = open(filename, 'wb') dump(data, f) f.close() def url_filename(self, url): # Double-quoting makes the filename safe return os.path.join(self.storage_dir, urllib.quote(url, '')) You can get the full request URL with ``req.url``, so to get the comment data with these methods you do ``data = self.get_data(req.url)``. Now we'll update the ``__call__`` method to filter *some* responses, and get the comment data for those. We don't want to change responses that were error responses (anything but ``200``), nor do we want to filter responses that aren't HTML. So we get: .. code-block:: python class Commenter(object): ... def __call__(self, environ, start_response): req = Request(environ) resp = req.get_response(self.app) if resp.content_type != 'text/html' or resp.status_code != 200: return resp(environ, start_response) data = self.get_data(req.url) ... do stuff with data, update resp ... return resp(environ, start_response) So far we're punting on actually adding the comments to the page. We also haven't defined what ``data`` will hold. Let's say it's a list of dictionaries, where each dictionary looks like ``{'name': 'John Doe', 'homepage': 'http://blog.johndoe.com', 'comments': 'Great site!'}``. We'll also need a simple method to add stuff to the page. We'll use a regular expression to find the end of the page and put text in: .. code-block:: python import re class Commenter(object): ... _end_body_re = re.compile(r'</body.*?>', re.I|re.S) def add_to_end(self, html, extra_html): """ Adds extra_html to the end of the html page (before </body>) """ match = self._end_body_re.search(html) if not match: return html + extra_html else: return html[:match.start()] + extra_html + html[match.start():] And then we'll use it like: .. code-block:: python data = self.get_data(req.url) body = resp.body body = self.add_to_end(body, self.format_comments(data)) resp.body = body return resp(environ, start_response) We get the body, update it, and put it back in the response. This also updates ``Content-Length``. Then we define: .. code-block:: python from webob import html_escape class Commenter(object): ... def format_comments(self, comments): if not comments: return '' text = [] text.append('<hr>') text.append('<h2><a name="comment-area"></a>Comments (%s):</h2>' % len(comments)) for comment in comments: text.append('<h3><a href="%s">%s</a> at %s:</h3>' % ( html_escape(comment['homepage']), html_escape(comment['name']), time.strftime('%c', comment['time']))) # Susceptible to XSS attacks!: text.append(comment['comments']) return ''.join(text) We put in a header (with an anchor we'll use later), and a section for each comment. Note that ``html_escape`` is the same as ``cgi.escape`` and just turns ``&`` into ``&``, etc. Because we put in some text without quoting it is susceptible to a `Cross-Site Scripting <http://en.wikipedia.org/wiki/Cross-site_scripting>`_ attack. Fixing that is beyond the scope of this tutorial; you could quote it or clean it with something like `lxml.html.clean <http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html>`_. Accepting Comments ------------------ All of those pieces *display* comments, but still no one can actually make comments. To handle this we'll take a little piece of the URL space for our own, everything under ``/.comments``, so when someone POSTs there it will add a comment. When the request comes in there are two parts to the path: ``SCRIPT_NAME`` and ``PATH_INFO``. Everything in ``SCRIPT_NAME`` has already been parsed, and everything in ``PATH_INFO`` has yet to be parsed. That means that the URL *without* ``PATH_INFO`` is the path to the middleware; we can intercept anything else below ``SCRIPT_NAME`` but nothing above it. The name for the URL without ``PATH_INFO`` is ``req.application_url``. We have to capture it early to make sure it doesn't change (since the WSGI application we are wrapping may update ``SCRIPT_NAME`` and ``PATH_INFO``). So here's what this all looks like: .. code-block:: python class Commenter(object): ... def __call__(self, environ, start_response): req = Request(environ) if req.path_info_peek() == '.comments': return self.process_comment(req)(environ, start_response) # This is the base path of *this* middleware: base_url = req.application_url resp = req.get_response(self.app) if resp.content_type != 'text/html' or resp.status_code != 200: # Not an HTML response, we don't want to # do anything to it return resp(environ, start_response) # Make sure the content isn't gzipped: resp.decode_content() comments = self.get_data(req.url) body = resp.body body = self.add_to_end(body, self.format_comments(comments)) body = self.add_to_end(body, self.submit_form(base_url, req)) resp.body = body return resp(environ, start_response) ``base_url`` is the path where the middleware is located (if you run the example server, it will be ``http://localhost:PORT/``). We use ``req.path_info_peek()`` to look at the next segment of the URL -- what comes after base_url. If it is ``.comments`` then we handle it internally and don't pass the request on. We also put in a little guard, ``resp.decode_content()`` in case the application returns a gzipped response. Then we get the data, add the comments, add the *form* to make new comments, and return the result. submit_form ~~~~~~~~~~~ Here's what the form looks like: .. code-block:: python class Commenter(object): ... def submit_form(self, base_path, req): return '''<h2>Leave a comment:</h2> <form action="%s/.comments" method="POST"> <input type="hidden" name="url" value="%s"> <table width="100%%"> <tr><td>Name:</td> <td><input type="text" name="name" style="width: 100%%"></td></tr> <tr><td>URL:</td> <td><input type="text" name="homepage" style="width: 100%%"></td></tr> </table> Comments:<br> <textarea name="comments" rows=10 style="width: 100%%"></textarea><br> <input type="submit" value="Submit comment"> </form> ''' % (base_path, html_escape(req.url)) Nothing too exciting. It submits a form with the keys ``url`` (the URL being commented on), ``name``, ``homepage``, and ``comments``. process_comment ~~~~~~~~~~~~~~~ If you look at the method call, what we do is call the method then treat the result as a WSGI application: .. code-block:: python return self.process_comment(req)(environ, start_response) You could write this as: .. code-block:: python response = self.process_comment(req) return response(environ, start_response) A common pattern in WSGI middleware that *doesn't* use WebOb is to just do: .. code-block:: python return self.process_comment(environ, start_response) But the WebOb style makes it easier to modify the response if you want to; modifying a traditional WSGI response/application output requires changing your logic flow considerably. Here's the actual processing code: .. code-block:: python from webob import exc from webob import Response class Commenter(object): ... def process_comment(self, req): try: url = req.params['url'] name = req.params['name'] homepage = req.params['homepage'] comments = req.params['comments'] except KeyError, e: resp = exc.HTTPBadRequest('Missing parameter: %s' % e) return resp data = self.get_data(url) data.append(dict( name=name, homepage=homepage, comments=comments, time=time.gmtime())) self.save_data(url, data) resp = exc.HTTPSeeOther(location=url+'#comment-area') return resp We either give a Bad Request response (if the form submission is somehow malformed), or a redirect back to the original page. The classes in ``webob.exc`` (like ``HTTPBadRequest`` and ``HTTPSeeOther``) are Response subclasses that can be used to quickly create responses for these non-200 cases where the response body usually doesn't matter much. Conclusion ---------- This shows how to make response modifying middleware, which is probably the most difficult kind of middleware to write with WSGI -- modifying the request is quite simple in comparison, as you simply update ``environ``.