文本文件  |  151行  |  5.53 KB

Using Markdown as Python Library
================================

First and foremost, Python-Markdown is intended to be a python library module
used by various projects to convert Markdown syntax into HTML.

The Basics
----------

To use markdown as a module:

    import markdown
    html = markdown.markdown(your_text_string)

Encoded Text
------------

Note that ``markdown()`` expects **Unicode** as input (although a simple ASCII 
string should work) and returns output as Unicode.  Do not pass encoded strings to it!
If your input is encoded, e.g. as UTF-8, it is your responsibility to decode 
it.  E.g.:

    input_file = codecs.open("some_file.txt", mode="r", encoding="utf-8")
    text = input_file.read()
    html = markdown.markdown(text, extensions)

If you later want to write it to disk, you should encode it yourself:

    output_file = codecs.open("some_file.html", "w", encoding="utf-8")
    output_file.write(html)

More Options
------------

If you want to pass more options, you can create an instance of the ``Markdown``
class yourself and then use ``convert()`` to generate HTML:

    import markdown
    md = markdown.Markdown(
            extensions=['footnotes'], 
            extension_configs= {'footnotes' : ('PLACE_MARKER','~~~~~~~~')},
            safe_mode=True,
            output_format='html4'
    )
    return md.convert(some_text)

You should also use this method if you want to process multiple strings:

    md = markdown.Markdown()
    html1 = md.convert(text1)
    html2 = md.convert(text2)

Working with Files
------------------

While the Markdown class is only intended to work with Unicode text, some
encoding/decoding is required for the command line features. These functions 
and methods are only intended to fit the common use case.

The ``Markdown`` class has the method ``convertFile`` which reads in a file and
writes out to a file-like-object:

    md = markdown.Markdown()
    md.convertFile(input="in.txt", output="out.html", encoding="utf-8")

The markdown module also includes a shortcut function ``markdownFromFile`` that
wraps the above method.

    markdown.markdownFromFile(input="in.txt", 
                              output="out.html", 
                              extensions=[],
                              encoding="utf-8",
                              safe=False)

In either case, if the ``output`` keyword is passed a file name (i.e.: 
``output="out.html"``), it will try to write to a file by that name. If
``output`` is passed a file-like-object (i.e. ``output=StringIO.StringIO()``),
it will attempt to write out to that object. Finally, if ``output`` is 
set to ``None``, it will write to ``stdout``.

Using Extensions
----------------

One of the parameters that you can pass is a list of Extensions. Extensions 
must be available as python modules either within the ``markdown.extensions``
package or on your PYTHONPATH with names starting with `mdx_`, followed by the 
name of the extension.  Thus, ``extensions=['footnotes']`` will first look for 
the module ``markdown.extensions.footnotes``, then a module named 
``mdx_footnotes``.   See the documentation specific to the extension you are 
using for help in specifying configuration settings for that extension.

Note that some extensions may need their state reset between each call to 
``convert``:

    html1 = md.convert(text1)
    md.reset()
    html2 = md.convert(text2)

Safe Mode
---------

If you are using Markdown on a web system which will transform text provided 
by untrusted users, you may want to use the "safe_mode" option which ensures 
that the user's HTML tags are either replaced, removed or escaped. (They can 
still create links using Markdown syntax.)

* To replace HTML, set ``safe_mode="replace"`` (``safe_mode=True`` still works 
    for backward compatibility with older versions). The HTML will be replaced 
    with the text defined in ``markdown.HTML_REMOVED_TEXT`` which defaults to 
    ``[HTML_REMOVED]``. To replace the HTML with something else:

        markdown.HTML_REMOVED_TEXT = "--RAW HTML IS NOT ALLOWED--"
        md = markdown.Markdown(safe_mode="replace")

    **Note**: You could edit the value of ``HTML_REMOVED_TEXT`` directly in 
    markdown/__init__.py but you will need to remember to do so every time you 
    upgrade to a newer version of Markdown. Therefore, this is not recommended.

* To remove HTML, set ``safe_mode="remove"``. Any raw HTML will be completely 
    stripped from the text with no warning to the author.

* To escape HTML, set ``safe_mode="escape"``. The HTML will be escaped and 
    included in the document.

Output Formats
--------------

If Markdown is outputing (X)HTML as part of a web page, most likely you will
want the output to match the (X)HTML version used by the rest of your page/site.
Currently, Markdown offers two output formats out of the box; "HTML4" and 
"XHTML1" (the default) . Markdown will also accept the formats "HTML" and 
"XHTML" which currently map to "HTML4" and "XHTML" respectively. However, 
you should use the more explicit keys as the general keys may change in the 
future if it makes sense at that time. The keys can either be lowercase or 
uppercase.

To set the output format do:

    html = markdown.markdown(text, output_format='html4')

Or, when using the Markdown class:

    md = markdown.Markdown(output_format='html4')
    html = md.convert(text)

Note that the output format is only set once for the class and cannot be 
specified each time ``convert()`` is called. If you really must change the
output format for the class, you can use the ``set_output_format`` method:

    md.set_output_format('xhtml1')