a python port of also available via `pip install twitter_text`

A port of the Ruby gem twitter-text-rb to Python.

Changes in 2.0

You can either call a new TwitterText object with the text of the tweet you want to process TwitterText('twitter-text-py is #awesome') or use any of the submodule objects directly (Autolink, Extractor, HitHighlighter or Validation), passing in the tweet text as an argument.

The library also contains a Django template filter that applies the auto_link method to the passed in text. It can also optionally apply the hit_highlight method. Example:

{% load twitterize %}

{{ obj.body|twitter_text }} <!-- just add the links -->
{{ obj.body|twitter_text:"my term" }} <!-- add the links and highlight the search term -->

You can test that the library is working correctly by running python inside the twitter_text directory.




This object modifies the text passed to it (and the parent TwitterText.text if present).


These may be overridden by kwargs on a particular method.


auto_link(self, **kwargs)

Add <a></a> tags around the usernames, lists, hashtags and URLs in the provided text. The <a> tags can be controlled with the following kwargs:

auto_link_usernames_or_lists(self, **kwargs)

Add <a></a> tags around the usernames and lists in the provided text. The <a> tags can be controlled with the following kwargs:

auto_link_hashtags(self, **kwargs)

Add <a></a> tags around the hashtags in the provided text. The <a> tags can be controlled with the following kwargs:

auto_link_urls_custom(self, **kwargs)

Add <a></a> tags around the URLs in the provided text. Any elements in kwargs (except @supress_no_follow@) will be converted to HTML attributes and place in the <a> tag. Unless kwargs contains @suppress_no_follow@ the rel="nofollow" attribute will be added.


This object does not modify the text passed to it (or the parent TwitterText.text if present).



Extracts a list of all usernames mentioned in the Tweet text. If the text contains no username mentions an empty list will be returned.

If a transform is given, then it will be called with each username.


Extracts a list of all usernames mentioned in the Tweet text along with the indices for where the mention occurred in the format:

    'username': username_string,
    'indicies': ( start_postion, end_position )

If the text contains no username mentions, an empty list will be returned.

If a transform is given, then it will be called with each username, the start index, and the end index in the text.


Extracts the first username replied to in the Tweet text. If the text does not contain a reply None will be returned.

If a transform is given then it will be called with the username replied to (if any).


Extracts a list of all URLs included in the Tweet text. If the text contains no URLs an empty list will be returned.

If a transform is given then it will be called for each URL.


Extracts a list of all URLs included in the Tweet text along with the indices in the format:

    'url': url_string,
    'indices': ( start_postion, end_position )

If the text contains no URLs an empty list will be returned.

If a transform is given then it will be called for each URL, the start index, and the end index in the text.


Extracts a list of all hashtags included in the Tweet text. If the text contains no hashtags an empty list will be returned. The list returned will not include the leading # character.

If a transform is given then it will be called for each hashtag.


Extracts a list of all hashtags included in the Tweet text along with the indices in the format:

    'hashtag': hashtag_text,
    'indices': ( start_postion, end_position )

If the text contains no hashtags an empty list will be returned. The list returned will not include the leading # character.

If a transform is given then it will be called for each hashtag.



These may be overridden by kwargs on a particular method.


hit_highlight(self, query, **kwargs)

Add <em></em> tags around occurrences of query provided in the text except for occurrences inside hashtags.

The <em></em> tags or css class can be overridden using the highlight_tag and/or highlight_class kwarg. For example:

python> HitHighlighter.hit_highlight('test hit here').hit_highlight('hit', highlight_tag = 'strong', highlight_class = 'search-term')
        =\> "test <strong class='search-term'>hit</strong> here"




Returns the length of the string as it would be displayed. This is equivilent to the length of the Unicode NFC (See: This is needed in order to consistently calculate the length of a string no matter which actual form was transmitted. For example:

U+0065 Latin Small Letter E
+ U+0301 Combining Acute Accent
= 2 bytes, 2 characters, displayed as é (1 visual glyph)

The NFC of {U+0065, U+0301} is {U+00E9}, which is a single character and a display length of 1

The string could also contain U+00E9 already, in which case the canonicalization will not change the value.


Check the text for any reason that it may not be valid as a Tweet. This is meant as a pre-validation before posting to There are several server-side reasons for Tweets to fail but this pre-validation will allow quicker feedback.

Returns false if this text is valid. Otherwise one of the following Symbols will be returned: