# better_profanity
*Blazingly fast cleaning swear words (and their leetspeak) in strings*
[](https://github.com/snguyenthanh/better_profanity/releases/latest)
[](https://travis-ci.com/snguyenthanh/better_profanity)

[](https://github.com/snguyenthanh/better_profanity/blob/master/LICENSE)
Inspired from package [profanity](https://github.com/ben174/profanity) of [Ben Friedland](https://github.com/ben174), this library is significantly faster than the original one, by using string comparison instead of regex.
It supports [modified spellings](https://en.wikipedia.org/wiki/Leet) (such as `p0rn`, `h4NDjob`, `handj0b` and `b*tCh`).
## Requirements
This package works with `Python 3.4+` and `PyPy3`.
## Installation
```
$ pip install better_profanity
```
## Unicode characters
Only Unicode characters from categories `Ll`, `Lu`, `Mc` and `Mn` are added. More on Unicode categories can be found [here][unicode category link].
[unicode category link]: https://en.wikipedia.org/wiki/Template:General_Category_(Unicode)
Not all languages are supported yet, such as *Chinese*.
## Usage
```
from better_profanity import profanity
if __name__ == "__main__":
profanity.load_censor_words()
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text)
print(censored_text)
# You **** of ****.
```
All modified spellings of words in [profanity_wordlist.txt](./better_profanity/profanity_wordlist.txt) will be generated. For example, the word `handjob` would be loaded into:
```
'handjob', 'handj*b', 'handj0b', 'handj@b', 'h@ndjob', 'h@ndj*b', 'h@ndj0b', 'h@ndj@b',
'h*ndjob', 'h*ndj*b', 'h*ndj0b', 'h*ndj@b', 'h4ndjob', 'h4ndj*b', 'h4ndj0b', 'h4ndj@b'
```
The full mapping of the library can be found in [profanity.py](./better_profanity/better_profanity.py#L18-L26).
### 1. Censor swear words from a text
By default, `profanity` replaces each swear words with 4 asterisks `****`.
```
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text)
print(censored_text)
# You **** of ****.
```
### 2. Censor doesn't care about word dividers
The function `.censor()` also hide words separated not just by an empty space ` ` but also other dividers, such as `_`, `,` and `.`. Except for `@, $, *, ", '`.
```
from better_profanity import profanity
if __name__ == "__main__":
text = "...sh1t...hello_cat_fuck,,,,123"
censored_text = profanity.censor(text)
print(censored_text)
# "...****...hello_cat_****,,,,123"
```
### 3. Censor swear words with custom character
4 instances of the character in second parameter in `.censor()` will be used to replace the swear words.
```
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text, '-')
print(censored_text)
# You ---- of ----.
```
### 4. Check if the string contains any swear words
Function `.contains_profanity()` return `True` if any words in the given string has a word existing in the wordlist.
```
from better_profanity import profanity
if __name__ == "__main__":
dirty_text = "That l3sbi4n did a very good H4ndjob."
profanity.contains_profanity(dirty_text)
# True
```
### 5. Censor swear words with a custom wordlist
#### 5.1. Wordlist as a `List`
Function `load_censor_words` takes a `List` of strings as censored words.
The provided list will replace the default wordlist.
```
from better_profanity import profanity
if __name__ == "__main__":
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords)
print(profanity.contains_profanity("Have a merry day! :)"))
# Have a **** day! :)
```
#### 5.2. Wordlist as a file
Function `load_censor_words_from_file takes a filename, which is a text file and each word is separated by lines.
```
from better_profanity import profanity
if __name__ == "__main__":
profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt')
```
### 6. Whitelist
Function `load_censor_words` and `load_censor_words_from_file` takes a keyword argument `whitelist_words` to ignore words in a wordlist.
It is best used when there are only a few words that you would like to ignore in the wordlist.
```
# Use the default wordlist
profanity.load_censor_words(whitelist_words=['happy', 'merry'])
# or with your custom words as a List
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords, whitelist_words=['merry'])
# or with your custom words as a text file
profanity.load_censor_words_from_file('/path/to/my/project/my_wordlist.txt', whitelist_words=['merry'])
```
### 7. Add more censor words
```
from better_profanity import profanity
if __name__ == "__main__":
custom_badwords = ['happy', 'jolly', 'merry']
profanity.add_censor_words(custom_badwords)
print(profanity.contains_profanity("Happy you, fuck!"))
# **** you, ****!
```
## Limitations
1. As the library compares each word by characters, the censor could easily be bypassed by adding any character(s) to the word:
```
profanity.censor('I just have sexx')
# returns 'I just have sexx'
profanity.censor('jerkk off')
# returns 'jerkk off'
```
2. Any word in [wordlist](https://github.com/snguyenthanh/better_profanity/blob/master/better_profanity/profanity_wordlist.txt) that have non-space separators cannot be recognised, such as `s & m`, and therefore, it won't be filtered out. This problem was raised in [#5](https://github.com/snguyenthanh/better_profanity/issues/5).
## Testing
```
$ python tests.py
```
## Contributing
Please read [CONTRIBUTING.md](./CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details
## Special thanks to
- [Andrew Grinevich](https://github.com/Derfirm) - Add support for Unicode characters.
- [Jaclyn Brockschmidt](https://github.com/jcbrockschmidt) - Optimize string comparison.
## Acknowledgments
- [Ben Friedland](https://github.com/ben174) - For the inspiring package [profanity](https://github.com/ben174/profanity).