Murphy

Murphy is a text processing library for working with Twitter data, built on top of Dask

Murphy is broken down into several components and subcomponents:

  • Data Preprocessing: Creating scalable tools for helping researchers moving forward

    • NLP Tools: NLP cleaning tools for tokenization and Lemmatisation

    • Filters: Removing retweet strings, emojis, and other annoyances

    • Batch Processing: Applying batch based workloads to make data processing easier

  • Applying AI Models: Creating AI models built by us for various purposes

    • Sentiment Analysis: Predicting sentiments in tweets!

    • Emoji Prediction: Predicting which emojis would work best for a tweet (coming soon!)

    • And more! We’re still developing, so ideas and contributions are much appreciated!

_images/overview-diagram.png

Built on top of Dask

By building on top of Dask, we’re able to get massive performance boosts and scalability.

You also have access to individual Dask objects like Dask DataFrames and Dask Bags directly, so you can continue to experiment on your own after using our tools.