Twitter Sentiment Analysis Project

Readme

Hate Speech on Twitter: A Natural Language Processing Challenge

Want to look at the “official” presentation that goes with this project? Click here!

Overall Summary

About the Dataset:

The aim of this dataset is to determine if a set of 30,000 tweets contains hate speech relating to sexism and racism in order to create predictive models to identify such language in the future. Tweets are pre-labeled as either 0(not containing hate speech) or 1(containing hate speech).

This dataset is available online as a part of the Analytics Vidhya challenge series.

Skills Used in This Project:
Throughout the course of this project I had the chance to strengthen the following skills:

Data Wrangling: The dataset came with only three features: id, label, and tweet. It was up to me to break down the info and get it to make sense. This involved cleaning the tweets for extraneous characters and symbols, creating my own “stop word” list of common filler words that didn’t have much impact on the sentiment of the tweet, and analyzing the accuracy of the labels.
Natural Language Processing: Since this was my first time to experiment with Natural Language Processing I really dove into how keywords can (or can’t) signify semantics (meaning) in text. Using SciKit-Learn I processed the tweets into individual word-features and looked at the relationship between groups of word and meaning.
Machine Learning/Predictive Modeling: While working through this project I quickly learned that using base keywords to create labels for machine learning can be completely inaccurate! Fun! This led me to create my own labels for the tweets using a list of my own keywords(which is ever growing).

Prerequisites/Project Process

Exploratory Data Analysis (EDA)
Please see the provided presentation for a full break-down of my EDA process and (very pretty) visualizations!

Testing/Main Purpose of Project
Can we use our prelabeled data set to predict if a tweet is hate speech or not?

Lecture

Below is the slide deck for this project.

Code

Click here to see the original github for this code!

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is ^superscript text and this is _subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Alternate

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Ordered

Dolor pulvinar etiam.
Etiam vel felis viverra.
Felis enim feugiat.
Dolor pulvinar etiam.
Etiam vel felis lorem.
Felis enim et feugiat.

Icons

Actions

Table

Default

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Alternate

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Buttons

Icon
Icon

Disabled
Disabled

Readme

Hate Speech on Twitter: A Natural Language Processing Challenge

Lecture

Code

Elements

Text

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Preformatted

Lists

Unordered

Alternate

Ordered

Icons

Actions

Table

Default

Alternate

Buttons

Form