Ok so the above post is a reference for this project but I think I might twist it down a bit. Maybe use a different niche of data to determine the mental state of the anime twitter community? Well that sounds interesting and should not be too much of a challenge.

Let's try. (Update : So it is a challenge without TWINT so dropping that idea and using a pre trained dataset)

2. Data Mining

First I need to get to Data mining.

So TWINT looks like a useful tool for this process.

twintproject/twint

Factors associated with our analysis :

What are the linguistic factors that associate with depressions?

→ SO, we need to approach this from a more psychological standpoint since getting the data is one part and deciding on what is the right data is another.

Well, first I need to get the dataset. This is gonna be a long process but aight. I will probably find a lot of help online.

TWINT :

OK so looks like TWINT is broken. I will just stop doing the hard stuff and move on with the pre constructed dataset.

I tried looking into the twitter API but the application is a drab so dropped it.

Data Preprocessing Phase

This will probably take the longest time and the most concept heavy.

Torchtext a Pytorch library should be used to make preprocessing simple and efficient.

Referring to the following article :

Sentiment Analysis - TorchText

<aside> 💡 Typical components of classical NLP: 1. Preprocessing and Tokenization → 2. Generating vocabulary of unique token and converts words to indices → 3. Loading Pretrained vectors like Glove, Word2vec, Fast text → 4. Padding text with zeroes in case of variable lengths → 5. Data loading and Batching → 6. Model Creation and Training

</aside>