Ok so the above post is a reference for this project but I think I might twist it down a bit. Maybe use a different niche of data to determine the mental state of the anime twitter community? Well that sounds interesting and should not be too much of a challenge.
Let's try. (Update : So it is a challenge without TWINT so dropping that idea and using a pre trained dataset)
First I need to get to Data mining.
So TWINT looks like a useful tool for this process.
Factors associated with our analysis :
What are the linguistic factors that associate with depressions?
→ SO, we need to approach this from a more psychological standpoint since getting the data is one part and deciding on what is the right data is another.
Well, first I need to get the dataset. This is gonna be a long process but aight. I will probably find a lot of help online.
OK so looks like TWINT is broken. I will just stop doing the hard stuff and move on with the pre constructed dataset.
I tried looking into the twitter API but the application is a drab so dropped it.
This will probably take the longest time and the most concept heavy.
Torchtext
a Pytorch
library should be used to make preprocessing simple and efficient.
Referring to the following article :
Sentiment Analysis - TorchText
<aside> 💡 Typical components of classical NLP: 1. Preprocessing and Tokenization → 2. Generating vocabulary of unique token and converts words to indices → 3. Loading Pretrained vectors like Glove, Word2vec, Fast text → 4. Padding text with zeroes in case of variable lengths → 5. Data loading and Batching → 6. Model Creation and Training
</aside>