From 2a88ff8bf78b7234471fc5a345f06d40ab1dae97 Mon Sep 17 00:00:00 2001 From: Hykilpikonna Date: Fri, 5 Nov 2021 00:42:22 -0400 Subject: [PATCH] [+] Add some ambitious goals --- proposal/project_proposal.tex | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposal/project_proposal.tex b/proposal/project_proposal.tex index 1e2bc9b..f6b8217 100644 --- a/proposal/project_proposal.tex +++ b/proposal/project_proposal.tex @@ -54,6 +54,8 @@ sorting=nyt We plan to use \textbf{matplotlib} to create data images or \textbf{plotly} to create websites for data visualization. We plan to use \textbf{NumPy} for statistical calculations. + To identify whether or not some article is about COVID, we currently use a keyword search. However, a keyword search might not be accurate when COVID has became such an essential background to our society (i.e. many articles with the word COVID in them are about something else). We might experiment with training a binary classification model with \textbf{Keras} and \textbf{scikit-learn} to better classify COVID articles. We might also experiment with training autoencoders with vectorized word occurence data in an COVID-related article to find if there are significant categories within COVID articles (i.e. some COVID articles might be about new COVID policies, and others might just be general updates relating to COVID, and this might be an important insight because people's interests in these different types of COVID articles might differ). + The primary type of graph we will use will be a frequency histogram——an individual or a group of data’s frequency of mentioning COVID-related topics will be graphed against the date from January 1, 2020, to Nov 1, 2021. We will experiment with group sizes and classification methods to find which variables influence the frequency and which don’t. (For example, we will group individuals by popularity and compare between groups to find if popularity impacts the frequency they mention COVID-related topics). We also plan to overlay these charts in comparison to visualize the statistical differences better. Another variant of the frequency histogram will be plotted not against the date but against the country’s confirmed cases since people’s emotions of anxiety might be influenced by the growing or decreasing of confirmed cases. We will also graph some data using this variant to find more insights.