GatorEducator · Mai1902 · Mar 24, 2021 · Mar 30, 2021 · Mar 31, 2021 · Mar 31, 2021
diff --git a/.Rhistory b/.Rhistory
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/docs/document-similarity.md b/docs/document-similarity.md
@@ -0,0 +1,11 @@
+Document Similarity is a method to determine the similarity between certain sentences,
+paragraphs, and documents.
+It is widely used in identifying plagiarisms.
+It means that the program converts a document into a collection of strings and vectors.
+Similar words or sentences will be marked as a vector and compared to the other document.
+With this, we can calculate the angle between the document vectors.
+The vector angle should be from 0-90 which demonstrates how similar they are.
+If the vector angle is close to 0, it is really similar.
+If the vector angle is close to 90, documents are completely different.
+For user convenience, we can assign different colors to each end, 0-90.
+We can identify the similarity from the color shades.
diff --git a/docs/frequency-analysis/frequency-analysis-overall.md b/docs/frequency-analysis/frequency-analysis-overall.md
@@ -0,0 +1,5 @@
+The "overall" type of frequency analysis will output a list of words,
+for each assignment, most frequently used in a hierarchical descending fashion.
+The most used words will be at the top of the list and the least used words will be at the bottom of the list.
+The amount of times each word was used will be labeled on the x-axis.
+An user can use overall frequency analysis to see what each assignment is about and if a majority of people are understanding the point of an assignment.
diff --git a/docs/frequency-analysis/frequency-analysis-question.md b/docs/frequency-analysis/frequency-analysis-question.md
@@ -0,0 +1,4 @@
+The "question" type of frequency analysis will allows an user of GatorMinor to select questions in an assignment. 
+An user can select one questions or as many questions as they want.
+The output will display the most used words, for an answer, to a question that is selected.
+It will display the information in the same fashion as other types of frequency analysis and this can be useful when a GatorMinor user wants to see if submissions are understanding a certain questions or not.
diff --git a/docs/frequency-analysis/frequency-analysis-student.md b/docs/frequency-analysis/frequency-analysis-student.md
@@ -0,0 +1,6 @@
+The "student" type of frequency analysis is very similar to the overall type of frequency analysis.
+It will display the same information in the same fashion.
+However, an user of this type of frequency analysis will be able to select as many users as
+he/she wants and compare the usage of words between each user.
+This type of frequency analysis can show, a user of GatorMinor,
+if an assignment is similar to another submission.
diff --git a/docs/frequency-analysis/frequency-analysis.md b/docs/frequency-analysis/frequency-analysis.md
@@ -0,0 +1,10 @@
+Frequency analysis can be used to count letters, but in our case,
+we are counting the frequency of meaningful words and phrases and filtering out stopwords.
+It is a useful study within the field of computer science.
+This is due to its function because it is being capable of outputting
+the frequency of letters or groups of letters in a ciphertext.
+
+Frequency analysis is a study that can be found within our GatorMinor.
+A GatorMinor user can use frequency analysis by inputting the path_to_file
+for each markdown assignment he/she would like to run studies on.
+Then an user can run a a frequency analysis study on chose markdown assignments.
diff --git a/docs/interactive.md b/docs/interactive.md
@@ -0,0 +1,12 @@
+This section provide an interactive platform for user, as user can enter a text,
+a paragraph, or an essay in the prompter and choose which analyzer they want to evaluate their writing.
+
+There are four provided analyzer:
+
+1. Show token (return the text in token)
+
+2. Show named entity (return object, entity stated in the text)
+
+3. Show sentiment (return the degree of negativity or positivity of the text)
+
+4. Show summary (return one short sentence summarized the entered text).
diff --git a/docs/sentiment-analysis.md b/docs/sentiment-analysis.md
@@ -0,0 +1,8 @@
+Sentiment analysis is the process of using text analysis to
+identify and study states of emotion in regards to the input, which is subjective information.
+
+Sentiment analysis is able to identify the amount of sentiment each user has
+for what it is processing. It will give numbers based on each users sentiment.
+
+To each user, outputs of higher numbers mean a higher sentiment and lower numbers are given
+for lower sentiment. Then the numbers are graphed according to each user.
diff --git a/docs/summary.md b/docs/summary.md
@@ -0,0 +1,6 @@
+The summary analyzer does exactly what its name implies,
+which is to summarize the information regarding the info that was put
+into a specific prompt from each person.
+For instance, if there is a prompt in a file,
+it will provide the user with the answer for that prompt and organize it in manner
+that can be looked over for each given prompt.
diff --git a/docs/topic-modelling.md b/docs/topic-modelling.md
@@ -0,0 +1,11 @@
+The topic modeling analyzer is an analyzing system that takes keywords and implements them
+into a graph that demonstrates the frequency they were used from each person.
+It doesn't take the number of times those keywords were used
+but rather if they were used from certain users.
+If one of the keywords was used it will be put into either a histogram graph or a scatter chart.
+You can also change the number of topics and the amount of words per topic.
+For example, if we were to try and find the keyword "keyword" in one of the files that is scanned,
+it would show both the total number of times that word is used in that file
+and which person said that specific word.
+In addition, you can compare the number of times a person were to use the word "keyword"
+to other files that you can compare to.
diff --git a/resources/.Rhistory b/resources/.Rhistory
diff --git a/streamlit_web.py b/streamlit_web.py
@@ -209,6 +209,12 @@ def df_preprocess(df):
 
 def frequency():
     """main function for frequency analysis"""
+
+    # Create expandable container to show the description for frequency analyzer
+    freq_des = md.read_file('docs/frequency-analysis/frequency-analysis.md')
+    with st.beta_expander("Frequency Analysis Description"):
+        st.write(freq_des)
+
     freq_type = st.sidebar.selectbox(
         "Type of frequency analysis", ["Overall", "Student", "Question"]
     )
@@ -264,6 +270,10 @@ def overall_freq(freq_range):
         )
     )
 
+    freq_overall_des = md.read_file('docs/frequency-analysis/frequency-analysis-overall.md')
+    with st.beta_expander("Overall Frequency Analysis Description"):
+        st.write(freq_overall_des)
+
 
 def student_freq(freq_range):
     """page for individual student's word frequency"""
@@ -306,6 +316,9 @@ def student_freq(freq_range):
             )
         )
 
+        freq_student_des = md.read_file('docs/frequency-analysis/frequency-analysis-student.md')
+        with st.beta_expander("Frequency Analysis for Student Description"):
+            st.write(freq_student_des)
 
 def question_freq(freq_range):
     """page for individual question's word frequency"""
@@ -353,9 +366,19 @@ def question_freq(freq_range):
             )
         )
 
+        freq_question_des = md.read_file('docs/frequency-analysis/frequency-analysis-question.md')
+        with st.beta_expander("Frequency Analysis for Question Description"):
+            st.write(freq_question_des)
+
 
 def sentiment():
     """main function for sentiment analysis"""
+
+    # Create expandable container to show description for sentiment analysis
+    sent_des = md.read_file('docs/sentiment-analysis.md')
+    with st.beta_expander("Sentiment Analysis Description"):
+        st.write(sent_des)
+
     senti_df = main_df.copy(deep=True)
     # Initializing the new columns with a numpy array, so the entire series is returned
     senti_df[cts.POSITIVE], senti_df[cts.NEGATIVE] = az.top_polarized_word(senti_df[cts.TOKEN].values)
@@ -438,6 +461,12 @@ def question_senti(input_df):
 
 def summary():
     """Display summarization"""
+
+    # Create expandable container to show description for summary feature
+    summ_des = md.read_file('docs/summary.md')
+    with st.beta_expander("Summary Description"):
+        st.write(summ_des)
+
     sum_df = preprocessed_df[
         preprocessed_df[assign_id].isin(assignments)
     ].dropna(axis=1, how="all")
@@ -450,6 +479,11 @@ def summary():
 
 def tpmodel():
     """Display topic modeling"""
+    # Create expandable container to show description for summary feature
+    topic_des = md.read_file('docs/topic-modelling.md')
+    with st.beta_expander("Topic Modelling Description"):
+        st.write(topic_des)
+
     topic_df = main_df.copy(deep=True)
     topic_df = topic_df[topic_df[assign_id].isin(assignments)]
     # st.write(topic_df)
@@ -550,6 +584,11 @@ def scatter_tm(lda_model, corpus, overall_topic_df):
 
 def doc_sim():
     """Display document similarity"""
+    # Create expandable container to show description for document similarity analyzer
+    docs_des = md.read_file('docs/document-similarity.md')
+    with st.beta_expander("Document Similarity Description"):
+        st.write(docs_des)
+
     doc_df = main_df.copy(deep=True)
     doc_sim_type = st.sidebar.selectbox(
         "Type of similarity analysis", ["TF-IDF", "Spacy"]
@@ -622,6 +661,12 @@ def spacy_sim(doc_df):
 
 def interactive():
     """Page to allow nlp analysis from user input"""
+
+    # Create expandable container to show description for interactive feature
+    inter_des = md.read_file('docs/interactive.md')
+    with st.beta_expander("Interactive Description"):
+        st.write(inter_des)
+
     input_text = st.text_area("Enter text", "Type here")
     token_cb = st.checkbox("Show tokens")
     ner_cb = st.checkbox("Show named entities")