Skip to content
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
9a9dd8c
Create doc folder
Mar 24, 2021
e1ced82
commit
Kevin487 Mar 30, 2021
44bf00e
stash push
Kevin487 Mar 31, 2021
1f50189
texting
Kevin487 Mar 31, 2021
8951713
Merge branch 'master' into issue#59
Kevin487 Mar 31, 2021
116ea32
explanation of the summary analyzer and the topic modeling analyzer
TheShiny1 Mar 31, 2021
e648d42
Merge branch 'issue#59' of github.com:Allegheny-Ethical-CS/GatorMiner…
TheShiny1 Mar 31, 2021
b27a702
kevin's text
Kevin487 Mar 31, 2021
7419a4c
Merge branch 'issue#59' of github.com:Allegheny-Ethical-CS/GatorMiner…
Kevin487 Mar 31, 2021
c9273ea
kevin's text
Kevin487 Mar 31, 2021
994c5f6
Description Feature Adding
Mar 31, 2021
8a93793
commit
Kevin487 Mar 31, 2021
bf51c76
DocumentSimilarity
Batmunkh0419 Apr 6, 2021
fe85e65
Update
Batmunkh0419 Apr 6, 2021
3b6854a
Merge branch 'master' into issue#59
Mai1902 Apr 7, 2021
8bb74bd
resolves change request
Apr 7, 2021
24c2a14
update
Apr 7, 2021
d197a24
update
Apr 7, 2021
7068775
frequency analysis text
Kevin487 Apr 14, 2021
b470d0a
edits of the summary and topic modeling markdown files
TheShiny1 Apr 14, 2021
39ebce0
frequency analysis doc edit
Kevin487 Apr 14, 2021
ec8fe5b
Update displaying of different type of frequency analysis descriptions
Apr 15, 2021
a9f1de5
pull request fix
Kevin487 Apr 21, 2021
8bebc88
Resolve change requested by breaking line in markdown and fix doc string
Apr 21, 2021
59db078
Resolve change requested by breaking line in markdown and fix doc string
Apr 21, 2021
0a5da63
Merge branch 'master' into issue#59
Mai1902 Apr 21, 2021
55b612f
pull request doc similarity fix
Kevin487 Apr 21, 2021
03e02d4
quickfix
Kevin487 Apr 21, 2021
07bfaff
Merge branch 'master' into issue#59
noorbuchi Apr 27, 2021
a64c893
Resolve change requested on Pipfile, RHistory and stremlit_web.py
Apr 27, 2021
a377899
Fixed the Document Similarity
Batmunkh0419 Apr 27, 2021
2a63408
Merge branch 'issue#59' of github.com:Allegheny-Ethical-CS/GatorMiner…
Batmunkh0419 Apr 27, 2021
cc094d7
pull request 69
Kevin487 Apr 27, 2021
ce5161d
Fixing markdown style
Apr 28, 2021
35c65d2
Merge branch 'master' into issue#59
corlettim Apr 29, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .Rhistory
Empty file.
28 changes: 6 additions & 22 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions docs/document-similarity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Document Similarity is a method to determine the similarity between certain sentences,
paragraphs, and documents.
It is widely used in identifying plagiarisms.
It means that the program converts a document into a collection of strings and vectors.
Similar words or sentences will be marked as a vector and compared to the other document.
With this, we can calculate the angle between the document vectors.
The vector angle should be from 0-90 which demonstrates how similar they are.
If the vector angle is close to 0, it is really similar.
If the vector angle is close to 90, documents are completely different.
For user convenience, we can assign different colors to each end, 0-90.
We can identify the similarity from the color shades.
5 changes: 5 additions & 0 deletions docs/frequency-analysis/frequency-analysis-overall.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
The "overall" type of frequency analysis will output a list of words,
for each assignment, most frequently used in a hierarchical descending fashion.
The most used words will be at the top of the list and the least used words will be at the bottom of the list.
The amount of times each word was used will be labeled on the x-axis.
An user can use overall frequency analysis to see what each assignment is about and if a majority of people are understanding the point of an assignment.
4 changes: 4 additions & 0 deletions docs/frequency-analysis/frequency-analysis-question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
The "question" type of frequency analysis will allows an user of GatorMinor to select questions in an assignment.
An user can select one questions or as many questions as they want.
The output will display the most used words, for an answer, to a question that is selected.
It will display the information in the same fashion as other types of frequency analysis and this can be useful when a GatorMinor user wants to see if submissions are understanding a certain questions or not.
6 changes: 6 additions & 0 deletions docs/frequency-analysis/frequency-analysis-student.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
The "student" type of frequency analysis is very similar to the overall type of frequency analysis.
It will display the same information in the same fashion.
However, an user of this type of frequency analysis will be able to select as many users as
he/she wants and compare the usage of words between each user.
This type of frequency analysis can show, a user of GatorMinor,
if an assignment is similar to another submission.
10 changes: 10 additions & 0 deletions docs/frequency-analysis/frequency-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Frequency analysis can be used to count letters, but in our case,
we are counting the frequency of meaningful words and phrases and filtering out stopwords.
It is a useful study within the field of computer science.
This is due to its function because it is being capable of outputting
the frequency of letters or groups of letters in a ciphertext.

Frequency analysis is a study that can be found within our GatorMinor.
A GatorMinor user can use frequency analysis by inputting the path_to_file
for each markdown assignment he/she would like to run studies on.
Then an user can run a a frequency analysis study on chose markdown assignments.
12 changes: 12 additions & 0 deletions docs/interactive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
This section provide an interactive platform for user, as user can enter a text,
a paragraph, or an essay in the prompter and choose which analyzer they want to evaluate their writing.

There are four provided analyzer:

1. Show token (return the text in token)

2. Show named entity (return object, entity stated in the text)

3. Show sentiment (return the degree of negativity or positivity of the text)

4. Show summary (return one short sentence summarized the entered text).
8 changes: 8 additions & 0 deletions docs/sentiment-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Sentiment analysis is the process of using text analysis to
identify and study states of emotion in regards to the input, which is subjective information.

Sentiment analysis is able to identify the amount of sentiment each user has
for what it is processing. It will give numbers based on each users sentiment.

To each user, outputs of higher numbers mean a higher sentiment and lower numbers are given
for lower sentiment. Then the numbers are graphed according to each user.
6 changes: 6 additions & 0 deletions docs/summary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
The summary analyzer does exactly what its name implies,
which is to summarize the information regarding the info that was put
into a specific prompt from each person.
For instance, if there is a prompt in a file,
it will provide the user with the answer for that prompt and organize it in manner
that can be looked over for each given prompt.
11 changes: 11 additions & 0 deletions docs/topic-modelling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
The topic modeling analyzer is an analyzing system that takes keywords and implements them
into a graph that demonstrates the frequency they were used from each person.
It doesn't take the number of times those keywords were used
but rather if they were used from certain users.
If one of the keywords was used it will be put into either a histogram graph or a scatter chart.
You can also change the number of topics and the amount of words per topic.
For example, if we were to try and find the keyword "keyword" in one of the files that is scanned,
it would show both the total number of times that word is used in that file
and which person said that specific word.
In addition, you can compare the number of times a person were to use the word "keyword"
to other files that you can compare to.
Empty file added resources/.Rhistory
Empty file.
45 changes: 45 additions & 0 deletions streamlit_web.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,12 @@ def df_preprocess(df):

def frequency():
"""main function for frequency analysis"""

# Create expandable container to show the description for frequency analyzer
freq_des = md.read_file('docs/frequency-analysis/frequency-analysis.md')
with st.beta_expander("Frequency Analysis Description"):
st.write(freq_des)

freq_type = st.sidebar.selectbox(
"Type of frequency analysis", ["Overall", "Student", "Question"]
)
Expand Down Expand Up @@ -264,6 +270,10 @@ def overall_freq(freq_range):
)
)

freq_overall_des = md.read_file('docs/frequency-analysis/frequency-analysis-overall.md')
with st.beta_expander("Overall Frequency Analysis Description"):
st.write(freq_overall_des)


def student_freq(freq_range):
"""page for individual student's word frequency"""
Expand Down Expand Up @@ -306,6 +316,9 @@ def student_freq(freq_range):
)
)

freq_student_des = md.read_file('docs/frequency-analysis/frequency-analysis-student.md')
with st.beta_expander("Frequency Analysis for Student Description"):
st.write(freq_student_des)

def question_freq(freq_range):
"""page for individual question's word frequency"""
Expand Down Expand Up @@ -353,9 +366,19 @@ def question_freq(freq_range):
)
)

freq_question_des = md.read_file('docs/frequency-analysis/frequency-analysis-question.md')
with st.beta_expander("Frequency Analysis for Question Description"):
st.write(freq_question_des)


def sentiment():
"""main function for sentiment analysis"""

# Create expandable container to show description for sentiment analysis
sent_des = md.read_file('docs/sentiment-analysis.md')
with st.beta_expander("Sentiment Analysis Description"):
st.write(sent_des)

senti_df = main_df.copy(deep=True)
# Initializing the new columns with a numpy array, so the entire series is returned
senti_df[cts.POSITIVE], senti_df[cts.NEGATIVE] = az.top_polarized_word(senti_df[cts.TOKEN].values)
Expand Down Expand Up @@ -438,6 +461,12 @@ def question_senti(input_df):

def summary():
"""Display summarization"""

# Create expandable container to show description for summary feature
summ_des = md.read_file('docs/summary.md')
with st.beta_expander("Summary Description"):
st.write(summ_des)

sum_df = preprocessed_df[
preprocessed_df[assign_id].isin(assignments)
].dropna(axis=1, how="all")
Expand All @@ -450,6 +479,11 @@ def summary():

def tpmodel():
"""Display topic modeling"""
# Create expandable container to show description for summary feature
topic_des = md.read_file('docs/topic-modelling.md')
with st.beta_expander("Topic Modelling Description"):
st.write(topic_des)

topic_df = main_df.copy(deep=True)
topic_df = topic_df[topic_df[assign_id].isin(assignments)]
# st.write(topic_df)
Expand Down Expand Up @@ -550,6 +584,11 @@ def scatter_tm(lda_model, corpus, overall_topic_df):

def doc_sim():
"""Display document similarity"""
# Create expandable container to show description for document similarity analyzer
docs_des = md.read_file('docs/document-similarity.md')
with st.beta_expander("Document Similarity Description"):
st.write(docs_des)

doc_df = main_df.copy(deep=True)
doc_sim_type = st.sidebar.selectbox(
"Type of similarity analysis", ["TF-IDF", "Spacy"]
Expand Down Expand Up @@ -622,6 +661,12 @@ def spacy_sim(doc_df):

def interactive():
"""Page to allow nlp analysis from user input"""

# Create expandable container to show description for interactive feature
inter_des = md.read_file('docs/interactive.md')
with st.beta_expander("Interactive Description"):
st.write(inter_des)

input_text = st.text_area("Enter text", "Type here")
token_cb = st.checkbox("Show tokens")
ner_cb = st.checkbox("Show named entities")
Expand Down