Skip to content

Cron job #898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ courses20.xml
compose-dev.yaml
rpi_data/get-summer-2023-2.sh
rpi_data/summer-20232.csv
.venv
16 changes: 13 additions & 3 deletions docker-compose.development.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ services:
- ./src/web:/app
- web_node_modules:/app/node_modules/
environment:
- YACS_API_HOST=http://yacs_api:5000
- YACS_API_HOST=http://yacs_api:4000

yacs_api:
command: /bin/bash -c "python tables/database_session.py && PYTHONPATH=. alembic upgrade head && uvicorn app:app --reload --host 0.0.0.0 --port 5000"
command: /bin/bash -c "python tables/database_session.py && PYTHONPATH=. alembic upgrade head && uvicorn app:app --reload --host 0.0.0.0 --port 4000"
ports:
- 5000:5000
- 4000:4000
volumes:
- ./src/api:/usr/src
environment:
Expand All @@ -55,3 +55,13 @@ services:
- POSTGRES_DB=yacs
- POSTGRES_USER=yacs
- POSTGRES_PASSWORD=${DB_PASS:-easy_dev_pass}

yacs_cron:
ports:
- 4321:4321
volumes:
- ./src/cron:/usr/src
environment:
- YACS_API_HOST=http://yacs_api:4000
- GECKO_PATH=/usr/local/bin/geckodriver
- API_SIGN_KEY=${API_SIGN_KEY:-secretKey}
6 changes: 6 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,9 @@ services:
container_name: yacs_db
image: postgres:12-alpine

yacs_cron:
restart: unless-stopped
container_name: yacs_cron
build:
context: ./src/cron
dockerfile: Dockerfile
4 changes: 2 additions & 2 deletions rpi_data/modules/parse_runner.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env python
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import headless_login as login
import new_parse as parser
import cron.headless_login as login
import cron.new_parse as parser
import sys
from datetime import datetime
import pytz
Expand Down
4 changes: 2 additions & 2 deletions src/api/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
RUN mkdir -p /usr/src
WORKDIR /usr/src
COPY ./requirements.txt /usr/src/
RUN apt-get update && apt-get install -y libpq-dev build-essential

Check notice on line 6 in src/api/Dockerfile

View check run for this annotation

codefactor.io / CodeFactor

src/api/Dockerfile#L6

Delete the apt-get lists after installing something. (DL3009)
RUN pip install --no-cache-dir -r requirements.txt
COPY . /usr/src/

CMD [ "sh", "scripts/start.sh" ]
CMD [ "sh", "scripts/start.sh" ]
23 changes: 23 additions & 0 deletions src/cron/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# FROM selenium/standalone-firefox:latest
FROM python:3.9-slim


RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates curl firefox-esr \
&& rm -fr /var/lib/apt/lists/* \
&& curl -L https://github.com/mozilla/geckodriver/releases/download/v0.30.0/geckodriver-v0.30.0-linux64.tar.gz | tar xz -C /usr/local/bin \
&& apt-get purge -y ca-certificates curl

RUN apt-get update && apt-get -y install cron vim

Check notice on line 12 in src/cron/Dockerfile

View check run for this annotation

codefactor.io / CodeFactor

src/cron/Dockerfile#L12

Delete the apt-get lists after installing something. (DL3009)
COPY crontab /etc/cron.d/crontab
RUN chmod 0644 /etc/cron.d/crontab
RUN touch /var/log/cron.log

RUN mkdir -p /usr/src
WORKDIR /usr/src
COPY ./requirements.txt /usr/src/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /usr/src/

CMD ["cron", "-f"]
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from selenium.webdriver.common.keys import Keys
import goldy_parse as gp

'''

Check notice on line 19 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L19

String statement has no effect (pointless-string-statement)
AUGUST 2024 Course Catalog Scraper
Uses the RPI catalog website to search for individual courses, and then scrapes all of that data. Depends a lot on consistent catalog formatting,
so if that changes it will need work.
Expand All @@ -26,7 +26,7 @@
by Giancarlo Martinelli (gcm on discord)
'''

'''

Check notice on line 29 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L29

String statement has no effect (pointless-string-statement)
Formatting Regex. Replaces unicode characters and removes unnessecary spaces caused by messy scraping (Sorry.)
'''
def un_spaceify(string: str) -> str:
Expand All @@ -46,13 +46,13 @@
def re_spaceify(string: str) -> str:
string = re.sub(r".(?=\w)", ". ", string)

'''

Check notice on line 49 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L49

String statement has no effect (pointless-string-statement)
Splits a list into a list of n lists. Useful for multiprocessing.
'''
def split(a: list[str], n: int):
parts = []
[parts.append([]) for _ in range(n)]

Check notice on line 54 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L54

Expression "[parts.append([]) for _ in range(n)]" is assigned to nothing (expression-not-assigned)
for allocating in range(len(a)):

Check notice on line 55 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L55

Consider using enumerate instead of iterating with range and len (consider-using-enumerate)
parts[allocating % n].append(a[allocating])
return parts

Expand Down Expand Up @@ -80,18 +80,18 @@
return [ele.rsplit("=", 1)[1], ele.rsplit("=", 2)[1].rsplit("&", 1)[0]]


'''

Check notice on line 83 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L83

String statement has no effect (pointless-string-statement)
Large scraping function. Goes to the search page of a single course, checks if the data exists, and then scrapes the course
'''
def scrape_single_course(prefix:str, code:str, nav: str, cat: str) -> dict:
try:

link = "https://catalog.rpi.edu/content.php?filter%5B27%5D={}&filter%5B29%5D={}&filter%5Bkeyword%5D=&filter%5B32%5D=1&filter%5Bcpage%5D=1&cur_cat_oid={}&expand=&navoid={}&search_database=Filter&filter%5Bexact_match%5D=1#acalog_template_course_filter".format(prefix, code, cat, nav)
r1 = requests.get(link)
content1 = r1.content
soup1 = bs(content1, "html.parser")
check = soup1.find("td", {"class": "block_content"})
'''

Check notice on line 94 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L94

String statement has no effect (pointless-string-statement)
Testing to see if the course exists. We need the webdriver waits so that selenium only does things when the necessary elements exist.
If they don't load in time we probably don't have a valid course.
'''
Expand All @@ -99,7 +99,12 @@
return dict()
if "No courses found" in check.get_text(strip=True) or "" == check.get_text(strip=True):
return dict()
nopop = check.find("a", {"aria-expanded": "false"}).get("href") # gets the link to the nopopup page

element = check.find("a", {"aria-expanded": "false"})

Check notice on line 103 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L103

String statement has no effect (pointless-string-statement)
nopop = element.get("href") if element else None
if nopop is None:
return dict()
# nopop = check.find("a", {"aria-expanded": "false"}).get("href") # gets the link to the nopopup page
'''
Beautiful soup for the nopopup page
'''
Expand Down Expand Up @@ -133,7 +138,7 @@
description_html = parts[1]
description_html = description_html.split("<br/>", 3)[-1]
desc_soup = bs(description_html, "html.parser") # put back into beautiful soup to remove left over tags
description = un_spaceify(desc_soup.get_text())

Check notice on line 141 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L141

String statement has no effect (pointless-string-statement)
rest = s_string.removeprefix(description_html) # get rid of all of the description stuff to scrape remaining important labels
r_soup = bs(rest, "html.parser")
rest_text = r_soup.get_text() # get all of the text from the rest
Expand All @@ -149,9 +154,9 @@
if de in l: # check if our delimiter actually exists in the thing we just split
splitted[1] = de + splitted[1] # adds the delimiter back
t.append(splitted) # stores our splitted (or not splitted) parts for later

Check notice on line 157 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L157

Consider using enumerate instead of iterating with range and len (consider-using-enumerate)
built_list = list(chain.from_iterable(t)) # black magic which collapses our multidimensional list into a single dimension list
if len(built_list) != 0: # I honestly forgot why I added this. There's probably something useless in the first position of our built list.

Check notice on line 159 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L159

String statement has no effect (pointless-string-statement)
built_list.pop(0)

for i in range(len(built_list)):
Expand Down Expand Up @@ -187,7 +192,7 @@
looking = looking.split("Corequisites")[1]
else:
looking = ""
for course in courses_mentioned:

Check notice on line 195 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L195

String statement has no effect (pointless-string-statement)
if course in looking:
coreq_list.append(course)
return coreq_list
Expand All @@ -209,7 +214,7 @@
pre_looking = looking
for course in courses_mentioned:
if course in pre_looking:
prereq_list.append(course)

Check notice on line 217 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L217

String statement has no effect (pointless-string-statement)
if "Prerequisite" not in looking and prereq_list != []:
looking = "Prerequisites/Corequisites: " + looking.strip()
return [prereq_list, looking.strip().replace("\n", "")]
Expand All @@ -228,7 +233,7 @@
looking_co = i
for course in courses_mentioned:
if course in looking_cross:
crosslist_list.add(course)

Check notice on line 236 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L236

String statement has no effect (pointless-string-statement)
if course in looking_co:
crosslist_list.add(course)
return list(crosslist_list)
Expand Down Expand Up @@ -257,7 +262,7 @@
if "spring" in result["text"].lower():
result["spring"] = True
if "summer" in result["text"].lower():
result["summer"] = True

Check notice on line 265 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L265

String statement has no effect (pointless-string-statement)
if "availability of instructor" in result["text"].lower() or "upon availability" in result["text"].lower():
result["uia"] = True
return result
Expand All @@ -271,19 +276,19 @@
jsons_path = os.path.join(parent_path, "frontend", "src", "data", "json")
folder_title = "{}-{}".format(year - 1, year)
json_checking_path = os.path.join(jsons_path, folder_title, "pathways.json")
to_check = list()

Check notice on line 279 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L279

Expression "[to_check.append(i) for i in j[pathway]['Remaining'].values()]" is assigned to nothing (expression-not-assigned)
with open(json_checking_path, 'r') as f:
j = dict(json.load(f))

Check notice on line 281 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L281

Expression "[to_check.append(i) for i in j[pathway]['Required'].values()]" is assigned to nothing (expression-not-assigned)
for pathway in j.keys():
if "Remaining" in j[pathway].keys():

Check notice on line 283 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L283

Expression "[to_check.append(i) for i in j[pathway]['One Of0'].values()]" is assigned to nothing (expression-not-assigned)
[to_check.append(i) for i in j[pathway]["Remaining"].values()]
if "Required" in j[pathway].keys():

Check notice on line 285 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L285

Expression "[to_check.append(i) for i in j[pathway]['One Of1'].values()]" is assigned to nothing (expression-not-assigned)
[to_check.append(i) for i in j[pathway]["Required"].values()]
if "One Of0" in j[pathway].keys():

Check notice on line 287 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L287

Expression "[to_check.append(i) for i in j[pathway]['One Of2'].values()]" is assigned to nothing (expression-not-assigned)
[to_check.append(i) for i in j[pathway]["One Of0"].values()]
if "One Of1" in j[pathway].keys():
[to_check.append(i) for i in j[pathway]["One Of1"].values()]
if "One Of2" in j[pathway].keys():

Check notice on line 291 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L291

String statement has no effect (pointless-string-statement)
[to_check.append(i) for i in j[pathway]["One Of2"].values()]
to_check = list(set(to_check))
return to_check
Expand All @@ -297,7 +302,7 @@
to_check = check_to_scrape(year)
dir_path = os.path.dirname(os.path.realpath(__file__))
pdf_path = os.path.join(dir_path, 'pdfs', pdf_name)
cis = ci.parse_pdf(pdf_path)

Check notice on line 305 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L305

Expression "[to_check.append(prefix + ' ' + code.replace('X', str(i))) for i in range(0, 10)]" is assigned to nothing (expression-not-assigned)
all_courses = dict()
for course in to_check:
prefix, code = course.split(" ")
Expand All @@ -312,7 +317,7 @@
all_courses[course_data["name"]] = course_data

out = json.dumps(all_courses, indent= 4)
print(out)

Check notice on line 320 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L320

String statement has no effect (pointless-string-statement)
with open(json_path, 'w') as f:
f.write(out)
driver.quit()
Expand All @@ -324,7 +329,7 @@
to_check = check_to_scrape(year)
dir_path = os.path.dirname(os.path.realpath(__file__))
pdf_path = os.path.join(dir_path, 'pdfs', pdf_name)
cis = ci.parse_pdf(pdf_path) # uses the pdf scraper to find all communication intensive courses

Check notice on line 332 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L332

Expression "[to_check.append(prefix + ' ' + code.replace('X', str(i))) for i in range(0, 10)]" is assigned to nothing (expression-not-assigned)
all_courses = dict()
for course in to_check:
prefix, code = course.split(" ")
Expand All @@ -338,7 +343,7 @@

for res in results:
all_courses.update(res) # this should combine all of our multiprocess results together

Check notice on line 346 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L346

String statement has no effect (pointless-string-statement)
out = json.dumps(all_courses, indent= 4)
with open(json_path, 'w') as f: # dump to json file
f.write(out)
Expand All @@ -352,7 +357,7 @@
subject, code = course.split(" ")[0], course.split(" ")[1]
course_data = scrape_single_course(subject, code, nav, cat)
if len(course_data.keys()) == 0: # removes blank courses
result[subject + "-" + code] = {"description": "", "corequisites": [], "rawprecoreq" : "", "prerequisites": [], "cross listed" : [], "offered": {"text" : ""}}

Check notice on line 360 in src/cron/courses_scraper.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/courses_scraper.py#L360

String statement has no effect (pointless-string-statement)
continue
result[course_data["subj"] + "-" + course_data["ID"]] = course_data # builds dictionary
return result
Expand Down
1 change: 1 addition & 0 deletions src/cron/crontab
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
00 * * * * root /usr/local/bin/python3 /usr/src/no_login.py >> /var/log/cron.log 2>&1
File renamed without changes.
File renamed without changes.
File renamed without changes.
34 changes: 30 additions & 4 deletions rpi_data/modules/no_login.py → src/cron/no_login.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import regex as re
import os

'''

Check notice on line 16 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L16

String statement has no effect (pointless-string-statement)
Finds all of the course codes for a given term and subject.
'''
def find_codes(term, subj):
Expand All @@ -30,26 +30,26 @@
print(len(elements))
pruned_elements = []
codes = []
for all in elements:

Check notice on line 33 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L33

Redefining built-in 'all' (redefined-builtin)
element = all.find("a").text
pruned_elements.append(element)
codes.append(element[:9])

return codes

'''

Check notice on line 40 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L40

String statement has no effect (pointless-string-statement)
Generates SIS links for a list of codes
'''
def generate_links(term, codes):
links = []
for all in codes:

Check notice on line 45 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L45

Redefining built-in 'all' (redefined-builtin)
subj = all[:4]
code = all[5:]
single_course = "https://sis.rpi.edu/rss/bwckctlg.p_disp_listcrse?term_in={}&subj_in={}&crse_in={}&schd_in=L".format(term, subj, code)
links.append(single_course)
return links

'''

Check notice on line 52 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L52

String statement has no effect (pointless-string-statement)
Scrapes all of the course information for a list of links.
'''
def scrape_all(links, term, major) -> list[Course]:
Expand All @@ -74,7 +74,7 @@
#info[15] - crosslist capacity, info[16] - crosslist enrolled, info[17] - crosslist seats left,
#info[18] are the profs, info[19] are days of the sem that the course spans, and info[20] is location
#Remove index[4] because most classes are on campus, with exceptions for some grad and doctoral courses.
'''

Check notice on line 77 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L77

String statement has no effect (pointless-string-statement)
Main link scrape, which splits the page into individual courses and then scrapes each.
'''
def link_scrape(term, link, major) -> list[Course]:
Expand Down Expand Up @@ -104,7 +104,7 @@
if (len(titles) != len(bodies)):
raise RuntimeError("Titles do not equal bodies: "+ link)
courses = []
for i in range(len(titles)):

Check notice on line 107 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L107

Consider using enumerate instead of iterating with range and len (consider-using-enumerate)
title = titles[i].text
split_title = title.rsplit(" - ", 3)
body_info = body_scrape(bodies[i])
Expand All @@ -121,10 +121,10 @@
course.append(split_title[0]) # NAME

formatted = format_and_order(body_info)
[courses.append(i) for i in formatted]

Check notice on line 124 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L124

Expression "[courses.append(i) for i in formatted]" is assigned to nothing (expression-not-assigned)
return courses

'''

Check notice on line 127 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L127

String statement has no effect (pointless-string-statement)
Scrapes the course occupancy information for a specific course from SIS.
'''
def get_slots(term, CRN):
Expand All @@ -149,7 +149,7 @@
scraped_table.pop(0)
return scraped_table[0]

'''

Check notice on line 152 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L152

String statement has no effect (pointless-string-statement)
Scrapes info from main page for a single course.
'''
def body_scrape(body) -> list[list[str]]:
Expand All @@ -159,7 +159,7 @@
string_body = string_body.replace("<br>", "")
string_body = string_body.replace("<br/>", "")
split_body = string_body.split("\n")
credits = ""

Check notice on line 162 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L162

Redefining built-in 'credits' (redefined-builtin)
for part in split_body:
if "Credits" in part:
part = part.replace("Credits", "")
Expand All @@ -180,7 +180,7 @@
course.append(credits)
return courses

'''

Check notice on line 183 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L183

String statement has no effect (pointless-string-statement)
Scrapes a table element into a 2D string list.
'''
def table_scrape(table:bs) -> list[list[str]]:
Expand All @@ -195,7 +195,7 @@
scraped_table.append(stripped_row)
return scraped_table

'''

Check notice on line 198 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L198

String statement has no effect (pointless-string-statement)
turns a term number into a human readable term
'''
def number_to_term(term) -> str:
Expand All @@ -220,7 +220,7 @@

#[crn, major, code, section, credits, name, days, stime, etime, max, curr, rem, profs, sdate, enddate, loc]

'''

Check notice on line 223 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L223

String statement has no effect (pointless-string-statement)
Formats and orders the courses into the desired order.
'''
def format_and_order(courses:list[list[str]]) -> list[list[str]]:
Expand All @@ -245,7 +245,7 @@
course[6] = course[6].replace("(P)", "")
temp = course[6].split(" ")
f_temp = []
for x in range(len(temp)):

Check notice on line 248 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L248

Consider using enumerate instead of iterating with range and len (consider-using-enumerate)
temp[x] = temp[x].strip()
if (temp[x] == ""):
continue
Expand Down Expand Up @@ -285,13 +285,14 @@
edate = "{0:%Y}-{0:%m}-{0:%d}".format(dt_end)
return sdate, edate

'''

Check notice on line 288 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L288

String statement has no effect (pointless-string-statement)
Parent function that scrapes all courses for a given term and writes them to a CSV file.
'''
def no_login_scrape(term: str, num_browsers: int):
options = Options()
services = webdriver.FirefoxService( executable_path=os.environ.get('GECKO_PATH', '/usr/local/bin/geckodriver') )
options.add_argument("--headless")
driver = webdriver.Firefox(options=options) # starter code which uses selenium
driver = webdriver.Firefox(options=options, service=services) # starter code which uses selenium
subjects = old.findAllSubjectCodes(driver) # finds all subject codes
nav, cat = cs.navigate_to_course(driver, term) # finds the navigation and catalog ids, which are each used to build a course search query.
driver.quit()
Expand All @@ -304,7 +305,7 @@
parts = list(cs.split(links, num_browsers))
temp_courses = pool.starmap(scrape_all, [(part, term, subject) for part in parts])
temp_courses = [i for sublist in temp_courses for i in sublist] # flattens the list
[i.addSchool(subjects[subject]) for i in temp_courses] # adds the school to each course

Check notice on line 308 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L308

Expression "[i.addSchool(subjects[subject]) for i in temp_courses]" is assigned to nothing (expression-not-assigned)
temp_codes = list(set([i.major + " " + i.code for i in temp_courses]))
print(len(temp_codes))
extra_info = pre_req_scrape(temp_codes, nav, cat, num_browsers) # scrapes the extra info off of the catalog website
Expand All @@ -320,8 +321,8 @@
course.addReqs(extra_info[course.short]["prerequisites"], extra_info[course.short]["corequisites"], extra_info[course.short]["rawprecoreq"], extra_info[course.short]["description"])
course.frequency = extra_info[course.short]["offered"]["text"]

[courses.append(i) for i in temp_courses]

Check notice on line 324 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L324

Expression "[courses.append(i) for i in temp_courses]" is assigned to nothing (expression-not-assigned)
'''

Check notice on line 325 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L325

String statement has no effect (pointless-string-statement)
Check Professor Goldschmidt's information
'''
textTerm = number_to_term(term).lower().replace(" ", "")
Expand All @@ -335,8 +336,9 @@
parent = os.path.abspath(os.path.join(dir_path, os.pardir))
path = os.path.join(parent, number_to_term(term).lower().replace(" ", "-") + ".csv")
old.writeCSV(courses, path)
return path

'''

Check notice on line 341 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L341

String statement has no effect (pointless-string-statement)
Scrapes the prerequisites for multiple courses at once.
'''
def pre_req_scrape(codes: list[str], nav:str, cat:str, num_browsers: int):
Expand All @@ -348,7 +350,7 @@
all_courses.update(res)
return all_courses

'''

Check notice on line 353 in src/cron/no_login.py

View check run for this annotation

codefactor.io / CodeFactor

src/cron/no_login.py#L353

String statement has no effect (pointless-string-statement)
Edits a course using the information from Professor Goldschmidt's website.
'''
def add_goldy_info(course: Course, goldy_info: dict):
Expand All @@ -368,7 +370,31 @@
course.raw = "Prerequisites: " + goldy_info[checking]

if __name__ == "__main__":
no_login_scrape("202409", 15)
#driver = webdriver.Firefox()
print("Our test works at", datetime.now())

# options = Options()
# services = webdriver.FirefoxService( executable_path=os.environ.get('GECKO_PATH', '/usr/local/bin/geckodriver') )
# options.add_argument("--headless")
# driver = webdriver.Firefox(options=options, service=services)

# print(cs.scrape_single_course(driver, "MANE", "6990", 202509))

file = no_login_scrape("202509", 15)
fileName = os.path.basename(os.path.normpath(file))
url = os.environ.get('YACS_API_HOST', 'http://yacs_api:4000')
payload = {'isPubliclyVisible': 'on'}

files=[
('file',(fileName,open(file,'rb'),'text/csv'))
]

headers = {
'X-API-KEY': os.environ.get('API_SIGN_KEY', None)
}

resp = requests.post(url + '/api/bulkCourseUpload', headers=headers, data=payload, files=files)
print(resp.text)

# driver = webdriver.Firefox()
#print(cs.scrape_single_course(driver, "CSCI", "1100", 202409))
#print(link_scrape("202409", "https://sis.rpi.edu/rss/bwckctlg.p_disp_listcrse?term_in=202409&subj_in=CHME&crse_in=4980&schd_in=L", "CHME"))
#print(link_scrape("202409", "https://sis.rpi.edu/rss/bwckctlg.p_disp_listcrse?term_in=202409&subj_in=CHME&crse_in=4980&schd_in=L", "CHME"))
7 changes: 7 additions & 0 deletions src/cron/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
selenium==4.28.1
beautifulsoup4==4.12.3
bs4==0.0.2
pypdf==5.1.0
pandas==2.2.3
requests==2.32.3
regex==2024.11.6