SoftwareUnderstanding
diff --git a/‎README.md‎
Lines changed: 198 additions & 48 deletions b/‎README.md‎
Lines changed: 198 additions & 48 deletions
diff --git a/‎setup.cfg‎
Lines changed: 4 additions & 4 deletions b/‎setup.cfg‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎src/SSKG/__init__.py‎ ‎src/RSEF/__init__.py‎src/SSKG/__init__.py renamed to src/RSEF/__init__.py b/‎src/SSKG/__init__.py‎ ‎src/RSEF/__init__.py‎src/SSKG/__init__.py renamed to src/RSEF/__init__.py
diff --git a/‎src/SSKG/__main__.py‎ ‎src/RSEF/__main__.py‎src/SSKG/__main__.py renamed to src/RSEF/__main__.py
Lines changed: 20 additions & 14 deletions b/‎src/SSKG/__main__.py‎ ‎src/RSEF/__main__.py‎src/SSKG/__main__.py renamed to src/RSEF/__main__.py
Lines changed: 20 additions & 14 deletions
diff --git a/‎src/SSKG/download_pdf/__init__.py‎ ‎src/RSEF/download_pdf/__init__.py‎src/SSKG/download_pdf/__init__.py renamed to src/RSEF/download_pdf/__init__.py b/‎src/SSKG/download_pdf/__init__.py‎ ‎src/RSEF/download_pdf/__init__.py‎src/SSKG/download_pdf/__init__.py renamed to src/RSEF/download_pdf/__init__.py
@@ -1,77 +1,227 @@
 
 
-# Research Software Extraction Framework (RSEF)  
-README IN PROGRESS  
-## Introduction  
-  
-This tool verifies the link between a scientific paper and a software repository. It accomplishes this by locating the URL of the software repository within the scientific paper. It then extracts the repository's metadata to find any URLs associated with scientific papers and checks if they lead back to the original paper. If a bidirectional link is established, it marks it as "bidirectional".  
 
-There is also a "unidirectional" metric, which finds a repository url and see's within the repository if the paper is named.
-  
-## Dependencies  
-- Python 3.10
-- Java 8 or above (please see [Tika requirements](https://tika.apache.org))  
-  
-## Installation  
-  
-Install the required dependencies by running:  
-```  
-pip install -r requirements.txt  
-```  
-Highly recommended steps:  
-  
-```text  
-somef configure  
-```  
-You will be asked to provide:  
-  
-* A GitHub authentication token [**optional, leave blank if not used**], which SOMEF uses to retrieve metadata from GitHub. If you don't include an authentication token, you can still use SOMEF. However, you may be limited to a series of requests per hour. For more information, see [https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)  
+# Research Software Extraction Framework (RSEF)
+
 
-* The path to the trained classifiers (pickle files). If you have your own classifiers, you can provide them here. Otherwise, you can leave it blank  
+## Introduction
 
-### Docker
-TODO
+This tool verifies the link between a scientific paper and a software repository. It accomplishes this by locating the URL of the software repository within the scientific paper. It then extracts the repository's metadata to find any URLs associated with scientific papers and checks if they lead back to the original paper. If a bidirectional link is established, it marks it as "bidirectional".
 
-## Usage
 
-  To see an example of usage please look at [example.ipynb](./example/example.ipynb)
+
+There is also a "unidirectional" metric, which finds a repository url and see's within the repository if the paper is named.
+
+## Dependencies
+
+- Python 3.9
+
+- Java 8 or above (please see [Tika requirements](https://pypi.org/project/tika/))
+
+## Installation
+
+Install the required dependencies by running:
+
+```
+
+pip install -e .
+
+```
+
+Highly recommended steps:
+
+```text
+
+somef configure
+
+```
+
+You will be asked to provide:
+
+* A GitHub authentication token [**optional, leave blank if not used**], which SOMEF uses to retrieve metadata from GitHub. If you don't include an authentication token, you can still use SOMEF. However, you may be limited to a series of requests per hour. For more information, see [https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line)
+
+* The path to the trained classifiers (pickle files). If you have your own classifiers, you can provide them here. Otherwise, you can leave it blank
+
 
-### The repository is divided into the following directories:  
+
+## Usage
+
+```text
+
+Usage: rsef [OPTIONS] COMMAND [ARGS]...
+
+RRRRRRRRR     SSSSSSSSS    EEEEEEEEE  FFFFFFFFF  
+RRR    RRR   SSS     SSS   EEE        FFF  
+RRR    RRR   SSSS          EEE        FFF
+RRRRRRRRR     SSSSSSSSS    EEEEEEE    FFFFFFF  
+RRR    RRR          SSSS   EEE        FFF  
+RRR     RRR   SSS    SSS   EEE        FFF  
+RRR      RRR   SSSSSSSS    EEEEEEEEE  FFF  
   
-1. Download_pdf 
+Research Software Extraction Framework (RSEF)\n
+Find and assess Research Software within Research papers.
+
+Usage:
+1. (assess) Assess doi for unidirectionality or bidirectionality
+2. (download) Download PDF (paper) from a doi or list
+3. (process)  Process downloaded pdf to find urls and abstract
+
+Options:
+--version Show the version and exit.
+-h, --help  Show this message and exit.
+
+Commands:
+	assess
+	download
+	process
+ 
+```
+
+### Assess
+
+The assess command allows for a user to determine whether a given Identifier, in this case ArXiv or DOI,  is bidirectional or not.
+
+The command allows for the user to input a single DOI/ArXiv, a list of identifiers given as a ```.txt```, or a ```processed_metadata.json``` 
+
+
+```text
+rsef assess -h
+Usage: sskg assess [OPTIONS]
+
+Options:
+
+-i, --input <name> DOI, path to .txt list of DOIs or path to processed_metadata.json [required]
+
+-o, --output <path>  Output csv file  [default: output]
+
+-U, --unidir Unidirectionality
+
+-B, --bidir  Bidirectionality
+
+-h, --help Show this message and exit.
+```
+
+### Download
+
+The download command allows for a user to download the pdf with its metadata given an Identifier: ArXiv or DOI.  Alongside the PDFs folder there will be a `download_metadata.json` which will have the Title, DOI, ArXiv and filename/filepath for each paper downloaded.
+```
+rsef download -h 
+Usage: rsef download [OPTIONS]
+
+Options:
+
+-i, --input <name> DOI or path to .txt list of DOIs  [required]
+
+-o, --output <path>  Output Directory [default: ./]
+
+-h, --help Show this message and exit.
+```
+
+### Processed
+
+The process command allows to take Identifier, or downloaded paper and process it to extract the abstract and github and zenodo urls. These will be saved in a json named ```processed_metadata.json```
+```
+rsef process -h
+Usage: rsef process [OPTIONS]
+
+Options:
+
+-i, --input <name>  DOI, path to .txt list of DOIs or path to downloaded_metadata.json [required]
+
+-o, --output <path>  Output Directory [default: ./]
+
+-h, --help Show this message and exit.
+```
+
+
+
+
+### The repository is divided into the following directories:
+
+1. Download_pdf
+
 2. Metadata
+
 3. Extraction
-4. Object_creator  
+
+4. Object_creator
+
 5. Modelling
+
 6. Prediction
-  
-### Download_pdf
-Pertains to all the downloading of pdfs. 
-Downloaded_obj is a representation of downloaded papers which have not been processed yet.
+
+7. Utils
+
 
 ### Metadata
-TODO
-Encompasses petitions to OpenAlex for fetching the paper's metadata.
+
+
+Encompasses all petitions to OpenAlex and other api's for fetching the paper's metadata or general requests.
+
 MetadataObj contains the metadata from  OpenAlex: doi, arxiv and its title.
 
+### Download_pdf
+
+Pertains to all the downloading of pdfs.
+
+Downloaded_obj is a representation of downloaded papers which have not been processed yet. 
+
+Contains:
+
+	- Title 
+	- DOI
+	- ArXiv
+	- file_path
+	- file_name
+
+These objects are normally saved into a `downloaded_metadata.json`
+
+  
+
 ### Extraction
-TODO
+
+
+
 Tika scripts to open a pdf and extract its urls are also found witin this module.
-PaperObj is created once the downloadedObj's pdf has been processed to locate all its urls. Contains: doi, arxiv, title, file_path, urls.
-Finally, the necessary functions dowloading a repository and extracting its metadata with SOMEF
+
+PaperObj is created once the downloadedObj's pdf has been processed to locate all its urls. 
+Contains: 
+- DOI
+-  arXiv
+- Abstract
+- Title
+- File_path
+- File_name
+- URLs
+
+Finally, the necessary functions downloading a repository and extracting its metadata with SOMEF
+
+  
 
 ### Modelling
-Contains all assessment of bidirectionality and unidirectionality. 
-Mainly receives a paperObj and a repository_metadata json.
+
+Contains all assessment of bi-directionality and uni-directionality.
+
+Receives a paperObj and a repository_metadata json.
+
+  
 
 ### Object Creator
-This is the pipeline broken down into its main parts. Please look at [pipeline.py](./object_creator/pipeline.py) and [example.ipynb](./example/example.ipynb) to view the execution process.
+
+This is the pipeline broken down into its main parts. Please look at [pipeline.py](./object_creator/pipeline.py) to view the execution process.
+
+  
 
 ### Prediction
+
 For assessment of the program against its corpus. The corpus can be found within [corpus.csv](./predicition/corpus.csv) and the f1 score obtained bidirectional:  [corpus_eval_bidir.json](./predicition/corpus_eval_bidir.json) and the same for the unidirectional (_unidir)
 
 
+## Tests
+
+Tests can be found in the `./tests` folder
 
-## License  
-  
-This project is licensed under the [MIT License](LICENSE).
+
+## License
+
+This project is licensed under the [MIT License](LICENSE).  
@@ -1,6 +1,6 @@
 [metadata]
-name = SSKG
-version = attr: SSKG.__version__
+name = RSEF
+version = attr: RSEF.__version__
 author =  Miguel Arroyo Márquez, Daniel Garijo
 author_email = daniel.garijo@upm.es
 description = TODO
@@ -16,7 +16,7 @@ package_dir =
     = src
 packages = find:
 include_package_data = True
-python_requires = >= 3.10.0
+python_requires = >= 3.9.0
 install_requires =
     somef >= 0.9.4
     arxiv   
@@ -34,4 +34,4 @@ where = src
 
 [options.entry_points]
 console_scripts =
-    sskg = SSKG.__main__:cli
+    rsef = RSEF.__main__:cli
@@ -11,13 +11,15 @@
 @click.version_option(__version__)
 def cli():
     """
-    ███████  ███████  ██   ██   ██████  \n
-    ██       ██       ██  ██   ██       \n
-    ███████  ███████  █████    ██   ███ \n
-         ██       ██  ██  ██   ██    ██ \n
-    ███████  ███████  ██   ██   ██████  \n
+    RRRRRRRRR   SSSSSSSSS  EEEEEEEEE   FFFFFFFFF\n
+    RRR   RRR  SSS    SSS  EEE         FFF\n
+    RRR   RRR  SSS         EEE         FFF\n
+    RRRRRRRRR  SSSSSSSSS   EEEEEEE     FFFFFFF\n
+    RRR RRR         SSSS   EEE         FFF\n
+    RRR  RRR  SSS    SSS   EEE         FFF\n
+    RRR   RRR  SSSSSSSSS   EEEEEEEEE   FFF\n
 
-    Scientific Software Knowledge Graphs (SSKG)\n
+    Research Software Extraction Framework (RSEF)\n
     Find and assess Research Software within Research papers.\n
 
     Usage:\n
@@ -50,11 +52,12 @@ def cli():
 #         exit(1)
 
 @cli.command()
-@click.option('--input','-i', required=True, help="DOI or path to .txt list of DOIs", metavar='<name>')
-@click.option('--output','-o', default="output", show_default=True, help="Output csv file", metavar='<path>')
+@click.option('--input', '-i', required=True, help="DOI, path to .txt list of DOIs or path to processed_metadata.json",
+              metavar='<name>')
+@click.option('--output', '-o', default="output", show_default=True, help="Output csv file", metavar='<path>')
 @click.option('--unidir', '-U', is_flag=True, default = False, help="Unidirectionality")
 @click.option('--bidir', '-B', is_flag=True, default = False, help="Bidirectionality")
-def assess(input, output,unidir,bidir):
+def assess(input, output, unidir, bidir):
     from .object_creator.pipeline import dois_txt_to_unidir_json, dois_txt_to_bidir_json, single_doi_pipeline_unidir, \
         single_doi_pipeline_bidir, papers_json_to_unidir_json, papers_json_to_bidir_json
     if unidir:
@@ -84,10 +87,11 @@ def assess(input, output,unidir,bidir):
 
 
 @cli.command()
-@click.option('--input','-i', required=True, help="DOI or path to .txt list of DOIs", metavar='<name>')
-@click.option('--output','-o', default="./", show_default=True, help="Output Directory ", metavar='<path>')
+@click.option('--input', '-i', required=True, help="DOI or path to .txt list of DOIs", metavar='<name>')
+@click.option('--output', '-o', default="./", show_default=True, help="Output Directory ", metavar='<path>')
 def download(input, output):
     from .object_creator.create_downloadedObj import doi_to_downloadedJson, dois_txt_to_downloadedJson
+    from .utils.regex import str_to_doiID
     if input.endswith(".txt") and os.path.exists(input):
         dois_txt_to_downloadedJson(dois_txt=input, output_dir=output)
     else:
@@ -97,17 +101,18 @@ def download(input, output):
             print(e)
         return
 @cli.command()
-@click.option('--input','-i', required=True, help="DOI or path to .txt list of DOIs", metavar='<name>')
+@click.option('--input', '-i', required=True, help="DOI, path to .txt list of DOIs or path to downloaded_metadata.json",
+              metavar='<name>')
 @click.option('--output','-o', default="./", show_default=True, help="Output Directory ", metavar='<path>')
-def process(input,output):
+def process(input, output):
     from .object_creator.downloaded_to_paperObj import dwnlddJson_to_paperJson, dwnldd_obj_to_paper_json
     from .object_creator.create_downloadedObj import pdf_to_downloaded_obj
 
     if os.path.isdir(input):
         _aux_pdfs_to_pp_json(input= input, output= output)
         return
     if input.endswith(".json") and os.path.exists(input):
-        dwnlddJson_to_paperJson(input,output)
+        dwnlddJson_to_paperJson(input, output)
     if input.endswith(".pdf") and os.path.exists(input):
         #TODO
         dwnldd = pdf_to_downloaded_obj(pdf= input, output_dir= output)
@@ -117,6 +122,7 @@ def process(input,output):
         print("Error")
         return
 
+
 def _aux_pdfs_to_pp_json(input, output):
     from .object_creator.create_downloadedObj import pdf_to_downloaded_obj
     from .object_creator.downloaded_to_paperObj import dwnldd_obj_to_paper_dic