souradipp76
diff --git a/‎.github/workflows/rename_project.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/rename_project.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 36 additions & 1 deletion b/‎README.md‎
Lines changed: 36 additions & 1 deletion
diff --git a/‎paper/paper.bib‎
Lines changed: 23 additions & 23 deletions b/‎paper/paper.bib‎
Lines changed: 23 additions & 23 deletions
@@ -35,7 +35,7 @@ jobs:
           echo "Renaming the project with -a(author) ${{ env.REPOSITORY_OWNER }} -n(name) ${{ env.REPOSITORY_NAME }} -u(urlname) ${{ env.REPOSITORY_URLNAME }}"
           .github/rename_project.sh -a ${{ env.REPOSITORY_OWNER }} -n ${{ env.REPOSITORY_NAME }} -u ${{ env.REPOSITORY_URLNAME }} -d "Awesome ${{ env.REPOSITORY_NAME }} created by ${{ env.REPOSITORY_OWNER }}"
               
-      - uses: stefanzweifel/git-auto-commit-action@v5
+      - uses: stefanzweifel/git-auto-commit-action@v6
         with:
           commit_message: "✅ Ready to clone and code."
           # commit_options: '--amend --no-edit'
 
@@ -2,6 +2,7 @@
 
 [![codecov](https://codecov.io/gh/souradipp76/ReadMeReady/branch/main/graph/badge.svg?token=49620380-3fe7-4eb1-8dbb-3457febc6f78)](https://codecov.io/gh/souradipp76/ReadMeReady)
 [![CI](https://github.com/souradipp76/ReadMeReady/actions/workflows/main.yml/badge.svg)](https://github.com/souradipp76/ReadMeReady/actions/workflows/main.yml)
+[![DOI](https://joss.theoj.org/papers/10.21105/joss.07489/status.svg)](https://doi.org/10.21105/joss.07489)
 
 Auto-generate code documentation in Markdown format in seconds.
 
@@ -124,7 +125,7 @@ index.index(repo_config)
 query.generate_readme(repo_config, user_config, readme_config)
 ```
 
-Run the sample script in the `examples/example.py` to see a typical code usage. See example on Google Colab: <a src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github.com/souradipp76/ReadMeReady/blob/main/examples/example.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
+Run the sample script in the `examples/example.py` to see a typical code usage. See example on Google Colab: <a src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github/souradipp76/ReadMeReady/blob/main/examples/example.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
 
 See detailed API references [here](https://souradipp76.github.io/ReadMeReady/reference/).
 
@@ -135,6 +136,40 @@ For finetuning on custom datasets, follow the instructions below.
 - Run the notebook file `scripts/data.ipynb` and follow the instructions in the file to generate custom dataset from open-source repositories.
 - Run the notebook file `scripts/fine-tuning-with-llama2-qlora.ipynb` and follow the instructions in the file to finetune custom LLMs.
 
+The results are reported in Table 1 and Table 2, under the "With FT" or "With Finetuning" columns where the contents are compared with each repository's original README file. It is observed that BLEU scores range from 15 to 30, averaging 20, indicating that the generated text is understandable but requires substantial editing to be acceptable. Conversely, BERT scores reveal a high semantic similarity to the original README content, with an average F1 score of ~85%.
+
+### Table 1: BLEU Scores
+
+| Repository | W/O FT | With FT |
+|------------|--------|---------|
+| allennlp   | 32.09  | 16.38   |
+| autojump   | 25.29  | 18.73   |
+| numpy-ml   | 16.61  | 19.02   |
+| Spleeter   | 18.33  | 19.47   |
+| TouchPose  | 17.04  | 8.05    |
+
+### Table 2: BERT Scores
+
+| Repository | P (W/O FT) | R (W/O FT) | F1 (W/O FT) | P (With FT) | R (With FT) | F1 (With FT) |
+|------------|------------|------------|-------------|-------------|-------------|--------------|
+| allennlp   | 0.904      | 0.8861     | 0.895       | 0.862       | 0.869       | 0.865        |
+| autojump   | 0.907      | 0.86       | 0.883       | 0.846       | 0.87        | 0.858        |
+| numpy-ml   | 0.89       | 0.881      | 0.885       | 0.854       | 0.846       | 0.85         |
+| Spleeter   | 0.86       | 0.845      | 0.852       | 0.865       | 0.866       | 0.865        |
+| TouchPose  | 0.87       | 0.841      | 0.856       | 0.831       | 0.809       | 0.82         |
+
+
+### Validation
+
+Run the script `scripts/run_validate.sh` to generate BLEU and BERT scores for 5 sample repositories comparing the actual README file with the generated ones. Note that to reproduce the scores, a GPU with 16GB or more is required.
+
+```bash
+$ chmod +x scripts/run_validate.sh
+$ scripts/run_validate.sh
+```
+
+Alternatively, run the notebook `scripts/validate.ipynb` on Google Colab: <a src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github/souradipp76/ReadMeReady/blob/main/scripts/validate.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
+
 ### Supported models
 - TINYLLAMA_1p1B_CHAT_GGUF (`TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF`)
 - GOOGLE_GEMMA_2B_INSTRUCT_GGUF (`bartowski/gemma-2-2b-it-GGUF`)
 
@@ -13,7 +13,7 @@ @article{chomsky1956three
 }
 
 @article{miller2003cognitive,
-  title={The cognitive revolution: a historical perspective},
+  title={The cognitive revolution: {a} historical perspective},
   author={Miller, George A},
   journal={Trends in cognitive sciences},
   volume={7},
@@ -25,23 +25,23 @@ @article{miller2003cognitive
 }
 
 @article{graves2014neural,
-  title={Neural turing machines},
+  title={Neural {T}uring {M}achines},
   author={Graves, Alex and Wayne, Greg and Danihelka, Ivo},
   journal={arXiv preprint arXiv:1410.5401},
   year={2014},
   doi={10.48550/arXiv.1410.5401}
 }
 
 @article{bahdanau2014neural,
-  title={Neural machine translation by jointly learning to align and translate},
+  title={Neural Machine Translation by Jointly Learning to Align and Translate},
   author={Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
   journal={arXiv preprint arXiv:1409.0473},
   year={2014},
   doi={10.48550/arXiv.1409.0473}
 }
 
 @inproceedings{iyer2016summarizing,
-  title={Summarizing source code using a neural attention model},
+  title={Summarizing Source Code using a Neural Attention Model},
   author={Iyer, Srinivasan and Konstas, Ioannis and Cheung, Alvin and Zettlemoyer, Luke},
   booktitle={54th Annual Meeting of the Association for Computational Linguistics 2016},
   pages={2073--2083},
@@ -51,15 +51,15 @@ @inproceedings{iyer2016summarizing
 }
 
 @article{ling2016latent,
-  title={Latent predictor networks for code generation},
+  title={Latent Predictor Networks for Code Generation},
   author={Ling, Wang and Grefenstette, Edward and Hermann, Karl Moritz and Ko{\v{c}}isk{\`y}, Tom{\'a}{\v{s}} and Senior, Andrew and Wang, Fumin and Blunsom, Phil},
   journal={arXiv preprint arXiv:1603.06744},
   year={2016},
   doi={10.48550/arXiv.1603.06744}
 }
 
 @article{yin2017syntactic,
-  title={A syntactic neural model for general-purpose code generation},
+  title={A Syntactic Neural Model for General-purpose Code Generation},
   author={Yin, Pengcheng and Neubig, Graham},
   journal={arXiv preprint arXiv:1704.01696},
   year={2017},
@@ -77,15 +77,15 @@ @inproceedings{allamanis2013mining
 }
 
 @article{bhoopchand2016learning,
-  title={Learning python code suggestion with a sparse pointer network},
+  title={Learning Python Code Suggestion with a Sparse Pointer Network},
   author={Bhoopchand, Avishkar and Rockt{\"a}schel, Tim and Barr, Earl and Riedel, Sebastian},
   journal={arXiv preprint arXiv:1611.08307},
   year={2016},
   doi={10.48550/arXiv.1611.08307}
 }
 
 @inproceedings{oda2015learning,
-  title={Learning to generate pseudo-code from source code using statistical machine translation},
+  title={Learning to Generate Pseudo-Code from Source Code using Statistical Machine Translation},
   author={Oda, Yusuke and Fudaba, Hiroyuki and Neubig, Graham and Hata, Hideaki and Sakti, Sakriani and Toda, Tomoki and Nakamura, Satoshi},
   booktitle={2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
   pages={574--584},
@@ -104,7 +104,7 @@ @inproceedings{quirk2015language
 }
 
 @article{radford2019language,
-  title={Language models are unsupervised multitask learners},
+  title={Language {M}odels are {U}nsupervised {M}ultitask {L}earners},
   author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},
   journal={OpenAI blog},
   volume={1},
@@ -114,7 +114,7 @@ @article{radford2019language
 }
 
 @article{brown2020language,
-  title={Language models are few-shot learners},
+  title={Language {M}odels are {F}ew-{S}hot {L}earners},
   author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
   journal={Advances in neural information processing systems},
   volume={33},
@@ -134,7 +134,7 @@ @article{ouyang2022training
 }
 
 @article{vaswani2017attention,
-  title={Attention is all you need},
+  title={Attention is {A}ll {Y}ou {N}eed},
   author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
   journal={Advances in neural information processing systems},
   volume={30},
@@ -143,7 +143,7 @@ @article{vaswani2017attention
 }
 
 @article{lester2021power,
-  title={The power of scale for parameter-efficient prompt tuning},
+  title={The Power of Scale for Parameter-Efficient Prompt Tuning},
   author={Lester, Brian and Al-Rfou, Rami and Constant, Noah},
   journal={arXiv preprint arXiv:2104.08691},
   year={2021},
@@ -255,15 +255,15 @@ @misc{sentence-transformers-all-mpnet-base-v2
 }
 
 @article{barone2017parallel,
-  title={A parallel corpus of python functions and documentation strings for automated code documentation and code generation},
+  title={A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation},
   author={Barone, Antonio Valerio Miceli and Sennrich, Rico},
   journal={arXiv preprint arXiv:1707.02275},
   year={2017},
   doi={10.48550/arXiv.1707.02275}
 }
 
 @article{malkov2018efficient,
-  title={Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs},
+  title={Efficient and robust approximate nearest neighbor search using {H}ierarchical {N}avigable {S}mall {W}orld graphs},
   author={Malkov, Yu A and Yashunin, Dmitry A},
   journal={IEEE transactions on pattern analysis and machine intelligence},
   volume={42},
@@ -275,21 +275,21 @@ @article{malkov2018efficient
 }
 
 @article{dettmers2023qlora,
-  title={QLoRA: Efficient Finetuning of Quantized LLMs},
+  title={QLo{RA}: {E}fficient {F}inetuning of {Q}uantized LLMs},
   author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
   journal={arXiv preprint arXiv:2305.14314},
   year={2023},
   doi={10.48550/arXiv.2305.14314}
 }
 
 @inproceedings{
-hu2022lora,
-title={Lo{RA}: Low-Rank Adaptation of Large Language Models},
-author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
-booktitle={International Conference on Learning Representations},
-year={2022},
-url={https://openreview.net/forum?id=nZeVKeeFYf9},
-doi={10.48550/arXiv.2106.09685}
+  hu2022lora,
+  title={Lo{RA}: Low-Rank Adaptation of Large Language Models},
+  author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
+  booktitle={International Conference on Learning Representations},
+  year={2022},
+  url={https://openreview.net/forum?id=nZeVKeeFYf9},
+  doi={10.48550/arXiv.2106.09685}
 }
 
 @article{zhang2023dynamically,
@@ -311,7 +311,7 @@ @article{makarychev2024single
 }
 
 @article{datta2024consistency,
-  title={On the consistency of maximum likelihood estimation of probabilistic principal component analysis},
+  title={On the {C}onsistency of {M}aximum {L}ikelihood {E}stimation of {P}robabilistic {P}rincipal {C}omponent {A}nalysis},
   author={Datta, Arghya and Chakrabarty, Sayak},
   journal={Advances in Neural Information Processing Systems},
   volume={36},