Skip to content

Commit 03abaa6

Browse files
committed
Merge branch 'main' of https://github.com/souradipp76/ReadMeReady into app_dev
2 parents 3270106 + 987947d commit 03abaa6

File tree

8 files changed

+624
-57
lines changed

8 files changed

+624
-57
lines changed

.github/workflows/rename_project.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
echo "Renaming the project with -a(author) ${{ env.REPOSITORY_OWNER }} -n(name) ${{ env.REPOSITORY_NAME }} -u(urlname) ${{ env.REPOSITORY_URLNAME }}"
3636
.github/rename_project.sh -a ${{ env.REPOSITORY_OWNER }} -n ${{ env.REPOSITORY_NAME }} -u ${{ env.REPOSITORY_URLNAME }} -d "Awesome ${{ env.REPOSITORY_NAME }} created by ${{ env.REPOSITORY_OWNER }}"
3737
38-
- uses: stefanzweifel/git-auto-commit-action@v5
38+
- uses: stefanzweifel/git-auto-commit-action@v6
3939
with:
4040
commit_message: "✅ Ready to clone and code."
4141
# commit_options: '--amend --no-edit'

README.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
[![codecov](https://codecov.io/gh/souradipp76/ReadMeReady/branch/main/graph/badge.svg?token=49620380-3fe7-4eb1-8dbb-3457febc6f78)](https://codecov.io/gh/souradipp76/ReadMeReady)
44
[![CI](https://github.com/souradipp76/ReadMeReady/actions/workflows/main.yml/badge.svg)](https://github.com/souradipp76/ReadMeReady/actions/workflows/main.yml)
5+
[![DOI](https://joss.theoj.org/papers/10.21105/joss.07489/status.svg)](https://doi.org/10.21105/joss.07489)
56

67
Auto-generate code documentation in Markdown format in seconds.
78

@@ -124,7 +125,7 @@ index.index(repo_config)
124125
query.generate_readme(repo_config, user_config, readme_config)
125126
```
126127

127-
Run the sample script in the `examples/example.py` to see a typical code usage. See example on Google Colab: <a src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github.com/souradipp76/ReadMeReady/blob/main/examples/example.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
128+
Run the sample script in the `examples/example.py` to see a typical code usage. See example on Google Colab: <a src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github/souradipp76/ReadMeReady/blob/main/examples/example.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
128129

129130
See detailed API references [here](https://souradipp76.github.io/ReadMeReady/reference/).
130131

@@ -135,6 +136,40 @@ For finetuning on custom datasets, follow the instructions below.
135136
- Run the notebook file `scripts/data.ipynb` and follow the instructions in the file to generate custom dataset from open-source repositories.
136137
- Run the notebook file `scripts/fine-tuning-with-llama2-qlora.ipynb` and follow the instructions in the file to finetune custom LLMs.
137138

139+
The results are reported in Table 1 and Table 2, under the "With FT" or "With Finetuning" columns where the contents are compared with each repository's original README file. It is observed that BLEU scores range from 15 to 30, averaging 20, indicating that the generated text is understandable but requires substantial editing to be acceptable. Conversely, BERT scores reveal a high semantic similarity to the original README content, with an average F1 score of ~85%.
140+
141+
### Table 1: BLEU Scores
142+
143+
| Repository | W/O FT | With FT |
144+
|------------|--------|---------|
145+
| allennlp | 32.09 | 16.38 |
146+
| autojump | 25.29 | 18.73 |
147+
| numpy-ml | 16.61 | 19.02 |
148+
| Spleeter | 18.33 | 19.47 |
149+
| TouchPose | 17.04 | 8.05 |
150+
151+
### Table 2: BERT Scores
152+
153+
| Repository | P (W/O FT) | R (W/O FT) | F1 (W/O FT) | P (With FT) | R (With FT) | F1 (With FT) |
154+
|------------|------------|------------|-------------|-------------|-------------|--------------|
155+
| allennlp | 0.904 | 0.8861 | 0.895 | 0.862 | 0.869 | 0.865 |
156+
| autojump | 0.907 | 0.86 | 0.883 | 0.846 | 0.87 | 0.858 |
157+
| numpy-ml | 0.89 | 0.881 | 0.885 | 0.854 | 0.846 | 0.85 |
158+
| Spleeter | 0.86 | 0.845 | 0.852 | 0.865 | 0.866 | 0.865 |
159+
| TouchPose | 0.87 | 0.841 | 0.856 | 0.831 | 0.809 | 0.82 |
160+
161+
162+
### Validation
163+
164+
Run the script `scripts/run_validate.sh` to generate BLEU and BERT scores for 5 sample repositories comparing the actual README file with the generated ones. Note that to reproduce the scores, a GPU with 16GB or more is required.
165+
166+
```bash
167+
$ chmod +x scripts/run_validate.sh
168+
$ scripts/run_validate.sh
169+
```
170+
171+
Alternatively, run the notebook `scripts/validate.ipynb` on Google Colab: <a src="https://colab.research.google.com/assets/colab-badge.svg" href="https://colab.research.google.com/github/souradipp76/ReadMeReady/blob/main/scripts/validate.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"></a>
172+
138173
### Supported models
139174
- TINYLLAMA_1p1B_CHAT_GGUF (`TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF`)
140175
- GOOGLE_GEMMA_2B_INSTRUCT_GGUF (`bartowski/gemma-2-2b-it-GGUF`)

paper/paper.bib

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ @article{chomsky1956three
1313
}
1414

1515
@article{miller2003cognitive,
16-
title={The cognitive revolution: a historical perspective},
16+
title={The cognitive revolution: {a} historical perspective},
1717
author={Miller, George A},
1818
journal={Trends in cognitive sciences},
1919
volume={7},
@@ -25,23 +25,23 @@ @article{miller2003cognitive
2525
}
2626

2727
@article{graves2014neural,
28-
title={Neural turing machines},
28+
title={Neural {T}uring {M}achines},
2929
author={Graves, Alex and Wayne, Greg and Danihelka, Ivo},
3030
journal={arXiv preprint arXiv:1410.5401},
3131
year={2014},
3232
doi={10.48550/arXiv.1410.5401}
3333
}
3434

3535
@article{bahdanau2014neural,
36-
title={Neural machine translation by jointly learning to align and translate},
36+
title={Neural Machine Translation by Jointly Learning to Align and Translate},
3737
author={Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua},
3838
journal={arXiv preprint arXiv:1409.0473},
3939
year={2014},
4040
doi={10.48550/arXiv.1409.0473}
4141
}
4242

4343
@inproceedings{iyer2016summarizing,
44-
title={Summarizing source code using a neural attention model},
44+
title={Summarizing Source Code using a Neural Attention Model},
4545
author={Iyer, Srinivasan and Konstas, Ioannis and Cheung, Alvin and Zettlemoyer, Luke},
4646
booktitle={54th Annual Meeting of the Association for Computational Linguistics 2016},
4747
pages={2073--2083},
@@ -51,15 +51,15 @@ @inproceedings{iyer2016summarizing
5151
}
5252

5353
@article{ling2016latent,
54-
title={Latent predictor networks for code generation},
54+
title={Latent Predictor Networks for Code Generation},
5555
author={Ling, Wang and Grefenstette, Edward and Hermann, Karl Moritz and Ko{\v{c}}isk{\`y}, Tom{\'a}{\v{s}} and Senior, Andrew and Wang, Fumin and Blunsom, Phil},
5656
journal={arXiv preprint arXiv:1603.06744},
5757
year={2016},
5858
doi={10.48550/arXiv.1603.06744}
5959
}
6060

6161
@article{yin2017syntactic,
62-
title={A syntactic neural model for general-purpose code generation},
62+
title={A Syntactic Neural Model for General-purpose Code Generation},
6363
author={Yin, Pengcheng and Neubig, Graham},
6464
journal={arXiv preprint arXiv:1704.01696},
6565
year={2017},
@@ -77,15 +77,15 @@ @inproceedings{allamanis2013mining
7777
}
7878

7979
@article{bhoopchand2016learning,
80-
title={Learning python code suggestion with a sparse pointer network},
80+
title={Learning Python Code Suggestion with a Sparse Pointer Network},
8181
author={Bhoopchand, Avishkar and Rockt{\"a}schel, Tim and Barr, Earl and Riedel, Sebastian},
8282
journal={arXiv preprint arXiv:1611.08307},
8383
year={2016},
8484
doi={10.48550/arXiv.1611.08307}
8585
}
8686

8787
@inproceedings{oda2015learning,
88-
title={Learning to generate pseudo-code from source code using statistical machine translation},
88+
title={Learning to Generate Pseudo-Code from Source Code using Statistical Machine Translation},
8989
author={Oda, Yusuke and Fudaba, Hiroyuki and Neubig, Graham and Hata, Hideaki and Sakti, Sakriani and Toda, Tomoki and Nakamura, Satoshi},
9090
booktitle={2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)},
9191
pages={574--584},
@@ -104,7 +104,7 @@ @inproceedings{quirk2015language
104104
}
105105

106106
@article{radford2019language,
107-
title={Language models are unsupervised multitask learners},
107+
title={Language {M}odels are {U}nsupervised {M}ultitask {L}earners},
108108
author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},
109109
journal={OpenAI blog},
110110
volume={1},
@@ -114,7 +114,7 @@ @article{radford2019language
114114
}
115115

116116
@article{brown2020language,
117-
title={Language models are few-shot learners},
117+
title={Language {M}odels are {F}ew-{S}hot {L}earners},
118118
author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
119119
journal={Advances in neural information processing systems},
120120
volume={33},
@@ -134,7 +134,7 @@ @article{ouyang2022training
134134
}
135135

136136
@article{vaswani2017attention,
137-
title={Attention is all you need},
137+
title={Attention is {A}ll {Y}ou {N}eed},
138138
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
139139
journal={Advances in neural information processing systems},
140140
volume={30},
@@ -143,7 +143,7 @@ @article{vaswani2017attention
143143
}
144144

145145
@article{lester2021power,
146-
title={The power of scale for parameter-efficient prompt tuning},
146+
title={The Power of Scale for Parameter-Efficient Prompt Tuning},
147147
author={Lester, Brian and Al-Rfou, Rami and Constant, Noah},
148148
journal={arXiv preprint arXiv:2104.08691},
149149
year={2021},
@@ -255,15 +255,15 @@ @misc{sentence-transformers-all-mpnet-base-v2
255255
}
256256

257257
@article{barone2017parallel,
258-
title={A parallel corpus of python functions and documentation strings for automated code documentation and code generation},
258+
title={A Parallel Corpus of Python Functions and Documentation Strings for Automated Code Documentation and Code Generation},
259259
author={Barone, Antonio Valerio Miceli and Sennrich, Rico},
260260
journal={arXiv preprint arXiv:1707.02275},
261261
year={2017},
262262
doi={10.48550/arXiv.1707.02275}
263263
}
264264

265265
@article{malkov2018efficient,
266-
title={Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs},
266+
title={Efficient and robust approximate nearest neighbor search using {H}ierarchical {N}avigable {S}mall {W}orld graphs},
267267
author={Malkov, Yu A and Yashunin, Dmitry A},
268268
journal={IEEE transactions on pattern analysis and machine intelligence},
269269
volume={42},
@@ -275,21 +275,21 @@ @article{malkov2018efficient
275275
}
276276

277277
@article{dettmers2023qlora,
278-
title={QLoRA: Efficient Finetuning of Quantized LLMs},
278+
title={QLo{RA}: {E}fficient {F}inetuning of {Q}uantized LLMs},
279279
author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
280280
journal={arXiv preprint arXiv:2305.14314},
281281
year={2023},
282282
doi={10.48550/arXiv.2305.14314}
283283
}
284284

285285
@inproceedings{
286-
hu2022lora,
287-
title={Lo{RA}: Low-Rank Adaptation of Large Language Models},
288-
author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
289-
booktitle={International Conference on Learning Representations},
290-
year={2022},
291-
url={https://openreview.net/forum?id=nZeVKeeFYf9},
292-
doi={10.48550/arXiv.2106.09685}
286+
hu2022lora,
287+
title={Lo{RA}: Low-Rank Adaptation of Large Language Models},
288+
author={Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen},
289+
booktitle={International Conference on Learning Representations},
290+
year={2022},
291+
url={https://openreview.net/forum?id=nZeVKeeFYf9},
292+
doi={10.48550/arXiv.2106.09685}
293293
}
294294

295295
@article{zhang2023dynamically,
@@ -311,7 +311,7 @@ @article{makarychev2024single
311311
}
312312

313313
@article{datta2024consistency,
314-
title={On the consistency of maximum likelihood estimation of probabilistic principal component analysis},
314+
title={On the {C}onsistency of {M}aximum {L}ikelihood {E}stimation of {P}robabilistic {P}rincipal {C}omponent {A}nalysis},
315315
author={Datta, Arghya and Chakrabarty, Sayak},
316316
journal={Advances in Neural Information Processing Systems},
317317
volume={36},

0 commit comments

Comments
 (0)