You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run the sample script in the `examples/example.py` to see a typical code usage. See example on Google Colab: <asrc="https://colab.research.google.com/assets/colab-badge.svg"href="https://colab.research.google.com/github.com/souradipp76/ReadMeReady/blob/main/examples/example.ipynb"target="_blank"rel="noopener noreferrer"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg"alt="Open in Colab"></a>
128
+
Run the sample script in the `examples/example.py` to see a typical code usage. See example on Google Colab: <asrc="https://colab.research.google.com/assets/colab-badge.svg"href="https://colab.research.google.com/github/souradipp76/ReadMeReady/blob/main/examples/example.ipynb"target="_blank"rel="noopener noreferrer"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg"alt="Open in Colab"></a>
128
129
129
130
See detailed API references [here](https://souradipp76.github.io/ReadMeReady/reference/).
130
131
@@ -135,6 +136,40 @@ For finetuning on custom datasets, follow the instructions below.
135
136
- Run the notebook file `scripts/data.ipynb` and follow the instructions in the file to generate custom dataset from open-source repositories.
136
137
- Run the notebook file `scripts/fine-tuning-with-llama2-qlora.ipynb` and follow the instructions in the file to finetune custom LLMs.
137
138
139
+
The results are reported in Table 1 and Table 2, under the "With FT" or "With Finetuning" columns where the contents are compared with each repository's original README file. It is observed that BLEU scores range from 15 to 30, averaging 20, indicating that the generated text is understandable but requires substantial editing to be acceptable. Conversely, BERT scores reveal a high semantic similarity to the original README content, with an average F1 score of ~85%.
140
+
141
+
### Table 1: BLEU Scores
142
+
143
+
| Repository | W/O FT | With FT |
144
+
|------------|--------|---------|
145
+
| allennlp | 32.09 | 16.38 |
146
+
| autojump | 25.29 | 18.73 |
147
+
| numpy-ml | 16.61 | 19.02 |
148
+
| Spleeter | 18.33 | 19.47 |
149
+
| TouchPose | 17.04 | 8.05 |
150
+
151
+
### Table 2: BERT Scores
152
+
153
+
| Repository | P (W/O FT) | R (W/O FT) | F1 (W/O FT) | P (With FT) | R (With FT) | F1 (With FT) |
Run the script `scripts/run_validate.sh` to generate BLEU and BERT scores for 5 sample repositories comparing the actual README file with the generated ones. Note that to reproduce the scores, a GPU with 16GB or more is required.
165
+
166
+
```bash
167
+
$ chmod +x scripts/run_validate.sh
168
+
$ scripts/run_validate.sh
169
+
```
170
+
171
+
Alternatively, run the notebook `scripts/validate.ipynb` on Google Colab: <asrc="https://colab.research.google.com/assets/colab-badge.svg"href="https://colab.research.google.com/github/souradipp76/ReadMeReady/blob/main/scripts/validate.ipynb"target="_blank"rel="noopener noreferrer"><imgsrc="https://colab.research.google.com/assets/colab-badge.svg"alt="Open in Colab"></a>
title={Latent predictor networks for code generation},
54
+
title={Latent Predictor Networks for Code Generation},
55
55
author={Ling, Wang and Grefenstette, Edward and Hermann, Karl Moritz and Ko{\v{c}}isk{\`y}, Tom{\'a}{\v{s}} and Senior, Andrew and Wang, Fumin and Blunsom, Phil},
56
56
journal={arXiv preprint arXiv:1603.06744},
57
57
year={2016},
58
58
doi={10.48550/arXiv.1603.06744}
59
59
}
60
60
61
61
@article{yin2017syntactic,
62
-
title={A syntactic neural model for general-purpose code generation},
62
+
title={A Syntactic Neural Model for General-purpose Code Generation},
title={Language models are unsupervised multitask learners},
107
+
title={Language {M}odels are {U}nsupervised {M}ultitask {L}earners},
108
108
author={Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya and others},
109
109
journal={OpenAI blog},
110
110
volume={1},
@@ -114,7 +114,7 @@ @article{radford2019language
114
114
}
115
115
116
116
@article{brown2020language,
117
-
title={Language models are few-shot learners},
117
+
title={Language {M}odels are {F}ew-{S}hot {L}earners},
118
118
author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
119
119
journal={Advances in neural information processing systems},
120
120
volume={33},
@@ -134,7 +134,7 @@ @article{ouyang2022training
134
134
}
135
135
136
136
@article{vaswani2017attention,
137
-
title={Attention is all you need},
137
+
title={Attention is {A}ll {Y}ou {N}eed},
138
138
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
139
139
journal={Advances in neural information processing systems},
140
140
volume={30},
@@ -143,7 +143,7 @@ @article{vaswani2017attention
143
143
}
144
144
145
145
@article{lester2021power,
146
-
title={The power of scale for parameter-efficient prompt tuning},
146
+
title={The Power of Scale for Parameter-Efficient Prompt Tuning},
147
147
author={Lester, Brian and Al-Rfou, Rami and Constant, Noah},
0 commit comments