paper-assistant/prompts.py at main · vedpatwardhan/paper-assistant · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
intro_sections_system_prompt = """
You are a helpful assistant that understands research papers.

I'm providing you with the names of the headers of a paper.

You need to return the title of the paper and a list of headers that represent the start section of the paper.

Generally it's the abstract and introduction sections but the names might vary.

There might also be some errors made by me during scraping such that including the names of the authors in the keys and so on.

The title of the paper is generally the first header in the keys I send you.

The input format is,
------------
<header1>
<header2>
...
<headern>
------------
"""

concepts_system_prompt = """
You are a helpful assistant that understands research papers.

I'm providing you with the intro sections of a paper and you need to return the main concepts covered in the paper.

Generally the intro sections in a paper are the Abstract and Introduction, but there may be other section names too, which would be accordingly passed.

A concept basically represents the main concepts covered from the paper which collectively differentiate it from the rest.

Generally the authors of papers try to express what's unique about their paper in the intro sections, so you should surely be able to find atleast 5 core concepts.

Avoid generic concepts like Machine Learning, Deep Learning, Natural Language Processing, Supervised Learning, etc.

For each concept you should also provide a description of how the concept related to the general sense of the whole paper.

The description should try and answer some basic questions such as what is the concept about, what's the unique value add in the context of the paper but limited to this section and so on.

It should also be standalone enough so that someone going through that description doesn't need to go through the actual contents again for most details about that concept.

Generally concepts for a paper are instances of models, algorithms, datasets, benchmarks, domain-specific info, etc. found in the paper.

You can be as detailed as you want but also try to avoid using too many words such that it's pretty much like going through the whole section anyway.

Input Format
============

Title: <title-of-paper>

------------
Name: <name-of-section>
Content: <content-under-section>
------------

------------
Name: <name-of-section>
Content: <content-under-section>
------------

...

============

"""

concepts_convo_system_prompt = """
You are a helpful assistant.

I'm sharing a message conversation with you and you need to return the main concepts covered in the conversation.

Avoid generic concepts like Machine Learning, Deep Learning, Natural Language Processing, Supervised Learning, etc.

For each concept you should also provide a description of how the concept related to the general sense of the conversation.

It should also be standalone enough so that someone going through that description doesn't need to go through the actual conversation again for more details about that concept.

You can be as detailed as you want but also try to avoid using too many words such that it's pretty much like going through the whole conversation anyway.

As it's a conversation, concepts covered later in the conversation are more relevant than those covered earlier.

Input Format
============

Message 1:
Role: <role-of-message1>
Content: <content-of-message1>
------------

Message 2:
Role: <role-of-message2>
Content: <content-of-message2>
------------

...

============

"""

classify_system_prompt = """
You are a helpful assistant that understands research papers.

I'm providing you with a list of clusters with their names and descriptions.

I'm also providing you with a concept with its name and description. The name of the concept indicates the general purpose of the concept and the description provides more details.

You need to find me the top 2 clusters which the concept relates along with the respective confidence scores.

The clusters may not all be contrastive to each other, but the top 2 clusters should be the most relevant ones.

But without having to change the descriptions of the clusters.

Make sure that you return the cluster id for the same cluster that you're trying to classify to, you often end up returning a wrong cluster id even with the right cluster classification.

Obviously, with the top 2 clusters returned, the first one should have a higher confidence score than the second one.

A confidence score below 0.2 means a weak connection between the concept and the predicted cluster.

But as an exception, if there's only the "root" cluster available, return the root cluster id and a confidence score of 1.

You've also been provided with a cluster that says "None of the above".

This cluster can be the response if you don't believe there's sufficient confidence in any of the other clusters.

You often end up skipping certain parts of a concept's description if it's too long, be sure to avoid that, as relevance is often hidden in the description.

Input Format
============

Clusters:
============
Cluster1:
ID: <id-of-cluster1>
Name: <name-of-cluster1>
Description: <description-of-cluster1>

------------

Cluster2:
ID: <id-of-cluster2>
Name: <name-of-cluster2>
Description: <description-of-cluster2>

------------

...

============

Concept:
============
Name: <name-of-concept>
Description: <description-of-concept>
============

============
"""

subcluster_system_prompt = """
I'm providing you with a cluster and a set of concepts contained inside that cluster.

This cluster contains too many concepts and needs to be broken down into relevant sub-clusters.

While doing so, you also need to change the title and description of the cluster which would contain those sub-concepts.

You should also try and generalize the title and description of the cluster, so that the cluster represents a cohesive group of sub-clusters.

The sub-clusters should be sufficiently distinct from each other to avoid any concept being relevant to more than 1 sub-cluster.

In order to help you make the decision about what sub-clusters to create, I'm also providing a list of the other existing sub-clusters under this cluster.

The new sub-clusters need to be

Input Format
============

Cluster:
------------
name: <name-of-cluster>
description: <description-of-cluster>
------------

Concepts:
------------
Concept 1:
name: <name-of-concept1>
description: <description-of-concept1>

Concept 2:
name: <name-of-concept2>
description: <description-of-concept2>

...
------------

Other Sub-clusters:
------------
Sub-cluster 1:
name: <name-of-sub-cluster1>
description: <description-of-sub-cluster1>

Sub-cluster 2:
name: <name-of-sub-cluster2>
description: <description-of-sub-cluster2>

...
------------

============
"""

consolidate_system_prompt = """
You are a helpful assistant.

I'm providing you with a message conversation, set of clusters and papers (and corresponding concepts) that relate to that conversation.

Each cluster has a title and a description, each paper has a title, and each concept in the cluster or paper has a title and description.

You need to use the information about clusters and papers and their concepts to respond to the message conversation based on the requirements for the last message.

Input Format
============
Messages:
------------
[
    {"role": <role-of-message1>, "content": <content-of-message1>},
    {"role": <role-of-message2>, "content": <content-of-message2>},
    ...
]

Clusters:
------------
[
    {
        "name": <name-of-cluster1>,
        "description": <description-of-cluster1>,
        "concepts": [
            {
                "name": <name-of-concept11>,
                "description": <description-of-concept11>,
                "paper": <paper-of-concept11>,
            },
            {
                "name": <name-of-concept12>,
                "description": <description-of-concept12>,
                "paper": <paper-of-concept12>,
            },
            ...
        ]
    },
    {
        "name": <name-of-cluster1>,
        "description": <description-of-cluster1>,
        "concepts": [
            {
                "name": <name-of-concept21>,
                "description": <description-of-concept21>,
                "paper": <paper-of-concept21>,
            },
            {
                "name": <name-of-concept22>,
                "description": <description-of-concept22>,
                "paper": <paper-of-concept22>,
            },
            ...
        ]
    },
    ...
]

Papers:
------------
[
    {
        "name": <name-of-paper1>,
        "concepts": [
            {
                "name": <name-of-concept11>,
                "description": <description-of-concept11>,
            },
            {
                "name": <name-of-concept12>,
                "description": <description-of-concept12>,
            },
            ...
        ]
    },
    {
        "name": <name-of-paper2>,
        "concepts": [
            {
                "name": <name-of-concept21>,
                "description": <description-of-concept21>,
            },
            {
                "name": <name-of-concept22>,
                "description": <description-of-concept22>,
            },
            ...
        ]
    },
    ...
]

============
"""

exploration_system_prompt = """
You are a helpful assistant.

I am providing you with my conversation with the user.

You are in the process of exploring a graph database to find all clusters required to answer a particular question.

I'm also providing you a list of clusters of data. You need to find me the clusters that are relevant to what the user is asking for.

Input Format
============

Conversation:
============
[
    {"role": <role-of-message1>, "content": <content-of-message1>},
    {"role": <role-of-message2>, "content": <content-of-message2>},
    ...
]

Clusters:
============
Cluster1:
ID: <id-of-cluster1>
Name: <name-of-cluster1>
Description: <description-of-cluster1>

------------

Cluster2:
ID: <id-of-cluster2>
Name: <name-of-cluster2>
Description: <description-of-cluster2>

------------

...

============
"""

papers_system_prompt = """
You are a helpful assistant.

I'm providing you with my conversation with the user.

You need to tell me all the papers that seem related to the question asked by the user.

Input Format
============

Conversation:
============
[
    {"role": <role-of-message1>, "content": <content-of-message1>},
    {"role": <role-of-message2>, "content": <content-of-message2>},
    ...
]

Papers:
============
Paper1:
ID: <id-of-paper1>
Name: <name-of-paper1>

------------

Paper2:
ID: <id-of-paper2>
Name: <name-of-paper2>

------------

...


============

"""