You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This function transforms the MSMARCO triples data available at `triples <https://msmarco.blob.core.windows.net/msmarcoranking/triples.train.small.tar.gz>`_
345
+
346
+
The data contains triplets where the first entry is the query, second one is the context passage from which the query can be
347
+
answered (positive passage) , while the third entry is a context passage from which the query cannot be answered (negative passage).
348
+
Data is transformed into sentence pair classification format, with query-positive context pair labeled as 1 (answerable)
349
+
and query-negative context pair labeled as 0 (non-answerable)
350
+
351
+
Following transformed files are written at wrtDir
352
+
353
+
- Sentence pair transformed downsampled file.
354
+
- Sentence pair transformed train tsv file for answerability task
355
+
- Sentence pair transformed dev tsv file for answerability task
356
+
- Sentence pair transformed test tsv file for answerability task
357
+
358
+
For using this transform function, set ``transform_func`` : **msmarco_answerability_detection_to_tsv** in transform file.
359
+
360
+
Args:
361
+
dataDir (:obj:`str`) : Path to the directory where the raw data files to be read are present..
362
+
readFile (:obj:`str`) : This is the file which is currently being read and transformed by the function.
363
+
wrtDir (:obj:`str`) : Path to the directory where to save the transformed tsv files.
364
+
transParamDict (:obj:`dict`, defaults to :obj:`None`): Dictionary of function specific parameters. Not required for this transformation function.
365
+
366
+
- ``data_frac`` (defaults to 0.01) : Fraction of data to keep in downsampling as the original data size is too large.
0 commit comments