Read Tabix-indexed files using either .tbi or .csi indexes.
$ npm install @gmod/tabix
import { TabixIndexedFile } from '@gmod/tabix'You can use tabix-js without NPM also with the tabix-bundle.js. See the example directory for usage with script tag example/index.html
<script src="https://unpkg.com/@gmod/tabix/dist/tabix-bundle.js"></script>Basic usage of TabixIndexedFile under node.js supplies a path and optionally a tbiPath to the constructor. If no tbiPath is supplied, it assumes that the path+'.tbi' is the location of the tbiPath.
const tbiIndexed = new TabixIndexedFile({
path: 'path/to/my/file.gz',
tbiPath: 'path/to/my/file.gz.tbi',
})You can also use CSI indexes:
const csiIndexed = new TabixIndexedFile({
path: 'path/to/my/file.gz',
csiPath: 'path/to/my/file.gz.csi',
})const remoteTbiIndexed = new TabixIndexedFile({
url: 'http://yourhost/file.vcf.gz',
tbiUrl: 'http://yourhost/file.vcf.gz.tbi', // can also be csiUrl
})You can also supply a filehandle-like object from generic-filehandle2:
import { RemoteFile } from 'generic-filehandle2'
const remoteTbiIndexed = new TabixIndexedFile({
filehandle: new RemoteFile('http://yourhost/file.vcf.gz'),
tbiFilehandle: new RemoteFile('http://yourhost/file.vcf.gz.tbi'), // can also be csiFilehandle
})The basic function this module provides is just called getLines and it returns
text contents from the tabix file (it unzips the bgzipped data) and supplies it
to a callback that you provide one line at a time.
Important: the start and end values that are supplied to getLines are
0-based half-open coordinates. This is different from the 1-based values that
are supplied to the tabix command line tool
const lines = []
await tbiIndexed.getLines(
'ctgA',
200,
300,
function (line, fileOffset, start, end) {
lines.push(line)
},
)After running this, lines contains the matching lines from the file. The
callback receives:
line— the raw line stringfileOffset— virtual file offset, useful as a unique line identifierstart/end— the parsed coordinates of that line (0-based half-open)
You can also pass an options object instead of a bare callback:
const lines = []
const aborter = new AbortController()
await tbiIndexed.getLines('ctgA', 200, 300, {
lineCallback: (line, fileOffset, start, end) => lines.push(line),
signal: aborter.signal, // an optional AbortSignal from an AbortController
})Notes about the returned values of getLines:
- commented (meta) lines are skipped.
- line strings do not include any trailing whitespace characters.
- if
getLinesis called with an undefinedendparameter it gets all lines from start going to the end of the contig e.g.
const lines = []
await tbiIndexed.getLines('ctgA', 0, undefined, line => lines.push(line))
console.log(lines)- constructor
- getLines
- getHeaderBuffer
- getHeader
- getReferenceSequenceNames
- checkLine
- lineCount
- readChunk
argsobjectargs.pathstring?args.filehandlefilehandle?args.urlstring?args.tbiPathstring?args.tbiUrltbiUrl?args.tbiFilehandlefilehandle?args.csiPathstring?args.csiUrlcsiUrl?args.csiFilehandlefilehandle?args.yieldTimenumber? yield to main thread after N milliseconds if reading features is taking a long time to avoid hanging main thread (optional, default500)args.renameRefSeqsfunction? optional function with sigstring => stringto transform reference sequence names for the purpose of indexing and querying. note that the data that is returned is not altered, just the names of the reference sequences that are used for querying. (optional, defaultn=>n)args.chunkCacheSize(optional, default5*2**20)
refNamestring name of the reference sequencestart(number | undefined) start of the region (0-based half-open)end(number | undefined) end of the region (0-based half-open)opts(GetLinesOpts | GetLinesCallback) callback invoked for each line, or an options object withlineCallbackand optionalsignal
Returns any promise that is resolved when the whole read is finished, rejected on error
get a buffer containing the "header" region of the file, which are the bytes up to the first non-meta line
optsOptions (optional, default{})
get a string containing the "header" region of the file, is the portion up to the first non-meta line
optsOptions (optional, default{})
Returns Promise for a string
get an array of reference sequence names, in the order in which they occur in the file. reference sequence renaming is not applied to these names.
optsOptions (optional, default{})
metadataobject metadata object from the parsed index, containing columnNumbers, metaChar, and formatregionRefNamestringregionStartnumber region start coordinate (0-based-half-open)regionEndnumber region end coordinate (0-based-half-open)linestring
Returns
object
like {startCoordinate, overlaps}. overlaps is boolean, true if line is a data
line that overlaps the given region
return the number of data lines in the given reference sequence
refNamestring reference sequence nameoptsOptions (optional, default{})
Returns any number of data lines present on that reference sequence
read and uncompress the data in a chunk (composed of one or more contiguous bgzip blocks) of the file
cChunkoptsOptions (optional, default{})
Trusted publishing via GitHub Actions.
npm version patch # or minor/majorThis package was written with funding from the NHGRI as part of the JBrowse project. If you use it in an academic project that you publish, please cite the most recent JBrowse paper, which will be linked from jbrowse.org.
MIT © Robert Buels