Skip to content

Commit 7e5929f

Browse files
committed
update documentation
installation, MacOS notes, and some FAQ items
1 parent d76764a commit 7e5929f

File tree

3 files changed

+205
-61
lines changed

3 files changed

+205
-61
lines changed

docs/AdvancedInstallation.md

Lines changed: 101 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ accordingly for their system. If that doesn't describe you, skip ahead to the
2020
information.
2121

2222
Install [HTSlib](https://github.com/samtools/htslib) for Bam file support. You may
23-
already have it; later versions may (usually) work just as well, but version 1.9
24-
is officially recommended. It defaults to installing in `/usr/local` but doesn't
25-
have to be; see below.
23+
already have it; version 1.9 is officially recommended, but later versions appear
24+
to work just fine and are preferred, as it includes Cram support. It defaults to
25+
installing in `/usr/local`.
2626

2727
curl -o htslib-1.19.tar.bz2 -L https://github.com/samtools/htslib/releases/download/1.19/htslib-1.19.tar.bz2
2828
tar -xf htslib-1.19.tar.bz2
@@ -118,10 +118,7 @@ switch between releases with a simple command. It also manages multiple `local::
118118
installations, in case you want to isolate packages.
119119

120120
BioToolBox does not utilize threading (it uses forks for parallel execution), so if you
121-
have a choice, compile a non-threaded Perl for a (very) slight performance gain. For
122-
those adventurous to try, BioToolBox does work under [cperl](https://github.com/perl11/cperl),
123-
although installing some prerequisite modules is a trying experience (many failed
124-
tests and partial functionality).
121+
have a choice, compile a non-threaded Perl for a slight performance gain.
125122

126123
### System installation
127124

@@ -283,6 +280,10 @@ An example for downloading on Linux:
283280
do curl -o $HOME/bin/$name http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/$name \
284281
&& chmod +x $HOME/bin/$name; done;
285282

283+
**NOTE** Current versions of these utilities do not support directly piping data into
284+
the utility using the `stdin` file name. You will need to either find an older binary
285+
version, compile your own from older source code (see below), or update BioToolBox;
286+
version 2.02 now supports a work-around.
286287

287288
## Legacy Perl modules
288289

@@ -314,14 +315,16 @@ multi-user installations).
314315
The Bio::DB::Sam library _only_ works with the legacy Samtools version, which
315316
included both the C libraries, headers, and executables; use version
316317
[0.1.19](https://github.com/samtools/samtools/archive/0.1.19.tar.gz) for best
317-
results. You will need to compile the Samtools code, but you do not have to
318-
install it (the library is not linked). Before compiling, edit the Makefile to
319-
include the cflags `-fPIC` and (most likely) `-m64` for 64 bit OS. Export the
320-
`SAMTOOLS` environment variable to the path of the Samtools build directory, and
321-
then you can proceed to build the Perl module; it should find the necessary
322-
files using the `SAMTOOLS` environment variable. You may obtain the latest
323-
source from
324-
[here](https://github.com/GMOD/GBrowse-Adaptors/tree/master/Bio-SamTools).
318+
results. You will need to compile the Samtools code, but you do not have to install
319+
it (the library is not linked). Before compiling, edit the Makefile to include the
320+
cflags `-fPIC` and (most likely) `-m64` for 64 bit OS. Export the `SAMTOOLS`
321+
environment variable to the path of the Samtools build directory, and then you can
322+
proceed to build the Perl module; it should find the necessary files using the
323+
`SAMTOOLS` environment variable. You may obtain the latest source from
324+
[Github](https://github.com/GMOD/GBrowse-Adaptors/tree/master/Bio-SamTools) or by
325+
downloading a [tarball](https://github.com/GMOD/GBrowse-Adaptors/tarball/master).
326+
**Note** that this project and file contains multiple Perl adapters and cannot be
327+
used directly with `cpanm`, for example.
325328

326329
### UCSC BigFile library
327330

@@ -330,40 +333,91 @@ The Bio::DB::BigWig and Bio::DB::BigBed modules are part of the same distributio
330333
use the code from the GitHub repository, as it should be compatible with recent UCSC
331334
libraries, whereas the distribution on CPAN is out of date.
332335

336+
**NOTE** The UCSC library, when it encounters an error, will immediately terminate
337+
the Perl process, with no chance of trapping the error. The newer `libBigWig` C
338+
library used with Bio::DB::Big (detailed above) does not exhibit this behavior, plus
339+
it's considerably easier to install. Encountering errors rarely happens, however,
340+
because all bioinformatic data is always perfectly formatted and well behaved, right?
341+
333342
You will need the UCSC source code; the
334343
[userApps](http://hgdownload.soe.ucsc.edu/admin/exe/) source code is sufficient, rather
335-
than the entire browser code. Version 375, at the time of this writing, works. This
336-
requires at least `OpenSSL` and `libpng` libraries to compile the required library; on
337-
MacOS, these need to be installed independently (see [Homebrew](https://brew.sh) for
338-
example). There are other requirements, such as MySQL client libraries, that are needed if
339-
you want to compile the actual command line utilities, if so desired.
340-
341-
For purposes here, only the library needs to be compiled. It does not need to be
342-
installed, as nothing is linked. Therefore, you can safely ignore the main `Makefile`
343-
commands. Below are the steps for compiling just the requisite C library for installing
344-
the Perl module.
345-
346-
Edit the file `kent/src/inc/common.mk`, and insert `-fPIC` into the `CFLAGS` variable. If
347-
you have installed any libraries in non-standard locations, e.g. `openssl` installed via
348-
HomeBrew on MacOS, then add these paths to the `HG_INC` variable. Save the file.
349-
350-
To simplify compilation, you can skip the main Makefile and simply compile only the
351-
libraries that you need. First, export the `MACHTYPE` environment variable to an
352-
acceptable simple value, usually `x86_64`.
353-
354-
Next, move to the included `kent/src/htslib` directory, and compile this library by
355-
issuing the `make` command.
356-
357-
Move to the `kent/src/lib` directory, and compile the library by issuing the `make`
358-
command. If it compiles successfully, you should get a `jkweb.a` file in the `lib/x86_64`
359-
directory.
360-
361-
Finally, you can return to the Perl module. First, set the `KENT_SRC` environment
362-
variable to the full path of the `kent/src` build directory (otherwise you will need to
363-
interactively provide the Perl module Build script this path). Then issue the standard
364-
`Build.PL` commands to build, test, and install the Perl modules.
365-
366-
344+
than the entire browser code. Versions 375 and 398, at the time of this writing, works
345+
successfully, but more recent versions appear to have increasing problems with
346+
successful compilation – YMMV.
347+
348+
**NOTE** If you are compiling the command line utilities, such as `wigToBigWig`, be
349+
aware that in version 439 and later, these utilities no longer accept `stdin` as a
350+
file input. The Bio::ToolBox::big_helper module uses this feature for convenience in
351+
applications such [bam2wig](apps/bam2wig.md). You can compile your own following
352+
these steps, but you do not need to install Bio::DB::BigWig.
353+
354+
This requires at least `OpenSSL` and `libpng` libraries to compile the required
355+
library. For the command line utilities, if desired, you will also need MySQL
356+
libraries; MariaDB, for now, seems adequate as far as I can tell.
357+
358+
On Linux, this is mostly not a problem as these libraries and development files are
359+
readily available through the package manager. If you're building on macOS, see the
360+
notes in the [macOS notes page](MacOSNotes.md).
361+
362+
For purposes of installing the Perl adapter, only the library needs to be compiled.
363+
It does not need to be installed, as nothing is linked. So, you do not need to run
364+
the full `make` command. In the `userApps` folder, run
365+
366+
cd path/to/userApps
367+
make installEnvironment
368+
369+
This will generate `kent/src/inc/localEnvironment.mk` for your local machine. Edit this
370+
file to add at the end
371+
372+
CFLAGS = -fPIC
373+
374+
If you have odd or non-standard locations for some libraries, for example in a
375+
computing cluster where the development files are brought in using environment
376+
[modules](https://modules.readthedocs.io/), you may be able to set additional paths
377+
in this `localEnvironment.mk` file or by directly hacking `kent/src/inc/common.mk`.
378+
**NOTE** Be careful about setting a generic path to the libraries, particularly if
379+
you also have `htslib` installed, since the UCSC userApp provides its own (presumably
380+
modified) `htslib` library, which will conflict with a system available library.
381+
382+
To proceed with library compilation, follow the following steps. Note exporting
383+
environment variables which aid in building the Perl adapter.
384+
385+
cd path/to/kent/src
386+
export KENT_SRC=$(PWD)
387+
cd htslib
388+
make
389+
cd ../lib
390+
make
391+
cd
392+
393+
To install the Perl adapter, download the tarball from Github (same source as
394+
Bio::DB::Sam above) and follow the steps below.
395+
396+
curl -o GBrowse-Adaptors.tar.gz -L https://github.com/GMOD/GBrowse-Adaptors/tarball/master
397+
tar -xf GBrowse-Adaptors.tar.gz
398+
cd GBrowse-Adaptors-master-85c29de/Bio-BigFile
399+
export MACHTYPE=local
400+
perl Build.PL
401+
./Build
402+
./Build test
403+
./Build install
404+
405+
If the environment variables have been set correctly and the library compiled
406+
successfully, then the Perl build process should proceed smoothly. There may be
407+
various warnings emitted during the build process, which can usually be ignored.
408+
409+
Once you have compiled the main `kent/src/lib` library, you can proceed with
410+
compiling the command line utilities, if you desire. You can build just the ones you
411+
want by going into each utility subdirectory within `kent/src/utils/` and issuing
412+
`make`. For example:
413+
414+
cd /path/to/userApps
415+
mkdir bin
416+
cd kent/src/utils/wigToBigWig
417+
make
418+
419+
The compiled executable should be copied into `userApps/bin`, and you can move it from
420+
there to wherever.
367421

368422

369423

docs/FAQ.md

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,35 @@
77
These may or may not have actually been asked, but it's a collection of hints that the
88
programmer understands but a casual user might not, as well as rationale.
99

10+
- Certain UCSC utilities no longer support `stdin`.
11+
12+
Several of the UCSC command line utilities for big files (bigWig and bigBed in
13+
particular) used to support a barely documented feature of using `stdin` or
14+
`stdout` as filenames, but more recent versions do not. This feature was used
15+
extensively by
16+
[Bio::ToolBox::big_helper](https://metacpan.org/pod/Bio::ToolBox::big_helper)
17+
when reading and/or writing big files as a way of offloading computational
18+
processing, avoiding intermediate disk writes, and for the very practical reason
19+
that the Perl adapters did not support direct writing of files. Version 2.02 of
20+
Bio::ToolBox (finally) now checks the version of these utilities, and adjusts
21+
behavior as necessary to either writing directly to the utility or intermediate
22+
text files and then calling the utility.
23+
24+
If you find you would like to use the older utilities that support `stdin`, at
25+
least for `wigToBigWig` (`bedToBigBed` hasn't supported this for a very, very
26+
long time), then you will need to compile your own. The change occurs in
27+
`userApps.v439` release (October 2022), so you will need an earlier version. See
28+
the section on UCSC library in the [Advanced Install](AdvancedInstallation.md)
29+
document for hints on compiling the utilities (you don't need to install the
30+
library).
31+
32+
- Do you support `CSV` files?
33+
34+
CSV files appear perfectly benign, but are in fact a can of worms: mandatory or
35+
optional quoting, empty or undefined values, spaces, character escaping, text
36+
encoding, and so on. This mostly affects reading files. Most (all?) bioinformatic
37+
text formats are tab-delimited, so CSV support is intentionally absent.
38+
1039
- Programs don't recognize a UCSC gene table (refFlat, knownGene, genePred, etc)
1140

1241
UCSC doesn't have official file extensions, and their downloads page just
@@ -128,11 +157,6 @@ programmer understands but a casual user might not, as well as rationale.
128157
(BED and refFlat are fastest), or import them into an annotation SQLite database
129158
file (see above).
130159

131-
There is an (unofficial) [effort](http://perl11.org) to make Perl faster by tweaking
132-
the internals. BioToolBox will install under [cperl](http://perl11.org/cperl/),
133-
although getting the prerequisites installed is not a trivial task. The speed gain
134-
is modest.
135-
136160
- Why do you fork instead of using threads?
137161

138162
It's easier? Forking a child process is less complicated, as memory is automatically

docs/MacOSNotes.md

Lines changed: 75 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,25 @@ need administrator permissions.
1717

1818
$ xcode-select --install
1919

20+
## Install Homebrew
21+
22+
If you're not already familiar with [Homebrew](https://brew.sh), you should be. This
23+
is a package manager for installing an ever-increasing number of open source
24+
utilities and packages, notably packages available on Linux but missing from
25+
(default) macOS. Importantly, it will install these with minimal effort on your part,
26+
as opposed to the old school method of downloading source code, configuring,
27+
compiling, and repeating again with required dependencies, all while debugging
28+
errors. It's the modern equivalent of MacPorts or Finks, for those old enough to
29+
remember those projects.
30+
31+
Follow the instructions on the [Homebrew](https://brew.sh) website to install. On
32+
older Intel Macs, this will install under `/usr/local`, which makes integrating with
33+
other software relatively painless. However, under modern Apple Silicon Macs, it
34+
installs under `/opt/homebrew`, which can sometimes make finding libraries and
35+
development headers, especially when building older software projects, a bit more
36+
challenging.
37+
38+
2039
## Installing your own Perl
2140

2241
Apple is generally a little slow in updating their Perl compared to the latest available
@@ -63,6 +82,12 @@ for the source of the solution.
6382

6483
$ install_name_tool -change libBigWig.so /path/to/lib/libBigWig.so blib/arch/auto/Bio/DB/Big/Big.bundle
6584

85+
There are limitations to the length of the new path to be linked. For example, if you
86+
attempting to link to a buried Alien::Lib path in a local library, it may not work
87+
`because larger updated load commands do not fit` error messages. Your best bet is to
88+
install it in `/usr/local`; that almost always works.
89+
90+
6691
## DB_File errors
6792

6893
There are [reports of issues](https://github.com/bioperl/bioperl-live/issues/267)
@@ -93,16 +118,18 @@ Set::IntervalTree, so force install ExtUtils::CppGuess and try re-installing
93118
Set::IntervalTree again – it will probably work.
94119

95120

96-
## libBigWig
121+
## Curl support in libBigWig
97122

98-
When manually installing libBigWig on recent versions of macOS (observed with Sonoma,
99-
14.x and libBigWig v0.4.7), the compilation may fail at first. To check for libCurl
100-
dependencies, it attempts to compile a small test program and runs the command
101-
`mktemp --suffix=.c`. While that `--suffix` option is available to versions on Linux
102-
platforms, it is not available to the version on macOS, thus breaking the detection
103-
of libCurl. To work around this, we just have to tell it that, yes, we have libCurl.
104-
Edit the `Makefile` and comment out the five lines after
105-
`# Create a simple test-program...` and add a new line
123+
When manually installing recent versions of
124+
[libBigWig](https://github.com/dpryan79/libBigWig) ,v 0.4.6 and later, the
125+
compilation may fail at first. To check for libcurl dependencies, it attempts to
126+
compile a small test program and runs the command `mktemp --suffix=.c`. While that
127+
`--suffix` option is available to versions on Linux platforms, it is not available to
128+
the version on macOS, thus breaking the detection of libcurl.
129+
130+
To work around this, we just have to tell it that, yes, we have libcurl (it's part of
131+
XCode). Edit the `Makefile` and comment out the five lines after `# Create a simple
132+
test-program...` and add a new line
106133

107134
HAVE_CURL=YES
108135

@@ -115,7 +142,46 @@ its testing, albeit through a fake remote test with Test::Fake::HTTPD (if it's
115142
installed). However, empirical testing with real remote data (via https) seems to work ok.
116143

117144

145+
## Compiling UCSC library
146+
147+
Compiling the UCSC library and/or utilities on macOS requires additional libraries,
148+
which are best installed using [Homebrew](https://brew.sh).
149+
150+
brew install libpng libssl
151+
152+
If you're intending to compile the command line utilities, you will also need
153+
MySQL. For purposes here, MariaDB seems sufficient. You do not need to set up
154+
or configure a database.
155+
156+
brew install mariadb
157+
158+
Generate the `localEnvironment.mk` file
159+
160+
cd path/to/userApps
161+
make installEnvironment
162+
163+
This will generate `kent/src/inc/localEnvironment.mk` for your local machine. If
164+
you are just building the library, edit this file to add at the end:
165+
166+
CFLAGS = -fPIC
167+
L+=/opt/homebrew/lib/libssl.a
168+
L+=/opt/homebrew/lib/libcrypto.a
169+
PNGLIB=/opt/homebrew/lib/libpng.a
170+
PNGINCL=-I/opt/homebrew/include
118171

172+
However, if you are intending to build the command line utilities, your best bet is
173+
to simply hack `kent/src/inc/common.mk` file directly. This file performs a bunch of
174+
shenanigans to locate various libraries and set different compile options separately,
175+
_especially_ for MySQL, so there is no single variable you can simply define. Your
176+
best bet is to search for the `/opt/local` path for the following variables and
177+
change it to `/opt/homebrew`. An example of what you need to change is below:
119178

179+
L+=/opt/homebrew/lib/libssl.a
180+
L+=/opt/homebrew/lib/libcrypto.a
181+
PNGLIB=/opt/homebrew/lib/libpng.a
182+
PNGINCL=-I/opt/homebrew/include
183+
MYSQLINC=/opt/homebrew/include/mysql
184+
MYSQLLIBS=/opt/homebrew/lib/libmysqlclient.a
120185

186+
The libraries and utilities should then compile ok.
121187

0 commit comments

Comments
 (0)