compatibility for python2 and python3, including C API, and revisions as to how package built #29

evanbiederstedt · 2018-09-30T13:13:56Z

No description provided.

Speed issue

…into speed_issue merged

Speed issue

evanbiederstedt · 2018-10-01T20:30:35Z

Comments:

RE: C API, FORTRAN, and wext/setup.py

I wrote this to be both 2.7 and 3.x compatible. The C API changed a good deal in Python3.x, but it's quite slick. Hopefully these changes make sense.

I didn't touch the Fortran code, but I did revise wext/setup.py to compile both the C and Fortran code into separate modules. It appears to work on Travis, as well as the Linux server.

I also recommend users should just install this, so the modules are individually "importable".

RE: README

The Travis-CI config will have to be activated on your end. Otherwise, the badge will fail.

I wish the README contained a bit more motivation/details about WExT, and why this should be used

RE: py23 compatibility changes

--- Any '.iteritems()' became '.items()'.

--- We were previously using implicit relative imports. Python does not like this----these need to be explicit relative imports, e.g.

https://github.com/evanbiederstedt/wext/blob/master/wext/exclusivity_tests.py#L5-L6

from .exact import exact_test

--- This needs to explicitly be a tuple() for Python3.x

https://github.com/evanbiederstedt/wext/blob/master/compute_mutation_probabilities.py#L49

--- The most interesting change is adding 'xranges()' from future, which is now a required dependency.

https://github.com/evanbiederstedt/wext/blob/master/compute_mutation_probabilities.py#L12
https://github.com/evanbiederstedt/wext/blob/master/wext/mcmc.py#L7

Normally, I would simply convert this to range(). However, this would be a major issue for Python2.7 users in this case, as the 2.7 version would take a major hit in terms of performance:

https://github.com/raphael-group/wext/blob/master/compute_mutation_probabilities.py#L109

e.g. range(1, 2*10**9) will take minutes in python 2.7

It appears the way around this is to use xrange() for both versions 2 and 3, which is what I've done.

zmiimz · 2018-12-03T11:07:16Z

thanks!
m.b. add future into requirements?

simple example works without error but pancan stops with

examples/generate_data.py", line 31, in generate_pancan_data
samples1 = [ 'd1-%s' % (i+1) for i in range(N_1)]
TypeError: 'float' object cannot be interpreted as an integer
Makefile:79: recipe for target '... examples/pancan/data/dataset%-aberrations.tsv' failed
'float' object cannot be interpreted as an integer

seems that all N/ divisions should be replaced with the int division operator // in examples/generate_data.py

also in process_mutations.py
in process_maf function for arr

AttributeError: 'map' object has no attribute 'index'

possible solution?..

patient_index = list(arr).index('tumor_sample_barcode')
var_class_index = list(arr).index('variant_classification') if 'variant_classification' in arr else None
var_type_index = list(arr).index('variant_type') if 'variant_type' in arr else None
val_status_index = list(arr).index('validation_status') if 'validation_status' in arr else None

-running experiments/eccb2016 in find_exclusive_sets.py

File "find_exclusive_sets.py", line 173, in run
print('* Using {} permuted matrix files'.format(len(permuted_files)))
TypeError: object of type 'zip' has no len()

fixed in def get_permuted_files(permuted_matrix_directories, num_permutations):

return list(zip(*permuted_directory_files))

etc.

evanbiederstedt · 2018-12-03T22:09:55Z

Hi @zmiimz

Thanks for the comments!

m.b. add future into requirements?

This is true that we should update the README if this pull request is accepted. Currently, it looks like I have future >= 0.16.0 in the requirements file. It might be easier to clone this repo https://github.com/evanbiederstedt/wext

simple example works without error but pancan stops with

Could you show me which commands you used to get this error? I just tried with Python 2.7.15 and Python 3.7.1, and I wasn't able to re-create this. I just followed the steps in the README, i.e.

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt

cd wext
python setup.py install

For all of the issues you discuss, could you please provide the version of python you used, and the command you used? That would be helpful moving forward---it's a bit difficult to follow what you've done so far.

Thanks, Evan

zmiimz · 2018-12-03T22:54:18Z

Could you show me which commands you used to get this error?

started examples and eccb2016 experiment calculation

evanbiederstedt · 2018-12-04T00:08:17Z

Hi @zmiimz

Please provide as many details as possible so I can proceed to investigate the errors you report, i.e. the python version, the version of gcc, how you installed the package, and the commands you are running.

I'm running this on Mac OS 10.13.6

Versions of python and gcc

$ python --version
Python 3.7.1
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 10.0.0 (clang-1000.10.44.4)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
$

I've attached a text file of the commands I use for installation and running examples/pancan. I haven't run into your errors, so I will need more information about the versions of tools you are using, as well as the commands you have run.

wext_example_pancan.txt

Thanks, Evan

zmiimz · 2018-12-07T17:03:08Z

Dear Evan,
I repeated install and test run of WEXT on a fresh installed Ubuntu 18.04 OS and everything works here as expected (I am waiting for finish of eccb2016 run, but other tests (simple and pancan) finished and that is ok and enough for me now). Mentioned errors were observed on another (older) Linux OS with python 2.7.13 but I still have to verify this assumption (in few days) .

evanbiederstedt · 2018-12-07T17:36:39Z

Hi @zmiimz

Thanks for the response. Please keep me updated.

I'm going to try out the eccb2016 run on my end as well. I'll post results.

Thanks, Evan

matthewreyna

Thank you for all of these changes. I tested the simple example, and the code successfully runs and returns equivalent results on Python 2 and 3. I made several comments but all for minor issues.

The package future is needed for both Python 2 and 3.
Most Python 2/3 functions work fine with iterators whether they are lists (like range(n) in Python 2) or generators (like range(n) in Python 3) -- len is a notable exception for some reason. I flagged some of these because they make it a little harder to read the code and slow things down the code at times but you can either leave them as-is or change them as needed.
print(a, b, c) prints a tuple (a, b, c) in Python 3. Maybe replace with print(a + b + c) so that Python 2 and 3 return same output.

README.md

matthewreyna · 2019-01-08T00:31:27Z

compute_mutation_probabilities.py

 this_dir = os.path.dirname(os.path.realpath(__file__))
 sys.path.append(this_dir)
 from wext import *
+from past.builtins import xrange


I don't think we need xrange.

I think my sole motivation here was consistency :)

Above I detail the memory/performance issue associated with range() vs xrange() between Python2.x and 3.x. So, I may have simply started using this everything to avoid any potential issues. (A downside to not taking the time to unit test this is that I didn't think deeply about each use of range(), unless it became obvious it was a problem.)

That's fine, and the use of xrange in

seeds = random.sample(xrange(1, 2*10**9), args.num_permutations)

is the only really important one.

Yes, I agree with this.

We can keep xrange here and other places -- not a big issue.

matthewreyna · 2019-01-08T00:34:46Z

experiments/eccb2016/scripts/helper.py

 #!/usr/bin/env python
+
 import numpy as np
+from past.builtins import xrange


Not needed.

You may be entirely correct, and I'm happy to accept this.

See previous comments on range() vs. xrange() and consistency.

Not a big issue. We can leave as-is.

matthewreyna · 2019-01-08T00:37:14Z

experiments/eccb2016/scripts/pairs_summary.py

-print '-' * 14, 'Correlation: WRE (Saddlepoint) and WRE (Recursive)', '-' * 14
-print 'All: \\rho={:.5}, P={:.5}'.format(*all_correlation)
-print '\Phi_WR < 10^-4: \\rho={:.5}, P={:.5}'.format(*tail_correlation)
+print('-' * 14, 'Correlation: WRE (Saddlepoint) and WRE (Recursive)', '-' * 14)


In Python 2, print(a, b, c) prints (a, b, c). Replace commas with plus signs?

Replace commas with plus signs?

Good catch. Yes, let's use plus signs.

matthewreyna · 2019-01-08T01:24:03Z

experiments/eccb2016/scripts/results_table.py

-    geneToLengthRank.update(zip(geneToLength.keys(), length_ranks))
-    threshold_gene = sorted(geneToLength.keys(), key=lambda g: geneToLengthRank[g])[args.length_threshold]
-    print 'Length of {} longest gene: {}'.format(args.length_threshold, geneToLength[threshold_gene])
+    geneToLengthRank.update(list(zip(list(geneToLength.keys()), length_ranks)))


Both lists unneeded here.

I missed this; you are correct, these are not necessary.

matthewreyna · 2019-01-08T01:30:07Z

wext/enumerate_sets.py

-        setToPval.update(pval.items())
-        setToTime.update(time.items())
-        setToObs.update(obs.items())
+        setToPval.update(list(pval.items()))


Nice dict.update(iterator) but not list needed here.

wext/exact.py

wext/mcmc.py

matthewreyna · 2019-01-08T01:37:02Z

wext/i_o.py

+            for M, pval in setToPval.items():
                if setToFDR[M]<=fdr_threshold:
                    X, T, Z, tbl = setToObs[M]
                    row = [ ', '.join(sorted(M)), pval, setToFDR[M], setToRuntime[M], T, Z ] + tbl


Error on line 66 if rows = []. Between lines 65 and 66, maybe add if rows: and indent lines 66-73 so no output if no results.

Ah, I see what you mean. Yes, I'll correct this.

matthewreyna · 2019-01-08T01:56:34Z

README.md

-Latest tested version in parentheses.
+[![Build Status](https://api.travis-ci.org/raphael-group/wext.svg?branch=master)](https://travis-ci.org/raphael-group/wext?branch=master)

-1. Python (2.7.9)


future is also currently needed.

See comments above on the README; I think I was thinking the requirements.txt addressed all Python dependencies.

Something like pip install -r requirements.txt should address concerns of all users, I think. (Could be wrong, happy to change.)

Yes, but not everyone will think to check requirements.txt or run pip install -r requirements.txt. Others may think, justifiably, that running python setup.py install successfully means that everything ready to go. For a real-life example, it crashed for me because I tried it in a new virtual machine with the usual dependencies but hadn't needed future yet.

If we add it to the README, then we'll save a few emails, GitHub issues, and StackOverflow searches, which is worth it for everyone.

If we add it to the README, then we'll save a few emails, GitHub issues, and StackOverflow searches, which is worth it for everyone.

Absolutely, and I agree with this as a philosophy whole-heartedly. If there are any GitHub issues, it's 99% the fault of the developer, even if the problem is one of presentation.

Yes, but not everyone will think to check requirements.txt or run pip install -r requirements.txt.

So, the motivation above wasn't to remove this information. Rather, I'm trying to cater to the lazy user (myself included) which reduces GitHub issues, e-mails, SO questions, etc.

When I look at new tools in bioinformatics, I want to see a succinct description of the algorithm/code within seconds in the README. This is needed for WExT, at the top of the README. In the current README, requirements are given first. The requirements should be made more comprehensive (e.g. add the python libraries necessary with libraries required as detailed in requirements.txt) and I think it shouldn't be the first think seen in a README. We need to aim for both succinct and comprehensive :)

The rationale behind this is presentation for new users (as an invitation to use the code), as well as preventing unnecessary Github issues/e-mails.

Good idea. Let's move requirements lower in the README if they are too long.

matthewreyna · 2019-01-17T00:58:52Z

LGTM -- thanks for all of the changes! Let's fix the print functions so that the output looks right, run the simple example again to test for new errors, and merge.

evanbiederstedt and others added 30 commits August 11, 2018 21:37

Update README.md

282b3ba

revised requirements.txt, README

a4106f6

added travis config

8effca7

updated to python3.x in "examples", "experiments", "viz"

4ff1858

updated certain scripts to python3.x

f4c339c

updated certain scripts to python3.x

98086fc

ported scripts from "wext" to python3.x

e3a0a71

update travis config, compile 'wext' C/Fortran code

f613b30

revise travis config

e45829c

revise using python3.x syntax for explicit relative imports

4612a70

fixed relative import syntax for all scripts in "wext"

844ec70

attempt to fix issue with wext_exact_test

300f1d8

2nd attempt to fix issue with wext_exact_test

4a87caa

3rd attempt to fix issue with wext_exact_test

36a22bc

removed python3 shebangs

7c6427e

corrected structures for PyMethodDef

02e5586

check compiles under Python3.x

934904a

revise poibinmodule header file, defin py_pmf as static

602e301

try renaming to comet_exact_tests

49750b7

revised setup.py, module should be cpoibin

6b2cdda

corrected typo with "from wext_exact_test import triple_exact_test"

053e680

cannot find module, but it does install...

29de50b

revised __init__.py to correclty import modules

61fc045

run nosetests in different subdirectory

da5a5a9

try nosetests in upper subdirectory

2beb71f

revised __init__.py

8f646d4

changed __init__.py again, try explicit imports

c04fdc8

now change exact.py, from ..c import wext_exact_test

8254756

revised relative imports

7fa5cbb

try "from .wext_exact_test import * "

c43aa88

evanbiederstedt and others added 21 commits September 10, 2018 23:56

allow 2.7 builds with python

d9a03b7

revise how module named

48873ff

removed comments

3d74ad2

revised string handling, outside scripts

79f697b

revised string handling, outside scripts

e799a4f

first commit

ec674fc

revised external scripts

bb0d151

fixed performance issue, added future dependency

4158cbd

fixed performance issue, added future dependency

a60809a

revised experiments/eccb2016/scripts

e3a4355

revised experiments/eccb2016/scripts, permutation_helper

a508dac

revised travis config

82ff47c

revised README

3ddc71f

Merge pull request #1 from evanbiederstedt/speed_issue

f5ef9a2

Speed issue

revised source

31a8a28

remove debugging files

3abf0f6

source code revisions for py23 compatibility

990fdb2

Merge branch 'speed_issue' of https://github.com/evanbiederstedt/wext …

aad1b60

…into speed_issue merged

revised source for py23 compatibility

866d32e

revise source, use generator instead of converting to list()

3ee93a6

Merge pull request #2 from evanbiederstedt/speed_issue

411303d

Speed issue

matthewreyna approved these changes Jan 8, 2019

View reviewed changes

compatibility for python2 and python3, including C API, and revisions as to how package built #29

Are you sure you want to change the base?

compatibility for python2 and python3, including C API, and revisions as to how package built #29

Uh oh!

Conversation

evanbiederstedt commented Sep 30, 2018

Uh oh!

evanbiederstedt commented Oct 1, 2018

Uh oh!

zmiimz commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evanbiederstedt commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zmiimz commented Dec 3, 2018

Uh oh!

evanbiederstedt commented Dec 4, 2018

Uh oh!

zmiimz commented Dec 7, 2018

Uh oh!

evanbiederstedt commented Dec 7, 2018

Uh oh!

matthewreyna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewreyna Jan 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthewreyna commented Jan 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

zmiimz commented Dec 3, 2018 •

edited

Loading

evanbiederstedt commented Dec 3, 2018 •

edited

Loading

matthewreyna Jan 8, 2019 •

edited

Loading