-
Notifications
You must be signed in to change notification settings - Fork 63
JOSHUA-252 Make it possible to use Maven to build Joshua #12
Conversation
@lewismc Please merge this ASAP. |
My ant build is failing.. [ivy:resolve] http://ivyroundup.googlecode.com/svn/trunk/repo/modules/org.apache.commons/commons-cli/1.2/packager.xml [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: net.sourceforge.ant-doxygen#ant-doxygen;1.6.1: not found [ivy:resolve] :: org.apache.commons#commons-cli;1.2: not found [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS Merging this PR helps me to move forward for resolving other issues. |
The PR is not finished yet. There are a number of issues as highlighted above which need attention. |
Hey Lewis. I'm flying today but can take a look at getting the tests to On Sun, May 15, 2016 at 7:19 PM, Lewis John McGibbney <
|
ACK On Sun, May 15, 2016 at 7:25 PM, Kellen Sunderland <[email protected]
Lewis |
this is indeed a big change and I think it's better to split into multiple small PRs. Maybe we can have a 'maven' branch in parallel for a short transition time, |
I second @thammegowda's idea of merging this into a "maven" branch first which will also continue to pull the latest from master. This is a big change and I'm not going to have time to look at it too closely for a while yet (probably later next week). |
I agree as well
|
@mjpost do you know where I can find the class ArpaFile.java ? It is referenced within src/test/java/org/apache/joshua/decoder/ff/lm/ArpaFileTest.java| but not available within master source. |
I don't. However, the "Arpa" denotes the standard SRILM format for language model files (read by both BerkeleyLM and KenLM), so the file would have been very similar to this one in BerkeleyLM. As a warning, I have not touched many of those unit test files in years, and many of them may be way, way out of date. |
ACK @mjpost I knew this. I am tempted to merely skip the ones for which classes no longer exist. I'm going to try and stabilize the build tonight. |
No way, I actually found the file online |
Wow, nice. We might want to put some of that back in there for completeness / posterity. |
+1 |
Hi Folks, this is now a feature branch and can be found at https://github.com/apache/incubator-joshua/tree/JOSHUA-252 |
The next step here is for me to stabilize the test suite. This will involve putting the test resources in to src/test/resources and then removing the top level test directory. I'll work on this over the next few days. |
I didn't see your JOSHUA-252 branch. I just created a "maven" branch that pulls PRs 4 and 5 into this one. However, after pushing that up, I don't see that on github / incubator either. It seems that Apache is not pushing branches down to github? Do you know how to fix this? Everything seems to compile for me when I type the commands @thammegowda provided. However, I don't really know how to use Maven, and eclipse is totally broken. Can you add a BUILD file that describes how to compile and how to get things set up in eclipse? |
Yep I'll add all of this to the wiki space. In the meantime you can do mvn eclipse:eclipse Easy as that On Monday, May 16, 2016, Matt Post [email protected] wrote:
Lewis |
@lewismc I tried to create a training model from maven build. Found that
|
You can build it locally by simply using the following scope
I think you can remove the version element for the time being and try that out. We may need to create a Maven build for berkeley aligner and push it to Maven Central |
Okay, I just pushed up the changes after merging in master and making some other changes to that the tests will run (with manual calls). The code on master runs fine, so it must have been something in the 252 or 262 branches. I've done some initial work wrapping LOG.debug() calls in checks for LOG.isDebugEnabled(), but that didn't seem to help. All my tests were done with
I may have some more time to look at this tomorrow. |
What kind of 'tests' are we talking about here? The only tests which I know of are invoked by running One last thing. No I changed absolutely no code. I did however introduce some classes such as the ArpaFile, etc which were required to get the current unit tests running. |
On a hunch, I went through and removed all the calls LOG.*() in Decoder, DecoderThread, JoshuaDecoder, Translation, Chart, DotChart, and ComputeNodeResult. This fixed the problem; the decoder now runs as fast as before. |
I went into joshua/decoder/chart_parser/ComputeNodeResult; adding and removing the following lines makes a huge difference in speed. So it looks like logging is neither (a) time-safe nor (b) thread-safe. Or that something is wrong. I can't get the decoder to output any statements using the logging properties. It's sort of a pain to have to rebuild a tarball every time you want to test something... FYI, I won't have any more time to spend on this before Tuesday.
|
Okay, can't stop scratching this itch. On even further investigation, with some System.err.println()s, I discovered that the properties file isn't getting loaded correctly. Consequently, isDebugEnabled() is always returning true, but the logging statements are not printed anywhere visible. So the work is getting triggered, and is just not seen, and that is the problem. I've pushed up the merge. Can you folks take care of making sure that the logger file is running correctly? You can test this by running
Depending on your hardware, of course, that should run in just a few seconds tops. Incidentally, there are at least two other problems I am hoping you can help address?
Or I would be okay to switch to those values directly (e.g., "-v OFF"). But the former seems more Unix-like. Thanks again for all your work on this! matt |
@thammegowda please revert logging patch. |
Just to be clear, not sure it needs to be reverted. We just need to find a way to make sure the config file gets read and that logging gets set to INFO by default. And logging should all go to STDERR, I think, not STDOUT. In general, too, having INFO prepended to all informative lines is a bit ugly. Is this a useful convention for reasons I can't understand? It's often useful to see the decoder working with just these high-level labels, and I don't mind debug lines being prepended with DEBUG. |
Make an entry in logging file? |
Yes, I hope I was clear that this is all great work. We just have to figure out how to get log4j to read the config file, which it's currently not doing. Then, we can make sure the test cases pass, and merge back into master? |
Found the issue with logger config file. Seems like |
@mjpost I am unable to run Caused by: java.lang.RuntimeException: java.lang.UnsatisfiedLinkError: no ken in java.library.path at org.apache.joshua.decoder.ff.lm.KenLM.(KenLM.java:52) ... 10 more Caused by: java.lang.UnsatisfiedLinkError: no ken in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.apache.joshua.decoder.ff.lm.KenLM.(KenLM.java:43) ... 10 more Exception in thread "main" java.lang.RuntimeException: * FATAL: could not find a feature 'LanguageModel' How do I pass this ? |
OK @mjpost thanks |
Its more than overdue as well for me to say @thammegowda thank you. The Logging effort is absolutely paradise. I looked at a lot of code during reformat and your logging patches are very welcome. Thank you Sir. |
Okay, great! That worked. I assume the edits to $JOSHUA/bin/joshua mean that eclipse compiled files will override the jar? So I can do fast development in Eclipse? We still have to solve the KenLM problem. My tests ran because I linked $JOSHUA/ext and $JOSHUA/lib to directories on master. Do we have ideas for how to pull in KenLM and compile it? And BerkeleyLM? I think that's the big thing remaining. After that, I'll have to go over the log messages in a bit more detail. |
Yes all you need to execute in your terminal is
Yes we do :(
https://bitbucket.org/atlassian/bash-maven-plugin Honestly thats the most I can find right now without hacking a Maven Github download build plugin. I am fed up of XML @mjpost :0 :) :) :) |
Hi @lewismc Thanks so much.
Oh, I missed the licence header. Got it. I do not yet have write permissions to directly fix it, so guess I have to raise another PR to fix it.
Sorry, I am not clear what do you mean?
Thanks :-) |
Seems like you have hard-time with eclipse maven integration. P.S. I use Intellij Idea opensource edition and the maven integration is a breeze. |
yes @thammegowda the process is welcome and appreciated Sir. on a side note hove you tried th Docker containersTZ? |
@lewismc Thank you very much, Sir. I am much interested in your PR to Tika to integrate this translator 👍 |
ack |
Another issue: BerkeleyLM is currently pulled from the Maven repository, but it is very old (1.1.2, but 1.1.6 is available). I'm pretty sure its author has abandoned it; how can we get it updated? Joshua has a few bug fixes on it, even. One option: we could just roll the codebase into Joshua (it's Apache 2.0). |
So here is what is outstanding:
That way we don't have to mess with any shell scripts from Maven, which is not advised (because of portability concerns). matt |
Unless there are any objections, I'd like to merge this branch back into master. There are a few small issues, but none that can't be addressed with documentation. I'd love to do this either tonight or tomorrow, so please voice any objections as soon as possible. |
+1 great work! watch out there are conflicts to resolve before merging @mjpost |
Hi @mjpost can you point me at the latest BerkeleyLM codebase? I can Mavenize it and get it in to Maven Central so we can use it as a dependency in pom.xml |
@lewismc See https://github.com/joshua-decoder/berkeleylm/, which has the 1.1.6 release, plus a few changes that I committed. The only important change is an ability to recognize compressed files that don't happen to end with .gz, but if it's easier to just take the official 1.1.6, that's fine with me; I don't think the changes are crucial. |
Actually @chrismattmann it's just a fast-forward at this point — all is merged on JOSHUA-252, it's just not showing up on this PR. |
Hi Folks,
This PR is a beast. I think a Google Hangout would be best to talk through what is going on.
Basically it,
...
As I expected there are a bunch of files which include Objects which ni longer exist. If you pull this PR in to a branch and execute mvn clean install you will see that this is the case with around 6 or 7 test files. We need to collectively decide what to do with the missing imports.
Apart from that, work still t be done here is as follows
The first thing i think, is to get the test classes compiling. Once we have done that we can move on to getting the tests passing and stabilizing the Java build code.
After that we can move on to the C++ build challenges.
All in all, a VERY successful week for Joshua. Looking forward to stabilizing the Maven build as it will make things so much easier or us all moving forward.