Skip to content
This repository was archived by the owner on Sep 20, 2022. It is now read-only.

GeCo I2b2 data source: implementation of steps 1 and 2 #6

Open
wants to merge 82 commits into
base: dev
Choose a base branch
from

Conversation

mickmis
Copy link
Contributor

@mickmis mickmis commented Dec 23, 2021

Here is a working implementation of the i2b2-medco data source plugin for GeCo.

Summary:

  • implementation of i2b2 docker with test data
    • this new image does not include the i2b2 demo data (only the structure), as such the build/test/deployment is much faster, also the CI is configured to cache the layers of the docker builds
  • integration with geco deployment, to reuse the same database easily
  • use of the definition of data source plugin interface through SDK package of GeCo
  • implementation of i2b2 XML API client enabling ontology browsing and explore queries
  • implementation of GeCo data source interface enabling ontology browsing, explore queries and cohort management
  • implementation of database structure and operations for the data source own database containing the explore queries history and the saved cohorts
    • it loads its own structure at init when it finds the database to be empty
  • Tests / CI / Makefile / deployment / etc.
    • notably in internal: test of the plugin through GeCo's data manager

@mickmis mickmis changed the title Step2 GeCo I2b2 data source: implementation of steps 1 and 2 Dec 23, 2021
@mickmis mickmis marked this pull request as ready for review December 23, 2021 16:37
Copy link

@f-marino f-marino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of what we agreed on has been implemented.
@mickmis I left just a few comments to address in the code. In addition to those, I think adding a (even small) description to the packages would be useful.


Next steps to have a fully working plugin:

  1. Implement search box and survival curves operations
  2. Factor out all GeCo dependencies
  3. Define how plugins are loaded in GeCo

@@ -0,0 +1,3 @@
[submodule "third_party/geco"]
path = third_party/geco
url = [email protected]:ldsec/geco.git

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be deleted by extracting to a public repo the geco parts used by this plugin (i.e., as far as I understand, the dev deployment)

@@ -0,0 +1,35 @@
-- pl/pgsql function that returns (maximum lim) ontology elements whose paths contain the given search_string.

CREATE OR REPLACE FUNCTION i2b2metadata.get_ontology_elements(search_string varchar, lim integer DEFAULT 10)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is not up to date

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, actually none of the functions in 50-stored-procedures are used or up to date at the moment. I've included them though in anticipation to the next implementations, but it is likely that it will be needed for them to be modified.

I can remove for clarity if you prefer, or leave them here, maybe with a readme saying what I just wrote.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, don't touch them, I'll take care of it.


const ddlLoaded = `SELECT EXISTS (SELECT 1 FROM pg_namespace WHERE nspname = $1);`

const createDeleteSchemaFunctions = `

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you want to create a schema every time you create a datasource to support the test case where multiple datasources' info are stored on the same DB. If this is the case, why aren't you creating the explore query and saved cohorts table in the correspondent schema. And if it is not the case, why you are doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was more to move the database structure creation logic into the code, rather than rely on the deployment to do so. I find this simplifies the deployment since:

  1. in test deployment we have less control over the database since its deployment is done through geco
  2. in production deployment the plugin is loaded by geco and not as an independent runtime, and I wanted to avoid the need of having to run devops stuff from geco

And yes one of the other objective was to support multiple data sources of the same type.

why aren't you creating the explore query and saved cohorts table in the correspondent schema

It is, through a separate statement ddlStatement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's because of my not great proficiency in sql, but in the ddlstatement I don't see specified the schema in which the two tables are created. Aren't the table created in the default schema (i.e., public) in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this, the connection string passed to the postgresql driver contains the schema (here), i.e. if the schema is not specified in the SQL query, it will default to the one passed to the driver. In this case, the name of the schema is configurable.

Comment on lines +28 to +29
go-build-plugin:
go build -buildmode=plugin -v -o ./build/ ./cmd/...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
go-build-plugin:
go build -buildmode=plugin -v -o ./build/ ./cmd/...
go-build-plugin: export GOOS=linux
go-build-plugin:
go build -buildmode=plugin -v -o ./build/ ./cmd/...

I guess here we should always cross compile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it depends on what is done on the geco side, as the produced binary for the plugin should be compatible with the geco binary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, actually I had to work a lot to make it work with the dockerized geco, I'll take care of it

@mickmis
Copy link
Contributor Author

mickmis commented Jan 18, 2022

@f-marino @romainbou
I'm done with the feedback on both PRs (modulo small additional things depending on @f-marino answers), which in terms of time spent was mostly producing additional documentation.

FYI I have about 1 hour left to spend. I don't know how you would like me to spend it, for additional fixes, a meeting for a debrief/walk-through the code, or anything else.

@f-marino
Copy link

There is only one last comment to address.
For the remaining hour, I'd say let's keep it there for the moment, and let's see how my work on the integration evolves, in case unexpected issues that need @mickmis experience emerge.

@f-marino
Copy link

@mickmis everything went smoothly enough with the integration, so I didn't need your help.

Last thing you can do for the remaining hour is add some comments/documentation about the i2b2 image (everything under /build/i2b2), like a small comment for the most relevant files describing what they do, with the most important parameters to take into account in case we want to modify something, if any.

In particular I noticed that in the i2b2 logs there is not anymore the dump of the xml requests addressed to i2b2, could you tell us which parameter we have to tweak to have them back?

@mickmis
Copy link
Contributor Author

mickmis commented Feb 18, 2022

@mickmis everything went smoothly enough with the integration, so I didn't need your help.

Great news!

Last thing you can do for the remaining hour is add some comments/documentation about the i2b2 image (everything under /build/i2b2), like a small comment for the most relevant files describing what they do, with the most important parameters to take into account in case we want to modify something, if any.

OK this should be done now, I've added several READMEs which should contain all the information needed.

In particular I noticed that in the i2b2 logs there is not anymore the dump of the xml requests addressed to i2b2, could you tell us which parameter we have to tweak to have them back?

If you mean the dump by i2b2 itself it should be controlled the environment variable of the i2b2 docker image AXIS2_LOGLEVEL that you can set to DEBUG. I don't recommend it though as it is usually way too verbose.

If you mean the dump by the data source (which I recommend), it is logged at the debug level.
As you may have a noticed the logging is actually controlled outside of the data source by using a provided Logger logrus.FieldLogger, so this logging level must be set by the component that inits the datasource.
As an example in the tests the level is set to debug, see pkg/datasource/datasource_test.go:31.

@f-marino
Copy link

In MedCo in the logs we were able to see the xml requests when browsing the ontologies or performing queries (like the one in the picture), it is not the case here.
image
So setting AXIS2_LOGLEVEL to DEBUG should do the trick right?

@mickmis
Copy link
Contributor Author

mickmis commented Feb 18, 2022

So setting AXIS2_LOGLEVEL to DEBUG should do the trick right?

Yes however it is pretty unreadable, so I suggest to log it at the datasource level, c.f. my previous response.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants