Skip to content

Closes #2736 Import Geonames Data#2782

Merged
lbownik merged 1 commit into
developfrom
2736_Import_Geonames_Data_3
Jun 10, 2025
Merged

Closes #2736 Import Geonames Data#2782
lbownik merged 1 commit into
developfrom
2736_Import_Geonames_Data_3

Conversation

@lbownik

@lbownik lbownik commented May 17, 2025

Copy link
Copy Markdown
Contributor

No description provided.

@rscipien rscipien assigned diasf and unassigned rscipien May 21, 2025

@diasf diasf left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests. It's a good candidate for an integration test importing a sample file and querying the import.

At this stage GEONAME fields in datasets are indexed with only the ID of the geoname. So datasets won't be findable using any of the geoname text fields. Is this the desired behavior?

Comment thread dataverse-webapp/src/main/java/edu/harvard/iq/dataverse/api/GeoNameApi.java Outdated
Comment thread conf/solr/7.3.1/geonames/schema.xml
Comment thread dataverse-webapp/src/main/java/edu/harvard/iq/dataverse/api/GeoNameApi.java Outdated
Comment thread dataverse-webapp/src/main/java/edu/harvard/iq/dataverse/api/GeoNameApi.java Outdated
@diasf diasf assigned lbownik and unassigned diasf May 29, 2025
@lbownik lbownik requested a review from diasf June 3, 2025 11:39
// country codes repeat often so deduplicating them saves a lot of memory
private final Map<String, String> countryCodes = new HashMap<>();
// feature codes repeat often so deduplicating them saves a lot of memory
private final Map<String, String> featureCodes = new HashMap<>();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you will be able to import the whole file at once on most of our environments. Most have < 3GB of head room. So you would need to split the file. And you would also have to make sure you don't split it in the middle of a country, otherwise you will have wrong results.

As with the optimizations, I'm just concerned about readability. I mean, if they were obvious you wouldn't need extra commentary around them.

That being said, if you think the current approach is good enough, then let's leave it at that.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File isn't in this repo anynmore. Please sync with the main develop branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread dataverse-webapp/src/main/webapp/metadataTab.xhtml
Comment thread dataverse-webapp/src/main/java/edu/harvard/iq/dataverse/api/GeoNameApi.java Outdated
public class GeoNamesIT extends WebappArquillianDeployment {

private GeoNameIndexingService indexer;
private GeoNameDataFinder finder;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't those injected as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that we can initialize them with
final SolrClient solr = new TestSolrClientFactory().produceGeoNameSolrClient();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Injection of @GeoNameSolrClient should also work in IT tests.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be addressed, testing the correct wiring is part of IT tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread conf/solr/7.3.1/Dockerfile
Comment thread dataverse-webapp/src/test/resources/geonames/PL.txt Outdated
@lbownik lbownik requested a review from diasf June 6, 2025 09:04
private GeoNameDataFinder finder;
@Inject
@GeoNameSolrClient
private SolrClient solr;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Not used.


final SolrClient solr = Mockito.mock(SolrClient.class);
when(solr.addBeans(anyCollection())).thenThrow(RuntimeException.class);
this.indexer = new GeoNameIndexingService(solr);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a unit test for GeoNameIndexingService.

this.indexer.importNames(in);
fail("Exception expected");
} catch (Exception e) {
// pass

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's assertThatThrownBy which can be used in this case.

@lbownik lbownik force-pushed the 2736_Import_Geonames_Data_3 branch 2 times, most recently from 119e0b3 to ab1dec2 Compare June 10, 2025 10:36
@lbownik lbownik force-pushed the 2736_Import_Geonames_Data_3 branch from ab1dec2 to e7daa25 Compare June 10, 2025 10:39
@lbownik lbownik merged commit bd2a331 into develop Jun 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants