Meeting notes - February 20th, 2024 #27
unece-stat
started this conversation in
General
Replies: 1 comment
-
|
I have investigated the etymology of the english word "acquire" and the similar-sound alquilar/alquiler in spanish. The results of these investigations are:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Incorporating Machine Learning into GSBPM
The group looked at Juan’s proposed text on Machine Learning, which described uses such as Analytical, Editing and imputation, and Process optimisation.
Edgardo commented that the use for classification and coding could be mentioned, while InKyung also mentioned co-piloting for code development. Chris mentioned that machine learning (cluster analysis) could be used to actually create classification rather than only being used to assign units to their categories.
Edgardo pointed out that some machine learning techniques have existed for many decades, before the term “machine learning” was widely used. Chris commented that the boundary between machine learning and more traditional analysis can be a bit blurry (e.g. PCA for dimension reduction is sometimes consider to be a machine leaning technique). Furthermore, there is a qualitative difference between machine learning techniques applied to numerical datasets and generative AI such as ChatGPT, even though both may use techniques such as neural networks/deep learning.
On AI, Edgardo pointed out that the limit of it’s use in GSBPM, where the outputs need to be reliable, is still unclear. He suggested that we should keep an eye on how things might develop in the future on ML/AI (perhaps in unexpected ways), and so we might avoid using a large amount of time to look at it now. Edgardo suggested we could consider future work on explaining ML using GSBPM, but as a separate product, as has been done for Geo-GSBPM. (Action: Juan to draft a single paragraph for the introduction that refer those specifically interested in geospatial data to the work on GeoGSBPM, and on linking GSIM to GSBPM.)
In this context, there was debate about whether to mention machine learning in specific sub-processes where it may be applied. Or whether to have a more general mention of it at the start of the GSBPM document. Chris suggested that the sub-processes for which machine learning could be used might be limited only by one’s imagination, and it could conceivably be used for fieldwork optimisation or even sample design.
It was noted that some GSBPM sub-processes (such as 5.2 Classify and Code) already make reference to machine learning. Edgardo suggested also adding to the imputation subprocess, however, InKyung said that editing and imputation is an area where machine learning does not seem to be very promising.
However, we should avoid trying to be prescriptive in describing in which sub-processes it can be used. GSBPM has not for example attempted to indicate all of the subprocesses for which geospatial analysis techniques might be applicable. InKyung suggested that the GSBPM sub-process descriptions should try to focus more towards what is done than how it is done.
Florian suggested a general comment at the beginning of the document to avoid going to in-depth about the specific/comprehensive applications of machine learning, noting that it’s part of a wider selection of analytical methods that statisticians might choose to use, among others. (Action: For Florian and Juan to write 1 paragraph on machine learning).
Design phase - Relationship Management
Feedback was received from New Zealand and United Kingdom about managing the relationship with (survey) respondents to boost response rates, and designing the collection to be respondent-centred.
Chris expressed concern that such an approach should not introduce (non-sampling) bias into the collection, for example by targeting groups with known higher response rates.
It was agreed to make changes to the description of 2.3, possibly in the first and/or second paragraphs. Edgardo stressed the importance of not just quality but also the efficiency of the data collection (e.g. not going to the same address 3 times to get a response)
Action: InKyung agreed to make a proposal for changing the text of subprocess 2.3 to incorporate putting respondents at the centre of data collection design.
There was also discussion about changing the name of sub-process 2.3 to something like “design acquisition strategy” to reflect the renaming of Phase 4 to “acquire”.
Action: Chris to investigate the etymology of the words acquire (English) and alqiler (Spanish)
Beta Was this translation helpful? Give feedback.
All reactions