MLexperiments-research-computervision-climatechange/s d text at master · Tianguistengo/MLexperiments-research-computervision-climatechange · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

AI Makes New Scientific Discoveries by Analyzing Old Research Papers
"Unsupervised word embeddings capture latent knowledge from materials science literature"

https://thenewstack.io/ai-makes-new-scientific-discoveries-by-analyzing-old-research-papers/

orignal paper - https://www.researchgate.net/publication/334209824_Unsupervised_word_embeddings_capture_latent_knowledge_from_materials_science_literature

the team gathered 3.3 million abstracts from scientific papers published in over 1,000 journals between 1922 and 2018.
The algorithm then processed the roughly 500,000 unique words found in the abstracts
and transformed each into an array of 200 vectors.
Though the AI had no prior training in materials science, after this process it was nevertheless able to ‘learn’
scientific concepts and
infer relationships between data points,
simply by analyzing the >>placement of words in the abstracts
and when they >>co-occur with one another.

these embeddings capture complex materials science concepts
such as the underlying structure of the periodic table
and structure–property relationships in materials.

" you can use this algorithm to >> address gaps
in materials research, things that people
>>should study but haven’t studied so far.”

In particular, the algorithm proved that it could
>> predict novel thermoelectric materials,
which convert heat to electricity efficiently.
During the team’s tests, the algorithm came up with a variety of predictions for possible thermoelectric materials,
with the top ten predictions demonstrating higher-than-average thermoelectric properties

it works in an unsupervised capacity to find novel connections that might have been overlooked
— perhaps even years ahead of time.
Moreover, according to the team, this method could be used to automatically
extract knowledge that is still hidden in older scientific papers,
and which might not be apparent to human eyes.

Furthermore, we demonstrate that an unsupervised method
can recommend materials for functional applications several years before their discovery.

>>>This suggests that LATENT KNOWLEDGE regarding future discoveries
>>>is to a large extent embedded in past publications.

Our findings highlight the possibility of extracting knowledge and relationships
from the massive body of scientific literature in a collective manner,
>>>>>and point towards a generalized approach to the mining of scientific literature.

-----------
Meta - biomedical

https://www.meta.org/how-it-works

-------------

SciBERT: A Pretrained Language Model for Scientific Text
Iz Beltagy, Kyle Lo, Arman Cohan

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive.
We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018)
to address the lack of high-quality, large-scale labeled scientific data.

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications
to improve performance on downstream scientific NLP tasks.

We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing,
with datasets from a variety of scientific domains.
We demonstrate statistically significant improvements over BERT
and achieve new state-of-the-art results on several of these tasks.
The code and pretrained models are available at https://github.com/allenai/scibert

----------------

Habiendo tantos research papers hoy en dia, la diferencia entre hacer un descubrimiento y no
cada vez depende mas en tu habiildad para escoger que leer

--------------

https://arxiv.org/abs/1808.06640

Adversarial Removal of Demographic Attributes from Text Data

(es interesante para probar hipotesis tambien porque puedes
1. quitar human bias
2. probar un experimento sin una o muchas variables
3. simular como se comportaria/que aprendes de un fenomeno si le quitas una variable)

Recent advances in Representation Learning and Adversarial Training seem to succeed in
>>>removing unwanted features from the learned representation.

We show that demographic information of authors is encoded in -- and can be recovered from --
the intermediate representations learned by text-based neural classifiers.

The implication is that decisions of classifiers trained on textual data are not agnostic to
-- and likely condition on -- demographic attributes.

When attempting to remove such demographic information using adversarial training,
we find that while the adversarial component achieves chance-level development-set accuracy during training,
a post-hoc classifier, trained on the encoded sentences from the first part,
still manages to reach substantially higher classification accuracies on the same data.

This behavior is consistent across several tasks, demographic properties and datasets.
We explore several techniques to improve the effectiveness of the adversarial component.
Our main conclusion is a cautionary one:
do not rely on the adversarial training to achieve invariant representation to sensitive features.


--------

From Gil - Thoughtful AI

6.3. Initiative-driven ThAIs: Access to the scientific record

In order for ThAIs to acquire new knowledge about science, the scientific record should be made more accessible.
Current published articles do not contain the information necessary to understand what was done in enough detail that they can be reproduced [10].

Many scientists would like to improve the transparency and reproducibility of their papers,
but the best practices remain difficult to understand and follow in practice.
Inspired by and partnering with early career researchers, I developed the Scientific Paper of the Future (SPF) Initiative
to teach scientists how to write papers that
>>describe and cite explicitly not just data but also software and methods (workflows and provenance) [11].

scientist’s motivation for structuring and reporting their research products more thoroughly.

Most of the work focuses on extracting facts and claims.
>> Much remains to be done to address the challenges of extracting information about processes and methods.


--------

AI in scientific resarch - royal society + alan turing

Understanding social history from archive material
Researchers are collaborating with curators to build new software to analyse data drawn initially
from millions of pages of out-of-copyright newspaper collections from within the British Library’s National Newspaper archive.
They will also draw on other digitised historical collections, most notably government-collected data
, such as the Census and registration of births, marriages and deaths.
The resulting new research methods will allow computational linguists and historians
to track societal and cultural change in new ways during the Industrial Revolution,
and the changes brought about by the advance of technology across all aspects
of society during this period.
Crucially, these new research methods will place the lives of ordinary people centre-stage15.