Skip to content

Commit 8a27a1d

Browse files
committed
[docs] more on learning + queries
1 parent 38a9a31 commit 8a27a1d

File tree

8 files changed

+133
-29
lines changed

8 files changed

+133
-29
lines changed

docs/src/api/public.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -26,18 +26,21 @@ save_as_dot
2626
## Learning Circuits
2727

2828
```@docs
29-
learn_parameters
30-
learn_chow_liu_tree_circuit
3129
learn_circuit
3230
learn_strudel
33-
learn_circuit_mixture
31+
estimate_parameters
32+
learn_parameters
33+
learn_chow_liu_tree_circuit
3434
```
3535

3636
## Circuit Queries
3737

3838
```@docs
3939
marginal
4040
max_a_posteriori
41+
sample
42+
Expectation
43+
Moment
4144
```
4245

4346
## Compilation

docs/src/manual/learning.md

+47-8
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,73 @@
1-
# [Learning PCs](@id man-learning)
1+
# [Learning](@id man-learning)
22

33
In this section we provide few learning scenarios for circuits. In general, learning tasks for PCs can be separted into two categories: paramter learning and structure learning.
44

5-
### Paramter Learning
65

7-
Given a fixed structure for the PC and the dataset, the goal of paramter learning is to estimate the parameters so that likelihood is maximized.
6+
## Learn a Circuit
7+
8+
You can use [`learn_circuit`](@ref) to learn a probabilistic circuit from the data (both paramter and structure learning).
9+
10+
11+
```@example learning
12+
using LogicCircuits
13+
using ProbabilisticCircuits
14+
train_x, valid_x, test_x = twenty_datasets("nltcs")
15+
16+
pc = learn_circuit(train_x; maxiter=100);
817
9-
First, we load a dataset and initilize a PC with a fully factorized distribution:
18+
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *
19+
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
20+
```
21+
22+
## Learn a mixture of circuits
23+
24+
We also support learning mixture of circuits using the Strudel algorithm ([`learn_strudel`](@ref)).
1025

1126
```@example learning
12-
using LogicCircuits #hide
13-
using ProbabilisticCircuits #hide
27+
using LogicCircuits
28+
using ProbabilisticCircuits
29+
using Statistics
30+
1431
train_x, valid_x, test_x = twenty_datasets("nltcs")
32+
33+
spc, component_weights, lls = learn_strudel(train_x; num_mix = 10, init_maxiter = 20, em_maxiter = 100);
34+
35+
"SPC: $(num_nodes(spc)) nodes, $(num_parameters(spc)) parameters. " *
36+
"Train log-likelihood is $(mean(lls))"
37+
```
38+
39+
## Misc Options
40+
41+
In this sections, we provide options to have more control in learning circuits. For example, what if we only want to do paramter learning.
42+
43+
### Paramter Learning
44+
45+
Given a fixed structure for the PC, the goal of paramter learning is to estimate the parameters so that likelihood is maximized.
46+
47+
First, initliaze PC structure with a balanced vtree represneting a fully factorized distribution:
48+
49+
```@example learning
1550
v = Vtree(num_features(train_x), :balanced)
1651
pc = fully_factorized_circuit(StructProbCircuit, v);
52+
1753
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
1854
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
1955
```
2056

21-
Given fully observed data, we can do maximum likelihood estimatation (MLE) as follows:
57+
No parmater learning is done yet, now let's, do maximum likelihood estimatation (MLE) using [`estimate_parameters`](@ref):
2258

2359
```@example learning
2460
estimate_parameters(pc, train_x; pseudocount=1.0);
61+
62+
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
2563
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
2664
```
2765

2866
As we see the likelihood improved, however we are still using a fully factorized distribution. There is room for improvement. For example, we can choose initial structure based on Chow-Liu Trees.
2967

3068
```@example learning
3169
pc, vtree = learn_chow_liu_tree_circuit(train_x)
70+
3271
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
3372
"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
3473
```
@@ -52,6 +91,6 @@ pc = struct_learn(pc;
5291
maxiter=20)
5392
estimate_parameters(pc, train_x; pseudocount=1.0)
5493
55-
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
94+
"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *
5695
"Training set log-likelihood is $(log_likelihood_avg(pc, train_x))"
5796
```

docs/src/manual/queries.md

+58-11
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,12 @@ First, we randomly make some features go `missing`:
4343

4444
```@example queries
4545
using DataFrames
46-
function make_missing(d::DataFrame;keep_prob=0.8)
47-
m = missings(Bool, num_examples(d), num_features(d))
46+
function make_missing(d::DataFrame;keep_prob=0.8)
47+
m = missings(Bool, num_examples(d), num_features(d))
4848
flag = rand(num_examples(d), num_features(d)) .<= keep_prob
49-
m[flag] .= Matrix(d)[flag]
50-
DataFrame(m)
51-
end;
49+
m[flag] .= Matrix(d)[flag]
50+
DataFrame(m)
51+
end;
5252
data_miss = make_missing(data[1:1000,:]);
5353
nothing #hide
5454
```
@@ -72,7 +72,8 @@ probs_mar ≈ probs_evi
7272

7373
## Conditionals (CON)
7474

75-
In this case, given observed features ``x^o``, we would like to compute ``p(Q \mid x^o)``, where ``Q`` is a subset of features disjoint with ``x^o``. We can leverage Bayes rule to compute conditionals as two seperate MAR queries as follows:
75+
In this case, given observed features ``x^o``, we would like to compute ``p(Q \mid x^o)``, where ``Q`` is a subset of features disjoint with ``x^o``.
76+
We can use Bayes rule to compute conditionals as two seperate MAR queries as follows:
7677

7778
```math
7879
p(q \mid x^o) = \cfrac{p(q, x^o)}{p(x^o)}
@@ -84,17 +85,63 @@ Currently, this has to be done manually by the user. We plan to add a simple API
8485

8586
In this case, given the observed features ``x^o`` the goal is to fill out the missing features in a way that ``p(x^m, x^o)`` is maximized.
8687

87-
88-
We can use the [`MAP`](@ref) method to compute MAP, which outputs the states that maximize the probability and returns the probabilities themselves.
88+
We can use the [`MAP`](@ref) method to compute MAP, which outputs the states that maximize the probability and the log-likelihoods of each state.
8989

9090
```@example queries
9191
data_miss = make_missing(data,keep_prob=0.5);
9292
states, probs = MAP(pc, data_miss);
9393
probs[1:3]
9494
```
9595

96-
## Probability of logical Events
96+
## Sampling
97+
98+
We can also sample from the distrubtion ``p(x)`` defined by a Probabilistic Circuit. You can use [`sample`](@ref) to achieve this task.
99+
100+
```@example queries
101+
samples, lls = sample(pc, 100);
102+
lls[1:3]
103+
```
104+
105+
Additionally, we can do conditional samples ``x \sim p(x \mid x^o)``, where ``x^o`` are the observed features (``x^o \subseteq x``), and could be any arbitrary subset of features.
106+
107+
```@example queries
108+
#3 random evidences for the examples
109+
evidence = DataFrame(rand( (missing,true,false), (2, num_variables(pc))))
110+
111+
samples, lls = sample(pc, 3, evidence);
112+
lls
113+
```
114+
115+
## Expected Prediction (EXP)
116+
117+
Expected Prediction (EXP) is the task of taking expectation of a discrimintative model w.r.t a generative model conditioned on evidemce (subset of features observed).
118+
119+
``\mathbb{E}_{x^m \sim p(x^m \mid x^o)} [ f(x^o x^m) ]``
97120

98-
## Expected Prediction
121+
In the case where ``f`` and ``p`` are circuit, and some structural constrains for the pair, we can do this expectation and higher moments tractably.
122+
You can use [`Expectation`](@ref) and [`Moment`](@ref) to compute the expectations.
99123

100-
## Same Decision Probability
124+
```@example queries
125+
using DataFrames
126+
127+
pc = zoo_psdd("insurance.psdd")
128+
rc = zoo_lc("insurance.circuit", 1)
129+
130+
# Using samples from circuit for the example; replace with real data
131+
data, _ = sample(pc, 10);
132+
data = make_missing(DataFrame(data));
133+
134+
exps, exp_cache = Expectation(pc, rc, data)
135+
136+
exps[1:3]
137+
```
138+
139+
```@example queries
140+
second_moments, moment_cache = Moment(pc, rc, data, 2);
141+
exps[1:3]
142+
```
143+
144+
```@example queries
145+
stds = sqrt.( second_moments - exps.^2 );
146+
stds[1:3]
147+
```

src/Logistic/parameters.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ using CUDA
44
using LoopVectorization: @avx, vifelse
55

66
"""
7-
Parameter learning through gradient descents
7+
LogisticCircuit Parameter learning through gradient descents
88
Note: data need to be DataFrame and Labels need to be in one-hot form.
99
"""
1010
function learn_parameters(lc::LogisticCircuit, nc::Int, data, labels; num_epochs=25, step_size=0.01)

src/mixtures/em.jl

+3-4
Original file line numberDiff line numberDiff line change
@@ -112,11 +112,10 @@ end
112112

113113

114114
"""
115-
Learns a mixture of circuits
115+
learn_strudel(train_x; num_mix = 5, init_maxiter = 10, em_maxiter=20)
116116
117-
learn_strudel (train_x; init_maxiter = 10, em_maxiter=20)
118-
119-
See Strudel: Learning Structured-Decomposable Probabilistic Circuits. https://arxiv.org/abs/2007.09331
117+
Learn a mixture of circuits
118+
See "Strudel: Learning Structured-Decomposable Probabilistic Circuits. [arxiv.org/abs/2007.09331](https://arxiv.org/abs/2007.09331)
120119
"""
121120
function learn_strudel(train_x; num_mix = 5,
122121
pseudocount=1.0,

src/queries/expectation_rec.jl

+10-1
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,13 @@ choose_cache = [ 1.0 * binomial(i,j) for i=0:max_k+1, j=0:max_k+1 ]
2626
end
2727

2828

29-
# On Tractable Computation of Expected Predictions (https://arxiv.org/abs/1910.02182)
3029
"""
30+
Expectation(pc::ProbCircuit, lc::LogisticCircuit, data)
31+
32+
Compute Expected Prediction of a Logistic/Regression Circuit w.r.t to a ProbabilistcCircuit
33+
3134
Missing values should be denoted by missing
35+
See: On Tractable Computation of Expected Predictions [arxiv.org/abs/1910.02182](https://arxiv.org/abs/1910.02182)
3236
"""
3337
function Expectation(pc::ProbCircuit, lc::LogisticCircuit, data)
3438
# 1. Get probability of each observation
@@ -49,6 +53,11 @@ function Expectation(pc::ProbCircuit, lc::LogisticCircuit, data)
4953
results, cache
5054
end
5155

56+
"""
57+
Moment(pc::ProbCircuit, lc::LogisticCircuit, data, moment::Int)
58+
59+
Compute higher moments of Expected Prediction for the pair of Logistic/Regression Circuit, ProbabilistcCircuit
60+
"""
5261
function Moment(pc::ProbCircuit, lc::LogisticCircuit, data, moment::Int)
5362
# 1. Get probability of each observation
5463
log_likelihoods = marginal(pc, data)

src/queries/marginal_flow.jl

+2
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ Computes Marginal log likelhood of data.
5252
const MAR = marginal
5353

5454
"""
55+
marginal_log_likelihood(pc, data)
56+
5557
Compute the marginal likelihood of the PC given the data
5658
"""
5759
marginal_log_likelihood(pc, data) = begin

src/queries/sample.jl

+6-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,12 @@ import Random: default_rng
77
# Circuit sampling
88
#####################
99

10-
"Sample states from the circuit distribution."
10+
"""
11+
sample(pc::ProbCircuit, num_samples)
12+
sample(pc::ProbCircuit, num_samples, evidences)
13+
14+
Sample states from the probabilistic circuit distribution. Also can do conditional sampling if evidence is given (any subset of features).
15+
"""
1116
function sample(pc::ProbCircuit; rng = default_rng())
1217
states, prs = sample(pc, 1, [missing for i=1:num_variables(pc)]...; rng)
1318
return states[1,:], prs[1]

0 commit comments

Comments
 (0)