[docs] more on learning + queries

khosravipasha · khosravipasha · commit 8a27a1d7b49c · 2020-10-23T20:53:10.000-07:00
diff --git a/docs/src/api/public.md b/docs/src/api/public.md
@@ -26,18 +26,21 @@ save_as_dot
 ## Learning Circuits
 
 ```@docs
-learn_parameters
-learn_chow_liu_tree_circuit
 learn_circuit
 learn_strudel
-learn_circuit_mixture
+estimate_parameters
+learn_parameters
+learn_chow_liu_tree_circuit
 ```
 
 ## Circuit Queries
 
 ```@docs
 marginal
 max_a_posteriori
+sample
+Expectation
+Moment
 ```
 
 ## Compilation
diff --git a/docs/src/manual/learning.md b/docs/src/manual/learning.md
@@ -1,34 +1,73 @@
-# [Learning PCs](@id man-learning)
+# [Learning](@id man-learning)
 
 In this section we provide few learning scenarios for circuits. In general, learning tasks for PCs can be separted into two categories: paramter learning and structure learning.
 
-### Paramter Learning
 
-Given a fixed structure for the PC and the dataset, the goal of paramter learning is to estimate the parameters so that likelihood is maximized.
+## Learn a Circuit
+
+You can use [`learn_circuit`](@ref) to learn a probabilistic circuit from the data (both paramter and structure learning).
+
+
+```@example learning
+using LogicCircuits
+using ProbabilisticCircuits
+train_x, valid_x, test_x = twenty_datasets("nltcs")
+
+pc = learn_circuit(train_x; maxiter=100);
 
-First, we load a dataset and initilize a PC with a fully factorized distribution:
+"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *  
+"Train log-likelihood is $(log_likelihood_avg(pc, train_x))"
+```
+
+## Learn a mixture of circuits
+
+We also support learning mixture of circuits using the Strudel algorithm ([`learn_strudel`](@ref)).
 
 ```@example learning
-using LogicCircuits #hide
-using ProbabilisticCircuits #hide
+using LogicCircuits
+using ProbabilisticCircuits
+using Statistics
+
 train_x, valid_x, test_x = twenty_datasets("nltcs")
+
+spc, component_weights, lls = learn_strudel(train_x; num_mix = 10, init_maxiter = 20, em_maxiter = 100);
+
+"SPC: $(num_nodes(spc)) nodes, $(num_parameters(spc)) parameters. " *
+"Train log-likelihood is $(mean(lls))"
+```
+
+## Misc Options
+
+In this sections, we provide options to have more control in learning circuits. For example, what if we only want to do paramter learning.
+
+### Paramter Learning
+
+Given a fixed structure for the PC, the goal of paramter learning is to estimate the parameters so that likelihood is maximized.
+
+First, initliaze PC structure with a balanced vtree represneting a fully factorized distribution:
+
+```@example learning
 v = Vtree(num_features(train_x), :balanced)
 pc = fully_factorized_circuit(StructProbCircuit, v);
+
 "PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *  
 "Train log-likelihood is $(log_likelihood_avg(pc, train_x))"  
 ```
 
-Given fully observed data, we can do maximum likelihood estimatation (MLE) as follows:
+No parmater learning is done yet, now let's, do maximum likelihood estimatation (MLE) using [`estimate_parameters`](@ref):
 
 ```@example learning
 estimate_parameters(pc, train_x; pseudocount=1.0);
+
+"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
 "Train log-likelihood is $(log_likelihood_avg(pc, train_x))"  
 ```
 
 As we see the likelihood improved, however we are still using a fully factorized distribution. There is room for improvement. For example, we can choose initial structure based on Chow-Liu Trees.
 
 ```@example learning
 pc, vtree = learn_chow_liu_tree_circuit(train_x)
+
 "PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
 "Train log-likelihood is $(log_likelihood_avg(pc, train_x))"  
 ```
@@ -52,6 +91,6 @@ pc = struct_learn(pc;
     maxiter=20)
 estimate_parameters(pc, train_x; pseudocount=1.0)
 
-"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters." *
+"PC: $(num_nodes(pc)) nodes, $(num_parameters(pc)) parameters. " *
 "Training set log-likelihood is $(log_likelihood_avg(pc, train_x))"  
 ```
diff --git a/docs/src/manual/queries.md b/docs/src/manual/queries.md
@@ -43,12 +43,12 @@ First, we randomly make some features go `missing`:
 
 ```@example queries
 using DataFrames
-function make_missing(d::DataFrame;keep_prob=0.8)    
-    m = missings(Bool, num_examples(d), num_features(d)) 
+function make_missing(d::DataFrame;keep_prob=0.8)
+    m = missings(Bool, num_examples(d), num_features(d))
     flag = rand(num_examples(d), num_features(d)) .<= keep_prob
-    m[flag] .= Matrix(d)[flag] 
-    DataFrame(m) 
-end; 
+    m[flag] .= Matrix(d)[flag]
+    DataFrame(m)
+end;
 data_miss = make_missing(data[1:1000,:]);
 nothing #hide
 ```
@@ -72,7 +72,8 @@ probs_mar ≈ probs_evi
 
 ## Conditionals (CON)
 
-In this case, given observed features ``x^o``, we would like to compute ``p(Q \mid x^o)``, where ``Q`` is a subset of features disjoint with ``x^o``. We can leverage Bayes rule to compute conditionals as two seperate MAR queries as follows:
+In this case, given observed features ``x^o``, we would like to compute ``p(Q \mid x^o)``, where ``Q`` is a subset of features disjoint with ``x^o``. 
+We can use Bayes rule to compute conditionals as two seperate MAR queries as follows:
 
 ```math
 p(q \mid x^o) = \cfrac{p(q, x^o)}{p(x^o)}
@@ -84,17 +85,63 @@ Currently, this has to be done manually by the user. We plan to add a simple API
 
 In this case, given the observed features ``x^o`` the goal is to fill out the missing features in a way that ``p(x^m, x^o)`` is maximized.
 
-
-We can use the [`MAP`](@ref) method to compute MAP, which outputs the states that maximize the probability and returns the probabilities themselves.
+We can use the [`MAP`](@ref) method to compute MAP, which outputs the states that maximize the probability and the log-likelihoods of each state.
 
 ```@example queries
 data_miss = make_missing(data,keep_prob=0.5);
 states, probs = MAP(pc, data_miss);
 probs[1:3]
 ```
 
-## Probability of logical Events
+## Sampling
+
+We can also sample from the distrubtion ``p(x)`` defined by a Probabilistic Circuit. You can use [`sample`](@ref) to achieve this task.
+
+```@example queries
+samples, lls = sample(pc, 100);
+lls[1:3]
+```
+
+Additionally, we can do conditional samples ``x \sim p(x \mid x^o)``, where ``x^o`` are the observed features (``x^o \subseteq x``), and could be any arbitrary subset of features.
+
+```@example queries
+#3 random evidences for the examples
+evidence = DataFrame(rand( (missing,true,false), (2, num_variables(pc))))
+
+samples, lls = sample(pc, 3, evidence);
+lls
+```
+
+## Expected Prediction (EXP)
+
+Expected Prediction (EXP) is the task of taking expectation of a discrimintative model w.r.t a generative model conditioned on evidemce (subset of features observed).
+
+``\mathbb{E}_{x^m \sim p(x^m \mid x^o)} [ f(x^o x^m) ]``
 
-## Expected Prediction
+In the case where ``f`` and ``p`` are circuit, and some structural constrains for the pair, we can do this expectation and higher moments tractably. 
+You can use [`Expectation`](@ref) and [`Moment`](@ref) to compute the expectations.
 
-## Same Decision Probability
+```@example queries
+using DataFrames
+
+pc = zoo_psdd("insurance.psdd")
+rc = zoo_lc("insurance.circuit", 1)
+
+# Using samples from circuit for the example; replace with real data
+data, _ = sample(pc, 10);
+data = make_missing(DataFrame(data));
+
+exps, exp_cache = Expectation(pc, rc, data)
+
+exps[1:3]
+```
+
+```@example queries
+second_moments, moment_cache = Moment(pc, rc, data, 2);
+exps[1:3]
+```
+
+```@example queries
+stds = sqrt.( second_moments - exps.^2 );
+stds[1:3]
+```
diff --git a/src/Logistic/parameters.jl b/src/Logistic/parameters.jl
@@ -4,7 +4,7 @@ using CUDA
 using LoopVectorization: @avx, vifelse
 
 """
-Parameter learning through gradient descents
+LogisticCircuit Parameter learning through gradient descents
 Note: data need to be DataFrame and Labels need to be in one-hot form.
 """
 function learn_parameters(lc::LogisticCircuit, nc::Int, data, labels; num_epochs=25, step_size=0.01)
diff --git a/src/mixtures/em.jl b/src/mixtures/em.jl
@@ -112,11 +112,10 @@ end
 
 
 """
-Learns a mixture of circuits
+    learn_strudel(train_x; num_mix = 5, init_maxiter = 10, em_maxiter=20)
 
-    learn_strudel (train_x; init_maxiter = 10, em_maxiter=20)
-
-See Strudel: Learning Structured-Decomposable Probabilistic Circuits. https://arxiv.org/abs/2007.09331
+Learn a mixture of circuits
+See "Strudel: Learning Structured-Decomposable Probabilistic Circuits. [arxiv.org/abs/2007.09331](https://arxiv.org/abs/2007.09331)
 """
 function learn_strudel(train_x; num_mix = 5,
     pseudocount=1.0,
diff --git a/src/queries/expectation_rec.jl b/src/queries/expectation_rec.jl
@@ -26,9 +26,13 @@ choose_cache = [ 1.0 * binomial(i,j) for i=0:max_k+1, j=0:max_k+1 ]
 end
 
 
-# On Tractable Computation of Expected Predictions (https://arxiv.org/abs/1910.02182)
 """
+    Expectation(pc::ProbCircuit, lc::LogisticCircuit, data)
+    
+Compute Expected Prediction of a Logistic/Regression Circuit w.r.t to a ProbabilistcCircuit
+
 Missing values should be denoted by missing
+See: On Tractable Computation of Expected Predictions [arxiv.org/abs/1910.02182](https://arxiv.org/abs/1910.02182)
 """
 function Expectation(pc::ProbCircuit, lc::LogisticCircuit, data)
     # 1. Get probability of each observation
@@ -49,6 +53,11 @@ function Expectation(pc::ProbCircuit, lc::LogisticCircuit, data)
     results, cache
 end
 
+"""
+    Moment(pc::ProbCircuit, lc::LogisticCircuit, data, moment::Int)
+
+Compute higher moments of Expected Prediction for the pair of Logistic/Regression Circuit, ProbabilistcCircuit
+"""
 function Moment(pc::ProbCircuit, lc::LogisticCircuit, data, moment::Int)
     # 1. Get probability of each observation
     log_likelihoods = marginal(pc, data)
diff --git a/src/queries/marginal_flow.jl b/src/queries/marginal_flow.jl
@@ -52,6 +52,8 @@ Computes Marginal log likelhood of data.
 const MAR = marginal
 
 """
+    marginal_log_likelihood(pc, data)
+    
 Compute the marginal likelihood of the PC given the data
 """
 marginal_log_likelihood(pc, data) = begin
diff --git a/src/queries/sample.jl b/src/queries/sample.jl
@@ -7,7 +7,12 @@ import Random: default_rng
 # Circuit sampling
 #####################
 
-"Sample states from the circuit distribution."
+"""
+    sample(pc::ProbCircuit, num_samples)
+    sample(pc::ProbCircuit, num_samples, evidences)
+
+Sample states from the probabilistic circuit distribution. Also can do conditional sampling if evidence is given (any subset of features).
+"""
 function sample(pc::ProbCircuit; rng = default_rng())
     states, prs = sample(pc, 1, [missing for i=1:num_variables(pc)]...; rng)
     return states[1,:], prs[1]