Skip to content

Commit 427ddc0

Browse files
committed
Merge branch 'develop'
2 parents fccd74f + f841428 commit 427ddc0

File tree

504 files changed

+72623
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

504 files changed

+72623
-3
lines changed

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
include LICENSE

README.md

Lines changed: 198 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,200 @@
1-
# Pyconstruct
2-
A Python library for declarative, constrained, structured-output prediction.
1+
Pyconstruct
2+
===========
33

4-
## Coming soon
4+
<div align="center">
5+
<img height="300px" src="docs/_static/images/pyconstruct.png"><br><br>
6+
</div>
7+
8+
**Pyconstruct** is a Python library for declarative, constrained,
9+
structured-output prediction. When using Pyconstruct, the problem specification
10+
can be encoded in MiniZinc, a high-level constraint programming language. This
11+
means that domain knowledge can be declaratively included in the inference
12+
procedure as constraints over the optimization variables.
13+
14+
Sounds complicated? A simple example will clear up the doubts!
15+
16+
17+
Getting started
18+
---------------
19+
20+
In the following example we will implement a simple OCR (Optical Character
21+
Recognition) model in few lines of MiniZinc.
22+
23+
First of all, lets fetch the data. Pyconstruct has a utility for getting some
24+
standard datasets::
25+
26+
```python
27+
from pyconstruct import datasets
28+
ocr = datasets.load('ocr')
29+
```
30+
31+
The first time the dataset is loaded it will actually be fetched from the web
32+
and stored locally. You can now see the description of the dataset::
33+
34+
```python
35+
print(ocr.descr)
36+
```
37+
38+
By default, structured objects are represented as Python dictionaries in
39+
Pyconstruct. Each objects as several "attributes", identified with some string.
40+
Each attribute value may be any basic Python data type: strings, integers,
41+
floats, list or other dictionaries. In OCR, for instance, inputs `X` are
42+
represented as dictionaries containing two attributes: an integer containing the
43+
`length` of the word; a list of `16x8` matrices (`numpy.ndarray`) containing the
44+
bitmap images of each character in the word. The targets (labels) are also
45+
structured objects containig a single attribute `sequence`, a list of integers
46+
representing the letters associated to each image in the word. For instance::
47+
48+
```python
49+
print(ocr.data[0])
50+
print(ocr.targets[0])
51+
```
52+
53+
After getting the data, we can start coding our problem. First of all, in
54+
Pyconstruct there are three main kinds of objects to interact with: Domains,
55+
Models and Learners. At a high level: a Domain defines the attributes and the
56+
constraints of the structured objects; a Model is an object contaning some
57+
parameters that can be used to make inference over a Domain; a Learner is an
58+
algorithm that can learn a Model from data. A Domain is also responsible of
59+
solving inference problems with respect to some Model, so the two classes are
60+
interdependent, but in general a Domain can be made working for different
61+
Models.
62+
63+
Several Models and Learners are already defined by Pyconstruct. All that is
64+
required for start training a model, apart from the data, is a Domain encoded in
65+
MiniZinc which defines how the attributes of the objects interact, which are the
66+
constraints and the features of the objects. To do so, we need to create a
67+
`ocr.pmzn` file::
68+
69+
```HTML+Django
70+
{% from 'pyconstruct.pmzn' import n_features, features, domain, solve %}
71+
72+
{{ n_features('16 * 8 * 26') }}
73+
74+
{% call domain(problem) %}
75+
76+
int: length;
77+
array[1 .. length, 1 .. 16, 1 .. 8] of var {0, 1}: images;
78+
79+
array[1 .. length] of var 1 .. 26: sequence;
80+
81+
82+
{% call features(feature_type='int') %}
83+
[
84+
sum(e in 1 .. length)(images[e, i, j] * (sequence[e] == s))
85+
| i in 1 .. 16, j in 1 .. 8, s in 1 .. 26
86+
]
87+
{% endcall %}
88+
89+
{% endcall %}
90+
91+
{{ solve(problem, model, discretize=True) }}
92+
```
93+
94+
That's it! Now we can instantiate a `Domain` with our new `ocr.pmzn` file::
95+
96+
```python
97+
from pyconstruct import Domain
98+
ocr_dom = Domain('ocr.pmzn')
99+
```
100+
101+
If you know MiniZinc, the above code will probably look a bit odd. That is
102+
because Pyconstruct by default uses a superset of MiniZinc defined by the PyMzn
103+
library. Essentially, that is MiniZinc with some tempating provided by the
104+
Jinja2 library. Check out PyMzn for an explanation on how to use fully it. Here
105+
we'll explain the basics.
106+
107+
The first line
108+
`{% from 'pyconstruct.pmzn' import n_features, features, domain, solve %}`
109+
imports few useful macros from the `pyconstruct.pmzn` file.
110+
111+
The second line `{{ n_features('16 * 8 * 26') }}` calls the `n_features` macro,
112+
which compiles into::
113+
114+
```HTML+Django
115+
int: N_FEATURES = 16 * 8 * 26;
116+
set of int: FEATURES 1 .. N_FEATURES;
117+
```
118+
119+
The MiniZinc code enclosed in the tags
120+
`{% call domain(problem) %} ... {% endcall %}` is processed on the basis of the
121+
value of `problem` the domain is called with. The variable `problem` is usually
122+
passed to the domain by an internal call of Pyconstruct through PyMzn. In this
123+
block goes the domain definition, including the variables and parameters of the
124+
objects, the constraints and the features. Notice that we have two MiniZinc
125+
parameters `length` and `images`, which match the attributes of the input
126+
objects of the OCR dataset, and one optimization variable `sequence` which
127+
matches the attribute of the output objects of the OCR dataset. This is valid
128+
for any problem: the examples are the inputs that are provided as dzn data,
129+
whereas the targets are the outputs of the model, which translate into
130+
optimization variables when solving inference.
131+
132+
Inside the domain call we also call the `features` macro, which compiles into::
133+
134+
```HTML+Django
135+
array[FEATURES] of var int: phi = [
136+
sum(e in 1 .. length)(images[e, i, j] * (sequence[e] == s))
137+
| i in 1 .. 16, j in 1 .. 8, s in 1 .. 26
138+
];
139+
```
140+
141+
These are typical features used in OCR, for each symbol `s` and each pixel `(i,
142+
j)` in the images containing the number of times in the sequence the `(i, j)`
143+
pixel is active for characters labeled with symbol `s`.
144+
145+
The last line calls the `solve` macro, which compiles to a different solve
146+
statement depending on the `problem` and `model`. Possible values for `problem`
147+
are, for instance, `map` to find the object with highest score (dot product
148+
between weights and features) or `phi` to compute the feature vector given an
149+
input and an output object. The `model` is a dictionary containing the model's
150+
parameters, such as the weights `w` for a `LinearModel`. Also this object is
151+
usually passed to the domain by Pyconstruct.
152+
153+
The above model is actually a partial example of the complete `ocr` domain
154+
available in Pyconstruct out-of-the-box. You can load the domain by simply::
155+
156+
```python
157+
ocr_dom = Domain('ocr')
158+
```
159+
160+
After defining the domain, using the predefined one or the `ocr.pmzn` file, we
161+
can start learning by instantiating a learner, say a `StructuredPerceptron`, and
162+
fitting the data::
163+
164+
```python
165+
from pyconstruct import StructuredPerceptron
166+
sp = StructuredPerceptron(domain=ocr_dom)
167+
sp.fit(ocr.data, ocr.targets)
168+
```
169+
170+
This will take a while... If you need a quick benchmark, Pyconstruct contains
171+
pretrained models for many domains and learners (link).
172+
173+
174+
Install
175+
-------
176+
Pyconstruct can be installed through `pip`:
177+
178+
```bash
179+
pip install pyconstruct
180+
```
181+
182+
Or by downloading the code from Github and running the following from the
183+
downloaded directory:
184+
185+
```bash
186+
python setup.py install
187+
```
188+
189+
After installing Pyconstruct you will need to install **MiniZinc** as well.
190+
Download the latest release of MiniZincIDE and follow the instructions.
191+
192+
Authors
193+
-------
194+
This project is developed at the SML research group at the University of Trento
195+
(Italy). Main developers and maintainers:
196+
197+
* Paolo Dragone
198+
* Stefano Teso (now at KU Leuven)
199+
* Andrea Passerini
5200

docs/.buildinfo

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: d95d9fafc74e858d83bcd9e7baed02a8
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7

docs/.nojekyll

Whitespace-only changes.

0 commit comments

Comments
 (0)