Skip to content

Commit bddf71f

Browse files
Merge pull request #658 from ixcat/v0.12-doc-update
[WIP] docs-parts updates for v0.12
2 parents 92989d0 + d821545 commit bddf71f

File tree

5 files changed

+126
-21
lines changed

5 files changed

+126
-21
lines changed

README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,65 @@ If you already have an older version of DataJoint installed using `pip`, upgrade
2222
```bash
2323
pip3 install --upgrade datajoint
2424
```
25+
## Python Native Blobs
26+
27+
For the v0.12 release, the variable `enable_python_native_blobs` can be
28+
safely enabled for improved blob support of python datatypes if the following
29+
are true:
30+
31+
* This is a new DataJoint installation / pipeline(s)
32+
* You have not used DataJoint prior to v0.12 with your pipeline(s)
33+
* You do not share blob data between Python and Matlab
34+
35+
Otherwise, please read the following carefully:
36+
37+
DataJoint v0.12 expands DataJoint's blob serialization mechanism with
38+
improved support for complex native python datatypes, such as dictionaries
39+
and lists of strings.
40+
41+
Prior to DataJoint v0.12, certain python native datatypes such as
42+
dictionaries were 'squashed' into numpy structured arrays when saved into
43+
blob attributes. This facilitated easier data sharing between Matlab
44+
and Python for certain record types. However, this created a discrepancy
45+
between insert and fetch datatypes which could cause problems in other
46+
portions of users pipelines.
47+
48+
For v0.12, it was decided to remove the type squashing behavior, instead
49+
creating a separate storage encoding which improves support for storing
50+
native python datatypes in blobs without squashing them into numpy
51+
structured arrays. However, this change creates a compatibility problem
52+
for pipelines which previously relied on the type squashing behavior
53+
since records saved via the old squashing format will continue to fetch
54+
as structured arrays, whereas new record inserted in DataJoint 0.12 with
55+
`enable_python_native_blobs` would result in records returned as the
56+
appropriate native python type (dict, etc). Read support for python
57+
native blobs also not yet implemented in DataJoint for Matlab.
58+
59+
To prevent data from being stored in mixed format within a table across
60+
upgrades from previous versions of DataJoint, the
61+
`enable_python_native_blobs` flag was added as a temporary guard measure
62+
for the 0.12 release. This flag will trigger an exception if any of the
63+
ambiguous cases are encountered during inserts in order to allow testing
64+
and migration of pre-0.12 pipelines to 0.11 in a safe manner.
65+
66+
The exact process to update a specific pipeline will vary depending on
67+
the situation, but generally the following strategies may apply:
68+
69+
* Altering code to directly store numpy structured arrays or plain
70+
multidimensional arrays. This strategy is likely best one for those
71+
tables requiring compatibility with Matlab.
72+
* Adjust code to deal with both structured array and native fetched data.
73+
In this case, insert logic is not adjusted, but downstream consumers
74+
are adjusted to handle records saved under the old and new schemes.
75+
* Manually convert data using fetch/insert into a fresh schema.
76+
In this approach, DataJoint's create_virtual_module functionality would
77+
be used in conjunction with a a fetch/convert/insert loop to update
78+
the data to the new native_blob functionality.
79+
* Drop/Recompute imported/computed tables to ensure they are in the new
80+
format.
81+
82+
As always, be sure that your data is safely backed up before modifying any
83+
important DataJoint schema or records.
2584

2685
## Documentation and Tutorials
2786
A number of labs are currently adopting DataJoint and we are quickly getting the documentation in shape in February 2017.
Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,17 @@
11
.. code-block:: python
22
3-
# default external storage
4-
dj.config['external'] = dict(
5-
protocol='s3',
6-
endpoint='https://s3.amazonaws.com',
7-
bucket = 'testbucket',
8-
location = '/datajoint-projects/myschema',
9-
access_key='1234567',
10-
secret_key='foaf1234')
3+
dj.config['stores'] = {
4+
'external': dict( # 'regular' external storage for this pipeline
5+
protocol='s3',
6+
endpoint='https://s3.amazonaws.com',
7+
bucket = 'testbucket',
8+
location = '/datajoint-projects/myschema',
9+
access_key='1234567',
10+
secret_key='foaf1234'),
11+
'external-raw'] = dict( # 'raw' storage for this pipeline
12+
protocol='file',
13+
location='/net/djblobs/myschema')
14+
}
15+
# external object cache - see fetch operation below for details.
16+
dj.config['cache'] = '/net/djcache'
1117
12-
# raw data storage
13-
dj.config['extnernal-raw'] = dict(
14-
protocol='file',
15-
location='/net/djblobs/myschema')
16-
17-
# external object cache - see fetch operation below for details.
18-
dj.config['cache'] = dict(
19-
protocol='file',
20-
location='/net/djcache')
Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
1+
2+
To remove only the tracking entries in the external table, call `delete`
3+
on the external table for the external configuration with the argument
4+
`delete_external_files=False`.
5+
6+
.. note::
7+
8+
Currently, cleanup operations on a schema's external table are not 100%
9+
transaction safe and so must be run when there is no write activity occurring
10+
in tables which use a given schema / external store pairing.
11+
112
.. code-block:: python
213
3-
>>> schema.external_table.delete_garbage()
14+
>>> schema.external['external_raw'].delete(delete_external_files=False)
15+
16+
To remove the tracking entries as well as the underlying files, call `delete`
17+
on the external table for the external configuration with the argument
18+
`delete_external_files=True`.
19+
20+
.. code-block:: python
21+
22+
>>> schema.external['external_raw'].delete(delete_external_files=True)
23+
24+
.. note::
25+
26+
Setting `delete_external_files=True` will always attempt to delete
27+
the underlying data file, and so should not typically be used with
28+
the `filepath` datatype.
29+

docs-parts/admin/5-blob-config_lang5.rst

Lines changed: 0 additions & 3 deletions
This file was deleted.

docs-parts/intro/Releases_lang1.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
1+
0.12.0 -- October 1, 2019
2+
-------------------------
3+
* Dropped support for Python 3.4
4+
* Support secure connections with TLS (aka SSL) PR #620
5+
* Convert numpy array from python object to appropriate data type if all elements are of the same type (#587) PR #608
6+
* Remove expression requirement to have additional attributes (#604) PR #604
7+
* Support for filepath datatype (#481) PR #603, #659
8+
* Support file attachment datatype (#480, #592, #637) PR #659
9+
* Fetch return a dict array when specifying `as_dict=True` for specified attributes. (#595) PR #593
10+
* Support of ellipsis in `proj`: `query_expression.proj(.., '-movie')` (#499) PR #578
11+
* Expand support of blob serialization (#572, #520, #427, #392, #244, #594) PR #577
12+
* Support for alter (#110) PR #573
13+
* Support for `conda install datajoint` via `conda-forge` channel (#293)
14+
* `dj.conn()` accepts a `port` keyword argument (#563) PR #571
15+
* Support for UUID datatype (#562) PR #567
16+
* `query_expr.fetch("KEY", as_dict=False)` returns results as `np.recarray`(#414) PR #574
17+
* `dj.ERD` is now called `dj.Diagram` (#255, #546) PR #565
18+
* `dj.Diagram` underlines "distinguished" classes (#378) PR #557
19+
* Accept alias for supported MySQL datatypes (#544) PR #545
20+
* Support for pandas in `fetch` (#459, #537) PR #534
21+
* Support for ordering by "KEY" in `fetch` (#541) PR #534
22+
* Improved external storage - a migration script needed from version 0.11 (#467, #475, #480, #497) PR #532
23+
* Increase default display rows (#523) PR #526
24+
* Bugfixes (#521, #205, #279, #477, #570, #581, #597, #596, #618, #633, #643, #644, #647, #648, #650, #656)
25+
* Minor improvements (#538)
26+
127
0.11.1 -- Nov 15, 2018
228
----------------------
329
* Fix ordering of attributes in proj (#483 and #516)

0 commit comments

Comments
 (0)