Python Reference

base.config
base.dataset
- class Dataset
base.file
- class File
- class Files
base.hash
- func calc_file_hash
base.parser
- class Parser
base.project

check_project_exists()

function base.config.check_project_exists(user_id="string", project_name="string")

Check project is already exists or not

Parameters

user_id (string) - requeired
- aquired user id from environment variable or config file
project_name (string) - requeired
- target project name

Returns

project_exists (bool)
- project already exists or not

→ Back to top

delete_project_config()

function base.config.delete_project_config(user_id="string", project_name="string")

Delete config of specified project.

Parameters

user_id (string) - requeired
- aquired user id from environment variable or config file
project_name (string) - requeired
- target project name

→ Back to top

get_access_key()

function base.config.get_access_key()

Get access key from config file. If you have 'BASE_ACCESS_KEY' on environment variables, Base will use it

Returns

access_key (string)
- aquired API access key from environment variable or config file

→ Back to top

get_project_uid()

function base.config.get_project_uid(user_id="string", project_name="string")

Get project uid from project name.

Parameters

user_id (string) - requeired
- aquired user id from environment variable or config file
project_name (string) - requeired
- target project name

Returns

project_uid (listringst)
- project uid of given project name

→ Back to top

get_user_id()

function base.config.get_user_id()

Get user id from config file. If you have 'BASE_USER_ID' on environment variables, Base will use it

Returns

user_id (string)
- aquired user id from environment variable or config file

→ Back to top

get_user_id_from_db()

function base.config.get_user_id_from_db(access_key="string")

Get user id from remote db.

Parameters

access_key (string) - requeired
- aquired API access key from environment variable or config file

Returns

user_id (string)
- aquired user id from database

→ Back to top

register_access_key()

function base.config.register_access_key(access_key="string")

Parameters

access_key (string) - requeired
- API access key

→ Back to top

register_project_uid()

function base.config.register_project_uid(user_id="string", project="string", project_uid="string")

Parameters

user_id (string) - requeired
- aquired user id from environment variable or config file
project (string) - requeired
- target project name
project_uid (string) - requeired
- target project uid

→ Back to top

register_user_id()

function base.config.register_user_id(user_id="string")

Parameters

user_id (string) - requeired
- target user id

→ Back to top

update_project_info()

function base.config.update_project_info(user_id="string")

Update local project info with remote.

Parameters

user_id (string) - requeired
- aquired user id from environment variable or config file

→ Back to top

Dataset class

class base.dataset.Dataset

This is a middle-level (numpy or other) interface for dataset in Base. Dataset class receive Files class as an argument and process each data file with specified transform functions. You can create high-level (torch tensor or other) interface for dataset, like Dataloader of Pytorch, using this Dataset object.

import base

project = base.Project("project-name")
files = project.files(conditions="string", query=["string"], sort_key="string")
dataset = base.Dataset(files=files, target_key="string", transform=None|Callable)

These are the available attributes:

transform (Callable)
- preprocess function
target_key (string)
- object variable for modeling
files (Files)
- inherited dataset interface

These are the available methods:

train_test_split()

train_test_split()

x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=float)

This method splits dataset for 2 folds. You can adjust split ratio with split_rate option.

Parameters

split_rate (float) - default 0.25
- the ratio of test set

Returns

x_train (list)
- transformed train data
x_test (list)
- transformed test data
y_train (list)
- train label specified as target_key in Dataset class initialization
y_test (list)
- test label specified as target_key in Dataset class initialization

Usage
Using the index operator [] on the Dataset class object, you can get the data transformed by user-defined preprocessing functions and label specified by target key.

def preprocess_func(path):
    image = Image.open(path)
    image = image.resize((28, 28))
    image = np.array(image)
    return image

test_files = Project("mnist").files(conditions="test")
test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func)

print(test_dataset[0])
>>>(array([[  0,   0, ...]]), '7'

If transform is not specified, local path is returned by default.

test_files = Project("mnist").files(conditions="test")
test_dataset = Dataset(test_files, target_key="label")

print(test_dataset[0])
>>> '/Users/user/dataset/mnist/test/7/4815.png', '7'

For example:

You can get X and y using for loops as in the following example.

def preprocess_func(path):
    image = Image.open(path)
    image = image.resize((28, 28))
    image = np.array(image)
    return image

def get_image_and_label(dataset, idx):
    X, label = dataset[idx] # label = "0" or "1" or "2" , ...
    y = int(label)
    # cerate one-hot vector
    y = np.eye(10)[y]
    return X, y

test_files = Project("mnist").files(conditions="test")
test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func)

X_test = np.empty((len(test_dataset), 28, 28, 1))
y_test = np.empty((len(test_dataset), 10))
for i in range(len(test_dataset)):
    X_test[i], y_test[i] = get_image_and_label(test_dataset, i)

If you use train_test_split(), y_train and y_test are list of string obtained by target_key by default.

files = Project("mnist").files()
dataset = Dataset(files, target_key="label", transform=preprocess_func)
X_train, y_train, X_test, y_test = dataset.train_test_split(0.25)

print(y_train)
>>> ["1", "3", "4",...]

→ Back to top

File class

class base.files.File

Using the index operator [] on the Files class object, you can get the File class object at a specific index.

print(files[0])
>>> "/home/xxxx/dataset/mnist/0/12909.png"

These are the available attributes:

path (string)

local filepath.

For example:

files[0].path
>>> "/home/xxxx/dataset/mnist/0/12909.png"

metadata (dict) whole dict of attributes (metadata) which related with this file.

For example:

files[0].metadata
>>> {
        "dataType": "train",
        "label": "0",
        "id": "12909"
    }

attrs (string)
- attributes (metadata) which related with this file.
For example:
```
files[0].label
>>> "0"

files[0].id
>>> "12909"
```

→ Back to top

Files class

class base.files.Files

This is a low-level (file path) interface for dataset in Base. A Files object includes the File instances which matched with your dataset filter.

import base

project = base.Project("project-name")
files = project.files(conditions="string", query=["string"], sort_key="string")

You can filter data files and get Files object simply by specified criteria using files method of base.Project.

Using the index operator [] on the Files class object, you can get the File class object at a specific index.

For example:

files[0]
>>> "/home/xxxx/dataset/mnist/0/12909.png"

files[0].label
>>> "0"

files[0].id
>>> "12909"

These are the available attributes:

project_name (string)
- registerd project name
user_id (string)
- registerd user id
project_uid (string)
- project unique hash
conditions (string) - default None
- value to search for files
query (string) - default []
- expression of key and value to search for files
sort_key (string) - default None
- key to sort files
files (list)
- list of File class objects

result (list)

list of metadata_dict filtered by criteria

[
		{
				"FilePath": String,
				"MetaKey1": ...,
				...
		},
		...
]

paths (list)
- list of local filepaths
```
[
		"String",
		...
]
```
items (list)
- list of metadata_dict other than filepath
```
[
		{
				"MetaKey1": ...,
				...
		},
		...
]
```

This is the available methods:

filter()

filter()

files = files.filter(conditions="string", query=["string"], sort_key="string")

This method apply additional filter to already filtered Files object. You can use this method repeatedly.

Parameters

conditions (string) - optional
- value to search for files.
For example:
```
conditions="0"
```
If you want to search by multiple criteria, you must provide comma (,) separated strings.

For example:
```
conditions="0,1,2"
```
You will get files that meet at least one of the criteria.

Note

There must be no single-byte spaces between values.
query (list) - default []
- expression of key and value to search for files.
For example:
```
query=["label == 0"]
```
You can use ==, !=, >, >=, <, <=, is, is not, in, and not in as operators.

If you want to search by multiple criteria, you must provide the list of expressions. For example:
```
query=["label == 0", "id >= 10000"]
```
You will get files that meet all the criteria.

Note

A single-byte space is required before and after the operator.
sort_key (string) - optional
- key to sort files.
For example:
```
sort_key="label"
```

Returns

Files class

There are available operators

＋ (concatenation)
| (union)

+ (concatenation)

Return a new Files object that is the concatenation of the 2 Files object. You can use this operator recursively.

This operation is not sensitive to element duplication. If both Files objects has same File object, the operated Files object has 2 same File object.

Expression

concated_files = files1 + files2

# You can operate recursively.
concated_files = files1 + files2 + files3
concated_files2 = concated_files + files4

Examples

files1 = project.files(conditions="0,1,2", query=['dataType == test'], sort_key="id")
files2 = project.files(conditions="0,1,2", query=['dataType == train'], sort_key="id")

files = files1 + files2
print(files)
>>> ======Files======
    Files1(project_name='mnist', conditions='0,1,2', query=['dataType == test'], sort_key='id', file_num=3148)
    Files2(project_name='mnist', conditions='0,1,2', query=['dataType == train'], sort_key='id', file_num=18624)
    ===Expressions===
    Files1 + Files2

print(len(files))
>>> 21772

| (union)

Return a new Files object that is the union of the 2 Files object. You can use this operator recursively.

This operation guaranteed that all File objects that operated Files object has are unique.

Expression

union_files = files1 | files2

# You can operate recursively.
union_files = files1 | files2 | files3
union_files2 = union_files | files4

Examples

files1 = project.files(conditions="0,1,2", sort_key="id")
files2 = project.files(conditions="0", sort_key="id")

files = files1 | files2
print(files)
>>> ======Files======
    Files1(project_name='mnist', conditions='0,1,2', query=[], sort_key='id', file_num=21772)
    Files2(project_name='mnist', conditions='0', query=[], sort_key='id', file_num=6905)
    ===Expressions===
    Files1 or Files2

print(len(files))
>>> 21772

→ Back to top

calc_file_hash()

function base.hash.calc_file_hash(path="string", algorithm="md5"|"sha224"|"sha256"|"sha384"|"sha512"|"sha1", split_chunk=False|True, chunk_size=int)

Calculate hash value of each file

Parameters

path (string) - requeired
- target file path
algorithm (string) - default "sha256"
- hash algorithm name
split_chunk (bool) - default True
- if True, split large file to byte chunks
chunk_size (integer) - default 2048
- block byte size of chunk

Returns

digest (string)
- hash string of inputed file

→ Back to top

Parser class

class base.parser.Parser

This is a file path parser. When you call add_datafiles method of base.Project, Base will initialize Parser object with specified parsing rule and try to extract metadata from each file path with __call__ method.

from base.parser import Parser

parser = Parser(parsing_rule="string", sep=None|"string")
result = parser(path="string")

init()

Initialize self with parsing_rule and generate parser.

base.parser.Parser(parsing_rule="string", sep=None|"string")

Replace unused strings with {_} in parsing_rule
Extract keys enclosed in {}

Example of processing method

1. parsing_rule: hoge{num1}/fuga{num2}.txt
    -> {hoge}/{num1}/{fuga}/{num2}.txt

2. {_}/{num1}/{_}/{num2}.txt
    -> ["_", "num1", "_", "num2"]

Parameter

parsing_rule (string) - required
- specified parsing rule
  ex.) {}/{name}/{timestamp}/{sensor}-{condition}{iteration}.csv
sep (string) - optional
- the separator of the file path

call()

Parse your target path.

parser(path="string")

Convert file path string to parsable format.
Extract values enclosed in {} in the parsable formatted path.
Generate a dictionary from keys and values extracted with parsing_rule.

Example of processing method

1. path: mnist/train/0/12909.png
    -> {mnist}/{train}/{0}/{12909}.png

2. parsable format: {mnist}/{train}/{0}/{12909}.png
    -> ["mnist", "train", "0", "12909"]

3. keys  : ["_", "dataType", "label", "id"]
   values: ["mnist", "train", "0", "12909"]
    -> {"dataType": "train", "label": "0", "id": "12909"}

Parameters

path (string) - required
- the file path

Return

parsed_dict (dict)
- meta data dictionary

These are the available methods:

is_path_parsable()
update_rule()

is_path_parsable()

Verify specified parsing rule is working properly. If not, return False

parser.is_path_parsable(path="string")

Parameter

path (string) - required
- the file path.

Return

parsable_flag (bool)
- True if the file path is parsable

update_rule()

Generate a parser that takes into account the number of splitter based on the parsing example.

Use this method when is_path_parsable("your-path") is false.

parser.update_rule(parsing_rule="string")

Parameters

parsing_rule (string) - required
- detail parsing rule.
  ex.) {Origin}/{train}/{2022_04_05}-{dog}_{a01}.png

→ Back to top

Project class

class base.project.Project

A basement class of project. You have to initialize with existing project name. If you specified a project name which you don't have, you will get ValueError. Please retry after call base.project.create_project function.

import base

project = base.Project("project-name")

These are the available attributes:

project_name (string)
- registerd project name
user_id (string)
- registerd user id
project_uid (string)
- project unique hash

These are the available methods:

add_datafile()
add_datafiles()
add_member()
add_metafile()
extract_metafile
estimate_join_rule
files()
get_members()
get_metadata_summary()
link_datafiles()
remove_member()
update_member()

add_datafile()

Import meta data of one file.

project.add_datafile(file_path="string", attributes={"string":"string"})

Calculate the file hash.
Create meta data record with the file hash and attributes.
Add that record into project database table.

{
	"FileHash": String,
	"MetaKey1": ...,
	...
}

Parameters

file_path (string) - requeired
- the file path
attributes (dict) - default {}
- the extra meta data (attributes)

Raises

Exception
- raises if something went wrong on uploading request to server

add_datafiles()

Import meta data related with datafile paths.

project.add_datafiles(dir_path="string", extension="string", attributes={"string":"string"}, parsing_rule="string", detail_parsing_rule="string")

Calculate the file hash.
Parse the file path with parsing-rule.
Create meta data records with the file hash, attributes, and parsed path data.
Add that records into project database table.

{
	"FileHash": String,
	"MetaKey1": ...,
	...
}

Parameters

dir_path (string) - requeired
- the root directory path for datafiles
extension (string) - requeired
- the extension of datafiles
attributes (dict) - default {}
- the extra meta data (attributes) combined with whole datafiles
parsing_rule (string) - optional
- the rule for extracting meta data from datafile path ex.) {_}/{disease}/{patient-id}-{part}-{iteration}.png
detail_parsing_rule (string) - optional
- detail information about parsing rule ex.) {_}/{CancerA}/{1-123}-{1}-{100}.png

Returns

file_num (integer)
- number of imported datafiles

Raises

ValueError
- raises if invalid parsing rule was specified
Exception
- raises if something went wrong on uploading request to server

add_member()

Invite a new project member.

project.add_member(member="string", permission_level="string")

Parameters

member (string) - requeired
- the user id of new member
permission_level (string) - requeired
- new member's permission level
  - Viewer only read meta data on project database. viewer can not import data files or external files and can not control permission of other members.
  - Editor can read and write meta data into project database. editor can not control permission of other members.
  - Admin can read and write meta data into project database. admin can also control permission of other members, but can not transfer Owner permission level.

Raises

ValueError
- raises if invalid permission level was specified
Exception
- raises if something went wrong on invite request to server

add_metafile()

Import meta data from external file.

project.add_metafile(file_path=["string"], attributes={"string":"string"})

Parameters

file_path (list) - requeired
- list of the external file path
attributes (string) - default {}
- the extra meta data (attributes) combined with whole datafiles

Raises

ValueError
- raises if specified external file is not csv or excel file
Exception
- raises if something went wrong on uploading request to server

extract_metafile()

Only Extract meta data from external file.

project.extract_metafile(file_path="string", attributes={"string":"string"})

Parameters

file_path (string) - requeired
- the external file path
attributes (string) - default {}
- the extra meta data (attributes) combined with whole datafiles

Returns

tables (list)
- list of table data extracted from external file

[
    [
        {
            "MetaKey1": ...,
            "MetaKey2": ...,
            ...
        },
        ...
    ],
    ...
]

Raises

ValueError
- raises if specified external file is not csv or excel file
Exception
- raises if something went wrong on uploading request to server

estimate_join_rule()

Only estimate the join rule from external file and existing table.

project.extract_metafile(file_path="string", tables=list)

Parameters

Either file_path or tables are required. If both are specified, tables take precedence.

file_path (string)
- the external file path
tables (list)
- output of base.Project().extract_metafile() method

Returns

join_rule (list)
- list of the join rule estimated from external file and existing table.

[
        {
            "new key1":"exist key1" ...,
            ...
        },
        ...
]

Raises

ValueError
- raises if specified external file is not csv or excel file
Exception
- raises if something went wrong on uploading request to server

files()

Return the Files class. You can filter files easily and simply by specified criteria.

files = project.files(conditions="string", query=["string"], sort_key="string")

Parameters

conditions (string) - optional
- value to search for files
For example:
```
conditions="0"
```
If you want to search by multiple criteria, you must provide comma (,) separated strings.

For example:
```
conditions="0,1,2"
```
You will get files that meet at least one of the criteria.
query (list) - default []
- expression of key and value to search for files
For example:
```
query=["label == 0"]
```
You can use ==, !=, >, >=, <, <=, is, is not, in, and not in as operator.

If you want to search by multiple criteria, you must provide the list of expressions.

For example:
```
query=["label == 0", "id >= 10000"]
```
You will get files that meet all the criteria.

Note

A single-byte space is required before and after the operator.
sort_key (string) - optional
- key to sort files.
For example:
```
sort_key="label"
```

Returns

Files class

get_members()

Get list of project members.

project.get_members()

Returns

member_list (list)
- list of each members information

[
    {
        "UserID": String,
        "UserRole": String,
        "CreatedTime": String of unix time
    },
    ...
]

Raises

Exception
- raises if something went wrong with request to server

get_metadata_summary()

Get list of meta data information.

project.get_metadata_summary()

Returns

key_list (list)
- list of each members information

[
    {
        "KeyHash": String,
        "KeyName": String,
        "ValueHash": String,
        "ValueType": String,
        "RecordedCount": Integer,
        "UpperValue": String,
        "LowerValue": String,
        "CreatedTime": String of unix time,
        "LastModifiedTime": String of unix time,
        "Creator": String,
        "LastEditor": String,
        "EditerList": List of String
    },
    ...
]

Raises

Exception
- raises if something went wrong with request to server

link_datafiles()

Create linker metadat to local datafiles.

project.link_datafiles(dir_path="string", extension="string")

Parameters

dir_path (string) - requeired
- the root directory path for datafiles
extension (string) - requeired
- the extension of datafiles

Returns

file_num (integer)
- number of linked datafiles

remove_member()

Remove project member.

project.remove_member(member=["string"]|"string")

Parameters

member (list or string) - requeired
- the target member for removing

Raises

Exception
- raises if something went wrong on removing request to server

update_member()

Update project member's permission.

project.update_member(member="string", permission_level="Viewer"|"Editor"|"Admin"|"Owner")

Parameters

member (string) - requeired
- the user id of existing member
permission_level (string) - requeired
- member's permission level for update
  - Viewer only read meta data on project database. viewer can not import data files or external files and can not control permission of other members.
  - Editor can read and write meta data into project database. editor can not control permission of other members.
  - Admin can read and write meta data into project database. admin can also control permission of other members, but can not transfer Owner permission level.
  - Owner can transfer owner permission to others, and delete project completely.

Raises

ValueError
- raises if invalid permission level was specified
Exception
- raises if something went wrong on invite request to server

→ Back to top

archive_project()

function base.project.archive_project(user_id="string", project_name="string")

Archive project.

Parameters

user_id (string) - requeired
- registerd user id
project_name (string) - requeired
- project name you want to archive

Raises

Exception
- raises if something went wrong on request to server

create_project()

function base.project.create_project(user_id="string", project_name="string", private=True|False)

Parameters

user_id (string) - requeired
- registerd user id
project_name (string) - requeired
- project name which you want to create
private (bool) - default True
- specifies whether or not to allow public access into the project

Returns

project_uid (string)
- project unique hash

Raises

Exception
- raises if something went wrong on request to server

delete_project()

function base.project.delete_project(user_id="string", project_name="string")

Delete project.

Parameters

user_id (string) - requeired
- registerd user id
project_name (string) - requeired
- archived project name you want to delete

Raises

Exception
- raises if something went wrong on request to server

get_projects()

function base.project.get_projects(user_id="string", archived=False|True)

Get list of projects.

Parameters

user_id (string) - requeired
- registerd user id
archived (bool) - default False
- if False, return not archived projects. if False, return archived projects

Returns

project_list (list)
- list of project name you have

Raises

Exception
- raises if something went wrong on request to server

→ Back to top

summarize_keys_information()

function base.project.summarize_keys_information(metadata_summary="list")

Summarize information of keys on project for printing.

Parameters

metadata_summary (list) - requeired
- output of the base.Project().get_metadata_summary() method

Returns

summary_for_print (dict)
- summarized key information for printing

{
    "MaxRecordedCount": Integer,
    "UniqueKeyCount": Integer,
    "MaxCharCount": {
        "KEY NAME": Integer,
        "VALUE RANGE": Integer,
        "VALUE TYPE": Integer,
        "RECORDED COUNT": Integer
    },
    "Keys": [
        (
            KeyName: String,
            ValueRange: String,
            ValueType: String,
            RecordedCount: String
        ),
        ...
    ]
}

→ Back to top

FilesExpand file tree

SDK.md

Latest commit

History

SDK.md

File metadata and controls

Python Reference

check_project_exists()

delete_project_config()

get_access_key()

get_project_uid()

get_user_id()

get_user_id_from_db()

register_access_key()

register_project_uid()

register_user_id()

update_project_info()

Dataset class

train_test_split()

File class

Files class

filter()

+ (concatenation)

| (union)

calc_file_hash()

Parser class

__init__()

__call__()

is_path_parsable()

update_rule()

Project class

add_datafile()

add_datafiles()

add_member()

add_metafile()

extract_metafile()

estimate_join_rule()

files()

get_members()

get_metadata_summary()

link_datafiles()

remove_member()

update_member()

archive_project()

create_project()

delete_project()

get_projects()

summarize_keys_information()

init()

call()