- base.config
- base.dataset
- base.file
- base.hash
- base.parser
- base.project
function base.config.check_project_exists(user_id="string", project_name="string")Check project is already exists or not
Parameters
- user_id (string) - requeired
- aquired user id from environment variable or config file
- project_name (string) - requeired
- target project name
Returns
- project_exists (bool)
- project already exists or not
function base.config.delete_project_config(user_id="string", project_name="string")Delete config of specified project.
Parameters
- user_id (string) - requeired
- aquired user id from environment variable or config file
- project_name (string) - requeired
- target project name
function base.config.get_access_key()Get access key from config file. If you have 'BASE_ACCESS_KEY' on environment variables, Base will use it
Returns
- access_key (string)
- aquired API access key from environment variable or config file
function base.config.get_project_uid(user_id="string", project_name="string")Get project uid from project name.
Parameters
- user_id (string) - requeired
- aquired user id from environment variable or config file
- project_name (string) - requeired
- target project name
Returns
- project_uid (listringst)
- project uid of given project name
function base.config.get_user_id()Get user id from config file. If you have 'BASE_USER_ID' on environment variables, Base will use it
Returns
- user_id (string)
- aquired user id from environment variable or config file
function base.config.get_user_id_from_db(access_key="string")Get user id from remote db.
Parameters
- access_key (string) - requeired
- aquired API access key from environment variable or config file
Returns
- user_id (string)
- aquired user id from database
function base.config.register_access_key(access_key="string")Register access key to local config file.
Parameters
- access_key (string) - requeired
- API access key
function base.config.register_project_uid(user_id="string", project="string", project_uid="string")Register project uid to local config file.
Parameters
- user_id (string) - requeired
- aquired user id from environment variable or config file
- project (string) - requeired
- target project name
- project_uid (string) - requeired
- target project uid
function base.config.register_user_id(user_id="string")Register user id to local config file.
Parameters
- user_id (string) - requeired
- target user id
function base.config.update_project_info(user_id="string")Update local project info with remote.
Parameters
- user_id (string) - requeired
- aquired user id from environment variable or config file
class base.dataset.DatasetThis is a middle-level (numpy or other) interface for dataset in Base. Dataset class receive Files class as an argument and process each data file with specified transform functions. You can create high-level (torch tensor or other) interface for dataset, like Dataloader of Pytorch, using this Dataset object.
import base
project = base.Project("project-name")
files = project.files(conditions="string", query=["string"], sort_key="string")
dataset = base.Dataset(files=files, target_key="string", transform=None|Callable)These are the available attributes:
- transform (Callable)
- preprocess function
- target_key (string)
- object variable for modeling
- files (Files)
- inherited dataset interface
These are the available methods:
x_train, x_test, y_train, y_test = dataset.train_test_split(split_rate=float)This method splits dataset for 2 folds. You can adjust split ratio with split_rate option.
Parameters
- split_rate (float) - default 0.25
- the ratio of test set
Returns
- x_train (list)
- transformed train data
- x_test (list)
- transformed test data
- y_train (list)
- train label specified as target_key in Dataset class initialization
- y_test (list)
- test label specified as target_key in Dataset class initialization
Usage
Using the index operator [] on the Dataset class object, you can get the data transformed by user-defined preprocessing functions and label specified by target key.
def preprocess_func(path):
image = Image.open(path)
image = image.resize((28, 28))
image = np.array(image)
return image
test_files = Project("mnist").files(conditions="test")
test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func)
print(test_dataset[0])
>>>(array([[ 0, 0, ...]]), '7'If transform is not specified, local path is returned by default.
test_files = Project("mnist").files(conditions="test")
test_dataset = Dataset(test_files, target_key="label")
print(test_dataset[0])
>>> '/Users/user/dataset/mnist/test/7/4815.png', '7'For example:
You can get X and y using for loops as in the following example.
def preprocess_func(path):
image = Image.open(path)
image = image.resize((28, 28))
image = np.array(image)
return image
def get_image_and_label(dataset, idx):
X, label = dataset[idx] # label = "0" or "1" or "2" , ...
y = int(label)
# cerate one-hot vector
y = np.eye(10)[y]
return X, y
test_files = Project("mnist").files(conditions="test")
test_dataset = Dataset(test_files, target_key="label", transform=preprocess_func)
X_test = np.empty((len(test_dataset), 28, 28, 1))
y_test = np.empty((len(test_dataset), 10))
for i in range(len(test_dataset)):
X_test[i], y_test[i] = get_image_and_label(test_dataset, i)If you use train_test_split(), y_train and y_test are list of string obtained by target_key by default.
files = Project("mnist").files()
dataset = Dataset(files, target_key="label", transform=preprocess_func)
X_train, y_train, X_test, y_test = dataset.train_test_split(0.25)
print(y_train)
>>> ["1", "3", "4",...]class base.files.FileUsing the index operator [] on the Files class object, you can get the File class object at a specific index.
print(files[0])
>>> "/home/xxxx/dataset/mnist/0/12909.png"These are the available attributes:
-
path (string)
- local filepath.
For example:
files[0].path >>> "/home/xxxx/dataset/mnist/0/12909.png"
-
metadata (dict) whole dict of attributes (metadata) which related with this file.
For example:
files[0].metadata >>> { "dataType": "train", "label": "0", "id": "12909" }
-
attrs (string)
- attributes (metadata) which related with this file.
For example:
files[0].label >>> "0" files[0].id >>> "12909"
class base.files.FilesThis is a low-level (file path) interface for dataset in Base. A Files object includes the File instances which matched with your dataset filter.
import base
project = base.Project("project-name")
files = project.files(conditions="string", query=["string"], sort_key="string")You can filter data files and get Files object simply by specified criteria using files method of base.Project.
Using the index operator [] on the Files class object, you can get the File class object at a specific index.
For example:
files[0]
>>> "/home/xxxx/dataset/mnist/0/12909.png"
files[0].label
>>> "0"
files[0].id
>>> "12909"These are the available attributes:
- project_name (string)
- registerd project name
- user_id (string)
- registerd user id
- project_uid (string)
- project unique hash
- conditions (string) - default
None- value to search for files
- query (string) - default []
- expression of key and value to search for files
- sort_key (string) - default
None- key to sort files
- files (list)
- list of File class objects
- result (list)
- list of metadata_dict filtered by criteria
[ { "FilePath": String, "MetaKey1": ..., ... }, ... ] - paths (list)
- list of local filepaths
[ "String", ... ] - items (list)
- list of metadata_dict other than filepath
[ { "MetaKey1": ..., ... }, ... ]
This is the available methods:
files = files.filter(conditions="string", query=["string"], sort_key="string")This method apply additional filter to already filtered Files object. You can use this method repeatedly.
Parameters
-
conditions (string) - optional
- value to search for files.
For example:
conditions="0"
If you want to search by multiple criteria, you must provide comma (,) separated strings.
For example:
conditions="0,1,2"
You will get files that meet at least one of the criteria.
Note
There must be no single-byte spaces between values.
-
query (list) - default []
- expression of key and value to search for files.
For example:
query=["label == 0"]
You can use
==,!=,>,>=,<,<=,is,is not,in, andnot inas operators.If you want to search by multiple criteria, you must provide the list of expressions. For example:
query=["label == 0", "id >= 10000"]
You will get files that meet all the criteria.
Note
A single-byte space is required before and after the operator.
-
sort_key (string) - optional
- key to sort files.
For example:
sort_key="label"
Returns
- Files class
There are available operators
Return a new Files object that is the concatenation of the 2 Files object. You can use this operator recursively.
This operation is not sensitive to element duplication. If both Files objects has same File object, the operated Files object has 2 same File object.
Expression
concated_files = files1 + files2
# You can operate recursively.
concated_files = files1 + files2 + files3
concated_files2 = concated_files + files4Examples
files1 = project.files(conditions="0,1,2", query=['dataType == test'], sort_key="id")
files2 = project.files(conditions="0,1,2", query=['dataType == train'], sort_key="id")
files = files1 + files2
print(files)
>>> ======Files======
Files1(project_name='mnist', conditions='0,1,2', query=['dataType == test'], sort_key='id', file_num=3148)
Files2(project_name='mnist', conditions='0,1,2', query=['dataType == train'], sort_key='id', file_num=18624)
===Expressions===
Files1 + Files2
print(len(files))
>>> 21772Return a new Files object that is the union of the 2 Files object. You can use this operator recursively.
This operation guaranteed that all File objects that operated Files object has are unique.
Expression
union_files = files1 | files2
# You can operate recursively.
union_files = files1 | files2 | files3
union_files2 = union_files | files4Examples
files1 = project.files(conditions="0,1,2", sort_key="id")
files2 = project.files(conditions="0", sort_key="id")
files = files1 | files2
print(files)
>>> ======Files======
Files1(project_name='mnist', conditions='0,1,2', query=[], sort_key='id', file_num=21772)
Files2(project_name='mnist', conditions='0', query=[], sort_key='id', file_num=6905)
===Expressions===
Files1 or Files2
print(len(files))
>>> 21772function base.hash.calc_file_hash(path="string", algorithm="md5"|"sha224"|"sha256"|"sha384"|"sha512"|"sha1", split_chunk=False|True, chunk_size=int)Calculate hash value of each file
Parameters
- path (string) - requeired
- target file path
- algorithm (string) - default "sha256"
- hash algorithm name
- split_chunk (bool) - default True
- if True, split large file to byte chunks
- chunk_size (integer) - default 2048
- block byte size of chunk
Returns
- digest (string)
- hash string of inputed file
class base.parser.ParserThis is a file path parser. When you call add_datafiles method of base.Project, Base will initialize Parser object with specified parsing rule and try to extract metadata from each file path with __call__ method.
from base.parser import Parser
parser = Parser(parsing_rule="string", sep=None|"string")
result = parser(path="string")Initialize self with parsing_rule and generate parser.
base.parser.Parser(parsing_rule="string", sep=None|"string")- Replace unused strings with
{_}inparsing_rule - Extract keys enclosed in
{}
- Example of processing method
1. parsing_rule: hoge{num1}/fuga{num2}.txt -> {hoge}/{num1}/{fuga}/{num2}.txt 2. {_}/{num1}/{_}/{num2}.txt -> ["_", "num1", "_", "num2"]
Parameter
- parsing_rule (string) - required
- specified parsing rule
ex.) {}/{name}/{timestamp}/{sensor}-{condition}{iteration}.csv
- specified parsing rule
- sep (string) - optional
- the separator of the file path
Parse your target path.
parser(path="string")- Convert file path string to parsable format.
- Extract values enclosed in
{}in the parsable formatted path. - Generate a dictionary from keys and values extracted with
parsing_rule.
- Example of processing method
1. path: mnist/train/0/12909.png -> {mnist}/{train}/{0}/{12909}.png 2. parsable format: {mnist}/{train}/{0}/{12909}.png -> ["mnist", "train", "0", "12909"] 3. keys : ["_", "dataType", "label", "id"] values: ["mnist", "train", "0", "12909"] -> {"dataType": "train", "label": "0", "id": "12909"}
Parameters
- path (string) - required
- the file path
Return
- parsed_dict (dict)
- meta data dictionary
These are the available methods:
Verify specified parsing rule is working properly. If not, return False
parser.is_path_parsable(path="string")Parameter
- path (string) - required
- the file path.
Return
- parsable_flag (bool)
- True if the file path is parsable
Generate a parser that takes into account the number of splitter based on the parsing example.
Use this method when is_path_parsable("your-path") is false.
parser.update_rule(parsing_rule="string")Parameters
- parsing_rule (string) - required
- detail parsing rule.
ex.) {Origin}/{train}/{2022_04_05}-{dog}_{a01}.png
- detail parsing rule.
class base.project.ProjectA basement class of project. You have to initialize with existing project name. If you specified a project name which you don't have, you will get ValueError. Please retry after call base.project.create_project function.
import base
project = base.Project("project-name")These are the available attributes:
- project_name (string)
- registerd project name
- user_id (string)
- registerd user id
- project_uid (string)
- project unique hash
These are the available methods:
- add_datafile()
- add_datafiles()
- add_member()
- add_metafile()
- extract_metafile
- estimate_join_rule
- files()
- get_members()
- get_metadata_summary()
- link_datafiles()
- remove_member()
- update_member()
Import meta data of one file.
project.add_datafile(file_path="string", attributes={"string":"string"})- Calculate the file hash.
- Create meta data record with the file hash and attributes.
- Add that record into project database table.
{
"FileHash": String,
"MetaKey1": ...,
...
}Parameters
- file_path (string) - requeired
- the file path
- attributes (dict) - default {}
- the extra meta data (attributes)
Raises
- Exception
- raises if something went wrong on uploading request to server
Import meta data related with datafile paths.
project.add_datafiles(dir_path="string", extension="string", attributes={"string":"string"}, parsing_rule="string", detail_parsing_rule="string")- Calculate the file hash.
- Parse the file path with
parsing-rule. - Create meta data records with the file hash, attributes, and parsed path data.
- Add that records into project database table.
{
"FileHash": String,
"MetaKey1": ...,
...
}Parameters
- dir_path (string) - requeired
- the root directory path for datafiles
- extension (string) - requeired
- the extension of datafiles
- attributes (dict) - default {}
- the extra meta data (attributes) combined with whole datafiles
- parsing_rule (string) - optional
- the rule for extracting meta data from datafile path ex.) {_}/{disease}/{patient-id}-{part}-{iteration}.png
- detail_parsing_rule (string) - optional
- detail information about parsing rule ex.) {_}/{CancerA}/{1-123}-{1}-{100}.png
Returns
- file_num (integer)
- number of imported datafiles
Raises
- ValueError
- raises if invalid parsing rule was specified
- Exception
- raises if something went wrong on uploading request to server
Invite a new project member.
project.add_member(member="string", permission_level="string")Parameters
- member (string) - requeired
- the user id of new member
- permission_level (string) - requeired
- new member's permission level
- Viewer only read meta data on project database. viewer can not import data files or external files and can not control permission of other members.
- Editor can read and write meta data into project database. editor can not control permission of other members.
- Admin can read and write meta data into project database. admin can also control permission of other members, but can not transfer Owner permission level.
- new member's permission level
Raises
- ValueError
- raises if invalid permission level was specified
- Exception
- raises if something went wrong on invite request to server
Import meta data from external file.
project.add_metafile(file_path=["string"], attributes={"string":"string"})Parameters
- file_path (list) - requeired
- list of the external file path
- attributes (string) - default {}
- the extra meta data (attributes) combined with whole datafiles
Raises
- ValueError
- raises if specified external file is not csv or excel file
- Exception
- raises if something went wrong on uploading request to server
Only Extract meta data from external file.
project.extract_metafile(file_path="string", attributes={"string":"string"})Parameters
- file_path (string) - requeired
- the external file path
- attributes (string) - default {}
- the extra meta data (attributes) combined with whole datafiles
Returns
- tables (list)
- list of table data extracted from external file
[
[
{
"MetaKey1": ...,
"MetaKey2": ...,
...
},
...
],
...
]Raises
- ValueError
- raises if specified external file is not csv or excel file
- Exception
- raises if something went wrong on uploading request to server
Only estimate the join rule from external file and existing table.
project.extract_metafile(file_path="string", tables=list)Parameters
Either file_path or tables are required. If both are specified, tables take precedence.
- file_path (string)
- the external file path
- tables (list)
- output of base.Project().extract_metafile() method
Returns
- join_rule (list)
- list of the join rule estimated from external file and existing table.
[
{
"new key1":"exist key1" ...,
...
},
...
]Raises
- ValueError
- raises if specified external file is not csv or excel file
- Exception
- raises if something went wrong on uploading request to server
Return the Files class.
You can filter files easily and simply by specified criteria.
files = project.files(conditions="string", query=["string"], sort_key="string")Parameters
-
conditions (string) - optional
- value to search for files
For example:
conditions="0"
If you want to search by multiple criteria, you must provide comma (,) separated strings.
For example:
conditions="0,1,2"
You will get files that meet at least one of the criteria.
-
query (list) - default []
- expression of key and value to search for files
For example:
query=["label == 0"]
You can use
==,!=,>,>=,<,<=,is,is not,in, andnot inas operator.If you want to search by multiple criteria, you must provide the list of expressions.
For example:
query=["label == 0", "id >= 10000"]
You will get files that meet all the criteria.
Note
A single-byte space is required before and after the operator.
-
sort_key (string) - optional
- key to sort files.
For example:
sort_key="label"
Returns
Get list of project members.
project.get_members()Returns
- member_list (list)
- list of each members information
[
{
"UserID": String,
"UserRole": String,
"CreatedTime": String of unix time
},
...
]Raises
- Exception
- raises if something went wrong with request to server
Get list of meta data information.
project.get_metadata_summary()Returns
- key_list (list)
- list of each members information
[
{
"KeyHash": String,
"KeyName": String,
"ValueHash": String,
"ValueType": String,
"RecordedCount": Integer,
"UpperValue": String,
"LowerValue": String,
"CreatedTime": String of unix time,
"LastModifiedTime": String of unix time,
"Creator": String,
"LastEditor": String,
"EditerList": List of String
},
...
]Raises
- Exception
- raises if something went wrong with request to server
Create linker metadat to local datafiles.
project.link_datafiles(dir_path="string", extension="string")Parameters
- dir_path (string) - requeired
- the root directory path for datafiles
- extension (string) - requeired
- the extension of datafiles
Returns
- file_num (integer)
- number of linked datafiles
Remove project member.
project.remove_member(member=["string"]|"string")Parameters
- member (list or string) - requeired
- the target member for removing
Raises
- Exception
- raises if something went wrong on removing request to server
Update project member's permission.
project.update_member(member="string", permission_level="Viewer"|"Editor"|"Admin"|"Owner")Parameters
- member (string) - requeired
- the user id of existing member
- permission_level (string) - requeired
- member's permission level for update
- Viewer only read meta data on project database. viewer can not import data files or external files and can not control permission of other members.
- Editor can read and write meta data into project database. editor can not control permission of other members.
- Admin can read and write meta data into project database. admin can also control permission of other members, but can not transfer Owner permission level.
- Owner can transfer owner permission to others, and delete project completely.
- member's permission level for update
Raises
- ValueError
- raises if invalid permission level was specified
- Exception
- raises if something went wrong on invite request to server
function base.project.archive_project(user_id="string", project_name="string")Archive project.
Parameters
- user_id (string) - requeired
- registerd user id
- project_name (string) - requeired
- project name you want to archive
Raises
- Exception
- raises if something went wrong on request to server
function base.project.create_project(user_id="string", project_name="string", private=True|False)Parameters
- user_id (string) - requeired
- registerd user id
- project_name (string) - requeired
- project name which you want to create
- private (bool) - default True
- specifies whether or not to allow public access into the project
Returns
- project_uid (string)
- project unique hash
Raises
- Exception
- raises if something went wrong on request to server
function base.project.delete_project(user_id="string", project_name="string")Delete project.
Parameters
- user_id (string) - requeired
- registerd user id
- project_name (string) - requeired
- archived project name you want to delete
Raises
- Exception
- raises if something went wrong on request to server
function base.project.get_projects(user_id="string", archived=False|True)Get list of projects.
Parameters
- user_id (string) - requeired
- registerd user id
- archived (bool) - default False
- if False, return not archived projects. if False, return archived projects
Returns
- project_list (list)
- list of project name you have
Raises
- Exception
- raises if something went wrong on request to server
function base.project.summarize_keys_information(metadata_summary="list")Summarize information of keys on project for printing.
Parameters
- metadata_summary (list) - requeired
- output of the base.Project().get_metadata_summary() method
Returns
- summary_for_print (dict)
- summarized key information for printing
{
"MaxRecordedCount": Integer,
"UniqueKeyCount": Integer,
"MaxCharCount": {
"KEY NAME": Integer,
"VALUE RANGE": Integer,
"VALUE TYPE": Integer,
"RECORDED COUNT": Integer
},
"Keys": [
(
KeyName: String,
ValueRange: String,
ValueType: String,
RecordedCount: String
),
...
]
}