Skip to content

NUDB Use - is a usage-package for the Norwegian National Education Database cloud-data. Both for data-consumers and data-deliverers. Requires access to NUDBs shared data in most instances.

License

Notifications You must be signed in to change notification settings

statisticsnorway/ssb-nudb-use

SSB-NUDB-USE

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

Description

NUDB is the National Education Database of Norway. It is operated by Statsitics Norway - section 360. This package is the main "usage-package" for those seeking to use NUDB-data, or deliver data to NUDB.

NUDBs data is kept as parquet files in GCP, and you will need seperate access to this data to utilize this package. Some features in this package might require access to other data, like BRREG (Brønnøysundregisteret), BOF (befolkningsregisteret), VOF (virksomhetsregisteret) etc.

Installation

You can install SSB Nudb Use via poetry from PyPI:

poetry add ssb-nudb-use

Dependencies

This package depends on the package "ssb-nudb-config", which contains metadata, but also points to content in other metadatasystems like Vardef, Klass and Datadoc.

Usage

Please see the Reference Guide for details.

Usage for extraction (data from NUDB)

Find the latest of each file shared.

from nudb_use import latest_shared_paths
latest_shared_paths()

Get the periods out of any paths following the SSB-naming standard.

from nudb_use import get_periods_from_path
get_periods_from_path(path)

Deriving variables not stored in data, is done by the derive module:

from nudb_use import derive
df = derive.utd_skoleaar_slutt(df)

Usage for delivery (data to NUDB)

We have renamed a lot of our variables transitioning from the old on-prem systems. If you are looking for the new or old names of variables, you can use the find_var or find_vars functions:

from nudb_use import find_vars
find_vars(["snr", "sosbak"])

Find the dtype and length (char-width) of strings using a dataeset name:

from nudb_use import look_up_dtype_length_for_dataset
print(look_up_dtype_length_for_dataset("igang_videregaaende"))

If you want to update the column names you have in a pandas dataframe, to the new column names - there's a function for that:

from nudb_use import update_colnames
df = update_colnames(df)

After renaming, you can get the pandas dtypes the columns should have with get_dtypes:

from nudb_use import get_dtypes
dtypes = get_dtypes(df)
df = df.astype(dtypes)

If you are delivering to NUDB, we want you to run our quality suite before sharing the data with us:

from nudb_use import run_quality_suite
run_quality_suite(df, "avslutta")

Data about your delivery, like "avslutta", should first have its data entered into, and released in the ssb-nudb-config package before available in this function. Contact the NUDB-team to define a new delivery.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, SSB Nudb Use is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from Statistics Norway's SSB PyPI Template.

About

NUDB Use - is a usage-package for the Norwegian National Education Database cloud-data. Both for data-consumers and data-deliverers. Requires access to NUDBs shared data in most instances.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages