Releases: The-Academic-Observatory/observatory-platform
Releases · The-Academic-Observatory/observatory-platform
0.6.0
What's Changed
- Fix seed_db service in docker-compose by @aroelo in #573
- Inf 418/add dag tags to workflows by @tuanchien in #572
- jsonl comparison fix by @keegansmith21 in #575
- Updated docker compose process to pipe its stderr to stdout by @keegansmith21 in #577
- precommit rebase fix by @keegansmith21 in #579
- INF-547: Random_id generator to use hostname of machine by @alexmassen-hane in #581
- Added prefix functionality to ObservatoryEnvironment class by @alexmassen-hane in #578
- INF-558: Added exception handling for deleting datasets and buckets produced by unit tests by @alexmassen-hane in #585
- Config default update by @keegansmith21 in #561
- Increase BQ bytes budget for new DOI workflow by @jdddog in #587
- Log BulkIndexErrors by @jdddog in #588
- BAD-308 schema and table description updates by @keegansmith21 in #589
- API core version by @jdddog in #594
- Updated github unit tests to only run on push by @keegansmith21 in #593
- View creation function update by @keegansmith21 in #596
- INF-588: HTTP Request Updates by @keegansmith21 in #597
- Remove project and workflow generation by @jdddog in #599
- INF-597: Terraform config files not updating and Flower docker container not building successfully. by @alexmassen-hane in #598
- Feature/remove query api by @jdddog in #600
- Remove Elastic and Kibana from local platform by @jdddog in #603
- Upgrade to Docker Compose V2 by @jdddog in #602
- Update to response logging criteria by @keegansmith21 in #605
- Fix: Only delete non hidden files in terraform directory. by @alexmassen-hane in #604
- Move helper functions from Thoth Telescope to common Observatory Platform utilities by @alexmassen-hane in #607
- Change table_id to table_name by @jdddog in #608
- Add constraint for alembic by @jdddog in #609
- Fix/dependencies by @jdddog in #612
- Add POSTGRES_USER environment variable by @jdddog in #613
- INF-465: Cloud Endpoints Portal Deprecation - move observatory-api container by @alexmassen-hane in #610
- Fix/Change apiserver container docker network by @alexmassen-hane in #615
- INF-609: Add limits to Bigquery for per user per day and project per day. by @alexmassen-hane in #616
- Fix api and workers network settings by @jdddog in #617
- Use config file to specify workflows by @jdddog in #606
- Added .env to gitignore by @keegansmith21 in #619
- Terraform deploy may 2023 fixes by @jdddog in #620
- Fix issues with Docker and remove unneeded code by @jdddog in #622
- Feature/Add FTP Server by @alexmassen-hane in #621
- Changed default write disposition by @keegansmith21 in #623
- Added function to list blobs in gcs bucket by @keegansmith21 in #624
- Added multi-uri table load by @keegansmith21 in #626
- Feature/Match on multiple keys for upserts and deletes in Bigquery. by @alexmassen-hane in #625
- Upgrade to Airflow 2.6.3 by @jdddog in #627
- Fix Google Cloud Storage based logging by @jdddog in #629
- Fix/Logs disappear when tasks are in "up for retry" state by @alexmassen-hane in #630
- Upgrade Terraform and Packer by @alexmassen-hane in #634
- Make dag_run_id nullable in openapi.yaml by @jdddog in #631
- hmac key by @keegansmith21 in #632
- Added glob match functionality to list_blobs by @keegansmith21 in #633
- Fix/Update VM Template File by @alexmassen-hane in #636
- Feature/Add functionality to bq_load_from_memory function by @alexmassen-hane in #637
- Feature/json custom datetime format by @jdddog in #638
- Add functions required to make use of Airflow TaskGroups and test them by @jdddog in #642
- Fix/Install Packer plugins when building the observatory image by @alexmassen-hane in #641
- Feature/Create buckets with roles for unit tests by @alexmassen-hane in #639
- Added readthedocs config file by @keegansmith21 in #643
- python 3.10 by @keegansmith21 in #628
- Fix SlackWebhookHook missing 1 required keyword-only argument: 'slack_webhook_conn_id' by @jdddog in #644
- Update unittest os to ubuntu-latest by @keegansmith21 in #645
- Updates to contributing.md and readme by @keegansmith21 in #640
- Moving logging for compare_lists_of_dicts into its own function by @jdddog in #647
- Feature: Add bq functions to the platform by @alexmassen-hane in #646
New Contributors
- @alexmassen-hane made their first contribution in #581
Full Changelog: 0.5.0...0.6.0
0.5.0
What's Changed
- streamtelescope: add diff merge by lex order by @tuanchien in #557
- Add functionality to use BigQuery snapshots by @aroelo in #559
- Add api server isolation and db seeding by @tuanchien in #562
- Add dag tags to Workflow class by @tuanchien in #565
- Parametrise host api port by @tuanchien in #564
- Create directories that are mounted as volumes with Docker by @aroelo in #563
- Separate config loading from config use by @jdddog in #560
- Use find_free_port in more unit tests by @tuanchien in #569
- Raise exception if dataset missing when adding releases by @tuanchien in #566
- Add api_port to config generation by @tuanchien in #570
- Propagate dag tag to telescopes by @tuanchien in #571
- Update to Elastic & Kibana v8 by @jdddog in #568
Full Changelog: 0.4.0...0.5.0
0.4.0
What's Changed
- Add a DOI badge to README.md by @aroelo in #549
- Create .zenodo.json by @jdddog in #550
- Add functionality to use a schema when creating table from query by @aroelo in #551
- MEL-798 added data trust zenodo community by @kathrynnapier in #552
- api: add local api server by @tuanchien in #553
- Add dataset release utils by @tuanchien in #540
- Fix/api update by @jdddog in #554
- Fix dag-delete error by @jdddog in #556
- Build Observatory API Image & Push to Google Cloud Artifact Registry by @jdddog in #558
New Contributors
- @kathrynnapier made their first contribution in #552
Full Changelog: 0.3.0...0.4.0
0.3.0
What's Changed
- Fix potential duplicates in table after merge using stream telescope by @aroelo in #496
- Fix double bucket delete race by @tuanchien in #510
- Remove bq merge days functionality by @aroelo in #512
- Inf 32/installer script after repo separation by @tuanchien in #509
- installer script doc fixes by @tuanchien in #514
- Add global prefix_dir override by @tuanchien in #515
- disable download timeout by @tuanchien in #517
- INF-166/airflow 2.2 by @jdddog in #516
- add get_airflow_connection_login by @tuanchien in #519
- Enable Airflow operators to be added directly as tasks by @jdddog in #518
- remove unused wos/scopus code by @tuanchien in #520
- installer: add https/ssh clone option by @tuanchien in #521
- Fix ModuleNotFoundError: No module named 'wtforms.compat' error by @jdddog in #522
- Upgrade apache-airflow to 2.2.1 by @jdddog in #523
- Enable a bucket path to specified in azure_to_google_cloud_storage_transfer by @jdddog in #524
- Inf 65/mag update by @jdddog in #525
- Update requirements.txt by @jdddog in #527
- Inf 278/workflow xcom cleanup by @tuanchien in #528
- Inf 279/port vm create destroy template by @tuanchien in #529
- Fix xcom topic by @jdddog in #530
- Add Dataset, DatasetRelease, DatasetStorage API extension by @tuanchien in #526
- Fix VM warning message on slack by @aroelo in #532
- Add ignore_unknown_values by @jdddog in #531
- on_failure_callback: handle exception value that is a string by @jdddog in #535
- Add bigquery bytes processed tripwire by @tuanchien in #533
- Remove description fields from 401 error by @jdddog in #536
- BigQuery bytes processed by @jdddog in #537
- Add functionality to use multiple instances for Elastic Import workflow by @aroelo in #538
- Parameterise select table shard date limit by @tuanchien in #539
- OpenAlex telescope changes by @aroelo in #542
- Add check_blob_hash parameter by @jdddog in #544
- Ensure that the blob name is unique across tables with the same name … by @jdddog in #548
Full Changelog: 0.2.1...0.3.0
0.3.0-dev
0.2.1
This release includes the following bugfix in the Dockerfile:
- Install apache-airflow-providers-google==5.1.0 with --no-deps so that pip doesn't spend forever trying to resolve dependencies for the package, which we only use for remote logging and secret manager backend in the cloud deployment. The google-cloud-secret-manager Python package is added as a dependency in requirements.txt.
0.2.0
This release includes the following changes / new features:
- Upgrade to Airflow 2.1.4.
- Stream Telescope: remove use of XComs so that it is easier to maintain.
- Updated documentation.
- download_files: uses DownloadInfo class and prefix_dir parameter to allow prefixing the filename paths.
- Remove third party get_file and _hash_file functions as they are replaced by get_file_hash and download_file.
- Command line interface: added generate workflow and project commands.
- Added OrganisationTelescope.
And the following bugfixes:
- Docker Compose file: rename deprecated Airflow config environment variables.
- Docker Compose file: change AIRFLOW__SECRETS__BACKEND to use class installed from apache-airflow-providers-google package and remove airflow subpackage as is no longer required.
- Fix typo in config.yaml.jinja2.
- Fix on_failure_callback function.
0.1.1
This release includes the following bugfixes:
- Sdist building:
- added missing
data_filesin config.cfg.
- added missing
- Docker Compose / Airflow 2:
- Received the error "daemonic processes are not allowed to have children" when tasks ran that use multiprocessing.Pool, to address it added
AIRFLOW__CORE__EXECUTE_TASKS_NEW_PYTHON_INTERPRETERto the Docker Compose file. This is the same error described in this Stack Overflow post. - Set Docker Compose volume paths correctly for editable workflows packages when deployed to Terraform.
- Received the error "daemonic processes are not allowed to have children" when tasks ran that use multiprocessing.Pool, to address it added
- Terraform:
- For Terraform config where Google Cloud Secrets that were made had their value set to the secret key instead of the secret value.
- Update TerraformBuilder so that it builds with the latest changes.
- Observatory API:
postgresconnection prefix deprecated in PostgresSQL 1.4, so changed in Terraform file topostgresql.
- Address inconsistent use of dates:
- Change type hints pendulum.datetime to pendulum.DateTime (the class, not function).
- Change datetime.datetime calls to pendulum.datetime.
- Make
select_table_shard_datesreturn List[pendulum.Date] - Add a
make_release_datefunction, which returns a pendulum.DateTime instance, which is required for some of the downstream functions that use it.
- get_airflow_connection_url: call get_uri to get the uri.
And the following new features:
- Added black to precommit config.
- load_dags.py:
- When DagBag has import errors, raise an exception that has a message with all of the errors so that the Dag import errors are visible in the
- Testing:
- Add simple threaded httpserver for testing use
- Utilities
- Add get_observatory_http_header to create simple header dict using custom user agent
- Add get_fiename_from_url to get a filename from a http url
- Add get_chunks function to split lists into constant size (unless last chunk) chunks.
- Add get_airflow_connection_url to pull a url from an airflow connection, validate it, and add trailing "/" if necessary.
- Add converter function for csv to jsonl files.
- Add http get response functions for simple interfaces to standardise getting http raw text response, xml -> dict, json->dict.
- Add AsyncHttpFileDownloader with download_file and download_files interfaces for downloading files using http. Allows custom headers to be used in http connection.
- download_files allows concurrent downloading through asyncio and aiohttp. Supports retry on failure with exponential backoff.
- download_file piggybacks off download_files. No speed benefit from asyncio, but provides a simpler interface.
- add get_airflow_connection_password
- add unzip_files function
- add find_replace_file (sed cli replacement)
- add fn to wrap shell cmd calls. treats non zero exit as error.
- Snapshot telescope:
- Add upload_downloaded as a snapshot telescope task. I noticed a lot of the upload_downloaded tasks in snapshot telescopes are identical in implementation. They all just upload the download_files list of files from the release object to the download_bucket in the cloud. Since this is a standard pattern we have adopted, it may as well just be part of the snapshot telescope implementation.
- Add download, extract, transform tasks to template.
- Stream telescope:
- Add download, upload_downloaded, extract, transform tasks to template.