Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading files and removing whole users, repos and orgs #19

Open
wants to merge 94 commits into
base: enterprise-2.2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
6ba3e2e
removing whole users and orgs wip
karolselak Sep 13, 2021
153b89b
conflict in run method solved
karolselak Sep 14, 2021
6ef6906
merge
karolselak Sep 21, 2021
e841b37
removing users and orgs - wip
karolselak Sep 22, 2021
adf10b2
factories separated and removing methods wip
karolselak Sep 22, 2021
b8fd13d
factories for user with all dependencies
karolselak Sep 24, 2021
92fb92e
wip
karolselak Sep 25, 2021
43b8545
working user_with_all_dependencies factory
karolselak Sep 27, 2021
8013b52
wip
karolselak Sep 27, 2021
fc2e8c4
factories with siblings
karolselak Sep 27, 2021
316a895
lighter factories with safe dependencies
karolselak Sep 27, 2021
0468a22
unnecessary code from models removed
karolselak Sep 27, 2021
1f4f027
remove_user_with_dependencies first version
karolselak Sep 29, 2021
694eee5
comments removed
karolselak Sep 29, 2021
64d6f12
wip
karolselak Sep 29, 2021
a792294
remove_user_with_dependencies removes data with exceptions
karolselak Sep 30, 2021
27349ec
refactoring
karolselak Sep 30, 2021
d08172a
wip
karolselak Oct 1, 2021
65edde7
Merge branch 'oom_problem' into removing_whole_users_and_orgs
karolselak Oct 5, 2021
93ba9d2
filter_dependent_on_strangers! fixed
karolselak Oct 6, 2021
8300c6f
unused code removed, tests fixed
karolselak Oct 6, 2021
5d95d85
wip
karolselak Oct 7, 2021
5d1437f
old version of ids_of_all_dependencies
karolselak Oct 7, 2021
6c1666b
ids_of_all_dependencies starts filtering from parent of filtered asso…
karolselak Oct 7, 2021
b32b70a
move_wrongly_assigned_to_main
karolselak Oct 8, 2021
9348599
refactoring
karolselak Oct 8, 2021
c16f454
refactoring
karolselak Oct 8, 2021
2381717
refactoring, unused code removed
karolselak Oct 8, 2021
90e7011
saving to file in remove_user_with_dependencies wip
karolselak Oct 11, 2021
d14d149
saving to file in remove_user_with_dependencies works with tests
karolselak Oct 11, 2021
0cde4e7
removing organizations wip
karolselak Oct 11, 2021
7e346da
test fixed
karolselak Oct 12, 2021
ae988b3
Merge branch 'fix' into removing_whole_users_and_orgs
karolselak Oct 12, 2021
9dc29c7
wip
karolselak Oct 12, 2021
e81e18a
wip
karolselak Oct 12, 2021
c528448
remove_org_with_dependencies works with tests
karolselak Oct 12, 2021
af5d758
removing repositories works
karolselak Oct 12, 2021
7efd47b
removing with dependencies in dry mode
karolselak Oct 13, 2021
e42b80a
removing with dependencies saves files only when if_backup is true
karolselak Oct 13, 2021
6c05abb
README updated
karolselak Oct 13, 2021
795ea25
should do -> does in tests
karolselak Oct 13, 2021
26d094f
tests refactored
karolselak Oct 13, 2021
7542801
IdHash
karolselak Oct 14, 2021
3c50ecd
saving heavy data to file tests wip
karolselak Oct 14, 2021
6a36a1b
saving heavy data to files
karolselak Oct 14, 2021
f75926a
wip
karolselak Oct 17, 2021
0bb88ea
remove_repo_builds tested
karolselak Oct 18, 2021
6e5ac35
remove_repo_requests tested
karolselak Oct 18, 2021
02866d6
bug fixed
karolselak Oct 19, 2021
41ff7f2
load from files mode in Config and Backup; bare LoadFromFiles class
karolselak Oct 19, 2021
d9e3b0d
LoadFromFiles
karolselak Oct 21, 2021
42f5da2
_dependencies_ in file format removed
karolselak Oct 21, 2021
9698c76
loading tested with cooperation with removing, bugs fixed
karolselak Oct 22, 2021
177c8f9
fix
karolselak Oct 25, 2021
6e8b9dc
fix
karolselak Oct 25, 2021
49c5408
wip
karolselak Oct 25, 2021
db41db9
Merge branch 'enterprise-2.2' into loading_files_and_removing_whole_u…
karolselak Oct 25, 2021
07391e2
fix
karolselak Oct 25, 2021
0bed840
fix
karolselak Oct 25, 2021
ecabe1d
checking db diff instead of db state in tests
karolselak Nov 2, 2021
fea3c3b
nullifying builds dependencies instead of removing them in remove orp…
karolselak Nov 3, 2021
c7a9e38
nullifying dependencies as a part of model
karolselak Nov 3, 2021
753262f
orphans_table option in config
karolselak Nov 9, 2021
8fe769c
removing orphans from specified table works
karolselak Nov 9, 2021
7b55aca
wip
karolselak Nov 9, 2021
149e535
spec for second filtering strategy
karolselak Nov 9, 2021
16ad3cc
wip
karolselak Nov 9, 2021
433bcea
wip
karolselak Nov 9, 2021
c0516f6
two filtering strategies together wip
karolselak Nov 9, 2021
07e26e5
two filtering strategies together
karolselak Nov 9, 2021
4e18121
refactoring
karolselak Nov 9, 2021
68c9271
refactoring
karolselak Nov 10, 2021
470701c
refactoring
karolselak Nov 10, 2021
8d7cb18
remove_entry_with_dependencies fixed
karolselak Nov 10, 2021
30b08a9
dependency tree
karolselak Nov 10, 2021
6eae488
remove_repo_with_dependencies tested using dependency tree and with n…
karolselak Nov 11, 2021
d33bd5c
removing users and orgs with new way of testing
karolselak Nov 11, 2021
8a4eacc
new expected files for removing whole users, orgs and repos
karolselak Nov 11, 2021
225dbef
all tests in remove_specified_spec with new expected files and with n…
karolselak Nov 12, 2021
cd102a6
during merge (wip)
karolselak Nov 13, 2021
a16f25b
merge done, process_ids_to_remove method
karolselak Nov 14, 2021
f8d0f87
bug fixed
karolselak Nov 15, 2021
3fad799
commented code removed
karolselak Nov 15, 2021
bd43fce
vision of backuping orphans in README, few minor changes
karolselak Nov 15, 2021
812f9a0
saving nullified relationships while removing specified
karolselak Nov 16, 2021
0eeec4f
saving data during removing orphans, bugs fixed
karolselak Nov 16, 2021
db7d6e9
saving data during removing orphans tests wip
karolselak Nov 16, 2021
67bbfc7
wip
karolselak Nov 16, 2021
03bc1d0
bug fixed
karolselak Nov 16, 2021
f202adf
nullifying all dependencies during removing build orphans
karolselak Nov 17, 2021
ae879ad
all models required in travis-backup.rb
karoltravis Dec 9, 2021
7fc47bc
bugs fixed
karoltravis Dec 9, 2021
bef7f65
load_nullified_relationships
karoltravis Dec 9, 2021
70c08b5
fix
karoltravis Dec 9, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ dump/*
!log/.keep
!dump/.keep
*.gem
.byebug_history
18 changes: 14 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# README

*travis-backup* is an application that helps with housekeeping and backup for Travis CI database v2.2 and with migration to v3.0 database. By default it removes requests and builds with their corresponding jobs and logs, as long as they are older than given threshold says (and backups them in files, if this option is active). Although it can be also run with special modes: `move_logs`, for moving logs from one database to another, and `remove_orphans`, for deleting all orphaned data.
*travis-backup* is an application that helps with housekeeping and backup for Travis CI database v2.2 and with migration to v3.0 database. By default it removes requests and builds with their corresponding jobs and logs, as long as they are older than given threshold says (and backups them in files, if this option is active). Although it can be also run in special modes to perform other specific tasks.

### Installation and run

Expand Down Expand Up @@ -29,6 +29,9 @@ All arguments:
--move_logs # run in move logs mode - move all logs to database at destination_db_url URL
--destination_db_url URL # URL for moving logs to
--remove_orphans # run in remove orphans mode
--orphans_table # name of the table we will remove orphans from (if not defined, all tables are considered)
--load_from_files # loads files stored in files_location to the database
--id_gap # concerns file loading - the gap between the biggest id in database and the lowest one that will be set to loaded data (that's for data inserted by other users during the load being performed; equals 1000 by default)
```

Or inside your app:
Expand Down Expand Up @@ -60,9 +63,13 @@ backup.run(repo_id: 1)

Using `--move_logs` flag you can move all logs to database at `destination_db_url` URL (which is required in this case). When you run gem in this mode no files are created and no other tables are being touched.

Using `--remove_orphans` flag you can remove all orphaned data from tables. When you run gem in this mode no files are created.
Using `--remove_orphans` flag you can remove all orphaned data from the tables. You can pick a specific table using `--orphans_table` flag or, by leaving it undefined, let all tables to be processed in the removing orphans procedure. It can be combined with `--backup` flag in order to save removed data in files.

Using `--dry_run` flag you can check which data would be removed by gem, but without removing them actually. Instead of that reports will be printed on standard output. This flag can be also combined with `--move_logs` or `--remove_orphans`.
Using `--user_id`, `--org_id` or `--repo_id` flag without setting `--threshold` results in removing the specified user/organization/repository with all its dependencies. It can be combined with `--backup` flag in order to save removed data in files.

Using `--load_from_files` flag you can restore dumped data from files located at path given by `--files_location`. The distance defined by `--id_gap` is going to be kept between biggest ids in the database and the lowest ones from the data loaded from files (and it equals 1000 by default).

Using `--dry_run` flag you can check which data would be removed by gem, but without removing them actually. Instead of that reports will be printed on standard output. This flag can be also combined with special modes.

### Configuration options

Expand All @@ -80,9 +87,12 @@ backup:
repo_id: 1 # run only for given repository
move_logs: false # run in move logs mode - move all logs to database at destination_db_url URL
remove_orphans: false # run in remove orphans mode
orphans_table: 'builds' # name of the table we will remove orphans from (if not defined, all tables are considered)
load_from_files: false # loads files stored in files_location to the database
id_gap: 1500 # concerns file loading - the gap between the biggest id in database and the lowest one that will be set to loaded data (that's for data inserted by other users during the load being performed; equals 1000 by default)
```

You can also set these properties using env vars corresponding to them: `IF_BACKUP`, `BACKUP_DRY_RUN`, `BACKUP_LIMIT`, `BACKUP_THRESHOLD`, `BACKUP_FILES_LOCATION`, `BACKUP_USER_ID`, `BACKUP_ORG_ID`, `BACKUP_REPO_ID`, `BACKUP_MOVE_LOGS`, `BACKUP_REMOVE_ORPHANS`.
You can also set these properties using env vars corresponding to them: `IF_BACKUP`, `BACKUP_DRY_RUN`, `BACKUP_LIMIT`, `BACKUP_THRESHOLD`, `BACKUP_FILES_LOCATION`, `BACKUP_USER_ID`, `BACKUP_ORG_ID`, `BACKUP_REPO_ID`, `BACKUP_MOVE_LOGS`, `BACKUP_REMOVE_ORPHANS`, `BACKUP_ORPHANS_TABLE`, `BACKUP_LOAD_FROM_FILES`, `BACKUP_ID_GAP`.

You should also specify your database url. You can do this the standard way in `config/database.yml` file, setting the `database_url` hash argument while creating `Backup` instance or using the `DATABASE_URL` env var. Your database should be consistent with the Travis 2.2 database schema.

Expand Down
245 changes: 245 additions & 0 deletions lib/backup/load_from_files.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# frozen_string_literal: true

require 'id_hash'

class Backup
class LoadFromFiles
class JsonContent < String
def hash
@hash ||= JSON.parse(self).symbolize_keys
end
end

class DataFile
attr_accessor :content

def initialize(json_content)
@content = json_content
end

def table_name
@content.match(/"table_name":\s?"(\w+)"/)[1]
end

def table_name_sym
table_name.to_sym
end

def full_hash
@content.hash
end
end

class EntryFile < DataFile
def ids
@content.scan(/"id":\s?(\d+)/).flatten.map(&:to_i)
end

def lowest_id
ids.min
end

def entries
full_hash[:data]
end
end

class RelationshipFile < DataFile
def relationships
@relationships ||= full_hash[:nullified_relationships].map do |rel|
rel.symbolize_keys
end
end
end

def initialize(config, dry_run_reporter=nil)
@config = config
@dry_run_reporter = dry_run_reporter
@touched_models = []
end

def run
set_id_offsets
load_data_with_offsets
cancel_offset_for_foreign_data
set_id_sequences
load_nullified_relationships
end

private

def load_nullified_relationships
relationship_files.each do |file|
file.relationships.each do |rel|
offset = @id_offsets[file.table_name.to_sym]

ActiveRecord::Base.connection.execute(%{
update #{rel[:related_table]}
set #{rel[:foreign_key]} = #{rel[:parent_id].to_i + offset}
where id = #{rel[:related_id].to_i};
})
end
end
end

def set_id_sequences
@touched_models.each do |model|
value = model.last.id + 1
seq = model.table_name + '_id_seq'
set_sequence(seq, value)
end

set_shared_builds_tasks_seq
end

def set_shared_builds_tasks_seq
value = [Build.last&.id || -1, Job.last&.id || -1].max + 1

if value > 0
set_sequence("shared_builds_tasks_seq", value)
end
end

def set_sequence(seq, value)
ActiveRecord::Base.connection.execute("alter sequence #{seq} restart with #{value};")
end

def cancel_offset_for_foreign_data
@loaded_entries.each do |entry|
entry.class.reflect_on_all_associations.select { |a| a.macro == :belongs_to }.each do |association|
foreign_key = association.foreign_key.to_sym
if entry.send(association.name).nil? && entry.send(foreign_key).present?
entry_hash = entry.attributes.symbolize_keys
table_name = get_table_name(entry_hash, association)
offset = @id_offsets[table_name.to_sym]
next if offset.nil?

proper_id = entry.send(foreign_key) - offset
entry.update(foreign_key => proper_id)
end
end
end
end

def load_data_with_offsets
@repository_files = []

@loaded_entries = entry_files.map do |data_file|
model = Model.get_model_by_table_name(data_file.table_name)

if model == Repository
@repository_files << data_file
next
end

load_file(model, data_file)
end.flatten.compact

repository_entries = @repository_files.map do |data_file|
load_file(Repository, data_file)
end.flatten.compact

@loaded_entries.concat(repository_entries)
end

def load_file(model, data_file)
@touched_models << model

data_file.entries&.map do |entry_hash|
load_entry(model, entry_hash)
end
end

def load_entry(model, entry_hash)
entry_hash.symbolize_keys!
entry_hash[:id] += @id_offsets[model.table_name.to_sym]
add_offset_to_foreign_keys!(model, entry_hash)
model.create(entry_hash)
end

def add_offset_to_foreign_keys!(model, entry_hash)
model.reflect_on_all_associations.select { |a| a.macro == :belongs_to }.each do |association|
foreign_key_sym = association.foreign_key.to_sym
next unless entry_hash[foreign_key_sym]

table_name = get_table_name(entry_hash, association)
entry_hash[foreign_key_sym] += @id_offsets[table_name.to_sym] || 0
end
end

def get_table_name(entry_hash, association)
if association.polymorphic?
type_symbol = association.foreign_key.gsub(/_id$/, '_type').to_sym
class_name = entry_hash[type_symbol]
else
class_name = association.class_name
end

Model.get_model(class_name).table_name
end

def file_contents
@file_contents ||= Dir["#{@config.files_location}/**/*.json"].map do |file_path|
JsonContent.new(File.read(file_path))
end
end

def entry_files
@entry_files ||= file_contents.map do |content|
next unless content.hash[:data]

EntryFile.new(content)
end.compact
end

def relationship_files
@relationship_files ||= file_contents.map do |content|
next if content.hash[:data]

RelationshipFile.new(content)
end.compact
end

def find_lowest_ids_from_files
@lowest_ids_from_files = HashOfArrays.new

entry_files.each do |data_file|
table_name = data_file.table_name_sym
min_id = data_file.lowest_id
@lowest_ids_from_files.add(table_name, min_id) if min_id
end

@lowest_ids_from_files = @lowest_ids_from_files.map { |k, arr| [k, arr.min] }.to_h
end

def find_highest_ids_from_db
@highest_ids_from_db = {}

Model.subclasses.each do |model|
table_name = model.table_name.to_sym
@highest_ids_from_db[table_name] = model.order(:id).last&.id || 0
end
end

def set_id_offsets
find_lowest_ids_from_files
find_highest_ids_from_db

@id_offsets = @lowest_ids_from_files.map do |key, file_min|
db_max = @highest_ids_from_db[key]
offset = db_max - file_min + @config.id_gap
[key, offset]
end.to_h

make_offset_common_for_builds_and_jobs
end

def make_offset_common_for_builds_and_jobs
if @id_offsets[:builds] && @id_offsets[:jobs]
offset = [@id_offsets[:builds], @id_offsets[:jobs]].max
@id_offsets[:builds] = offset
@id_offsets[:jobs] = offset
end
end
end
end
Loading