|
| 1 | +GOBBLIN 0.7.0 |
| 2 | +------------- |
| 3 | + |
| 4 | +Created Date: 05/11/2016 |
| 5 | + |
| 6 | +## NEW FEATURES |
| 7 | + |
| 8 | +* [Hive Registration] [PR 651] Hive registration initial commit |
| 9 | +* [Runtime] [PR 674] Lifecycle Events for JobListeners |
| 10 | +* [Hive Registration] [PR 684] Add inline Hive registration to Gobblin job |
| 11 | +* [SFTP] [PR 686] Modified the SFTP extractor to also use password for connecting |
| 12 | +* [Hive Registration] [PR 701] Reg compacted datasets in Hive |
| 13 | +* [Retention] [PR 716] Use configClient to configure retention jobs |
| 14 | +* [Hive Distcp] [PR 728] Hive dataset implementation for distcp. |
| 15 | +* [Hive Distcp] [PR 749] Hivesource copyentity |
| 16 | +* [Hive Distcp] [PR 757] Hive distcp: check target metastore to perform table syncs. |
| 17 | +* [Hive Registration] [PR 773] Refactoring Hive registration to allow query-based approach |
| 18 | +* [Config Management] [PR 774] Add HDFS config deployment tool |
| 19 | +* [Avro to ORC] [PR 780] Flatten Avro Schema to make it optimal for ORC |
| 20 | +* [Hive Distcp] [PR 801] Implemented Hive registration steps in Hive distcp. |
| 21 | +* [Hive Registration] [PR 803] Add snapshot Hive registration policy |
| 22 | +* [YARN] [PR 828] Add zookeeper based job lock for gobblin yarn |
| 23 | +* [Kafka] [PR 835] Add kafka simple json source |
| 24 | +* [Metrics] [PR 863] Metric reporters (Graphite, InfluxDB) |
| 25 | +* [JDBC Writer] [PR 893] JDBC Writer |
| 26 | +* [Config Management] [PR 928] Substitution of system and env variable in config management |
| 27 | +* [Core] [PR 942] Allow disabling state store. |
| 28 | +* [Avro to ORC] [PR 972] Avro2orc Source/Converter/Extractor/Publisher |
| 29 | + |
| 30 | +## BUG FIXES |
| 31 | + |
| 32 | +* [Distcp] [PR 645] Fix parent directory creation in distcp-ng |
| 33 | +* [Admin Dashboard] [PR 646] Downgraded jetty version to be java 7 compatible |
| 34 | +* [Admin Dashboard] [PR 648] Excluded old version of servlet-api artifact from Hadoop 2 dependencies |
| 35 | +* [State Store] [PR 655] Fix hanging StateStoreCleaner |
| 36 | +* [Publisher] [PR 657] Issue #561 - fix for BaseDataPublisher to mark WorkingState correctly |
| 37 | +* [Core] [PR 661] Change ParallelRunner.close to wait for all futures to finish |
| 38 | +* [Core] [PR 663] ParallelRunner catches exceptions correctly and has failure policies. |
| 39 | +* [Build] [PR 664] Fix broken Gobblin version resolution ( fixes #662 ) |
| 40 | +* [Build] [PR 665] Gobblin-compaction tarball doesn't contain gobblin-compaction.jar |
| 41 | +* [Core] [PR 670] Fixing FindBugs warnings |
| 42 | +* [Core] [PR 676] Ensure that parallel runner waits for the underlying tasks to finish |
| 43 | +* [Core] [PR 677] Fix race condition in FsStateStore |
| 44 | +* [Compaction] [PR 680] Fix a ConcurrentModificationException in MRCompactor |
| 45 | +* [Admin Dashboard] [PR 681] Fixed off by one issue when listing the job executions in Admin UI |
| 46 | +* [Config Management] [PR 682] various bug fixes when integrate test with hdfs store |
| 47 | +* [Core] [PR 690] Add missing jar to MR runner script |
| 48 | +* [Distcp] [PR 691] Fix permissions for directories in distcp. |
| 49 | +* [Core] [PR 700] Add missing jars to gobblin mapreduce runner, sort. |
| 50 | +* [Core] [PR 706] Fixing CliOptions config file fs |
| 51 | +* [Core] [PR 722] Fixing FindBugs warnings in gobblin-compaction |
| 52 | +* [Build] [PR 743] Fixing skipTestGroup option |
| 53 | +* [Build] [PR 775] Fix javadoc warnings by only adding linksOffline to projects that the current project depends on. |
| 54 | +* [Core] [PR 797] Fixing Fork + Task Retry Logic #776 |
| 55 | +* [Distcp] [PR 884] Fix issue with replicating owner and permission of system directories in distcp |
| 56 | +* [Data Management] [PR 887] Fix NPE in DateTimeDatasetVersionFinder |
| 57 | +* [Data Management] [PR 888] Fix NPE in datasetversion finder |
| 58 | +* [Core] [PR 903] The underlying Avro CodecFactory only matches lowercase codecs, so we should make sure they are lowercase before trying to find one |
| 59 | +* [Compaction] [PR 952] Unified way to execute Hive and MR-based compaction jobs |
| 60 | +* [Core] [PR 958] Fix parallelization of renameRecursively in PathUtils. |
| 61 | +* [YARN] [PR 962] Cleanup the helix job when closing the GobblinHelixJobLauncher |
| 62 | + |
| 63 | +## IMPROVEMENTS |
| 64 | + |
| 65 | +* [Distcp] [PR 647] Add option to set group for distcp-ng |
| 66 | +* [Build] [PR 650] Javadoc task should pick up system proxy settings |
| 67 | +* [Distcp] [PR 669] Parallelized copy listing generation in distcp. |
| 68 | +* [Data Management] [PR 671] Added ConfigurableCleanableDatasetFinder. Renamed some CleanableDatasets for clarification |
| 69 | +* [Admin Dashboard] [PR 687] Enable AdminUI when running gobblin under yarn |
| 70 | +* [Job Exec History] [PR 688] Added a log line when starting to write job execution history |
| 71 | +* [Build] [PR 694] Adding throttled upload of sonatype packages |
| 72 | +* [Metrics] [PR 698] Log which custom metric reporter class is wired up |
| 73 | +* [Documentation] [PR 704] Remove @link tags from @see javadoc tags |
| 74 | +* [Job History Store] [PR 705] Improve database history store performance |
| 75 | +* [YARN] [PR 708] Fixed the file mode of the gobblin-yarn.sh script to match the other scripts. |
| 76 | +* [Core] [PR 713] Don't send an email on shutdown when email notifications are disabled. |
| 77 | +* [Admin Dashboard] [PR 717] More flexible Admin configuration |
| 78 | +* [Core] [PR 727] Modified to add a configuration to skip previous run during FileBasedExtraction for full load |
| 79 | +* [Core] [PR 733] Add ability to configure the encryption_key_loc filesystem |
| 80 | +* [Build] [PR 737] Better travis scripts which support test error reporting |
| 81 | +* [Core] [PR 741] Fix #740 for FsStateStore.createAlias and removing usage of FileUtil.copy |
| 82 | +* [Core] [PR 759] Allow downloading other filetypes in FileBasedExtractor |
| 83 | +* [Data Management] [PR 760] Per dataset retention blacklist |
| 84 | +* [Retention] [PR 764] Ensure that jobs cleanup correctly |
| 85 | +* [Core] [PR 766] Create GZIPFileDownloader.java |
| 86 | +* [YARN] [PR 768] Switch LogCopier from ScheduledExecutorService to HashedWheelTimer |
| 87 | +* [Core] [PR 772] Upgrading and re-enabling Findbugs |
| 88 | +* [Kafka] [PR 777] Adding Parallelization to WorkUnit Creation in KafkaSource |
| 89 | +* [Documentation] [PR 788] Initial commit for mkdocs and readthedocs integration |
| 90 | +* [Kafka] [PR 789] Parallize late data copy |
| 91 | +* [Config Management] [PR 794] Read current version of config store from metadata file |
| 92 | +* [Build] [PR 799] (#798) Adding JaCoCo and Coveralls support for code coverage analysis |
| 93 | +* [Core] [PR 808] (#792, #802) Adding ApplicationLauncher to manage app services, including GobblinMetrics lifecyle |
| 94 | +* [Data Management] [PR 812] Make generic version, version finder, version selection policy |
| 95 | +* [Hive Registration] [PR 815] Improve Hive registration performance |
| 96 | +* [Core] [PR 829] Adds support to `HadoopUtils` for overwriting files |
| 97 | +* [Build] [PR 832] (#830) excluding hive-exec from gobblin-compaction |
| 98 | +* [YARN] [PR 834] Enable the maximum log file size for Gobblin Yarn LogCopier to be configured |
| 99 | +* [Compaction] [PR 847] Change default value of compaction.job.avro.single.input.schema to true |
| 100 | +* [Distcp] [PR 849] Distcp partition filter and kerberos authentication |
| 101 | +* [Kafka] [PR 856] Clean up KafkaSource |
| 102 | +* [Core] [PR 872] Change BoundedBlockingRecordQueue to be backed by ArrayBlockingQueue |
| 103 | +* [Distcp] [PR 873] Implement simulate mode in distcp. |
| 104 | +* [Distcp] [PR 877] Stream datasets to distcp. |
| 105 | +* [Hive Distcp] [PR 878] Distcp on Hive supports predicates for fast partition skips, and supports copying full directories recursively |
| 106 | +* [Hive Registration] [PR 885] Add locking to Hive registration |
| 107 | +* [Distcp] [PR 886] Purge distcp persist directory at the beginning of publish phase. |
| 108 | +* [Distcp] [PR 889] Avro schema modification in distcp is executed only for URLs in the origin schema and authority |
| 109 | +* [Hive Distcp] [PR 890] Dynamic partition filtering for distcp Hive. |
| 110 | +* [Hive Registration] [PR 894] Enable multiple db and table names in Hive registration |
| 111 | +* [Core] [PR 897] Make it possible to disable publishing in job by specifying empty job data publisher |
| 112 | +* [Core] [PR 902] Make it possible to specify empty job data publisher |
| 113 | +* [Distcp] [PR 906] Maximum size for distcp CopyContext cache. |
| 114 | +* [Retention] [PR 908] Add typesafe support to glob version finder for audit retention |
| 115 | +* [Core] [PR 913] Job state stored in distributed cache in MR mode. |
| 116 | +* [Data Management] [PR 926] Make NewestKSelectionPolicy use Java Generics instead of FileSystemDatasetVersion |
| 117 | +* [Core] [PR 932] Separate jobstate from taskstate and datasetstate |
| 118 | +* [Documentation] [PR 937] Add documentation for topic specific partitioning configuration |
| 119 | +* [Hive Distcp] [PR 940] Distcp hive registration metadata |
| 120 | +* [Hive Distcp] [PR 941] Delete empty parent directories on Hive de-registration. Optimize deregistration |
| 121 | +* [Distcp] [PR 944] Bin pack distcp-ng work units. |
| 122 | +* [Data Management] [PR 947] Make VersionSelectionPolicy to work with any DatasetVersion |
| 123 | +* [Distcp] [PR 949] Parallelize renameRecursively for distcp. |
| 124 | +* [Hive Distcp] [PR 950] Add delete methods when deregistering Hive partitions in distcp. |
| 125 | +* [Data Management] [PR 951] Moving NonNewestKSelectionPolicy logic to NewestKSelectionPolicy |
| 126 | +* [Hive Distcp] [PR 953] Added instrumentation to Hive copy. |
| 127 | +* [Config Management] [PR 956] Make the default store for SimpleHDFSConfigStoreFactory configurable |
| 128 | +* [Hive Distcp] [PR 959] Remove checksum from HiveDistcp copy listing. |
| 129 | +* [Hive Distcp] [PR 960] Accelerate path diff in HiveCopyEntityHelper by reusing FileStatus. |
| 130 | +* [Distcp] [PR 966] Set max work units per multiworkunit for distcp. |
| 131 | +* [Core] [PR 970] Fixing rest of findbugs warnings, and setting findbugs to fail the build on new warnings |
| 132 | +* [Distcp] [PR 971] Distcp ng handle directory structure copy |
| 133 | +* [Core] [PR 974] Deprecating and removing support for Hadoop versions other than 2.x.x |
| 134 | +* [Hive Distcp] [PR 975] Added whitelist and blacklist capabilities to HiveDatasetFinder. |
| 135 | + |
1 | 136 | GOBBLIN 0.6.2
|
2 | 137 | =============
|
3 | 138 |
|
|
0 commit comments