Skip to content

Commit 8bada10

Browse files
author
Chavdar Botev
committed
Adding 0.6.0 CHANGELOG
1 parent 8232ddd commit 8bada10

File tree

1 file changed

+65
-0
lines changed

1 file changed

+65
-0
lines changed

CHANGELOG

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
2+
GOBBLIN 0.6.0
3+
--------------
4+
5+
NEW FEATURES
6+
7+
* [Compaction] Added M/R compaction/de-duping for hourly data
8+
* [Compaction] Added late data handling for hourly and daily M/R compaction: https://github.com/linkedin/gobblin/wiki/Compaction#handling-late-records; added support for triggering M/R compaction if late data exceeds a threshold
9+
* [I/O] Added support for using Hive SerDe's through HiveWritableHdfsDataWriter
10+
* [I/O] Added the concept of data partitioning to writers: https://github.com/linkedin/gobblin/wiki/Partitioned-Writers
11+
* [Runtime] Added CliLocalJobLauncher for launching single jobs from the command line.
12+
* [Converters] Added AvroSchemaFieldRemover that can remove specific fields from a (possibly recursive) Avro schema.
13+
* [DQ] Added new row-level policies RecordTimestampLowerBoundPolicy and AvroRecordTimestampLowerBoundPolicy for checking if a record timestamp is too far in the past.
14+
* [Kafka] Added schema registry API to KafkaAvroExtractor which enables supports for various Kafka schema registry implementations (e.g. Confluent's schema registry).
15+
* [Build/Release] Added build instrumentation to publish artifacts to Maven Central
16+
17+
BUG FIXES
18+
19+
* [Retention management] Trash handles deletes of files already existing in trash correctly.
20+
* [Kafka] Fixed an issue that may cause Kafka adapter to miss data if the fork fails.
21+
22+
OTHER IMPROVEMENTS
23+
24+
* [Runtime] Added metrics for job executions
25+
* [Metrics] Added a root metric context to keep track of GC of metrics and metric contexts and make sure those are properly reported
26+
* [Compaction] Improve topic isolation in MRCompactor
27+
* [Build/release] Java version compatibility raised to Java 7.
28+
* [Runtime] Deprecated COMMIT_ON_PARTIAL_SUCCESS and added a new policy for successful extracts
29+
* [Retention management] Async trash implementation for parallel deletions.
30+
* [Metrics] Added tracking events emission when data gets published
31+
* [Retention management] Added support for parallel execution to the dataset cleaner
32+
* [Runtime] Update job execution info in the execution history store upon every task completion
33+
34+
INCUBATION
35+
36+
Note: these are new features which are under active development and may be subject to significant changes.
37+
38+
* [gobblin-ce] Adding support for Gobblin Continuous Execution on Yarn
39+
* [distcp-ng] Started work on bulk transfer (file copies) using Gobblin
40+
* [distcp-ng] Added a light-weight Hadoop FileSystem implementation for file transfer from SFTP
41+
* [gobblin-config] Added API for dataset driven
42+
43+
EXTERNAL CONTRIBUTIONS
44+
45+
We would like to thank all our external contributors for helping improve Gobblin.
46+
47+
* kadaan, joel.baranick:
48+
- Separate publisher filesystem from writer filesystem
49+
- Support for generating Idea projects with the correct language level (Java 7)
50+
- Fixed yarn conf path in gobblin-yarn.sh
51+
* mwol(Maurice Wolter)
52+
- Implemented new class AvroCombineFileSplit which stores the avro schema for each split, determined by the corresponding input file.
53+
* cheleb(NOUGUIER Olivier)
54+
- Add support for maven install
55+
* dvenkateshappa
56+
- bugifx to RestApiExtractor.java
57+
- Added an excluding column list , which can be used for salesforce configuration with huge list of columns.
58+
* klyr (Julien Barbot)
59+
- bugfix to gobblin-mapreduce.sh
60+
* gheo21
61+
- Bumped kafka dependency to 2.11
62+
* ahollenbach (Andrew Hollenbach)
63+
- configuration improvements for standalone mode
64+
* lbendig (Lorand Bendig)
65+
- fixed a bug in DatasetState creation

0 commit comments

Comments
 (0)