Skip to content

Commit f2f2a1d

Browse files
committed
Release 0.8.0
1 parent ad8b4be commit f2f2a1d

File tree

1 file changed

+174
-5
lines changed

1 file changed

+174
-5
lines changed

CHANGELOG.md

+174-5
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,177 @@
1+
GOBBLIN 0.8.0
2+
-------------
3+
4+
#### Created Date: 08/22/2016
5+
6+
## Highlights
7+
8+
* Gobblin can now convert avro to orc files through Hive. Documentation: http://gobblin.readthedocs.io/en/latest/adaptors/Hive-Avro-To-ORC-Converter/.
9+
* Gobblin can now write data to Kafka using a new `KafkaWriter`. Documentation: http://gobblin.readthedocs.io/en/latest/sinks/Kafka/.
10+
* Gobblin distcp can now replicate Hive tables between different Hive Metastores. Documentation: http://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/.
11+
* Gobblin can now support hive based retentions. Documentation: http://gobblin.readthedocs.io/en/latest/data-management/Gobblin-Retention/.
12+
* Gobblin can now support job templates, which reduces the efforts of writing a Gobblin job.
13+
Documentation: http://gobblin.readthedocs.io/en/latest/user-guide/Gobblin-template/.
14+
15+
## NEW FEATURES
16+
17+
* [Kafka] [PR 1016] Integration with Confluent Schema Registry, Confluent Deserializers, and Kafka Deserializers
18+
* [Avro to ORC] [PR 1031] Adding Avro To ORC conversion logic and related framework modifications
19+
* [General FileSystem Support] [PR 1066] Config file monitor for general file system
20+
* [Avro to ORC] [PR 1068] Nested Avro to Nested ORC conversion support
21+
* [General FileSystem Support] [PR 1073] extension of loading config file from general file system
22+
* [AWS] [PR 1088] Gobblin on AWS
23+
* [Kafka Writer] [PR 1089] Kafka writer
24+
* [JDBC Extractor] [PR 1090] Teradata JDBC Extractor and Source
25+
* [Avro to ORC] [PR 1093] Support for schema evolution, staging, selective column projection and compatibility check for Avro to ORC
26+
* [Hive Retention] [PR 1106] Hive Based Retention
27+
* [Job Templates] [PR 1145] Initial commit for job configuration template
28+
* [Http Writer] [PR 1186] HttpWriter including SalesForceRestWriter, ThrottleWriter, etc
29+
* [Avro to ORC] [PR 1188] Avro to orc data validation
30+
* [Job Templates] [PR 1197] Kafka-template
31+
* [Job Launcher] [PR 1203] New std driver2
32+
* [Core] [PR 1216] Adding a simple console writer to gobblin
33+
34+
## BUG FIXES
35+
36+
* [YARN] [PR 982] Using new zk port numbers for unit tests
37+
* [Kafka] [PR 996] Fix offset related bug in KafkaSource
38+
* [Core] [PR 999] distcp-ng throws UnsupportedOperationException
39+
* [Build] [PR 1001] Setting heaps size for gobblin-runtime tests due to OOM in some cases
40+
* [Core] [PR 1002] Set explicit 755 permissions to state store
41+
* [Core] [PR 1005] Fixing SOURCE_QUERYBASED_LOW_WATERMARK_BACKUP_SECS no default value
42+
* [Config Management] [PR 1043] Fix includes order
43+
* [JDBC Writer] [PR 1050] JDBCWriter. Bug fix on SQL statements. Bug fix on data type mapping.
44+
* [Data Management] [PR 1051] Fix default blacklist key
45+
* [Salesforce] [PR 1069] Adding security token to Salesforce bulk API login
46+
* [Runtime] [PR 1078] Fixing possible NPE in SourceDecorator
47+
* [Documentation] [PR 1081] Fixing search for Gobblin ReadTheDocs
48+
* [Documentation] [PR 1107] Minor text formatting fix for README.md
49+
* [Salesforce] [PR 1118] gobblin salesforce update to new proxy
50+
* [Config Management] [PR 1135] Revert changes to ConfigUtils
51+
* [Utility] [PR 1147] Capture exceptions correctly in HadoopUtilsTest.testSafeRenameRecursively
52+
* [Salesforce] [PR 1152] Updated gobblin salesforce to resolve entity.source and extract.table.name
53+
* [Build] [PR 1153] Make sure maven central repo is first; bug fixes
54+
* [Utility] [PR 1154] Fix for failing createProxiedFileSystemUsingToken
55+
* [Avro to ORC] [PR 1155] Changed Hive validation to make it compatible with old Hive version with auth turned on, and Hive query generation compile with new Hive version
56+
* [Build] [PR 1156] Upgrade wix-embedded-mysql
57+
* [Runtime] [PR 1157] Move test MR jobs dir to /tmp to avoid issues with DistributedCache
58+
* [Distcp] [PR 1160] FIxed a race condition on CopyDataPublisher.
59+
* [Metrics] [PR 1170] Not fail the task if metricsReport failed to be stopped
60+
* [Metrics] [PR 1176] Added a backwards compatible constructor to SchemRegistryVersionWriter
61+
* [Retention] [PR 1182] Throw exception when retention dataset finder fails to initialize
62+
* [Retention] [PR 1202] Bug fix - Retention does not blacklist dataset
63+
* [Runtime] [PR 1215] Fixed silent failures and hung application when a standalone service fails to initialize.
64+
* [Example] [PR 1217] Fixing console writer example
65+
66+
## IMPROVEMENTS
67+
68+
* [YARN] [PR 978] Initial commit for gobblin-cluster; gobblin-yarn refactoring
69+
* [Core] [PR 979] Initial commit for HTTP Writer APIs
70+
* [Core] [PR 980] Add metadata after completion of job to a specific metadata directory
71+
* [Hive Distcp] [PR 983] need to deregister existing table
72+
* [Documentation] [PR 988] Adding documentation page for Gobblin Distcp
73+
* [Documentation] [PR 989] Added retention docs
74+
* [Documentation] [PR 991] Add Hive registration doc
75+
* [Kafka] [PR 992] Making kafka metadata read more resillient to issues with the brokers
76+
* [Documentation] [PR 993] open source wiki for config management
77+
* [Data Management] [PR 998] Merge the two LongWatermarks
78+
* [Hive Distcp] [PR 1003] Added the predicate check to skip full table diff if the existing table's registration time > source table's mod time
79+
* [Distcp] [PR 1008] ETL-4470: Implementation of http filer puler using Distcp-ng
80+
* [Documentation] [PR 1012] Document changes in PR#952
81+
* [Documentation] [PR 1013] Update documents
82+
* [Build] [PR 1023] Adding parallel test Travis VMs
83+
* [Hive Registration] [PR 1027] Added configuration to Hive client for getting credentials.
84+
* [Hive Registration] [PR 1034] Hive metastore initialization should support empty HCat uri ie default to platform defaults
85+
* [Avro to ORC] [PR 1035] Use table schema and partition schema
86+
* [Avro to ORC] [PR 1036] Hive metastore connection pool optimization, Fixes for: backward compatibility for Hive in AvroToOrc, schema parser deserialization from schema literal, database name in Hive DDL query generation, Hive metastore connection pool initialization NPE if Hcat uri is platform provided
87+
* [Avro to ORC] [PR 1037] Add sla events for avro to orc conversion
88+
* [Hive Registration] [PR 1038] Made Hive metastore connection auto returnable to connection pool after Hive dataset discovery
89+
* [Avro to ORC] [PR 1044] Made HiveAvroToOrcConverter compatible with Hive v0.13 version
90+
* [Hive Distcp] [PR 1045] Add bootstrap low watermark support for HiveSource in data management
91+
* [Avro to ORC] [PR 1046] [Avro to ORC] Mark all workunits of a dataset as failed if one task fails
92+
* [Hive Distcp] [PR 1053] Add lookback days for HiveSource
93+
* [Hive Registration] [PR 1054] Converted Hive dereg / registration to post publish steps, fixed missing fileset.
94+
* [Distcp] [PR 1055] Parallelize commit rebased
95+
* [Hive Distcp] [PR 1056] Add lastDataPublishTime in hive table/partition properties
96+
* [Runtime] [PR 1060] MR launcher does not write tasks to the jobstate file in HDFS.
97+
* [Hive Distcp] [PR 1062] Enable AvroSchemaManager to read schema from Kafka schema registry
98+
* [Hive Distcp] [PR 1067] Add a backfill hive source that does not check watermarks
99+
* [Data Management] [PR 1071] Add ConvertibleHiveDataset and config store support to HiveDatasetFinder
100+
* [Documentation] [PR 1082] Updating the README and other outdated docs to encourage use of Gobblin Releases
101+
* [Avro to ORC] [PR 1087] Add support for nested and flattened orc conversion configuration
102+
* [Kafka] [PR 1091] Confluent schema registry example for kafka writer
103+
* [Json Converter] [PR 1092] Added JsonConverter to parse Json files to a format such that JsonIntermediateToAvro converter can parse
104+
* [Avro to ORC] [PR 1095] Refactored to rename HiveAvroORCQueryUtils to HiveAvroORCQueryGenerator
105+
* [Compaction] [PR 1096] Added simulate mode in Hive JDBC Connector to simulate query execution
106+
* [Avro to ORC] [PR 1097] Added limit clause to Hive query generation to enable conversion validation of sample subset
107+
* [Avro to ORC] [PR 1098] Added Azkaban job that can validate conversion result by comparing source and target Hive tables
108+
* [Core] [PR 1102] Inter strings in deserialized States to reduce memory usage.
109+
* [Documentation] [PR 1104] Added powered by section in wiki for companies using Gobblin
110+
* [Documentation] [PR 1105] Added Gobblin meetup June 2016 presentations on Talks and Tech Blogs wiki
111+
* [Documentation] [PR 1109] Updating the code contributions documentation
112+
* [Documentation] [PR 1110] Added videos from June 2016 meetup to talks-and-tech-blogs wiki page
113+
* [Documentation] [PR 1111] Made order of presentations chronological in talks-and-tech-blogs wiki page
114+
* [Documentation] [PR 1112] Update Gobblin on AWS video presentation link with right start time in playback
115+
* [Documentation] [PR 1113] Added Paypal to powered by wiki page
116+
* [Documentation] [PR 1115] Adding Sandia National Labs to Powered-By page
117+
* [Avro to ORC] [PR 1119] Changed concatenated queries string to list in Hive converter publisher
118+
* [Avro to ORC] [PR 1120] Added Hive query generation to optionally support explicit database names
119+
* [Avro to ORC] [PR 1122] Made changes to handle Hive-6129 (inverted exchange partition bug) and corresponding support for backward incompatible changes in Hive
120+
* [Hive Distcp] [PR 1126] Make distcp publisher safer: renameRecursively fails appropriately, hive registration fails if location doesn't exist.
121+
* [Avro to ORC] [PR 1127] Drop hourly partitions when daily data gets converted to ORC
122+
* [Hive Registration] [PR 1128] Added events in hive-registration
123+
* [Avro to ORC] [PR 1138] Change Hive Avro to ORC publish to use Gobblin constructs instead of Hive exchange partition query
124+
* [Avro to ORC] [PR 1139] Added support to escape the Hive nested field names when derived from destination table as raw string
125+
* [Data Management] [PR 1140] Moved WhitelistBlacklist from data-management to utility.
126+
* [Avro to ORC] [PR 1141] Renamed partitionDir.prefixLocationHint to source.dataPathIdentifier to be more consistent with naming across Hive data conversion
127+
* [Build] [PR 1142] Add gradle property withFindBugsXmlReport to enable XML FindBugs reports
128+
* [Avro to ORC] [PR 1148] Support for distcp-ng registration time in isOlderThanLookback check and minor refactoring
129+
* [Avro to ORC] [PR 1151] Changed Hive conversion validation job to use HIVE_DATASET_CONFIG_PREFIX consistent with HiveAvroToOrcSource
130+
* [Avro to ORC] [PR 1163] Fail avro to orc valiation job on at least one failure
131+
* [Hive Registration] [PR 1165] Add create time to newly registered Hive tables and partitions.
132+
* [Hive Distcp] [PR 1167] Adding options in watermarkCopyableFileFilter and some refactoring
133+
* [Metrics] [PR 1169] Gobblin metrics registers the base schemas instead of inferring them from events.
134+
* [Avro to ORC] [PR 1171] Added more SLA event metadata to Avro to Orc conversion job
135+
* [Avro to ORC] [PR 1172] Use camel case for event names
136+
* [Avro to ORC] [PR 1173] Parallalize Avro to Orc validation job
137+
* [Utility] [PR 1175] Schema files (schema.avsc) will be written with 774 permission.
138+
* [Hive Distcp] [PR 1180] Add createtime when altering a table.
139+
* [Job Templates] [PR 1183] change the key name of required.attributes
140+
* [Job Templates] [PR 1184] Fixed name of ResourceBasedTemplate.
141+
* [Job Templates] [PR 1185] Fix naming of template and template class file.
142+
* [Avro to ORC] [PR 1189] cache data modTime to reduce too many HDFS calls
143+
* [Hive Retention] [PR 1190] Add logs to hive retention. Support more DatasetFinder constructors
144+
* [Data Management] [PR 1192] Add config store uri builder for hive datasets
145+
* [Core] [PR 1204] Refactor methods between HadoopFsHelper and AvroFsHelper
146+
* [Avro to ORC] [PR 1205] AvroToorc - Implemented a per partition watermark
147+
* [Job Launcher] [PR 1206] Refactored SchedulerUtils into a new PullFileLoader that uses Config to load pull files.
148+
* [Documentation] [PR 1207] template wiki doc added
149+
* [Kafka] [PR 1210] Make topic suffix configurable for lookup in Confluent Schema Registry
150+
* [Job Templates] [PR 1211] Restored template functionality removed accidentally. Add unit test for the functionality.
151+
* [Kafka] [PR 1218] Making Kafka consumer configurable for Kafka extract
152+
* [Runtime] [PR 1220] Refactored MR mode to use GobblinInputFormat.
153+
* [Kafka Writer] [PR 1226] Making kafka writer more robust, adding tests
154+
* [Job Templates] [PR 1228] Templates use config instead of properties.
155+
156+
## EXTERNAL CONTRIBUTIONS
157+
158+
We would like to thank all our external contributors for helping improve Gobblin.
159+
160+
* singhd10:
161+
-Add metadata after completion of job to a specific metadata directory (PR 980)
162+
* shelocks:
163+
-Fixing SOURCE_QUERYBASED_LOW_WATERMARK_BACKUP_SECS no default value (PR 1005)
164+
* lbendig,Lorand Bendig:
165+
-Document changes in PR#952 (PR 1012)
166+
-Make topic suffix configurable for lookup in Confluent Schema Registry (PR 1210)
167+
* jinhyukchang, Jinhyuk Chang:
168+
-JDBCWriter. Bug fix on SQL statements. Bug fix on data type mapping. (PR 1050)
169+
-HttpWriter including SalesForceRestWriter, ThrottleWriter, etc (PR 1186)
170+
* ypopov, Eugene Popov:
171+
-Teradata JDBC Extractor and Source (PR 1090)
172+
* pldash
173+
-Added JsonConverter to parse Json files to a format such that JsonIntermediateToAvro converter can parse (PR 1092)
174+
1175
GOBBLIN 0.7.0
2176
-------------
3177

@@ -48,9 +222,7 @@ GOBBLIN 0.7.0
48222
* [Publisher] [PR 657] Issue #561 - fix for BaseDataPublisher to mark WorkingState correctly
49223
* [Core] [PR 661] Change ParallelRunner.close to wait for all futures to finish
50224
* [Core] [PR 663] ParallelRunner catches exceptions correctly and has failure policies.
51-
* [Build] [PR 664] Fix broken Gobblin version resolution ( fixes #662 )
52225
* [Build] [PR 665] Gobblin-compaction tarball doesn't contain gobblin-compaction.jar
53-
* [Core] [PR 670] Fixing FindBugs warnings
54226
* [Core] [PR 676] Ensure that parallel runner waits for the underlying tasks to finish
55227
* [Core] [PR 677] Fix race condition in FsStateStore
56228
* [Compaction] [PR 680] Fix a ConcurrentModificationException in MRCompactor
@@ -60,9 +232,6 @@ GOBBLIN 0.7.0
60232
* [Distcp] [PR 691] Fix permissions for directories in distcp.
61233
* [Core] [PR 700] Add missing jars to gobblin mapreduce runner, sort.
62234
* [Core] [PR 706] Fixing CliOptions config file fs
63-
* [Core] [PR 722] Fixing FindBugs warnings in gobblin-compaction
64-
* [Build] [PR 743] Fixing skipTestGroup option
65-
* [Build] [PR 775] Fix javadoc warnings by only adding linksOffline to projects that the current project depends on.
66235
* [Core] [PR 797] Fixing Fork + Task Retry Logic #776
67236
* [Distcp] [PR 884] Fix issue with replicating owner and permission of system directories in distcp
68237
* [Data Management] [PR 887] Fix NPE in DateTimeDatasetVersionFinder

0 commit comments

Comments
 (0)