You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+174-5
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,177 @@
1
+
GOBBLIN 0.8.0
2
+
-------------
3
+
4
+
#### Created Date: 08/22/2016
5
+
6
+
## Highlights
7
+
8
+
* Gobblin can now convert avro to orc files through Hive. Documentation: http://gobblin.readthedocs.io/en/latest/adaptors/Hive-Avro-To-ORC-Converter/.
9
+
* Gobblin can now write data to Kafka using a new `KafkaWriter`. Documentation: http://gobblin.readthedocs.io/en/latest/sinks/Kafka/.
10
+
* Gobblin distcp can now replicate Hive tables between different Hive Metastores. Documentation: http://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/.
11
+
* Gobblin can now support hive based retentions. Documentation: http://gobblin.readthedocs.io/en/latest/data-management/Gobblin-Retention/.
12
+
* Gobblin can now support job templates, which reduces the efforts of writing a Gobblin job.
*[Salesforce][PR 1069] Adding security token to Salesforce bulk API login
46
+
*[Runtime][PR 1078] Fixing possible NPE in SourceDecorator
47
+
*[Documentation][PR 1081] Fixing search for Gobblin ReadTheDocs
48
+
*[Documentation][PR 1107] Minor text formatting fix for README.md
49
+
*[Salesforce][PR 1118] gobblin salesforce update to new proxy
50
+
*[Config Management][PR 1135] Revert changes to ConfigUtils
51
+
*[Utility][PR 1147] Capture exceptions correctly in HadoopUtilsTest.testSafeRenameRecursively
52
+
*[Salesforce][PR 1152] Updated gobblin salesforce to resolve entity.source and extract.table.name
53
+
*[Build][PR 1153] Make sure maven central repo is first; bug fixes
54
+
*[Utility][PR 1154] Fix for failing createProxiedFileSystemUsingToken
55
+
*[Avro to ORC][PR 1155] Changed Hive validation to make it compatible with old Hive version with auth turned on, and Hive query generation compile with new Hive version
56
+
*[Build][PR 1156] Upgrade wix-embedded-mysql
57
+
*[Runtime][PR 1157] Move test MR jobs dir to /tmp to avoid issues with DistributedCache
58
+
*[Distcp][PR 1160] FIxed a race condition on CopyDataPublisher.
59
+
*[Metrics][PR 1170] Not fail the task if metricsReport failed to be stopped
60
+
*[Metrics][PR 1176] Added a backwards compatible constructor to SchemRegistryVersionWriter
61
+
*[Retention][PR 1182] Throw exception when retention dataset finder fails to initialize
62
+
*[Retention][PR 1202] Bug fix - Retention does not blacklist dataset
63
+
*[Runtime][PR 1215] Fixed silent failures and hung application when a standalone service fails to initialize.
64
+
*[Example][PR 1217] Fixing console writer example
65
+
66
+
## IMPROVEMENTS
67
+
68
+
*[YARN][PR 978] Initial commit for gobblin-cluster; gobblin-yarn refactoring
69
+
*[Core][PR 979] Initial commit for HTTP Writer APIs
70
+
*[Core][PR 980] Add metadata after completion of job to a specific metadata directory
71
+
*[Hive Distcp][PR 983] need to deregister existing table
72
+
*[Documentation][PR 988] Adding documentation page for Gobblin Distcp
*[Kafka][PR 992] Making kafka metadata read more resillient to issues with the brokers
76
+
*[Documentation][PR 993] open source wiki for config management
77
+
*[Data Management][PR 998] Merge the two LongWatermarks
78
+
*[Hive Distcp][PR 1003] Added the predicate check to skip full table diff if the existing table's registration time > source table's mod time
79
+
*[Distcp][PR 1008] ETL-4470: Implementation of http filer puler using Distcp-ng
80
+
*[Documentation][PR 1012] Document changes in PR#952
81
+
*[Documentation][PR 1013] Update documents
82
+
*[Build][PR 1023] Adding parallel test Travis VMs
83
+
*[Hive Registration][PR 1027] Added configuration to Hive client for getting credentials.
84
+
*[Hive Registration][PR 1034] Hive metastore initialization should support empty HCat uri ie default to platform defaults
85
+
*[Avro to ORC][PR 1035] Use table schema and partition schema
86
+
*[Avro to ORC][PR 1036] Hive metastore connection pool optimization, Fixes for: backward compatibility for Hive in AvroToOrc, schema parser deserialization from schema literal, database name in Hive DDL query generation, Hive metastore connection pool initialization NPE if Hcat uri is platform provided
87
+
*[Avro to ORC][PR 1037] Add sla events for avro to orc conversion
88
+
*[Hive Registration][PR 1038] Made Hive metastore connection auto returnable to connection pool after Hive dataset discovery
89
+
*[Avro to ORC][PR 1044] Made HiveAvroToOrcConverter compatible with Hive v0.13 version
90
+
*[Hive Distcp][PR 1045] Add bootstrap low watermark support for HiveSource in data management
91
+
*[Avro to ORC][PR 1046][Avro to ORC] Mark all workunits of a dataset as failed if one task fails
92
+
*[Hive Distcp][PR 1053] Add lookback days for HiveSource
93
+
*[Hive Registration][PR 1054] Converted Hive dereg / registration to post publish steps, fixed missing fileset.
94
+
*[Distcp][PR 1055] Parallelize commit rebased
95
+
*[Hive Distcp][PR 1056] Add lastDataPublishTime in hive table/partition properties
96
+
*[Runtime][PR 1060] MR launcher does not write tasks to the jobstate file in HDFS.
97
+
*[Hive Distcp][PR 1062] Enable AvroSchemaManager to read schema from Kafka schema registry
98
+
*[Hive Distcp][PR 1067] Add a backfill hive source that does not check watermarks
99
+
*[Data Management][PR 1071] Add ConvertibleHiveDataset and config store support to HiveDatasetFinder
100
+
*[Documentation][PR 1082] Updating the README and other outdated docs to encourage use of Gobblin Releases
101
+
*[Avro to ORC][PR 1087] Add support for nested and flattened orc conversion configuration
102
+
*[Kafka][PR 1091] Confluent schema registry example for kafka writer
103
+
*[Json Converter][PR 1092] Added JsonConverter to parse Json files to a format such that JsonIntermediateToAvro converter can parse
104
+
*[Avro to ORC][PR 1095] Refactored to rename HiveAvroORCQueryUtils to HiveAvroORCQueryGenerator
105
+
*[Compaction][PR 1096] Added simulate mode in Hive JDBC Connector to simulate query execution
106
+
*[Avro to ORC][PR 1097] Added limit clause to Hive query generation to enable conversion validation of sample subset
107
+
*[Avro to ORC][PR 1098] Added Azkaban job that can validate conversion result by comparing source and target Hive tables
108
+
*[Core][PR 1102] Inter strings in deserialized States to reduce memory usage.
109
+
*[Documentation][PR 1104] Added powered by section in wiki for companies using Gobblin
110
+
*[Documentation][PR 1105] Added Gobblin meetup June 2016 presentations on Talks and Tech Blogs wiki
111
+
*[Documentation][PR 1109] Updating the code contributions documentation
112
+
*[Documentation][PR 1110] Added videos from June 2016 meetup to talks-and-tech-blogs wiki page
113
+
*[Documentation][PR 1111] Made order of presentations chronological in talks-and-tech-blogs wiki page
114
+
*[Documentation][PR 1112] Update Gobblin on AWS video presentation link with right start time in playback
115
+
*[Documentation][PR 1113] Added Paypal to powered by wiki page
116
+
*[Documentation][PR 1115] Adding Sandia National Labs to Powered-By page
117
+
*[Avro to ORC][PR 1119] Changed concatenated queries string to list in Hive converter publisher
118
+
*[Avro to ORC][PR 1120] Added Hive query generation to optionally support explicit database names
119
+
*[Avro to ORC][PR 1122] Made changes to handle Hive-6129 (inverted exchange partition bug) and corresponding support for backward incompatible changes in Hive
120
+
*[Hive Distcp][PR 1126] Make distcp publisher safer: renameRecursively fails appropriately, hive registration fails if location doesn't exist.
121
+
*[Avro to ORC][PR 1127] Drop hourly partitions when daily data gets converted to ORC
122
+
*[Hive Registration][PR 1128] Added events in hive-registration
123
+
*[Avro to ORC][PR 1138] Change Hive Avro to ORC publish to use Gobblin constructs instead of Hive exchange partition query
124
+
*[Avro to ORC][PR 1139] Added support to escape the Hive nested field names when derived from destination table as raw string
125
+
*[Data Management][PR 1140] Moved WhitelistBlacklist from data-management to utility.
126
+
*[Avro to ORC][PR 1141] Renamed partitionDir.prefixLocationHint to source.dataPathIdentifier to be more consistent with naming across Hive data conversion
127
+
*[Build][PR 1142] Add gradle property withFindBugsXmlReport to enable XML FindBugs reports
128
+
*[Avro to ORC][PR 1148] Support for distcp-ng registration time in isOlderThanLookback check and minor refactoring
129
+
*[Avro to ORC][PR 1151] Changed Hive conversion validation job to use HIVE_DATASET_CONFIG_PREFIX consistent with HiveAvroToOrcSource
130
+
*[Avro to ORC][PR 1163] Fail avro to orc valiation job on at least one failure
131
+
*[Hive Registration][PR 1165] Add create time to newly registered Hive tables and partitions.
132
+
*[Hive Distcp][PR 1167] Adding options in watermarkCopyableFileFilter and some refactoring
133
+
*[Metrics][PR 1169] Gobblin metrics registers the base schemas instead of inferring them from events.
134
+
*[Avro to ORC][PR 1171] Added more SLA event metadata to Avro to Orc conversion job
135
+
*[Avro to ORC][PR 1172] Use camel case for event names
136
+
*[Avro to ORC][PR 1173] Parallalize Avro to Orc validation job
137
+
*[Utility][PR 1175] Schema files (schema.avsc) will be written with 774 permission.
138
+
*[Hive Distcp][PR 1180] Add createtime when altering a table.
139
+
*[Job Templates][PR 1183] change the key name of required.attributes
140
+
*[Job Templates][PR 1184] Fixed name of ResourceBasedTemplate.
141
+
*[Job Templates][PR 1185] Fix naming of template and template class file.
142
+
*[Avro to ORC][PR 1189] cache data modTime to reduce too many HDFS calls
143
+
*[Hive Retention][PR 1190] Add logs to hive retention. Support more DatasetFinder constructors
144
+
*[Data Management][PR 1192] Add config store uri builder for hive datasets
145
+
*[Core][PR 1204] Refactor methods between HadoopFsHelper and AvroFsHelper
146
+
*[Avro to ORC][PR 1205] AvroToorc - Implemented a per partition watermark
147
+
*[Job Launcher][PR 1206] Refactored SchedulerUtils into a new PullFileLoader that uses Config to load pull files.
148
+
*[Documentation][PR 1207] template wiki doc added
149
+
*[Kafka][PR 1210] Make topic suffix configurable for lookup in Confluent Schema Registry
150
+
*[Job Templates][PR 1211] Restored template functionality removed accidentally. Add unit test for the functionality.
151
+
*[Kafka][PR 1218] Making Kafka consumer configurable for Kafka extract
152
+
*[Runtime][PR 1220] Refactored MR mode to use GobblinInputFormat.
153
+
*[Kafka Writer][PR 1226] Making kafka writer more robust, adding tests
154
+
*[Job Templates][PR 1228] Templates use config instead of properties.
155
+
156
+
## EXTERNAL CONTRIBUTIONS
157
+
158
+
We would like to thank all our external contributors for helping improve Gobblin.
159
+
160
+
* singhd10:
161
+
-Add metadata after completion of job to a specific metadata directory (PR 980)
162
+
* shelocks:
163
+
-Fixing SOURCE_QUERYBASED_LOW_WATERMARK_BACKUP_SECS no default value (PR 1005)
164
+
* lbendig,Lorand Bendig:
165
+
-Document changes in PR#952 (PR 1012)
166
+
-Make topic suffix configurable for lookup in Confluent Schema Registry (PR 1210)
167
+
* jinhyukchang, Jinhyuk Chang:
168
+
-JDBCWriter. Bug fix on SQL statements. Bug fix on data type mapping. (PR 1050)
169
+
-HttpWriter including SalesForceRestWriter, ThrottleWriter, etc (PR 1186)
170
+
* ypopov, Eugene Popov:
171
+
-Teradata JDBC Extractor and Source (PR 1090)
172
+
* pldash
173
+
-Added JsonConverter to parse Json files to a format such that JsonIntermediateToAvro converter can parse (PR 1092)
174
+
1
175
GOBBLIN 0.7.0
2
176
-------------
3
177
@@ -48,9 +222,7 @@ GOBBLIN 0.7.0
48
222
*[Publisher][PR 657] Issue #561 - fix for BaseDataPublisher to mark WorkingState correctly
49
223
*[Core][PR 661] Change ParallelRunner.close to wait for all futures to finish
50
224
*[Core][PR 663] ParallelRunner catches exceptions correctly and has failure policies.
0 commit comments