Skip to content

Conversation

@loserwang1024
Copy link
Owner

No description provided.

JingsongLi and others added 30 commits April 6, 2023 16:33
…y and SinkFormatFactory


We improved the interfaces with the following changes:
1. Have a common interface DynamicTableSource.Context, and make Context of ScanTableSource and LookupTableSource extend it, and rename them to LookupContext and ScanContext
2. Change parameter of ScanFormat.createScanFormat from ScanTableSource.Context to DynamicTableSource.Context
3. Rename ScanFormat.createScanFormat to DecodingFormat#createRuntimeDecoder()
4. Rename SinkFormat.createSinkFormat to EncodingFormat#createRuntimeEncoder()
5. Rename ScanFormatFactory to DecodingFormatFactory
6. Rename SinkFormatFactory to EncodingFormatFactory

This closes #12320
… format


The current timestamp format in JSON format is not SQL standard which uses RFC-3339. This commit changes the default behavior to parse/generate timestamp using SQL standard. Besides, it introduces an option "json.timestamp-format.standard" to have the ability to fallback to ISO standard. 

This closes #12661
…d as TypeInformation

Introduces a WrapperTypeInfo that can replace most (if not all) TypeInformation classes
in the Blink planner. It is backed by logical types and uses internal serializers.

This closes #12852.
TypeInformation is a legacy class for the sole purpose of creating a
TypeSerializer. Instances of TypeInformation are not required in the
table ecosystem but sometimes enforced by interfaces of other modules
(such as org.apache.flink.api.dag.Transformation). Therefore, we
introduce InternalTypeInfo which acts as an adapter whenever type
information is required. Instances of InternalTypeInfo should only
be created for passing it to interfaces that require type information.
The class should not be used as a replacement for a LogicalType.
Information such as the arity of a row type, field types, field names, etc.
should be derived from the LogicalType directly.

This closes #12900.
…'s IDENTITY config is not FULL

This commit add documentation for this case and throws a guide message in the exception.

This closes #13019
…essage is received

Just skip the tombstone messages

This closes #13019
This commit upgrades the default version of avro that flink-avro will use. It should be possible to downgrade the avro version in a user job as the binary format is compatible and we do not expose any dependencies on avro in the API.

Additionally this commit fixes handling of logical types: time-micros and timestamp-micros as well as interpretation of timestamp-millis in the AvroRowDataDeserializationSchema.
…c database and table for canal-json format

This closes  (#13294)
…nformation in sources and sinks

This it is not a compatible change. But given that those interfaces are still relatively new and
not many people have changed to the new sources/sinks. We should do this change now or never
and avoid @SuppressWarning in almost all implementations.
…ormat deserialization

Never modify and prefix the field name, instead, we now use the {rowName}_{fieldName}
as the nested row type name because Avro schema does not allow same name row type
with different schema.
… deserialization

* Fix the TIME schema precision as 3
* Fix the nullability of type: TIMESTAMP_WITHOUT_TIME_ZONE, DATE, TIME_WITHOUT_TIME_ZONE,
  DECIMAL, MAP, ARRAY
* The table schema row type should be always non-nullable
AHeise and others added 28 commits October 9, 2024 11:54
Make assertj print full stacktraces when encountering unexpected exceptions.
Job may be at various states including race conditions during shutdown. Ideally, the framework would provide idempotence but we can workaround that by ignoring specific exceptions.
For non-transactional producers, a notifyCheckpointCompleted after finishOperator will set the transaction inside the 2PCSinkFunction to null, such that on close, the producer is leaked. Since transactional producer stores the transactions in pendingTransactions before that, we just need to fix the cases where we don't preCommit/commit. The easiest solution is to actually close the producer on finishOperator - no new record can arrive.
Add leak check in all relevant tests.
The test tried to assert on byte counts which are written async. Commit adds flushing and establishes a baseline so that metadata request don't interfere with assertions.
Also
- adds the migration support tests up to 1.20.
- bumps Kafka-client to 3.6.2
Add more test coverage for unchained cases and separate the behavioral components from data capture and assertions. Also reduces the need to convey information with fields.
To test recovery in future PRs, it's important to decompose the #createWriter methods into common cases and advanced cases that may require some additional setup.
Since Java 20, Thread.stop fails, so we just need to remember old leaks to avoid failing subsequent tests.
Move FlinkKafkaInternalProducer and TransactionalIdFactory to internal. All other classes are potentially leaked through the generics and signatures of the KafkaSink(Builder).
Split the easy case of non-transactional writer from the transactional writer to simplify reasoning about the state (e.g. which fields are used when).
Backchannel provides a way for the committer to communicate to the writer even in (simple) non-chained cases thanks to colocation contraints. It's the same trick that is employed in statefun. A backchannel is stateless, however, because its state can be entirely derived from committer state. Thus, it's much easier to handle than a statefun backchannel.

Backchannel will be used to communicate the committed transactions to the writer in future commits.
Add first class producer pool that self-manages all resources and allows to recycle producers by transactional ids.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.