Kudu
Kudu 源连接器
支持 Kudu 版本
- 1.11.1/1.12.0/1.13.0/1.14.0/1.15.0
支持这些引擎
Spark
Flink
SeaTunnel Zeta
关键特性
描述
用于从 Kudu 读取数据。
测试的 kudu 版本是 1.11.1。
数据类型映射
| Kudu 数据类型 | SeaTunnel 数据类型 |
|---|---|
| BOOL | BOOLEAN |
| INT8 INT16 INT32 | INT |
| INT64 | BIGINT |
| DECIMAL | DECIMAL |
| FLOAT | FLOAT |
| DOUBLE | DOUBLE |
| STRING | STRING |
| UNIXTIME_MICROS | TIMESTAMP |
| BINARY | BYTES |
源选项
| 参数名 | 类型 | 必须 | 默认值 | 描述 |
|---|---|---|---|---|
| kudu_masters | String | 是 | - | Kudu master 地址。用 ',' 分隔,例如 '192.168.88.110:7051'。 |
| table_name | String | 是 | - | Kudu 表的名称。 |
| client_worker_count | Int | 否 | 2 * Runtime.getRuntime().availableProcessors() | Kudu worker 数量。默认值是当前 CPU 核心数的两倍。 |
| client_default_operation_timeout_ms | Long | 否 | 30000 | Kudu 普通操作超时时间。 |
| client_default_admin_operation_timeout_ms | Long | 否 | 30000 | Kudu 管理操作超时时间。 |
| enable_kerberos | Bool | 否 | false | Kerberos principal 启用。 |
| kerberos_principal | String | 否 | - | Kerberos principal。注意所有 zeta 节点都需要有此文件。 |
| kerberos_keytab | String | 否 | - | Kerberos keytab。注意所有 zeta 节点都需要有此文件。 |
| kerberos_krb5conf | String | 否 | - | Kerberos krb5 conf。注意所有 zeta 节点都需要有此文件。 |
| scan_token_query_timeout | Long | 否 | 30000 | 连接扫描令牌的超时时间。如果未设置,将与 operationTimeout 相同。 |
| scan_token_batch_size_bytes | Int | 否 | 1024 * 1024 | Kudu 扫描字节数。一次读取的最大字节数,默认为 1MB。 |
| filter | String | 否 | - | Kudu 扫描过滤表达式,例如 id > 100 AND id < 200。 |
| schema | Map | 否 | 1024 * 1024 | SeaTunnel Schema。 |
| table_list | Array | 否 | - | 要读取的表列表。您可以使用此配置代替 table_path,例如:table_list = [{ table_name = "kudu_source_table_1"},{ table_name = "kudu_source_table_2"}] |
| common-options | 否 | - | 源插件通用参数,请参考 源通用选项 详见。 |
任务示例
简单
以下示例针对名为 "kudu_source_table" 的 Kudu 表,目标是在控制台打印此表中的数据并写入 kudu 表 "kudu_sink_table"
# 定义运行时环境
env {
parallelism = 2
job.mode = "BATCH"
}
source {
# 这是一个示例源插件 **仅用于测试和演示源插件功能**
kudu {
kudu_masters = "kudu-master:7051"
table_name = "kudu_source_table"
plugin_output = "kudu"
enable_kerberos = true
kerberos_principal = "xx@xx.COM"
kerberos_keytab = "xx.keytab"
}
}
transform {
}
sink {
console {
plugin_input = "kudu"
}
kudu {
plugin_input = "kudu"
kudu_masters = "kudu-master:7051"
table_name = "kudu_sink_table"
enable_kerberos = true
kerberos_principal = "xx@xx.COM"
kerberos_keytab = "xx.keytab"
}
}
多表
env {
# 您可以在此处设置引擎配置
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 5000
}
source {
# 这是一个示例源插件 **仅用于测试和演示源插件功能**
kudu{
kudu_masters = "kudu-master:7051"
table_list = [
{
table_name = "kudu_source_table_1"
},{
table_name = "kudu_source_table_2"
}
]
plugin_output = "kudu"
}
}
transform {
}
sink {
Assert {
rules {
table-names = ["kudu_source_table_1", "kudu_source_table_2"]
}
}
}
变更日志
Change Log
| Change | Commit | Version |
|---|---|---|
| [Chore] fix typos filed -> field (#9757) | https://github.com/apache/seatunnel/commit/e3e1c67d29 | 2.3.12 |
| [Improve][Core] Update apache common to apache common lang3 (#9694) | https://github.com/apache/seatunnel/commit/6e5737c1ec | 2.3.12 |
| [Improve][API] Optimize the enumerator API semantics and reduce lock calls at the connector level (#9671) | https://github.com/apache/seatunnel/commit/9212a77140 | 2.3.12 |
| [Feature][connector-kudu] implement the filter (#9405) | https://github.com/apache/seatunnel/commit/2714dd1105 | 2.3.12 |
| [Feature][Checkpoint] Add check script for source/sink state class serialVersionUID missing (#9118) | https://github.com/apache/seatunnel/commit/4f5adeb1c7 | 2.3.11 |
| [Improve] kudu options (#9162) | https://github.com/apache/seatunnel/commit/e7edafdbac | 2.3.11 |
| [Improve] restruct connector common options (#8634) | https://github.com/apache/seatunnel/commit/f3499a6eeb | 2.3.10 |
| [Improve][Transform] Rename sql transform table name from 'fake' to 'dual' (#8298) | https://github.com/apache/seatunnel/commit/e6169684fb | 2.3.9 |
| [Improve][dist]add shade check rule (#8136) | https://github.com/apache/seatunnel/commit/51ef800016 | 2.3.9 |
| [Improve][API] Unified tables_configs and table_list (#8100) | https://github.com/apache/seatunnel/commit/84c0b8d660 | 2.3.9 |
[Feature][Core] Rename result_table_name/source_table_name to plugin_input/plugin_output (#8072) | https://github.com/apache/seatunnel/commit/c7bbd322db | 2.3.9 |
| [Feature][Restapi] Allow metrics information to be associated to logical plan nodes (#7786) | https://github.com/apache/seatunnel/commit/6b7c53d03c | 2.3.9 |
| [Improve][Connector] Add multi-table sink option check (#7360) | https://github.com/apache/seatunnel/commit/2489f6446b | 2.3.7 |
| [Feature][Core] Support using upstream table placeholders in sink options and auto replacement (#7131) | https://github.com/apache/seatunnel/commit/c4ca74122c | 2.3.6 |
| correct the typo of kudu kerberos config (#6905) | https://github.com/apache/seatunnel/commit/fcb8554972 | 2.3.6 |
| [Fix][KuduCatalogFactory]: Fix KuduCatalogFactory.optionRule() will throw an Exception (#6787) | https://github.com/apache/seatunnel/commit/45a4e1532d | 2.3.6 |
| [Feature][Engine] Unify job env parameters (#6003) | https://github.com/apache/seatunnel/commit/2410ab38f0 | 2.3.4 |
| [Feature][Connector-V2] Support multi-table sink feature for kudu (#5951) | https://github.com/apache/seatunnel/commit/82460c0bf0 | 2.3.4 |
| [Feature] Add unsupported datatype check for all catalog (#5890) | https://github.com/apache/seatunnel/commit/b9791285a0 | 2.3.4 |
| [Feature][Kudu] Support multi-table source read (#5878) | https://github.com/apache/seatunnel/commit/8d9a0b7d11 | 2.3.4 |
| [Improve][Common] Introduce new error define rule (#5793) | https://github.com/apache/seatunnel/commit/9d1b2582b2 | 2.3.4 |
| [Feature][Connector-V2] Support TableSourceFactory/TableSinkFactory on kudu (#5789) | https://github.com/apache/seatunnel/commit/10e791d60a | 2.3.4 |
[Improve] Remove use SeaTunnelSink::getConsumedType method and mark it as deprecated (#5755) | https://github.com/apache/seatunnel/commit/8de7408100 | 2.3.4 |
| [Feature][Kudu] Refactor Kudu functionality and Sink support CDC data. (#5437) | https://github.com/apache/seatunnel/commit/22110eb7b3 | 2.3.4 |
| [Improve][build] Give the maven module a human readable name (#4114) | https://github.com/apache/seatunnel/commit/d7cd601051 | 2.3.1 |
| [Improve][Project] Code format with spotless plugin. (#4101) | https://github.com/apache/seatunnel/commit/a2ab166561 | 2.3.1 |
| [Hotfix][Connector-V2] Fix connector source snapshot state NPE (#4027) | https://github.com/apache/seatunnel/commit/e39c4988cc | 2.3.1 |
| [Feature][Connector] add get source method to all source connector (#3846) | https://github.com/apache/seatunnel/commit/417178fb84 | 2.3.1 |
| [Feature][API & Connector & Doc] add parallelism and column projection interface (#3829) | https://github.com/apache/seatunnel/commit/b9164b8ba1 | 2.3.1 |
| [Hotfix][OptionRule] Fix option rule about all connectors (#3592) | https://github.com/apache/seatunnel/commit/226dc6a119 | 2.3.0 |
| [Improve][Connector-V2] Bad smell ToArrayCallWithZeroLengthArrayArgument: (#3577) | https://github.com/apache/seatunnel/commit/cc448d98c4 | 2.3.0 |
| [Improve][Connector-V2][Kudu] Unified exception for kudu source & sink connector (#3564) | https://github.com/apache/seatunnel/commit/273418ddc9 | 2.3.0 |
| [Connector][Dependency] Add Miss Dependency Cassandra And Change Kudu Plugin Name (#3432) | https://github.com/apache/seatunnel/commit/6ac6a0a0cd | 2.3.0 |
| [Feature][Connector V2] expose configurable options in Kudu (#3365) | https://github.com/apache/seatunnel/commit/c422210e2c | 2.3.0 |
| [Feature][Core][Connector-V2] Unified The way of setting JobName (#2908) | https://github.com/apache/seatunnel/commit/bf2c97484b | 2.3.0-beta |
| remove duplicate ExceptionUtil class (#3037) | https://github.com/apache/seatunnel/commit/c9dc7c50c2 | 2.3.0-beta |
| [Improve][all] change Log to @Slf4j (#3001) | https://github.com/apache/seatunnel/commit/6016100f12 | 2.3.0-beta |
| [Improve][Connector-V2]Kudu Sink Connector Support to upsert row | https://github.com/apache/seatunnel/commit/1ece805ab1 | 2.3.0-beta |
| [DEV][Api] Replace SeaTunnelContext with JobContext and remove singleton pattern (#2706) | https://github.com/apache/seatunnel/commit/cbf82f755c | 2.2.0-beta |
| [#2606]Dependency management split (#2630) | https://github.com/apache/seatunnel/commit/fc047be69b | 2.2.0-beta |
| [Connector-V2] Add Kudu source and sink connector (#2254) | https://github.com/apache/seatunnel/commit/0483cbc2df | 2.2.0-beta |