Replies: 1 comment
-
|
Hi everyone,
The biggest challenge was understanding the exact boundary between what the enumerator manages and what Flink's framework handles automatically — particularly around reader failure and split reassignment, which required careful study before the design became clear. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Apache IoTDB Community,
My name is Arpit Saha, and I am an undergrad Information and Communication Technology student applying for Google Summer of Code 2026. I am writing to introduce myself, share the technical research I have conducted so far, and present my current progress toward the Flink Connector for Apache IoTDB 2.X Table Mode project. I have been in contact with mentor Haonan Hou, who has been kind enough to guide me toward the relevant resources and codebase. While I have not yet made direct contributions to the IoTDB repository, I have been investing significant effort into understanding the codebase, the existing connector limitations, and the architectural requirements of this project.
Background
I have prior open source experience with Apache Gravitino, including a merged PR and an ongoing contribution, which gave me familiarity with Java-based codebases and the Apache contribution and review workflow.
Codebase Analysis
I studied the existing Flink-IoTDB connectors in
iotdb-extrasalong with theflink-tsfile-connectorand identified the following gaps:flink-iotdb-connector(tree mode) uses the deprecatedSourceFunctionAPI with a hardcoded SQL string, no split-enumerator architecture, no TAG/FIELD awareness, and no fault tolerance — making it insufficient for table mode.flink-sql-iotdb-connectordelegates execution to Flink's SQL planner via a factory/provider pattern and it deepened my understanding of how Flink's SQL/Table API layer integrates with IoTDB as a registered table source..flink-tsfile-connectorto understand how a more complete connector is structured — how the base classes, test infrastructure, execution environments, and input formats are organized. This gave me a much clearer picture of what a well-structured connector looks like before I began designing my own.The existing
flink-iotdb-connectorwas built entirely around tree-mode path semantics and the deprecatedSourceFunctionAPI — it has no awareness of IoTDB 2.X's table model, TAG/FIELD structure, or modern streaming source architecture, which is precisely the gap this project addresses.Understanding FLIP-27 and Why It Matters
Going through the FLIP-27 documentation was the most valuable part of my research. The core insight is the clean separation between the
SplitEnumerator— which dynamically generates time-range splits as new IoTDB data arrives — and theSourceReader— which independently executes bounded queries per split and emits records downstream.Splitting by timestamp boundaries maps naturally onto IoTDB's time-series model and enables true parallelism — multiple readers processing different time windows simultaneously. Fault tolerance follows directly from this design: since each split tracks its own progress independently, the framework checkpoints every reader separately and resumes from exactly where it left off after a crash — something the older SourceFunction approach simply cannot do reliably.
Current Progress
I built a preliminary prototype using the older
SourceFunctionAPI first — not as the final approach, but to validate my understanding of IoTDB session interactions, table queries, and basic execution flow in a controlled setting: https://github.com/A0R0P0I7T/Flink-IoTDB-Table-ConnectorI have now begun the FLIP-27 based POC, starting with
IoTDBSplitimplementingSourceSplitwith a uniquesplitIdfor identification,startTime/endTimeboundaries as the unit of parallelism.The immediate next steps are building the
SourceReaderwith properRowDataemission and the dynamicSplitEnumeratorthat continuously generates splits as new time ranges become available. After that the focus shifts to TAG-based filter and projection pushdown, proper schema mapping, and checkpointing validation. I will have a working end-to-end POC within the next 24 hours.I am excited about both the database internals and distributed systems aspects of this project and welcome any feedback or suggestions from the community.
Thank you for your time.
Arpit Saha
P.S. — I am really enjoying diving deep into the core fundamentals of connector architecture through this research — understanding how each piece from split design to fault tolerance fits together is proving to be one of the more rewarding learning experiences I have had, and it is making me even more motivated to build this the right way.
Beta Was this translation helpful? Give feedback.
All reactions