[GSoC 2026 Aspiring Contributor] Introducing Myself – Flink Connector for Apache IoTDB 2.X Table Mode #17248

A0R0P0I7T · 2026-03-03T21:47:30Z

A0R0P0I7T
Mar 3, 2026

Hi Apache IoTDB Community,

My name is Arpit Saha, and I am an undergrad Information and Communication Technology student applying for Google Summer of Code 2026. I am writing to introduce myself, share the technical research I have conducted so far, and present my current progress toward the Flink Connector for Apache IoTDB 2.X Table Mode project. I have been in contact with mentor Haonan Hou, who has been kind enough to guide me toward the relevant resources and codebase. While I have not yet made direct contributions to the IoTDB repository, I have been investing significant effort into understanding the codebase, the existing connector limitations, and the architectural requirements of this project.

Background

I have prior open source experience with Apache Gravitino, including a merged PR and an ongoing contribution, which gave me familiarity with Java-based codebases and the Apache contribution and review workflow.

Codebase Analysis

I studied the existing Flink-IoTDB connectors in iotdb-extras along with the flink-tsfile-connector and identified the following gaps:

The flink-iotdb-connector (tree mode) uses the deprecated SourceFunction API with a hardcoded SQL string, no split-enumerator architecture, no TAG/FIELD awareness, and no fault tolerance — making it insufficient for table mode.
The flink-sql-iotdb-connector delegates execution to Flink's SQL planner via a factory/provider pattern and it deepened my understanding of how Flink's SQL/Table API layer integrates with IoTDB as a registered table source..
I also went through the flink-tsfile-connector to understand how a more complete connector is structured — how the base classes, test infrastructure, execution environments, and input formats are organized. This gave me a much clearer picture of what a well-structured connector looks like before I began designing my own.

The existing flink-iotdb-connector was built entirely around tree-mode path semantics and the deprecated SourceFunction API — it has no awareness of IoTDB 2.X's table model, TAG/FIELD structure, or modern streaming source architecture, which is precisely the gap this project addresses.

Understanding FLIP-27 and Why It Matters

Going through the FLIP-27 documentation was the most valuable part of my research. The core insight is the clean separation between the SplitEnumerator — which dynamically generates time-range splits as new IoTDB data arrives — and the SourceReader — which independently executes bounded queries per split and emits records downstream.

Splitting by timestamp boundaries maps naturally onto IoTDB's time-series model and enables true parallelism — multiple readers processing different time windows simultaneously. Fault tolerance follows directly from this design: since each split tracks its own progress independently, the framework checkpoints every reader separately and resumes from exactly where it left off after a crash — something the older SourceFunction approach simply cannot do reliably.

Current Progress

I built a preliminary prototype using the older SourceFunction API first — not as the final approach, but to validate my understanding of IoTDB session interactions, table queries, and basic execution flow in a controlled setting: https://github.com/A0R0P0I7T/Flink-IoTDB-Table-Connector

I have now begun the FLIP-27 based POC, starting with IoTDBSplit implementing SourceSplit with a unique splitId for identification, startTime/endTime boundaries as the unit of parallelism.

The immediate next steps are building the SourceReader with proper RowData emission and the dynamic SplitEnumerator that continuously generates splits as new time ranges become available. After that the focus shifts to TAG-based filter and projection pushdown, proper schema mapping, and checkpointing validation. I will have a working end-to-end POC within the next 24 hours.

I am excited about both the database internals and distributed systems aspects of this project and welcome any feedback or suggestions from the community.

Thank you for your time.

Arpit Saha

P.S. — I am really enjoying diving deep into the core fundamentals of connector architecture through this research — understanding how each piece from split design to fault tolerance fits together is proving to be one of the more rewarding learning experiences I have had, and it is making me even more motivated to build this the right way.

A0R0P0I7T · 2026-03-05T20:44:46Z

A0R0P0I7T
Mar 5, 2026
Author

Hi everyone,
Quick progress update since my introduction post.
Over the past day I have been moving beyond documentation and into actual implementation. I have completed three core components of the FLIP-27 based connector — IoTDBSplit defining time-range boundaries (splitId, startTime, endTime) as the unit of parallelism, IoTDBSplitSerializer handling serialization for network transfer and checkpointing, and IoTDBSplitEnumerator managing split creation, assignment, and recovery.
Key things I worked through:

How addSplitsBack recovers all splits from a failed reader and why it takes a List even when assigning one split at a time
Why split immutability matters for checkpoint consistency and where currentOffset actually belongs
How SplitEnumeratorContext acts as the communication bridge between the enumerator and Flink's JobManager
How snapshotState fits into Flink's barrier-based checkpointing mechanism

The biggest challenge was understanding the exact boundary between what the enumerator manages and what Flink's framework handles automatically — particularly around reader failure and split reassignment, which required careful study before the design became clear.
The next step is the IoTDBSourceReader — the most critical remaining component where actual IoTDB session interactions, RowData emission, and per-split progress tracking happen. I have back-to-back exams over the next two days so work on the IoTDBSourceReader will begin the day after tomorrow. Progress has been slow due to these MidSems but the POC will be completed by Sunday and pushed to the previously mentioned repo.
Looking forward to any feedback from the community.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC 2026 Aspiring Contributor] Introducing Myself – Flink Connector for Apache IoTDB 2.X Table Mode #17248

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

[GSoC 2026 Aspiring Contributor] Introducing Myself – Flink Connector for Apache IoTDB 2.X Table Mode #17248

Uh oh!

Uh oh!

A0R0P0I7T Mar 3, 2026

Replies: 1 comment

Uh oh!

Uh oh!

A0R0P0I7T Mar 5, 2026 Author

A0R0P0I7T
Mar 3, 2026

A0R0P0I7T
Mar 5, 2026
Author