Skip to content

Add Flux cluster support#700

Open
izzet wants to merge 2 commits intodask:mainfrom
izzet:feature/flux-cluster
Open

Add Flux cluster support#700
izzet wants to merge 2 commits intodask:mainfrom
izzet:feature/flux-cluster

Conversation

@izzet
Copy link

@izzet izzet commented Mar 7, 2026

This pull request adds support for the Flux resource manager to the dask_jobqueue package, enabling users to launch Dask clusters on systems managed by Flux. The main changes include the introduction of a new FluxCluster and FluxJob implementation, updates to configuration files, and comprehensive tests for the new functionality.

Flux resource manager integration

  • Added a new module flux.py implementing FluxCluster and FluxJob, including job script generation, walltime normalization, and handling of Flux-specific job directives.
  • Updated the package initialization in __init__.py to expose FluxCluster for import.

Configuration updates

  • Added a new flux section to jobqueue.yaml with relevant options for Flux jobs, such as queue, account, walltime, job_nodes, and job directives.

Testing

  • Introduced a comprehensive test suite in test_flux.py covering job script generation, header construction, walltime normalization, directive skipping, and configuration handling for Flux jobs and clusters.

Implement a Flux-backed JobQueueCluster with script generation and tests so Flux jobs can launch Dask workers, including walltime normalization for Flux batch directives.
@ocaisa
Copy link
Member

ocaisa commented Mar 7, 2026

@vsoch you had a previous go at this in #605, your eyes on this would be helpful for a review I expect

@vsoch
Copy link
Contributor

vsoch commented Mar 7, 2026

I'm headed to bed, but a quick question I have is how memory is being specified here? Flux does not have any flags for controlling / requesting memory.

def job_script(self):
worker_command = self._command_template
if self.job_nodes > 1:
worker_command = "flux run -N {nodes} -n {tasks} {command}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the standard for dask to call workload managers on the command line vs. using the Python SDK?

@izzet
Copy link
Author

izzet commented Mar 7, 2026

Hi @vsoch, good question. In the Flux implementation, memory is not translated into a Flux scheduler resource request because Flux does not provide a native memory request flag analogous to what SLURM/PBS expose, as you noted.

The parameter is still required by the shared JobQueueCluster/Job base classes. In dask-jobqueue, memory is used to compute the Dask worker --memory-limit, so in the Flux case it currently represents the Dask-side memory budget per submitted job rather than a Flux-enforced allocation.

I’ve updated the FluxCluster docstring and the docs to make this explicit, since otherwise it’s easy to assume it behaves like the memory options in SLURM/PBS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants