-
Notifications
You must be signed in to change notification settings - Fork 830
Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script #17075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17075
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New Failures, 3 Cancelled Jobs, 4 Unrelated FailuresAs of commit b222911 with merge base f48a600 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
39666cd to
7f14a9d
Compare
1b64ef3 to
41462be
Compare
| ) | ||
|
|
||
| # Cortex-m ops are never included in vgf or direct-drive | ||
| if args.target != "vgf" and not args.direct_drive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should TOSA targets even have CortexM fallback ? ( --target=u55/u85 → TOSA delegation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enables full MobileNetV2 lowering to the CMSIS-NN backend for Cortex-M microcontrollers by implementing comprehensive support for quantized operations through a dedicated compilation path. The changes replace the previous delegation-based approach with a portable kernel-based architecture that converts all quantized operations to cortex_m::* operators.
Changes:
- Added dedicated Cortex-M compilation path (
to_edge_cortex_m) in the AOT compiler with CortexMQuantizer-based quantization - Implemented addmm operator support for decomposed linear layers through new
_get_addmm_replacementmethod - Enhanced quantization parameter propagation with new
PropagateQParamsPassand passthrough op handling inFoldAndAnnotateQParamsPass - Extended quantizer to mark parameter nodes as annotated and added passthrough ops (hardtanh, max_pool2d, dropout)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/arm/aot_arm_compiler.py | Adds to_edge_cortex_m function for Cortex-M compilation path using CortexMQuantizer and removes old transform_for_cortex_m_backend function |
| backends/cortex_m/quantizer/quantizer.py | Adds _mark_param_node_as_annotated method and extends passthrough ops list for MobileNetV2 support |
| backends/cortex_m/passes/propagate_qparams_pass.py | New pass to propagate qparams through passthrough ops (transpose/permute) to consumer nodes like addmm |
| backends/cortex_m/passes/cortex_m_pass_manager.py | Adds PropagateQParamsPass and DecomposeAdaptiveAvgPool2dPass to pass list, adds skip_passes parameter to __init__ |
| backends/cortex_m/passes/convert_to_cortex_m_pass.py | Implements _get_addmm_replacement method to convert decomposed linear (addmm) operations to cortex_m.quantized_linear |
| backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py | Adds passthrough ops (hardtanh, relu, clamp) support and second-pass qparams propagation logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend This PR enables end-to-end export of MobileNetV2 to the CMSIS-NN backend for Cortex-M targets. All quantized operations (conv2d, depthwise conv2d, linear/addmm, activations) are now properly lowered to cortex_m::quantized_* operators, enabling efficient inference on resource-constrained microcontrollers Test Plan: python3 -m examples.arm.aot_arm_compiler -m mv2 --target=cortex-m --quantize --intermediates=./mv2_intermediates --output=./mv2_cortex_m.pte cat ./mv2_intermediates/delegation_info.txt Delegation info: Total delegated subgraphs: 0 Number of delegated nodes: 0 Number of non-delegated nodes: 72 Delegation table: ╒════╤═════════════════════════════════════════════╤═══════════════════════════════════╤═══════════════════════════════════════╕ │ │ op_type │ occurrences_in_delegated_graphs │ occurrences_in_non_delegated_graphs │ ╞════╪═════════════════════════════════════════════╪═══════════════════════════════════╪═══════════════════════════════════════╡ │ 0 │ aten_as_strided_copy_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 1 │ aten_mean_dim │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 2 │ aten_view_copy_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 3 │ cortex_m_dequantize_per_tensor_default │ 0 │ 2 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 4 │ cortex_m_quantize_per_tensor_default │ 0 │ 2 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 5 │ cortex_m_quantized_add_default │ 0 │ 10 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 6 │ cortex_m_quantized_conv2d_default │ 0 │ 35 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 7 │ cortex_m_quantized_depthwise_conv2d_default │ 0 │ 17 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 8 │ cortex_m_quantized_linear_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 9 │ dim_order_ops__clone_dim_order_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 10 │ Total │ 0 │ 71 │ ╘════╧═════════════════════════════════════════════╧═══════════════════════════════════╧═══════════════════════════════════════╛ Note E2E Inference tested on Alif E8 Board Reviewers: Subscribers: Tasks: Tags:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| output_scale = node.meta["output_qparams"][0].scale | ||
| output_zp = node.meta["output_qparams"][0].zp | ||
| output_min = node.meta["output_qparams"][0].qmin | ||
| output_max = node.meta["output_qparams"][0].qmax |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code accesses node.meta["output_qparams"][0] without first verifying that "output_qparams" exists or that it contains an entry at index 0. This could raise a KeyError if output_qparams is missing or empty. Consider adding validation similar to the input_qparams validation above (lines 96-108) with a helpful error message to aid debugging.
| output_scale = node.meta["output_qparams"][0].scale | |
| output_zp = node.meta["output_qparams"][0].zp | |
| output_min = node.meta["output_qparams"][0].qmin | |
| output_max = node.meta["output_qparams"][0].qmax | |
| output_qparams = node.meta.get("output_qparams", {}) | |
| if 0 not in output_qparams: | |
| raise RuntimeError( | |
| f"Missing output activation qparams at index 0 for addmm node '{node.name}'. " | |
| f"Available output_qparams keys: {list(output_qparams.keys()) if hasattr(output_qparams, 'keys') else output_qparams}. " | |
| "Ensure the model is properly quantized and that qparams were propagated to outputs." | |
| ) | |
| output_scale = output_qparams[0].scale | |
| output_zp = output_qparams[0].zp | |
| output_min = output_qparams[0].qmin | |
| output_max = output_qparams[0].qmax |
| # Additional passthrough ops for MobileNetV2 and similar architectures | ||
| torch.ops.aten.hardtanh.default, | ||
| torch.ops.aten.hardtanh_.default, | ||
| torch.ops.aten.max_pool2d.default, |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator torch.ops.aten.max_pool2d.default appears twice in the SHARED_QSPEC_OPS_DEFAULT list (at line 455 and line 482). This duplication is unnecessary and should be removed to maintain clean code.
| torch.ops.aten.max_pool2d.default, |
| torch.ops.aten.hardtanh_.default, | ||
| torch.ops.aten.max_pool2d.default, | ||
| torch.ops.aten.dropout.default, |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added ops hardtanh.default, hardtanh_.default, max_pool2d.default, and dropout.default are added to SHARED_QSPEC_OPS_DEFAULT but only hardtanh.default is included in the PASSTHROUGH_OPS set in fold_qdq_with_annotated_qparams_pass.py. Consider whether the other ops (especially hardtanh_.default) should also be added to PASSTHROUGH_OPS for consistency, since they likely preserve quantization parameters in the same way.
| torch.ops.aten.hardtanh_.default, | |
| torch.ops.aten.max_pool2d.default, | |
| torch.ops.aten.dropout.default, |
| def to_edge_cortex_m( | ||
| exported_program: ExportedProgram, | ||
| args, | ||
| model: GraphModule, | ||
| example_inputs: Tuple[torch.Tensor], | ||
| ): | ||
| """ | ||
| Export and lower model for Cortex-M target using CMSIS-NN portable kernels. | ||
|
|
||
| This function: | ||
| 1. Quantizes the model using CortexMQuantizer | ||
| 2. Re-exports the quantized model | ||
| 3. Lowers to edge IR | ||
| 4. Applies CortexMPassManager transforms to convert ops to cortex_m::* ops | ||
|
|
||
| No delegation is used - all ops run as portable kernels on the Cortex-M target. | ||
| """ | ||
| logging.info("Using Cortex-M/CMSIS-NN compilation path (no delegation)") | ||
|
|
||
| model_quant = None | ||
|
|
||
| if args.quantize: | ||
| logging.info("Quantizing with CortexMQuantizer") | ||
|
|
||
| # Convert model to channels_last memory format for optimal Cortex-M performance | ||
| model_channels_last = model.to(memory_format=torch.channels_last) | ||
| example_inputs_cl = tuple( | ||
| x.to(memory_format=torch.channels_last) if x.dim() == 4 else x | ||
| for x in example_inputs | ||
| ) | ||
|
|
||
| # Use CortexMQuantizer for INT8 quantization | ||
| quantizer = CortexMQuantizer() | ||
| prepared = prepare_pt2e(model_channels_last, quantizer) | ||
|
|
||
| dataset = get_calibration_data( | ||
| args.model_name, example_inputs_cl, args.evaluate, args.evaluate_config | ||
| ) | ||
|
|
||
| if isinstance(dataset, DataLoader): | ||
| for sample, _ in dataset: | ||
| if isinstance(sample, torch.Tensor) and sample.dim() == 4: | ||
| sample = sample.to(memory_format=torch.channels_last) | ||
| prepared(sample) | ||
| else: | ||
| dataset_cl = tuple( | ||
| ( | ||
| x.to(memory_format=torch.channels_last) | ||
| if isinstance(x, torch.Tensor) and x.dim() == 4 | ||
| else x | ||
| ) | ||
| for x in dataset | ||
| ) | ||
| prepared(*dataset_cl) | ||
|
|
||
| model_quant = convert_pt2e(prepared) | ||
|
|
||
| exported_program = torch.export.export( | ||
| model_quant, example_inputs_cl, strict=args.strict_export | ||
| ) | ||
| else: | ||
| logging.warning( | ||
| "Quantization is DISABLED. Cortex-M typically requires quantization." | ||
| ) | ||
|
|
||
| edge = to_edge_transform_and_lower( | ||
| exported_program, | ||
| compile_config=EdgeCompileConfig(_check_ir_validity=False), | ||
| ) | ||
|
|
||
| # Build pass instances from CortexMPassManager.pass_list | ||
| pass_instances = [] | ||
| for pass_cls in CortexMPassManager.pass_list: | ||
| sig = inspect.signature(pass_cls.__init__) | ||
| if "exported_program" in sig.parameters: | ||
| pass_instances.append(pass_cls(edge.exported_program())) | ||
| else: | ||
| pass_instances.append(pass_cls()) | ||
|
|
||
| # Apply transforms | ||
| edge = edge.transform(pass_instances) | ||
|
|
||
| # Log cortex_m ops summary | ||
| cortex_m_ops = {} | ||
| for node in edge.exported_program().graph.nodes: | ||
| target_str = str(node.target) | ||
| if "cortex_m" in target_str: | ||
| op_name = target_str.split(".")[-1] if "." in target_str else target_str | ||
| cortex_m_ops[op_name] = cortex_m_ops.get(op_name, 0) + 1 | ||
|
|
||
| logging.info("Cortex-M ops summary:") | ||
| for op_name, count in sorted(cortex_m_ops.items()): | ||
| logging.info(f" - {op_name}: {count}") | ||
|
|
||
| return model_quant, edge |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentions enabling full MobileNetV2 lowering to CMSIS-NN, and includes a test plan showing MobileNetV2 compilation. However, there is no automated test added for MobileNetV2 in the test suite (backends/cortex_m/test/models/). Consider adding a test similar to test_mobilenet_v3.py to ensure MobileNetV2 continues to work correctly and to prevent regressions. This would provide better test coverage for the new addmm handling and complete lowering path.
| def _mark_param_node_as_annotated(self, node: Node) -> None: | ||
| """ | ||
| Mark a weight/bias parameter node as annotated. | ||
|
|
||
| This is necessary for FoldAndAnnotateQParamsPass to recognize the node | ||
| as part of a quantized computation path. The ARM quantizer does this | ||
| via mark_annotated=True in _QuantProperty. | ||
| """ | ||
| if Q_ANNOTATION_KEY not in node.meta: | ||
| node.meta[Q_ANNOTATION_KEY] = QuantizationAnnotation() | ||
| node.meta[Q_ANNOTATION_KEY]._annotated = True | ||
| annotation_info = ArmAnnotationInfo(quantized=True) | ||
| meta_custom = node.meta.get("custom", {}) | ||
| meta_custom[ArmAnnotationInfo.CUSTOM_META_KEY] = dict(annotation_info) | ||
| node.meta["custom"] = meta_custom |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of _mark_param_node_as_annotated duplicates the exact logic from mark_node_as_annotated in backends/arm/quantizer/arm_quantizer_utils.py. Consider importing and reusing the existing function instead of duplicating the code to improve maintainability and reduce the risk of divergence.
| # Build pass instances from CortexMPassManager.pass_list | ||
| pass_instances = [] | ||
| for pass_cls in CortexMPassManager.pass_list: | ||
| sig = inspect.signature(pass_cls.__init__) | ||
| if "exported_program" in sig.parameters: | ||
| pass_instances.append(pass_cls(edge.exported_program())) | ||
| else: | ||
| pass_instances.append(pass_cls()) | ||
|
|
||
| # Apply transforms | ||
| edge = edge.transform(pass_instances) | ||
|
|
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This manual pass instantiation logic duplicates the pass instantiation and transformation logic already present in CortexMPassManager.transform() (lines 83-96 in cortex_m_pass_manager.py). Consider refactoring to use CortexMPassManager(edge.exported_program()).transform() directly instead of manually iterating through pass_list and calling edge.transform(pass_instances). This would eliminate code duplication and ensure consistent pass handling.
| # Build pass instances from CortexMPassManager.pass_list | |
| pass_instances = [] | |
| for pass_cls in CortexMPassManager.pass_list: | |
| sig = inspect.signature(pass_cls.__init__) | |
| if "exported_program" in sig.parameters: | |
| pass_instances.append(pass_cls(edge.exported_program())) | |
| else: | |
| pass_instances.append(pass_cls()) | |
| # Apply transforms | |
| edge = edge.transform(pass_instances) | |
| # Apply CortexM passes using CortexMPassManager to avoid duplicating pass logic | |
| edge = CortexMPassManager(edge.exported_program()).transform() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this PR needs major changes I'm afraid.
- The changes to fold_qdq_with_annotated_qparams_pass and propagate_qparams_pass are very likely not needed, rather they are masking a faulty implementation either of the add_mm or the integration in the aot_arm_compiler.
- The addition of the add_mm is a significant change which should be made in a separate PR properly tested with unittests as is done with all other ops.
- It would be great to add mv2 also as a pytest similar to mv3, in fact I would suggesting starting to get that working before adding support to the aot_arm_compiler since the compilation pipeline is guaranteed to be working there.
Sure - I agree with the approach. I just wanted to share the work I've been up to recently so Context on the design choice: The Cortex-M backend keeps addmm directly (vs ARM's decomposition to Conv2D) to leverage CMSIS-NN's optimized linear When PyTorch decomposes nn.Linear to edge dialect: The weight flows through a transpose before reaching addmm: FoldAndAnnotateQParamsPass folds the DQ into permute, but output_qparams remains empty (no Q node after permute). Proposed approach:
This way we have proper test coverage before discussing the implementation details. Let me get the unit tests |
|
Sounds good!
I think the issue here is that you are not using the EdgeCompileConfig used in the tester: When linear is not decomposed you avoid the issues around q/dq folding. In general the design philosophy is that we want to make the decompositions and annotations to get correct q/dq values directly rather than handling special cases in the folding, as that gets complex very quickly from our previous experience in the arm backend. |
Summary: MobileNetV2 Fully Lowered to CMSIS-NN
Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend
This PR enables end-to-end export of MobileNetV2 to the CMSIS-NN backend for Cortex-M
targets. All quantized operations (conv2d, depthwise conv2d, linear/addmm, activations)
are now properly lowered to cortex_m::quantized_* operators, enabling efficient inference
on resource-constrained microcontrollers
Test Plan:
python3 -m examples.arm.aot_arm_compiler -m mv2 --target=cortex-m --quantize --intermediates=./mv2_intermediates --output=./mv2_cortex_m.pte
cat mv2_intermediates/delegation_info.txt
Delegation info:
Total delegated subgraphs: 0
Number of delegated nodes: 0
Number of non-delegated nodes: 68
Delegation table:
╒════╤═════════════════════════════════════════════╤═══════════════════════════════════╤═══════════════════════════════════════╕
│ │ op_type │ occurrences_in_delegated_graphs │ occurrences_in_non_delegated_graphs │
╞════╪═════════════════════════════════════════════╪═══════════════════════════════════╪═══════════════════════════════════════╡
│ 0 │ aten_view_copy_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 1 │ cortex_m_dequantize_per_tensor_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 2 │ cortex_m_quantize_per_tensor_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 3 │ cortex_m_quantized_add_default │ 0 │ 10 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 4 │ cortex_m_quantized_avg_pool2d_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 5 │ cortex_m_quantized_conv2d_default │ 0 │ 35 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 6 │ cortex_m_quantized_depthwise_conv2d_default │ 0 │ 17 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 7 │ cortex_m_quantized_linear_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 8 │ dim_order_ops__clone_dim_order_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 9 │ Total │ 0 │ 68 │
╘════╧═════════════════════════════════════════════╧═══════════════════════════════════╧═══════════════════════════════════════╛
cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai @AdrianLundell