Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script #17075

psiddh · 2026-01-30T22:46:12Z

Summary: MobileNetV2 Fully Lowered to CMSIS-NN
Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend

This PR enables end-to-end export of MobileNetV2 to the CMSIS-NN backend for Cortex-M
targets. All quantized operations (conv2d, depthwise conv2d, linear/addmm, activations)
are now properly lowered to cortex_m::quantized_* operators, enabling efficient inference
on resource-constrained microcontrollers

Test Plan:
python3 -m examples.arm.aot_arm_compiler -m mv2 --target=cortex-m --quantize --intermediates=./mv2_intermediates --output=./mv2_cortex_m.pte

cat mv2_intermediates/delegation_info.txt
Delegation info:
Total delegated subgraphs: 0
Number of delegated nodes: 0
Number of non-delegated nodes: 68

Delegation table:
╒════╤═════════════════════════════════════════════╤═══════════════════════════════════╤═══════════════════════════════════════╕
│ │ op_type │ occurrences_in_delegated_graphs │ occurrences_in_non_delegated_graphs │
╞════╪═════════════════════════════════════════════╪═══════════════════════════════════╪═══════════════════════════════════════╡
│ 0 │ aten_view_copy_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 1 │ cortex_m_dequantize_per_tensor_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 2 │ cortex_m_quantize_per_tensor_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 3 │ cortex_m_quantized_add_default │ 0 │ 10 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 4 │ cortex_m_quantized_avg_pool2d_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 5 │ cortex_m_quantized_conv2d_default │ 0 │ 35 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 6 │ cortex_m_quantized_depthwise_conv2d_default │ 0 │ 17 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 7 │ cortex_m_quantized_linear_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 8 │ dim_order_ops__clone_dim_order_default │ 0 │ 1 │
├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤
│ 9 │ Total │ 0 │ 68 │
╘════╧═════════════════════════════════════════════╧═══════════════════════════════════╧═══════════════════════════════════════╛

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai @AdrianLundell

pytorch-bot · 2026-01-30T22:46:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17075

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 3 Cancelled Jobs, 4 Unrelated Failures

As of commit b222911 with merge base f48a600 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py:
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 79a805d9e426d0d935d03ae3186cbe44644038f826ac25703871d0f53af346ef /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 17911cd2daefe8cf586938cf3a0c296de67de6950f23d6689b895ebcb3a59fef /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 1a94af6409facfe56d5cda151ee736d2e72911da83618862f54353a042edc202 /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t df73bbe4a12e98525f520c343b96599c275788caecbb7af9b65f5a5306648ffc /exec failed with exit code 1
trunk / test-arm-backend-ethos-u (test_memory_allocation) / linux-job (gh)
RuntimeError: Command docker exec -t b6ae24c8ce12eda13a41eb85f849849a27cbff37947a34fce1ae5dfdceaf880a /exec failed with exit code 1
trunk / unittest-release / linux / linux-job (gh)
RuntimeError: Command docker exec -t 94ff403bbc3daeab18f1dac012067c5194713792ce543dadf6f04153d8debe17 /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

trunk / test-arm-backend-zephyr (cortex-m55) / linux-job (gh)
##[error]The operation was canceled.
trunk / test-arm-backend-zephyr (ethos-u55) / linux-job (gh)
##[error]The operation was canceled.
trunk / test-arm-backend-zephyr (ethos-u85) / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, cmake, linux.arm64.2xlarge, execut... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / unittest-editable / macos / macos-job (gh) (similar failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-30T22:51:52Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

psiddh · 2026-02-06T07:49:21Z

examples/arm/aot_arm_compiler.py

        )

-    # Cortex-m ops are never included in vgf or direct-drive
-    if args.target != "vgf" and not args.direct_drive:


Should TOSA targets even have CortexM fallback ? ( --target=u55/u85 → TOSA delegation)

Copilot

Pull request overview

This PR enables full MobileNetV2 lowering to the CMSIS-NN backend for Cortex-M microcontrollers by implementing comprehensive support for quantized operations through a dedicated compilation path. The changes replace the previous delegation-based approach with a portable kernel-based architecture that converts all quantized operations to cortex_m::* operators.

Changes:

Added dedicated Cortex-M compilation path (to_edge_cortex_m) in the AOT compiler with CortexMQuantizer-based quantization
Implemented addmm operator support for decomposed linear layers through new _get_addmm_replacement method
Enhanced quantization parameter propagation with new PropagateQParamsPass and passthrough op handling in FoldAndAnnotateQParamsPass
Extended quantizer to mark parameter nodes as annotated and added passthrough ops (hardtanh, max_pool2d, dropout)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
examples/arm/aot_arm_compiler.py	Adds `to_edge_cortex_m` function for Cortex-M compilation path using CortexMQuantizer and removes old `transform_for_cortex_m_backend` function
backends/cortex_m/quantizer/quantizer.py	Adds `_mark_param_node_as_annotated` method and extends passthrough ops list for MobileNetV2 support
backends/cortex_m/passes/propagate_qparams_pass.py	New pass to propagate qparams through passthrough ops (transpose/permute) to consumer nodes like addmm
backends/cortex_m/passes/cortex_m_pass_manager.py	Adds `PropagateQParamsPass` and `DecomposeAdaptiveAvgPool2dPass` to pass list, adds `skip_passes` parameter to `__init__`
backends/cortex_m/passes/convert_to_cortex_m_pass.py	Implements `_get_addmm_replacement` method to convert decomposed linear (addmm) operations to cortex_m.quantized_linear
backends/arm/_passes/fold_qdq_with_annotated_qparams_pass.py	Adds passthrough ops (hardtanh, relu, clamp) support and second-pass qparams propagation logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backends/cortex_m/passes/convert_to_cortex_m_pass.py

examples/arm/aot_arm_compiler.py

backends/cortex_m/passes/convert_to_cortex_m_pass.py

Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend This PR enables end-to-end export of MobileNetV2 to the CMSIS-NN backend for Cortex-M targets. All quantized operations (conv2d, depthwise conv2d, linear/addmm, activations) are now properly lowered to cortex_m::quantized_* operators, enabling efficient inference on resource-constrained microcontrollers Test Plan: python3 -m examples.arm.aot_arm_compiler -m mv2 --target=cortex-m --quantize --intermediates=./mv2_intermediates --output=./mv2_cortex_m.pte cat ./mv2_intermediates/delegation_info.txt Delegation info: Total delegated subgraphs: 0 Number of delegated nodes: 0 Number of non-delegated nodes: 72 Delegation table: ╒════╤═════════════════════════════════════════════╤═══════════════════════════════════╤═══════════════════════════════════════╕ │ │ op_type │ occurrences_in_delegated_graphs │ occurrences_in_non_delegated_graphs │ ╞════╪═════════════════════════════════════════════╪═══════════════════════════════════╪═══════════════════════════════════════╡ │ 0 │ aten_as_strided_copy_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 1 │ aten_mean_dim │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 2 │ aten_view_copy_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 3 │ cortex_m_dequantize_per_tensor_default │ 0 │ 2 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 4 │ cortex_m_quantize_per_tensor_default │ 0 │ 2 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 5 │ cortex_m_quantized_add_default │ 0 │ 10 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 6 │ cortex_m_quantized_conv2d_default │ 0 │ 35 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 7 │ cortex_m_quantized_depthwise_conv2d_default │ 0 │ 17 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 8 │ cortex_m_quantized_linear_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 9 │ dim_order_ops__clone_dim_order_default │ 0 │ 1 │ ├────┼─────────────────────────────────────────────┼───────────────────────────────────┼───────────────────────────────────────┤ │ 10 │ Total │ 0 │ 71 │ ╘════╧═════════════════════════════════════════════╧═══════════════════════════════════╧═══════════════════════════════════════╛ Note E2E Inference tested on Alif E8 Board Reviewers: Subscribers: Tasks: Tags:

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T09:15:35Z

backends/cortex_m/passes/convert_to_cortex_m_pass.py

+        output_scale = node.meta["output_qparams"][0].scale
+        output_zp = node.meta["output_qparams"][0].zp
+        output_min = node.meta["output_qparams"][0].qmin
+        output_max = node.meta["output_qparams"][0].qmax


The code accesses node.meta["output_qparams"][0] without first verifying that "output_qparams" exists or that it contains an entry at index 0. This could raise a KeyError if output_qparams is missing or empty. Consider adding validation similar to the input_qparams validation above (lines 96-108) with a helpful error message to aid debugging.

Suggested change

output_scale = node.meta["output_qparams"][0].scale

output_zp = node.meta["output_qparams"][0].zp

output_min = node.meta["output_qparams"][0].qmin

output_max = node.meta["output_qparams"][0].qmax

output_qparams = node.meta.get("output_qparams", {})

if 0 not in output_qparams:

raise RuntimeError(

f"Missing output activation qparams at index 0 for addmm node '{node.name}'. "

f"Available output_qparams keys: {list(output_qparams.keys()) if hasattr(output_qparams, 'keys') else output_qparams}. "

"Ensure the model is properly quantized and that qparams were propagated to outputs."

)

output_scale = output_qparams[0].scale

output_zp = output_qparams[0].zp

output_min = output_qparams[0].qmin

output_max = output_qparams[0].qmax

Copilot · 2026-02-06T09:15:35Z

backends/cortex_m/quantizer/quantizer.py

+        # Additional passthrough ops for MobileNetV2 and similar architectures
+        torch.ops.aten.hardtanh.default,
+        torch.ops.aten.hardtanh_.default,
+        torch.ops.aten.max_pool2d.default,


The operator torch.ops.aten.max_pool2d.default appears twice in the SHARED_QSPEC_OPS_DEFAULT list (at line 455 and line 482). This duplication is unnecessary and should be removed to maintain clean code.

Suggested change

torch.ops.aten.max_pool2d.default,

Copilot · 2026-02-06T09:15:35Z

backends/cortex_m/quantizer/quantizer.py

+        torch.ops.aten.hardtanh_.default,
+        torch.ops.aten.max_pool2d.default,
+        torch.ops.aten.dropout.default,


The newly added ops hardtanh.default, hardtanh_.default, max_pool2d.default, and dropout.default are added to SHARED_QSPEC_OPS_DEFAULT but only hardtanh.default is included in the PASSTHROUGH_OPS set in fold_qdq_with_annotated_qparams_pass.py. Consider whether the other ops (especially hardtanh_.default) should also be added to PASSTHROUGH_OPS for consistency, since they likely preserve quantization parameters in the same way.

Suggested change

torch.ops.aten.hardtanh_.default,

torch.ops.aten.max_pool2d.default,

torch.ops.aten.dropout.default,

Copilot · 2026-02-06T09:15:36Z

examples/arm/aot_arm_compiler.py

+def to_edge_cortex_m(
+    exported_program: ExportedProgram,
+    args,
+    model: GraphModule,
+    example_inputs: Tuple[torch.Tensor],
+):
+    """
+    Export and lower model for Cortex-M target using CMSIS-NN portable kernels.
+
+    This function:
+    1. Quantizes the model using CortexMQuantizer
+    2. Re-exports the quantized model
+    3. Lowers to edge IR
+    4. Applies CortexMPassManager transforms to convert ops to cortex_m::* ops
+
+    No delegation is used - all ops run as portable kernels on the Cortex-M target.
+    """
+    logging.info("Using Cortex-M/CMSIS-NN compilation path (no delegation)")
+
+    model_quant = None
+
+    if args.quantize:
+        logging.info("Quantizing with CortexMQuantizer")
+
+        # Convert model to channels_last memory format for optimal Cortex-M performance
+        model_channels_last = model.to(memory_format=torch.channels_last)
+        example_inputs_cl = tuple(
+            x.to(memory_format=torch.channels_last) if x.dim() == 4 else x
+            for x in example_inputs
+        )
+
+        # Use CortexMQuantizer for INT8 quantization
+        quantizer = CortexMQuantizer()
+        prepared = prepare_pt2e(model_channels_last, quantizer)
+
+        dataset = get_calibration_data(
+            args.model_name, example_inputs_cl, args.evaluate, args.evaluate_config
+        )
+
+        if isinstance(dataset, DataLoader):
+            for sample, _ in dataset:
+                if isinstance(sample, torch.Tensor) and sample.dim() == 4:
+                    sample = sample.to(memory_format=torch.channels_last)
+                prepared(sample)
+        else:
+            dataset_cl = tuple(
+                (
+                    x.to(memory_format=torch.channels_last)
+                    if isinstance(x, torch.Tensor) and x.dim() == 4
+                    else x
+                )
+                for x in dataset
+            )
+            prepared(*dataset_cl)
+
+        model_quant = convert_pt2e(prepared)
+
+        exported_program = torch.export.export(
+            model_quant, example_inputs_cl, strict=args.strict_export
+        )
+    else:
+        logging.warning(
+            "Quantization is DISABLED. Cortex-M typically requires quantization."
+        )
+
+    edge = to_edge_transform_and_lower(
+        exported_program,
+        compile_config=EdgeCompileConfig(_check_ir_validity=False),
+    )
+
+    # Build pass instances from CortexMPassManager.pass_list
+    pass_instances = []
+    for pass_cls in CortexMPassManager.pass_list:
+        sig = inspect.signature(pass_cls.__init__)
+        if "exported_program" in sig.parameters:
+            pass_instances.append(pass_cls(edge.exported_program()))
+        else:
+            pass_instances.append(pass_cls())
+
+    # Apply transforms
+    edge = edge.transform(pass_instances)
+
+    # Log cortex_m ops summary
+    cortex_m_ops = {}
+    for node in edge.exported_program().graph.nodes:
+        target_str = str(node.target)
+        if "cortex_m" in target_str:
+            op_name = target_str.split(".")[-1] if "." in target_str else target_str
+            cortex_m_ops[op_name] = cortex_m_ops.get(op_name, 0) + 1
+
+    logging.info("Cortex-M ops summary:")
+    for op_name, count in sorted(cortex_m_ops.items()):
+        logging.info(f"  - {op_name}: {count}")
+
+    return model_quant, edge


The PR description mentions enabling full MobileNetV2 lowering to CMSIS-NN, and includes a test plan showing MobileNetV2 compilation. However, there is no automated test added for MobileNetV2 in the test suite (backends/cortex_m/test/models/). Consider adding a test similar to test_mobilenet_v3.py to ensure MobileNetV2 continues to work correctly and to prevent regressions. This would provide better test coverage for the new addmm handling and complete lowering path.

Copilot · 2026-02-06T09:15:36Z

backends/cortex_m/quantizer/quantizer.py

+    def _mark_param_node_as_annotated(self, node: Node) -> None:
+        """
+        Mark a weight/bias parameter node as annotated.
+
+        This is necessary for FoldAndAnnotateQParamsPass to recognize the node
+        as part of a quantized computation path. The ARM quantizer does this
+        via mark_annotated=True in _QuantProperty.
+        """
+        if Q_ANNOTATION_KEY not in node.meta:
+            node.meta[Q_ANNOTATION_KEY] = QuantizationAnnotation()
+        node.meta[Q_ANNOTATION_KEY]._annotated = True
+        annotation_info = ArmAnnotationInfo(quantized=True)
+        meta_custom = node.meta.get("custom", {})
+        meta_custom[ArmAnnotationInfo.CUSTOM_META_KEY] = dict(annotation_info)
+        node.meta["custom"] = meta_custom


The implementation of _mark_param_node_as_annotated duplicates the exact logic from mark_node_as_annotated in backends/arm/quantizer/arm_quantizer_utils.py. Consider importing and reusing the existing function instead of duplicating the code to improve maintainability and reduce the risk of divergence.

Copilot · 2026-02-06T09:15:36Z

examples/arm/aot_arm_compiler.py

+    # Build pass instances from CortexMPassManager.pass_list
+    pass_instances = []
+    for pass_cls in CortexMPassManager.pass_list:
+        sig = inspect.signature(pass_cls.__init__)
+        if "exported_program" in sig.parameters:
+            pass_instances.append(pass_cls(edge.exported_program()))
+        else:
+            pass_instances.append(pass_cls())
+
+    # Apply transforms
+    edge = edge.transform(pass_instances)
+


This manual pass instantiation logic duplicates the pass instantiation and transformation logic already present in CortexMPassManager.transform() (lines 83-96 in cortex_m_pass_manager.py). Consider refactoring to use CortexMPassManager(edge.exported_program()).transform() directly instead of manually iterating through pass_list and calling edge.transform(pass_instances). This would eliminate code duplication and ensure consistent pass handling.

Suggested change

# Build pass instances from CortexMPassManager.pass_list

pass_instances = []

for pass_cls in CortexMPassManager.pass_list:

sig = inspect.signature(pass_cls.__init__)

if "exported_program" in sig.parameters:

pass_instances.append(pass_cls(edge.exported_program()))

else:

pass_instances.append(pass_cls())

# Apply transforms

edge = edge.transform(pass_instances)

# Apply CortexM passes using CortexMPassManager to avoid duplicating pass logic

edge = CortexMPassManager(edge.exported_program()).transform()

AdrianLundell

Hi, this PR needs major changes I'm afraid.

The changes to fold_qdq_with_annotated_qparams_pass and propagate_qparams_pass are very likely not needed, rather they are masking a faulty implementation either of the add_mm or the integration in the aot_arm_compiler.
The addition of the add_mm is a significant change which should be made in a separate PR properly tested with unittests as is done with all other ops.
It would be great to add mv2 also as a pytest similar to mv3, in fact I would suggesting starting to get that working before adding support to the aot_arm_compiler since the compilation pipeline is guaranteed to be working there.

psiddh · 2026-02-06T17:47:02Z

Hi, this PR needs major changes I'm afraid.

The changes to fold_qdq_with_annotated_qparams_pass and propagate_qparams_pass are very likely not needed, rather they are masking a faulty implementation either of the add_mm or the integration in the aot_arm_compiler.

The addition of the add_mm is a significant change which should be made in a separate PR properly tested with unittests as is done with all other ops.

It would be great to add mv2 also as a pytest similar to mv3, in fact I would suggesting starting to get that working before adding support to the aot_arm_compiler since the compilation pipeline is guaranteed to be working there.

Sure - I agree with the approach. I just wanted to share the work I've been up to recently so
that we can have exactly this kind of discussion.

Context on the design choice:

The Cortex-M backend keeps addmm directly (vs ARM's decomposition to Conv2D) to leverage CMSIS-NN's optimized linear
kernels. This creates a qparam propagation challenge:

When PyTorch decomposes nn.Linear to edge dialect:
linear(input, weight, bias) → addmm(bias, input, weight.T)

The weight flows through a transpose before reaching addmm:
weight → permute_copy → addmm

FoldAndAnnotateQParamsPass folds the DQ into permute, but output_qparams remains empty (no Q node after permute).
The addmm node expects weight qparams at input_qparams[2], hence PropagateQParamsPass bridges this gap.

Proposed approach:

I'll first add test_addmm.py and test_mobilenet_v2.py unit tests following the existing patterns
Once those pass and validate the pipeline, we can review whether PropagateQParamsPass is the right solution or if
there's a cleaner approach
The aot_arm_compiler integration can follow in a subsequent PR

This way we have proper test coverage before discussing the implementation details. Let me get the unit tests
working first.

AdrianLundell · 2026-02-09T09:34:05Z

Sounds good!

When PyTorch decomposes nn.Linear to edge dialect:
linear(input, weight, bias) → addmm(bias, input, weight.T)

The weight flows through a transpose before reaching addmm:
weight → permute_copy → addmm

I think the issue here is that you are not using the EdgeCompileConfig used in the tester:

 config = EdgeCompileConfig(
            preserve_ops=[
                torch.ops.aten.linear.default,
                torch.ops.aten.hardsigmoid.default,
                torch.ops.aten.hardsigmoid_.default,
                torch.ops.aten.hardswish.default,
                torch.ops.aten.hardswish_.default,
            ],
            _check_ir_validity=False,
            _core_aten_ops_exception_list=[torch.ops.aten.max_pool2d.default],
        )

When linear is not decomposed you avoid the issues around q/dq folding. In general the design philosophy is that we want to make the decompositions and annotations to get correct q/dq values directly rather than handling special cases in the folding, as that gets complex very quickly from our previous experience in the arm backend.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 30, 2026

psiddh force-pushed the main branch 6 times, most recently from 39666cd to 7f14a9d Compare February 4, 2026 09:06

zingo added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk module: microcontrollers For embedded MCUs like Cortex-M, or RTOS like Zephyr, does not track NPU backend like Arm Ethos. labels Feb 5, 2026

psiddh force-pushed the main branch 5 times, most recently from 1b64ef3 to 41462be Compare February 6, 2026 07:48

psiddh commented Feb 6, 2026

View reviewed changes

psiddh requested review from AdrianLundell and rascani February 6, 2026 07:54

psiddh changed the title ~~Summary:MV2 CortexM PassManager changes for Alif E8~~ Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script Feb 6, 2026

psiddh marked this pull request as ready for review February 6, 2026 07:56

psiddh requested a review from digantdesai as a code owner February 6, 2026 07:56

Copilot AI review requested due to automatic review settings February 6, 2026 07:56

Copilot started reviewing on behalf of psiddh February 6, 2026 07:56 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

psiddh force-pushed the main branch from 41462be to d7d85fb Compare February 6, 2026 08:39

psiddh force-pushed the main branch from d7d85fb to b222911 Compare February 6, 2026 09:09

Copilot AI review requested due to automatic review settings February 6, 2026 09:09

Copilot started reviewing on behalf of psiddh February 6, 2026 09:09 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

AdrianLundell requested changes Feb 6, 2026

View reviewed changes

psiddh marked this pull request as draft February 6, 2026 17:47

-        output_scale = node.meta["output_qparams"][0].scale
-        output_zp = node.meta["output_qparams"][0].zp
-        output_min = node.meta["output_qparams"][0].qmin
-        output_max = node.meta["output_qparams"][0].qmax
+        output_qparams = node.meta.get("output_qparams", {})
+        if 0 not in output_qparams:
+            raise RuntimeError(
+                f"Missing output activation qparams at index 0 for addmm node '{node.name}'. "
+                f"Available output_qparams keys: {list(output_qparams.keys()) if hasattr(output_qparams, 'keys') else output_qparams}. "
+                "Ensure the model is properly quantized and that qparams were propagated to outputs."
+            )
+        output_scale = output_qparams[0].scale
+        output_zp = output_qparams[0].zp
+        output_min = output_qparams[0].qmin
+        output_max = output_qparams[0].qmax

	torch.ops.aten.hardtanh_.default,
	torch.ops.aten.max_pool2d.default,
	torch.ops.aten.dropout.default,

Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script #17075

Are you sure you want to change the base?

Cortex-M: Enable full MobileNetV2 lowering to CMSIS-NN backend via Aot Compiler script #17075

Conversation

psiddh commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17075

❌ 7 New Failures, 3 Cancelled Jobs, 4 Unrelated Failures

Uh oh!

github-actions bot commented Jan 30, 2026

This PR needs a release notes: label

Uh oh!

psiddh Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

AdrianLundell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

psiddh commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AdrianLundell commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

psiddh commented Jan 30, 2026 •

edited

Loading

pytorch-bot bot commented Jan 30, 2026 •

edited

Loading

This PR needs a `release notes:` label

AdrianLundell left a comment •

edited

Loading

psiddh commented Feb 6, 2026 •

edited

Loading