fix(deps): update dependency transformers to v5 by dreadnode-renovate-bot[bot] · Pull Request #317 · dreadnode/sdk

dreadnode-renovate-bot · 2026-01-28T00:21:43Z

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package	Change	Age	Confidence
transformers	`>=4.41.0,<5.0.0` → `>=5.1.0,<5.2.0`

Release Notes

huggingface/transformers (transformers)

`v5.1.0`: : EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, GLM-OCR

Compare Source

New Model additions

EXAONE-MoE

K-EXAONE is a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.

Add EXAONE-MoE implementations (#43080) by @nuxlear

PP-DocLayoutV3

PP-DocLayoutV3 is a unified and high-efficiency model designed for comprehensive layout analysis. It addresses the challenges of complex physical distortions—such as skewing, curving, and adverse lighting—by integrating instance segmentation and reading order prediction into a single, end-to-end framework.

[Model] Add PP-DocLayoutV3 Model Support (#43098) by @zhang-prog

Youtu-LLM

Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

Add Youtu-LLM model (#43166) by @LuJunru

GlmOcr

GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It introduces Multi-Token Prediction (MTP) loss and stable full-task reinforcement learning to improve training efficiency, recognition accuracy, and generalization. The model integrates the CogViT visual encoder pre-trained on large-scale image–text data, a lightweight cross-modal connector with efficient token downsampling, and a GLM-0.5B language decoder. Combined with a two-stage pipeline of layout analysis and parallel recognition based on PP-DocLayout-V3, GLM-OCR delivers robust and high-quality OCR performance across diverse document layouts.

[GLM-OCR] GLM-OCR Support (#43391)by @zRzRzRzRzRzRzR

Breaking changes

🚨 T5Gemma2 model structure (#43633) - Makes sure that the attn implementation is set to all sub-configs. The config.encoder.text_config was not getting its attn set because we aren't passing it to PreTrainedModel.init. We can't change the model structure without breaking so I manually re-added a call to self.adjust_attn_implemetation in modeling code
🚨 Generation cache preparation (#43679) - Refactors cache initialization in generation to ensure sliding window configurations are now properly respected. Previously, some models (like Afmoe) created caches without passing the model config, causing sliding window limits to be ignored. This is breaking because models with sliding window attention will now enforce their window size limits during generation, which may change generation behavior or require adjusting sequence lengths in existing code.
🚨 Delete duplicate code in backbone utils (#43323) - This PR cleans up backbone utilities. Specifically, we have currently 5 different config attr to decide which backbone to load, most of which can be merged into one and seem redundant
After this PR, we'll have only one config.backbone_config as a single source of truth. The models will load the backbone from_config and load pretrained weights only if the checkpoint has any weights saved. The overall idea is same as in other composite models. A few config arguments are removed as a result.
🚨 Refactor DETR to updated standards (#41549) - standardizes the DETR model to be closer to other vision models in the library.
🚨Fix floating-point precision in JanusImageProcessor resize (#43187) - replaces an int() with round(), expect light numerical differences
🚨 Remove deprecated AnnotionFormat (#42983) - removes a missnamed class in favour of AnnotationFormat.

Bugfixes and improvements

fix(models): Migrate legacy segmentation_indices to out_indices in BeitConfig (#43505) by @harshaljanjani
[docs] Update torch version (#42135) by @stevhliu
Remove SDPA workarounds for torch 2.4+ (#43754) by @cyyever
add use_deterministic to guarantee the consistency for youtu-llm model (#43759) by @kaixuanliu
fix: add compatible_model_types to suppress model type mismatch warnings (#43495) by @leoneperdigao
Fix T5 v1.1 detection (#43681) by @githubnemo
Add moonshine streaming (#43702) by @eustlb
Allow bi-directional attention for all models (#43705) by @Cyrilvallez
Docs: fix Training step by removing tokenizer from trainer initialization (#43733) by @nesjett
Fix scheduler initialization order (#43711) by @SunMarc
Fix accelerate integration import (#43732) by @SunMarc
Update torch minimum version to 2.4 (#41307) by @cyyever
Fix dtype in image-text-to-text pipe (#43731) by @zucchini-nlp
Preventing initialization of siglip's lecun_normal_, default_flax_embed_init in ZeRO3 (#43574) by @jp1924
fix: AttributeError for Qwen3_omni_moe (#43593) by @Vallabh-1504
Improve typing/explanations for general model properties (#43712) by @Cyrilvallez
[Kernels] kernel migration updates for activation kernels (#43518) by @ariG23498
[feat] Allow loading T5Gemma2Encoder with AutoModel (#43559) by @tomaarsen
Added S110 - try-except-pass rule (#43687) by @tarekziade
[docs] benchmarks (#43694) by @stevhliu
fix norm_eps dtype (#43669) by @fschlatt
Llava onevision: output align for tests and add image_sizes input param (#43678) by @kaixuanliu
Fix CLIPOutput attentions not being returned (#43657) by @jonathan-fulton
[Attn] Fixup interface usage after refactor (#43706) by @vasqu
Fix model/processor mismatch in SigLIP2 quantization example (#43652) by @jonathan-fulton
Fix crash of custom models in Notebook or Repl (#43690) by @Cyrilvallez
Simplify TrainingArguments docstring (#43568) by @SunMarc
Composite model inherit automatically all important properties from their children (#43691) by @Cyrilvallez
Update configuration_qwen3.py (#43703) by @francesco-bertolotti
fix gptoss tp crash (#43695) by @sywangyi
[CB] Keep order of incoming requests (#43626) by @remi-or
Fix Apertus model loading (NotImplementedError: Cannot copy out of meta tensor; no data!) (#43473) by @xenova
Remove num_frames in ASR pipeline (#43546) by @jiqing-feng
remove ipex and ccl for xpu and cpu (#42852) by @yao-matrix
update guide with new attr name for toks (#43689) by @itazap
Docs: fix typos in Get started (index, quicktour) (#43666) by @CodeByKodi
the cache class is deprecated by @vasqu (direct commit on main)
custom tok init fix (#43591) by @itazap
More export friendly rewrites and skipping the failing ones (#43436) by @IlyasMoutawwakil
Cast byte_count to int in caching_allocator_warmup for MPS compatibility (#43608) by @tobyliu2004
[Docs] Complete missing Llama4 configuration docs (#43460) by @udaymehta
Fix t5 failures (#43374) by @Abdennacer-Badaoui
Add EoMT with DINOv3 backbone (#41212) by @NielsRogge
Update DBRX docs to reference re-uploaded checkpoint (#43196) by @qgallouedec
[loading] Fix forced upcasting to fp32 (#43683) by @Cyrilvallez
Fix FP8Expert for Qwen (#43670) by @yiliu30
Simplify loading structure (#43589) by @Cyrilvallez
[CB] Refactor logic for inputs and outputs outside of the main API (#43569) by @remi-or
Make sure hub errors are surfaced in PreTrainedTokenizerBase (#43675) by @tarekziade
Fix FP8Expert for DeepSeek R1 (#43616) by @yiliu30
Use correct sampling rate in chat template (#43674) by @zucchini-nlp
[HunYuan] Fix RoPE init (#43411) by @vasqu
XPU now supports MoE kernel(MegaBlocks) implementation (#43435) by @YangKai0616
[Sam] Fixup training flags (#43567) by @vasqu
remove torchao.autoquant from transformers (#43561) by @vkuzo
[DeepSpeed] properly handle MoE weight conversion (#43524) by @kashif
Tie zamba weights correctly (#43623) by @zucchini-nlp
[kernels] Centralize kernels tests (#42819) by @MekkCyber
Fix process_bad_commit_report.py: avoid items to appear in null author in the report (#43662) by @ydshieh
Fix KeyError in check_bad_commit.py (#43655) by @ydshieh
[Benchmark] Minor fix for benchmark: kernel is not correctly called (#43428) by @sywangyi
Add explicit commit info to PR comment CI feedback (#43635) by @ydshieh
Better new failures reporting for PR comment CI (#43629) by @ydshieh
[docs] serving (#42853) by @stevhliu
add XPU expected output for MixedInt8GPT2Test (#43615) by @kaixuanliu
Don't modify mappings in tests (#43634) by @Rocketknight1
Allow Attention and Experts to be used as standalone modules (#43622) by @Cyrilvallez
Don't modify tied_weight_keys in-place (#43619) by @zucchini-nlp
[Rope] Revert #43410 and make inheritance implicit again (#43620) by @vasqu
[vllm compat] Separate renaming from conversion ops (#43621) by @Cyrilvallez
refactor + robusts tests for Tensor Parallel (#42809) by @3outeille
add contiguous operation for diffllama model for xpu to enable compile mode. (#43614) by @kaixuanliu
add xpu expectation for lw_detr model (#43339) by @kaixuanliu
minimax_m2: fix failed test case for XPU (#43324) by @kaixuanliu
Improve new failures reporting (#43628) by @ydshieh
Fix extras on all supported Python versions (#43490) by @tarekziade
fix(models): Fix suno/bark-small CPU offload device mismatch causing CI failures (#43607) by @harshaljanjani
[CB] [Serve] Fix broken serve tests (#43594) by @remi-or
Docs: fix typo in weight converter guide (#43610) by @KOKOSde
[MoE] Use int input for histc on CUDA to support deterministic algorithms (#43583) by @YangKai0616
Fixes configuration default values (#43592) by @zucchini-nlp
Fix make_batched_video with 5D arrays (#43486) by @zucchini-nlp
Operation Green CI II (#43537) by @Rocketknight1
enable cpu paged cache (#42869) by @jiqing-feng
Qwen3 omni - fix get video features (#43588) by @zucchini-nlp
[GLM-Image] Add batch > 1 support and fix configuration defaults (#43342) by @JaredforReal
[Model] Refactor modernbert with the attention interface (#43030) by @YangKai0616
Regex post processing in loading (#43585) by @Cyrilvallez
simplify extra tokens logic in base (#43230) by @itazap
Add XPU support to the tests for solar_open (#43579) by @YangKai0616
remove FbgemmFp8LinearTest (#43545) by @sywangyi
Increase default ReadTimeout in tests (#43586) by @Wauplin
Fix mistral checkpoint loading in utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584) by @ydshieh
[CI][AMD] Fix Pipeline CI (#43178) by @Abdennacer-Badaoui
fix(converter): speed up MistralConverter.extract_vocab_merges_from_model (#43557) by @tarekziade
Improve GPU monitoring: switch to multiprocessing and use amdsmi for AMD GPUs (#43552) by @Abdennacer-Badaoui
Update test of Youtu-LLM to pr-aligned repos (#43578) by @LuJunru
Rework dependencies and extras + Remove outdated templates folder (#43536) by @Cyrilvallez
Fix repo. consistency bot (push permission issue) (#43570) by @ydshieh
Fix Wav2vec and a few others (#43566) by @Cyrilvallez
[Modular] Allow to add new bases that are not present in the inherited class (#43556) by @vasqu
add an option to disable Sam3VideoModel progress bar (#43564) by @ndeybach
check/fix repo. check bot workflow (#43565) by @ydshieh
Increase timeout when preparing CI (#43560) by @Rocketknight1
43054: Add Siglip2Tokenizer to enforce training-time text preprocessing defaults (#43101) by @vaibhav-research
check PR bot permission - part 3 (try content attribute) (#43555) by @ydshieh
check PR bot permission - part 2 (style only) (#43554) by @ydshieh
check PR bot permission - part 1 (#43553) by @ydshieh
Fix failing tests due to no attribute pad_token_id (#43453) by @Sai-Suraj-27
fix: GPT OSS Conversion Script Enhancements (#42901) by @KyleMylonakisProtopia
[Quantization] Fix triton_kernels name after being renamed to gpt-oss-triton-kernels (#43528) by @MekkCyber
[Quantization] Add cutlass kernel for FP8 (#43304) by @MekkCyber
[CB] Minor perf improvements and ty compatibility (#43521) by @remi-or
Fix tiles mixing for batched input, add tie_word_embeddings to LFM2VL config (#43379) by @ankke
fix: return labels instead of label in reduce_label method in BeitImageProcessorFast (#43527) by @sbucaille
[RoPE] Make explicit inheritance (#43410) by @vasqu
Fix for #43530 (#43535) by @Rocketknight1
Operation Green CI (#43530) by @Rocketknight1
Tie the weights even if initializing from a config on meta device (#43523) by @Cyrilvallez
[kernels] Update cv_utils name (#43529) by @MekkCyber
add trackio to training notebooks (#43442) by @merveenoyan
Mark test_prompt_lookup_decoding as flaky (#42184) by @Rocketknight1
Fix some MoE routers (#43445) by @IlyasMoutawwakil
batched_mm is slow on cpu (#43438) by @IlyasMoutawwakil
fix: initialize BatchNorm2d buffers only when needed (#43520) by @tarekziade
Fix loading of Qwen3 FP8 (#43494) by @githubnemo
fix ShieldGemma2IntegrationTest::test_model (#43343) by @sywangyi
Update SamHQModelIntegrationTest::test_inference_mask_generation_batched_points_batched_images for XPU (#43511) by @sywangyi
Revert utils files changes from PR #42845 (#43507) by @ydshieh
Move hardcoded time_step params to config for Bamba, FalconH1, GraniteMoeHybrid (#43461) by @raimbekovm
Prepare inputs for generation is called from super() (#43280) by @zucchini-nlp
Enhance repo. consistency bot (#43503) by @ydshieh
Add pytest-random-order for reproducible test randomization (#43483) by @tarekziade
Add missing GPURawMetrics.from_dict() method in benchmark_v2 (#43499) by @Abdennacer-Badaoui
push dev version 5.0.1.dev0 by @ArthurZucker (direct commit on main)
Fix failing markuplm & perception_lm integration tests (#43464) by @Sai-Suraj-27
fix(Phi4Multimodal): Fix incorrect default vision/audio config initialization in Phi4MultimodalConfig (#43480) by @charlieJ107
handle 1D position_ids for modeling_flash_attention_utils as well (#43403) by @kaixuanliu
Remove stale TODO comments in UDOP tied weights (#43477) by @raimbekovm
Fix Mxfp4 dequantize (#43326) by @Cyrilvallez

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@cyyever
- Remove SDPA workarounds for torch 2.4+ (#43754)
- Update torch minimum version to 2.4 (#41307)
- 🚨 Remove deprecated AnnotionFormat (#42983)
@eustlb
- Add moonshine streaming (#43702)
@tarekziade
- Added S110 - try-except-pass rule (#43687)
- Make sure hub errors are surfaced in PreTrainedTokenizerBase (#43675)
- Fix extras on all supported Python versions (#43490)
- fix(converter): speed up MistralConverter.extract_vocab_merges_from_model (#43557)
- fix: initialize BatchNorm2d buffers only when needed (#43520)
- Add pytest-random-order for reproducible test randomization (#43483)
@nuxlear
- Add EXAONE-MoE implementations (#43080)
@vasqu
- [Attn] Fixup interface usage after refactor (#43706)
- the cache class is deprecated
- [HunYuan] Fix RoPE init (#43411)
- [Sam] Fixup training flags (#43567)
- [Rope] Revert #43410 and make inheritance implicit again (#43620)
- [Modular] Allow to add new bases that are not present in the inherited class (#43556)
- [RoPE] Make explicit inheritance (#43410)
@remi-or
- [CB] Keep order of incoming requests (#43626)
- [CB] Refactor logic for inputs and outputs outside of the main API (#43569)
- [CB] [Serve] Fix broken serve tests (#43594)
- [CB] Minor perf improvements and ty compatibility (#43521)
@NielsRogge
- Add EoMT with DINOv3 backbone (#41212)
@YangKai0616
- XPU now supports MoE kernel(MegaBlocks) implementation (#43435)
- [MoE] Use int input for histc on CUDA to support deterministic algorithms (#43583)
- [Model] Refactor modernbert with the attention interface (#43030)
- Add XPU support to the tests for solar_open (#43579)
@ydshieh
- Fix process_bad_commit_report.py: avoid items to appear in null author in the report (#43662)
- Fix KeyError in check_bad_commit.py (#43655)
- Add explicit commit info to PR comment CI feedback (#43635)
- Better new failures reporting for PR comment CI (#43629)
- Improve new failures reporting (#43628)
- Fix mistral checkpoint loading in utils/fetch_hub_objects_for_ci.py: avoid too many requests and/or timeout (#43584)
- Fix repo. consistency bot (push permission issue) (#43570)
- check/fix repo. check bot workflow (#43565)
- check PR bot permission - part 3 (try content attribute) (#43555)
- check PR bot permission - part 2 (style only) (#43554)
- check PR bot permission - part 1 (#43553)
- Revert utils files changes from PR #42845 (#43507)
- Enhance repo. consistency bot (#43503)
@JaredforReal
- [GLM-Image] Add batch > 1 support and fix configuration defaults (#43342)
@zhang-prog
- [Model] Add PP-DocLayoutV3 Model Support (#43098)
@LuJunru
- Update test of Youtu-LLM to pr-aligned repos (#43578)
- Add Youtu-LLM model (#43166)
@zRzRzRzRzRzRzR
- [GLM-OCR] GLM-OCR Support (#43391)

`v5.0.0`: Transformers v5

Compare Source

Transformers v5 release notes

Highlights
Significant API changes: dynamic weight loading, tokenization
Backwards Incompatible Changes
Bugfixes and improvements

We have a migration guide that will be continuously updated available on the main branch, please check it out in case you're facing issues: migration guide.

Highlights

We are excited to announce the initial release of Transformers v5. This is the first major release in five years, and the release is significant: 1200 commits have been pushed to main since the latest minor release. This release removes a lot of long-due deprecations, introduces several refactors that significantly simplify our APIs and internals, and comes with a large number of bug fixes.

We give an overview of our focus for this release in the following blogpost. In these release notes, we'll focus directly on the refactors and new APIs coming with v5.

This release is the full V5 release. It sets in motion something bigger: going forward, starting with v5, we'll now release minor releases every week, rather than every 5 weeks. Expect v5.1 to follow next week, then v5.2 the week that follows, etc.

We're moving forward with this change to ensure you have access to models as soon as they're supported in the library, rather than a few weeks after.

In order to install this release, please do so with the following:

pip install transformers

For us to deliver the best package possible, it is imperative that we have feedback on how the toolkit is currently working for you. Please try it out, and open an issue in case you're facing something inconsistent/a bug.

Transformers version 5 is a community endeavor, and we couldn't have shipped such a massive release without the help of the entire community.

Significant API changes

Dynamic weight loading

We introduce a new weight loading API in transformers, which significantly improves on the previous API. This
weight loading API is designed to apply operations to the checkpoints loaded by transformers.

Instead of loading the checkpoint exactly as it is serialized within the model, these operations can reshape, merge,
and split the layers according to how they're defined in this new API. These operations are often a necessity when
working with quantization or parallelism algorithms.

This new API is centered around the new WeightConverter class:

class WeightConverter(WeightTransform):
    operations: list[ConversionOps]
    source_keys: Union[str, list[str]]
    target_keys: Union[str, list[str]]

The weight converter is designed to apply a list of operations on the source keys, resulting in target keys. A common
operation done on the attention layers is to fuse the query, key, values layers. Doing so with this API would amount
to defining the following conversion:

conversion = WeightConverter(
    ["self_attn.q_proj", "self_attn.k_proj", "self_attn.v_proj"],  # The input layers
    "self_attn.qkv_proj",  # The single layer as output
    operations=[Concatenate(dim=0)],
)

In this situation, we apply the Concatenate operation, which accepts a list of layers as input and returns a single
layer.

This allows us to define a mapping from architecture to a list of weight conversions. Applying those weight conversions
can apply arbitrary transformations to the layers themselves. This significantly simplified the from_pretrained method
and helped us remove a lot of technical debt that we accumulated over the past few years.

This results in several improvements:

Much cleaner definition of transformations applied to the checkpoint
Reversible transformations, so loading and saving a checkpoint should result in the same checkpoint
Faster model loading thanks to scheduling of tensor materialization
Enables complex mix of transformations that wouldn't otherwise be possible (such as quantization + MoEs, or TP + MoEs)

Linked PR: #41580

Tokenization

Just as we moved towards a single backend library for model definition, we want our tokenizers, and the Tokenizer object to be a lot more intuitive. With v5, tokenizer definition is much simpler; one can now initialize an empty LlamaTokenizer and train it directly on your corpus.

Defining a new tokenizer object should be as simple as this:

from transformers import TokenizersBackend, generate_merges
from tokenizers import pre_tokenizers, Tokenizer
from tokenizers.model import BPE

class Llama5Tokenizer(TokenizersBackend):
    def __init__(self, unk_token="<unk>",bos_token="<s>", eos_token="</s>", vocab=None, merges=None ):
        if vocab is None:
            self._vocab = {
                str(unk_token): 0,
                str(bos_token): 1,
                str(eos_token): 2,
            }

        else:
            self._vocab = vocab

            self._merges = merges

        self._tokenizer = Tokenizer(
            BPE(vocab=self._vocab, merges=self._merges, fuse_unk=True)
        )
        self._tokenizer.pre_tokenizer = pre_tokenizers.Metaspace(
            replacement="▁", prepend_scheme=_get_prepend_scheme(self.add_prefix_space, self), split=False
        )
        super().__init__(
            tokenizer_object=self._tokenizer,
            unk_token=unk_token,
            bos_token=bos_token,
            eos_token=eos_token,
        )

Once the tokenizer is defined as above, you can load it with the following: Llama5Tokenizer(). Doing this returns you an empty, trainable tokenizer that follows the definition of the authors of Llama5 (it does not exist yet 😉).

The above is the main motivation towards refactoring tokenization: we want tokenizers to behave similarly to models: trained or empty, and with exactly what is defined in their class definition.

Backend Architecture Changes: moving away from the slow/fast tokenizer separation

Up to now, transformers maintained two parallel implementations for many tokenizers:

"Slow" tokenizers (tokenization_<model>.py) - Python-based implementations, often using SentencePiece as the backend.
"Fast" tokenizers (tokenization_<model>_fast.py) - Rust-based implementations using the 🤗 tokenizers library.

In v5, we consolidate to a single tokenizer file per model: tokenization_<model>.py. This file will use the most appropriate backend available:

TokenizersBackend (preferred): Rust-based tokenizers from the 🤗 tokenizers library. In general it provides optimal performance, but it also offers a lot more features that are commonly adopted across the ecosystem:

handling additional tokens
a full python API for setting and updating
automatic parallelization,
automatic offsets
customization
training

SentencePieceBackend: for tokenizers requiring the sentencepiece library. It inherits from PythonBackend.
PythonBackend: a Python implementations of the features provided by tokenizers. Basically allows adding tokens.
MistralCommonBackend: relies on MistralCommon's tokenization library. (Previously known as the MistralCommonTokenizer)

The AutoTokenizer automatically selects the appropriate backend based on available files and dependencies. This is transparent, you continue to use AutoTokenizer.from_pretrained() as before. This allows transformers to be future-proof and modular to easily support future backends.

Defining a tokenizers outside of the existing backends

We enable users and tokenizer builders to define their own tokenizers from top to bottom. Tokenizers are usually defined using a backend such as tokenizers, sentencepiece or mistral-common, but we offer the possibility to design the tokenizer at a higher-level, without relying on those backends.

To do so, you can import the PythonBackend (which was previously known as PreTrainedTokenizer). This class encapsulates all the logic related to added tokens, encoding, and decoding.

If you want something even higher up the stack, then PreTrainedTokenizerBase is what PythonBackend inherits from. It contains the very basic tokenizer API features:

encode
decode
vocab_size
get_vocab
convert_tokens_to_ids
convert_ids_to_tokens
from_pretrained
save_pretrained
among a few others

API Changes

1. Direct tokenizer initialization with vocab and merges

Starting with v5, we now enable initializing blank, untrained tokenizers-backed tokenizers:

from transformers import LlamaTokenizer

tokenizer = LlamaTokenizer()

This tokenizer will therefore follow the definition of the LlamaTokenizer as defined in its class definition. It can then be trained on a corpus as can be seen in the tokenizers documentation.

These tokenizers can also be initialized from vocab and merges (if necessary), like the previous "slow" tokenizers:

from transformers import LlamaTokenizer

vocab = {"<unk>": 0, "<s>": 1, "</s>": 2, "hello": 3, "world": 4}
merges = [("h", "e"), ("l", "l"), ("o", " ")]

tokenizer = LlamaTokenizer(vocab=vocab, merges=merges)

This tokenizer will behave as a Llama-like tokenizer, with an updated vocabulary. This allows comparing different tokenizer classes with the same vocab; therefore enabling the comparison of different pre-tokenizers, normalizers, etc.

⚠️ The vocab_file (as in, a path towards a file containing the vocabulary) cannot be used to initialize the LlamaTokenizer as loading from files is reserved to the from_pretrained method.

2. Simplified decoding API

The batch_decode and decode methods have been unified to reflect behavior of the encode method. Both single and batch decoding now use the same decode method. See an example of the new behavior below:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("t5-small") 
inputs = ["hey how are you?", "fine"]
tokenizer.decode(tokenizer.encode(inputs))

Gives:

- 'hey how are you?</s> fine</s>'
+ ['hey how are you?</s>', 'fine</s>']

We expect encode and decode to behave, as two sides of the same coin: encode, process, decode, should work.

[!NOTE]
A common use-case would be: encode, model.generate, decode. However, using generate would return list[list[int]], which would then be incompatible with decode.

3. Unified encoding API

The encode_plus method is deprecated in favor of the single __call__ method.

4. `apply_chat_template` returns `BatchEncoding`

Previously, apply_chat_template returned input_ids for backward compatibility. Starting with v5, it now consistently returns a BatchEncoding dict like other tokenizer methods.

# v5
messages = [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi there!"}
]

# Now returns BatchEncoding with input_ids, attention_mask, etc.
outputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
print(outputs.keys())  # dict_keys(['input_ids', 'attention_mask'])

5. Removed legacy configuration file saving:

We simplify the serialization of tokenization attributes:

special_tokens_map.json - special tokens are now stored in tokenizer_config.json.
added_tokens.json - added tokens are now stored in tokenizer.json.
added_tokens_decoder is only stored when there is no tokenizer.json.

When loading older tokenizers, these files are still read for backward compatibility, but new saves use the consolidated format. We're gradually moving towards consolidating attributes to fewer files so that other libraries and implementations may depend on them more reliably.

6. Model-Specific Changes

Several models that had identical tokenizers now import from their base implementation:

LayoutLM → uses BertTokenizer
LED → uses BartTokenizer
Longformer → uses RobertaTokenizer
LXMert → uses BertTokenizer
MT5 → uses T5Tokenizer
MVP → uses BartTokenizer

These modules will eventually be removed altogether.

Removed T5-specific workarounds

The internal _eventually_correct_t5_max_length method has been removed. T5 tokenizers now handle max length consistently with other models.

Testing Changes

A few testing changes specific to tokenizers have been applied:

Model-specific tokenization test files now focus on integration tests.
Common tokenization API tests (e.g., add_tokens, encode, decode) are now centralized and automatically applied across all tokenizers. This reduces test duplication and ensures consistent behavior

For legacy implementations, the original BERT Python tokenizer code (including WhitespaceTokenizer, BasicTokenizer, etc.) is preserved in bert_legacy.py for reference purposes.

7. Deprecated / Modified Features

Special Tokens Structure:

SpecialTokensMixin: Merged into PreTrainedTokenizerBase to simplify the tokenizer architecture.
special_tokens_map: Now only stores named special token attributes (e.g., bos_token, eos_token). Use extra_special_tokens for additional special tokens (formerly additional_special_tokens). all_special_tokens includes both named and extra tokens.

# v4
tokenizer.special_tokens_map  # Included 'additional_special_tokens'

# v5
tokenizer.special_tokens_map  # Only named tokens
tokenizer.extra_special_tokens  # Additional tokens

special_tokens_map_extended and all_special_tokens_extended: Removed. Access AddedToken objects directly from _special_tokens_map or _extra_special_tokens if needed.
additional_special_tokens: Still accepted for backward compatibility but is automatically converted to extra_special_tokens.

Deprecated Methods:

sanitize_special_tokens(): Already deprecated in v4, removed in v5.
prepare_seq2seq_batch(): Deprecated; use __call__() with text_target parameter instead.

# v4
model_inputs = tokenizer.prepare_seq2seq_batch(src_texts, tgt_texts, max_length=128)

# v5
model_inputs = tokenizer(src_texts, text_target=tgt_texts, max_length=128, return_tensors="pt")
model_inputs["labels"] = model_inputs.pop("input_ids_target")

BatchEncoding.words(): Deprecated; use word_ids() instead.

Removed Methods:

create_token_type_ids_from_sequences(): Removed from base class. Subclasses that need custom token type ID creation should implement this method directly.
prepare_for_model(), build_inputs_with_special_tokens(), truncate_sequences(): Moved from tokenization_utils_base.py to tokenization_python.py for PythonBackend tokenizers. TokenizersBackend provides model-ready input via tokenize() and encode(), so these methods are no longer needed in the base class.
_switch_to_input_mode(), _switch_to_target_mode(), as_target_tokenizer(): Removed from base class. Use __call__() with text_target parameter instead.

# v4
with tokenizer.as_target

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://redirect.github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi45My4xIiwidXBkYXRlZEluVmVyIjoiNDMuOC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->

| datasource | package | from | to | | ---------- | ------------ | ------ | ----- | | pypi | transformers | 4.57.1 | 5.1.0 |

dreadnode-renovate-bot bot added area/python Changes to Python package configuration and dependencies type/digest Dependency digest updates labels Jan 28, 2026

fix(deps): update dependency transformers to v5

157a706

| datasource | package | from | to | | ---------- | ------------ | ------ | ----- | | pypi | transformers | 4.57.1 | 5.1.0 |

dreadnode-renovate-bot bot force-pushed the renovate/transformers-5.x branch from 9f77e00 to 157a706 Compare February 11, 2026 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deps): update dependency transformers to v5#317

fix(deps): update dependency transformers to v5#317
dreadnode-renovate-bot[bot] wants to merge 1 commit intomainfrom
renovate/transformers-5.x

dreadnode-renovate-bot bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dreadnode-renovate-bot bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v5.1.0: : EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, GLM-OCR

New Model additions

EXAONE-MoE

PP-DocLayoutV3

Youtu-LLM

GlmOcr

Breaking changes

Bugfixes and improvements

Significant community contributions

v5.0.0: Transformers v5

Transformers v5 release notes

Highlights

Significant API changes

Dynamic weight loading

Tokenization

Backend Architecture Changes: moving away from the slow/fast tokenizer separation

Defining a tokenizers outside of the existing backends

API Changes

1. Direct tokenizer initialization with vocab and merges

2. Simplified decoding API

3. Unified encoding API

4. apply_chat_template returns BatchEncoding

5. Removed legacy configuration file saving:

6. Model-Specific Changes

Testing Changes

7. Deprecated / Modified Features

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dreadnode-renovate-bot bot commented Jan 28, 2026 •

edited

Loading

`v5.1.0`: : EXAONE-MoE, PP-DocLayoutV3, Youtu-LLM, GLM-OCR

`v5.0.0`: Transformers v5

4. `apply_chat_template` returns `BatchEncoding`