Hy-MT1.5-1.8B-2bit: Tencent's 2-Bit On-Device Translation Model That Beats 72B Giants
π― TL;DR
- Hy-MT1.5-1.8B-2bit is Tencent Hunyuan Team's breakthrough 2-bit quantized translation model that compresses a 3.3GB FP16 model down to just 574MB while maintaining world-class translation quality
- Built on Tencent's proprietary Stretched Elastic Quantization (SEQ) technology, part of the AngelSlim compression toolkit
- Supports 33 languages, 5 dialects/minority languages, and 1,056 translation directions with only 1.8B parameters
- Comprehensively outperforms models with 30-40x more parameters (Tower-Plus-72B, Qwen3-32B) and leading commercial APIs
- Deployable fully offline on mobile devices β Apple M4, vivo x300, and Android phones with Snapdragon 865+
- Android APK demo available with background word extraction mode that works across any app without internet connection
Table of Contents
- What is Hy-MT1.5-1.8B-2bit?
- How the 2-bit Quantization Works
- Translation Quality Benchmarks
- On-Device Deployment & Privacy
- Speed Performance
- How to Download and Use
- Under the Hood: AngelSlim Toolkit
- Comparison with Alternatives
- FAQ
- Summary
What is Hy-MT1.5-1.8B-2bit?
Hy-MT1.5-1.8B-2bit is Tencent's latest open-source translation model, representing a major leap in efficient on-device AI. Developed by the Tencent Hunyuan Team, this model delivers translation quality that rivals or exceeds models with 30 to 40 times more parameters β all running locally on your phone with no internet required.
At its core, Hy-MT1.5-1.8B-2bit is built upon the Hy-MT1.5-1.8B foundation model, which was developed through a holistic multi-stage training pipeline:
- MT-oriented pre-training β Building strong multilingual foundations
- Supervised fine-tuning (SFT) β Aligning outputs with human-quality translations
- On-policy distillation β Transferring knowledge from larger teacher models
- Reinforcement learning (RL) β Optimizing for translation quality rewards
This pipeline produces a model that natively supports 33 languages, 5 dialects/minority languages, and an astonishing 1,056 translation directions β all within a 1.8B parameter footprint.
The "2bit" in the model name refers to its weight quantization format. The original 3.3GB FP16 model is compressed to just 574MB, a 82% reduction in size, while the companion 1.25-bit variant (Hy-MT1.5-1.8B-1.25bit) shrinks further to just 440MB.
π‘ Pro Tip: If you need the GGUF format for CPU inference with llama.cpp or similar frameworks, check out the AngelSlim GGUF variant on Hugging Face.
How the 2-bit Quantization Works
The secret sauce behind Hy-MT1.5-1.8B-2bit's remarkable efficiency is Stretched Elastic Quantization (SEQ), Tencent's proprietary quantization algorithm published in the AngelSlim Technical Report (arXiv:2602.21233).
Traditional quantization typically maps floating-point weights to a small set of discrete values. Most 2-bit quantization schemes use a symmetric grid like {-1, 0, 1} (ternary) or {-1, 1} (binary). The problem? These coarse grids cause significant information loss, especially for outlier weights that don't fit the grid well.
SEQ breaks this limitation by stretching the quantization grid to {-1.5, -0.5, 0.5, 1.5} β a non-uniform, asymmetric arrangement that better matches the actual statistical distribution of transformer weights. This "stretched elastic" approach:
- Preserves weight magnitude information that symmetric grids destroy
- Handles outlier weights more gracefully without wrecking the entire activation
- Works synergistically with quantization-aware distillation (QAD) β the model is trained to anticipate quantization errors during fine-tuning
The result is a 2-bit model that doesn't feel like a 2-bit model. On the Flores-200 benchmark for Chinese-foreign language translation, Hy-MT1.5-1.8B-2bit scores within striking distance of the full-precision 3.3GB base β while being 82% smaller.
Quantization Specifications
| Property | Full Precision (FP16) | 2-bit (Hy-MT1.5-1.8B-2bit) | 1.25-bit (Hy-MT1.5-1.8B-1.25bit) |
|---|---|---|---|
| Model Size | 3.3GB | 574MB | 440MB |
| Compression Ratio | 1x | ~5.7x | ~7.5x |
| Quantization Grid | N/A | {-1.5, -0.5, 0.5, 1.5} | {-1.25, -0.25, 0.25, 1.25} |
| Quality Retention | 100% | ~97%+ | ~95%+ |
Translation Quality Benchmarks
This is where Hy-MT1.5-1.8B-2bit truly shines. Despite being a 574MB model, it comprehensively outperforms:
- Tower-Plus-72B β A 72 billion parameter commercial-grade translation model
- Qwen3-32B β Alibaba's 32 billion parameter multilingual model
- Microsoft Translator β Major commercial translation API
- Doubao Translator β ByteDance's translation service
On the Flores-200 benchmark (the industry standard for multilingual translation quality assessment), Hy-MT1.5-1.8B-2bit scores at or near the top across Chinese-foreign language pairs. The model's quality advantage is particularly strong on:
- Chinese β English and English β Chinese translation
- Southeast Asian languages (Vietnamese, Thai, Indonesian)
- Low-resource language pairs where larger models often struggle
This means a 1.8B parameter model trained specifically for translation can actually out-translate generic large language models 20-40x its size. The lesson? Domain-specific training + proper quantization >>> generic scaling.
On-Device Deployment & Privacy
One of the most compelling aspects of Hy-MT1.5-1.8B-2bit is its ability to run entirely on-device. The model is optimized for:
- Apple M-series chips (M4, M3, M2) with Arm SME2 instructions
- Android devices with Snapdragon 865+ and 8GB+ RAM
- vivo x300 series and other flagship Android phones
Privacy by Design
When translation happens on your device, your data never leaves your phone. This is fundamentally different from cloud-based translation APIs where:
- Your text is sent to third-party servers
- Conversation data may be logged or used for model training
- You need a stable internet connection
With Hy-MT1.5-1.8B-2bit, the entire inference pipeline runs locally. Browse foreign websites, chat with international friends, read documents in other languages β all with zero network latency and complete data privacy.
Android Demo App
Tencent provides a ready-to-use Android APK demo that showcases two key features:
- Translation Demo β Type or paste text and get instant translations (Demo: Snapdragon 865, 8GB RAM)
- Background Word Extraction Mode β A system-wide overlay that translates text from any app without switching applications. Read foreign-language emails, webpages, or chat messages with translations floating right where you need them.
One-time APK download, permanent offline use. No account, no data collection.
Speed Performance
Tencent's benchmarks show impressive inference speeds on SME2 (Scalable Matrix Extension 2) capable hardware. The 2-bit model runs significantly faster than the full-precision variant because:
- Smaller memory footprint β Faster memory reads (574MB vs 3.3GB)
- Bit-wise operations β 2-bit weights can be processed more efficiently on dedicated silicon
- SME2 optimization β Arm's newer instruction set extension is purpose-built for matrix operations
On SME2 kernels, the 2-bit model achieves real-time translation speeds on mobile-class hardware. The Neon kernel baseline (standard ARMv8) is slower but still usable for non-real-time scenarios.
How to Download and Use
Model Weights
| Variant | Format | Size | Hugging Face Link |
|---|---|---|---|
| Hy-MT1.5-1.8B-2bit | Safetensors | 574MB | Model |
| Hy-MT1.5-1.8B-2bit | GGUF | ~574MB | GGUF |
| Hy-MT1.5-1.8B-1.25bit | Safetensors | 440MB | Model |
| Hy-MT1.5-1.8B-1.25bit | GGUF | ~440MB | GGUF |
Using with Transformers
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model_name = "AngelSlim/Hy-MT1.5-1.8B-2bit" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) # Translate English to Chinese inputs = tokenizer("The weather is great today.", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Using with llama.cpp (GGUF)
# Download and run with llama-cli ./llama-cli -m Hy-MT1.5-1.8B-2bit-Q4_0.gguf -p "Translate to Chinese: The weather is great today."
Under the Hood: AngelSlim Toolkit
Hy-MT1.5-1.8B-2bit is built using Tencent's AngelSlim model compression toolkit, an open-source project that supports compression for models at all scales β from small 1B models to large 100B+ VLMs and audio models.
Key AngelSlim Components
- SEQ (Stretched Elastic Quantization) β The core 2-bit quantization algorithm
- Sherry β Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification (see arXiv:2601.07892)
- Eagle3 β Training and deployment support for all-scale LLMs/VLMs/Audio models
The AngelSlim project is actively maintained by Tencent's Hunyuan AI Infra Team, with new features and model support released regularly.
Related Repositories
- AngelSlim GitHub: https://github.com/Tencent/AngelSlim
- HY-MT GitHub: https://github.com/Tencent-Hunyuan/HY-MT
- Documentation: https://angelslim.readthedocs.io/
Comparison with Alternatives
| Model | Parameters | Size | Languages | Deployment | Commercial API |
|---|---|---|---|---|---|
| Hy-MT1.5-1.8B-2bit | 1.8B | 574MB | 33 + 5 dialects | On-device (mobile) | No |
| Tower-Plus-72B | 72B | ~144GB | 200+ | Cloud only | Yes (paid) |
| Qwen3-32B | 32B | ~64GB | 100+ | Cloud / GPU | API |
| Google Translate API | N/A | N/A | 130+ | Cloud | Yes (paid) |
| Microsoft Translator | N/A | N/A | 100+ | Cloud | Yes (paid) |
Key takeaway: Hy-MT1.5-1.8B-2bit is the only option that delivers competitive translation quality in an on-device, privacy-preserving, zero-cost package. If you need the absolute best quality and cost is no object, Tower-Plus or Google Translate are options. But for offline mobile use, embedded applications, or privacy-sensitive scenarios, nothing else comes close.
π€ FAQ
Q: What does "2-bit" quantization mean practically?
A: Each model weight (normally stored as a 16-bit or 32-bit floating-point number) is compressed to just 2 bits. Instead of 65,536 possible values, each weight can only be one of 4 values: -1.5, -0.5, 0.5, or 1.5. This 8x reduction in bit-width, combined with removal of redundancy, produces an 82% smaller model file.
Q: How much quality is lost compared to the full-precision model?
A: Based on Tencent's benchmarks on the Flores-200 dataset, the quality loss is minimal β typically less than 3% on standard translation metrics (BLEU, COMET). For many language pairs, the difference is statistically indistinguishable from the FP16 base model in human evaluation.
Q: Can this run on iPhone?
A: Currently, Tencent's optimized binaries target ARM SME2-capable Android devices and Apple M-series chips (Mac/iPad). iPhone deployment would require Core ML conversion or similar optimization, which isn't officially provided yet. The GGUF format can be run on Apple Silicon Macs via llama.cpp.
Q: What languages does Hy-MT1.5-1.8B-2bit support?
A: 33 primary languages including English, Chinese (Simplified & Traditional), Spanish, French, German, Japanese, Korean, Arabic, Russian, Portuguese, Italian, Dutch, Polish, Vietnamese, Thai, Indonesian, and more. Plus 5 dialects/minority language variants and support for 1,056 directional language pairs.
Q: Is the model open-source?
A: Yes. The model weights and the AngelSlim toolkit are open-source. The code is released under the AngelSlim License. Both the standard Safetensors format and GGUF format are freely available on Hugging Face.
Q: How does it compare to GPT-4 / Claude for translation?
A: On standard translation benchmarks, Hy-MT1.5-1.8B-2bit matches or exceeds commercial APIs. However, it is a dedicated translation model β it cannot handle general Q&A, code generation, or other non-translation tasks. For pure translation quality vs. size efficiency, it is currently one of the best open-source options available.
Summary
Hy-MT1.5-1.8B-2bit represents a new paradigm in machine translation: domain-specific training, aggressive quantization, and mobile-first deployment β all in one open-source package. Tencent's AngelSlim toolkit demonstrates that extreme quantization (2-bit, 1.25-bit) doesn't have to mean catastrophic quality loss, thanks to techniques like Stretched Elastic Quantization and quantization-aware distillation.
For developers building translation-powered applications, embedded systems, privacy-sensitive tools, or offline mobile experiences, Hy-MT1.5-1.8B-2bit is worth serious consideration. The combination of:
- 574MB model size (or 440MB at 1.25-bit)
- 33 languages, 1,056 translation directions
- Fully offline, on-device inference
- Zero API costs and complete privacy
- Competitive quality against 72B models
...makes it a uniquely practical achievement in the LLM compression space.
Links:
- Model: https://huggingface.co/tencent/Hy-MT1.5-1.8B-2bit
- AngelSlim: https://github.com/Tencent/AngelSlim
- Android Demo APK: https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk
- AngelSlim Report (arXiv:2602.21233): https://arxiv.org/abs/2602.21233
- HY-MT1.5 Technical Report (arXiv:2512.24092): https://arxiv.org/abs/2512.24092
Originally published at CurateClick