Pretty Scale:How Pretty Are You? Let AI Decide.Try For Free
Hy-MT1.5-1.8B-2bit: Tencent Open-Sources a 574MB On-Device Translation Model That Beats 72B Giants

Hy-MT1.5-1.8B-2bit: Tencent's 2-Bit On-Device Translation Model That Beats 72B Giants

🎯 TL;DR

  • Hy-MT1.5-1.8B-2bit is Tencent Hunyuan Team's breakthrough 2-bit quantized translation model that compresses a 3.3GB FP16 model down to just 574MB while maintaining world-class translation quality
  • Built on Tencent's proprietary Stretched Elastic Quantization (SEQ) technology, part of the AngelSlim compression toolkit
  • Supports 33 languages, 5 dialects/minority languages, and 1,056 translation directions with only 1.8B parameters
  • Comprehensively outperforms models with 30-40x more parameters (Tower-Plus-72B, Qwen3-32B) and leading commercial APIs
  • Deployable fully offline on mobile devices β€” Apple M4, vivo x300, and Android phones with Snapdragon 865+
  • Android APK demo available with background word extraction mode that works across any app without internet connection

Table of Contents

  1. What is Hy-MT1.5-1.8B-2bit?
  2. How the 2-bit Quantization Works
  3. Translation Quality Benchmarks
  4. On-Device Deployment & Privacy
  5. Speed Performance
  6. How to Download and Use
  7. Under the Hood: AngelSlim Toolkit
  8. Comparison with Alternatives
  9. FAQ
  10. Summary

What is Hy-MT1.5-1.8B-2bit?

Hy-MT1.5-1.8B-2bit is Tencent's latest open-source translation model, representing a major leap in efficient on-device AI. Developed by the Tencent Hunyuan Team, this model delivers translation quality that rivals or exceeds models with 30 to 40 times more parameters β€” all running locally on your phone with no internet required.

At its core, Hy-MT1.5-1.8B-2bit is built upon the Hy-MT1.5-1.8B foundation model, which was developed through a holistic multi-stage training pipeline:

  • MT-oriented pre-training β€” Building strong multilingual foundations
  • Supervised fine-tuning (SFT) β€” Aligning outputs with human-quality translations
  • On-policy distillation β€” Transferring knowledge from larger teacher models
  • Reinforcement learning (RL) β€” Optimizing for translation quality rewards

This pipeline produces a model that natively supports 33 languages, 5 dialects/minority languages, and an astonishing 1,056 translation directions β€” all within a 1.8B parameter footprint.

The "2bit" in the model name refers to its weight quantization format. The original 3.3GB FP16 model is compressed to just 574MB, a 82% reduction in size, while the companion 1.25-bit variant (Hy-MT1.5-1.8B-1.25bit) shrinks further to just 440MB.

πŸ’‘ Pro Tip: If you need the GGUF format for CPU inference with llama.cpp or similar frameworks, check out the AngelSlim GGUF variant on Hugging Face.


How the 2-bit Quantization Works

The secret sauce behind Hy-MT1.5-1.8B-2bit's remarkable efficiency is Stretched Elastic Quantization (SEQ), Tencent's proprietary quantization algorithm published in the AngelSlim Technical Report (arXiv:2602.21233).

Traditional quantization typically maps floating-point weights to a small set of discrete values. Most 2-bit quantization schemes use a symmetric grid like {-1, 0, 1} (ternary) or {-1, 1} (binary). The problem? These coarse grids cause significant information loss, especially for outlier weights that don't fit the grid well.

SEQ breaks this limitation by stretching the quantization grid to {-1.5, -0.5, 0.5, 1.5} β€” a non-uniform, asymmetric arrangement that better matches the actual statistical distribution of transformer weights. This "stretched elastic" approach:

  1. Preserves weight magnitude information that symmetric grids destroy
  2. Handles outlier weights more gracefully without wrecking the entire activation
  3. Works synergistically with quantization-aware distillation (QAD) β€” the model is trained to anticipate quantization errors during fine-tuning

The result is a 2-bit model that doesn't feel like a 2-bit model. On the Flores-200 benchmark for Chinese-foreign language translation, Hy-MT1.5-1.8B-2bit scores within striking distance of the full-precision 3.3GB base β€” while being 82% smaller.

Quantization Specifications

PropertyFull Precision (FP16)2-bit (Hy-MT1.5-1.8B-2bit)1.25-bit (Hy-MT1.5-1.8B-1.25bit)
Model Size3.3GB574MB440MB
Compression Ratio1x~5.7x~7.5x
Quantization GridN/A{-1.5, -0.5, 0.5, 1.5}{-1.25, -0.25, 0.25, 1.25}
Quality Retention100%~97%+~95%+

Translation Quality Benchmarks

This is where Hy-MT1.5-1.8B-2bit truly shines. Despite being a 574MB model, it comprehensively outperforms:

  • Tower-Plus-72B β€” A 72 billion parameter commercial-grade translation model
  • Qwen3-32B β€” Alibaba's 32 billion parameter multilingual model
  • Microsoft Translator β€” Major commercial translation API
  • Doubao Translator β€” ByteDance's translation service

On the Flores-200 benchmark (the industry standard for multilingual translation quality assessment), Hy-MT1.5-1.8B-2bit scores at or near the top across Chinese-foreign language pairs. The model's quality advantage is particularly strong on:

  • Chinese β†’ English and English β†’ Chinese translation
  • Southeast Asian languages (Vietnamese, Thai, Indonesian)
  • Low-resource language pairs where larger models often struggle

This means a 1.8B parameter model trained specifically for translation can actually out-translate generic large language models 20-40x its size. The lesson? Domain-specific training + proper quantization >>> generic scaling.


On-Device Deployment & Privacy

One of the most compelling aspects of Hy-MT1.5-1.8B-2bit is its ability to run entirely on-device. The model is optimized for:

  • Apple M-series chips (M4, M3, M2) with Arm SME2 instructions
  • Android devices with Snapdragon 865+ and 8GB+ RAM
  • vivo x300 series and other flagship Android phones

Privacy by Design

When translation happens on your device, your data never leaves your phone. This is fundamentally different from cloud-based translation APIs where:

  • Your text is sent to third-party servers
  • Conversation data may be logged or used for model training
  • You need a stable internet connection

With Hy-MT1.5-1.8B-2bit, the entire inference pipeline runs locally. Browse foreign websites, chat with international friends, read documents in other languages β€” all with zero network latency and complete data privacy.

Android Demo App

Tencent provides a ready-to-use Android APK demo that showcases two key features:

  1. Translation Demo β€” Type or paste text and get instant translations (Demo: Snapdragon 865, 8GB RAM)
  2. Background Word Extraction Mode β€” A system-wide overlay that translates text from any app without switching applications. Read foreign-language emails, webpages, or chat messages with translations floating right where you need them.

One-time APK download, permanent offline use. No account, no data collection.


Speed Performance

Tencent's benchmarks show impressive inference speeds on SME2 (Scalable Matrix Extension 2) capable hardware. The 2-bit model runs significantly faster than the full-precision variant because:

  1. Smaller memory footprint β†’ Faster memory reads (574MB vs 3.3GB)
  2. Bit-wise operations β†’ 2-bit weights can be processed more efficiently on dedicated silicon
  3. SME2 optimization β†’ Arm's newer instruction set extension is purpose-built for matrix operations

On SME2 kernels, the 2-bit model achieves real-time translation speeds on mobile-class hardware. The Neon kernel baseline (standard ARMv8) is slower but still usable for non-real-time scenarios.


How to Download and Use

Model Weights

VariantFormatSizeHugging Face Link
Hy-MT1.5-1.8B-2bitSafetensors574MBModel
Hy-MT1.5-1.8B-2bitGGUF~574MBGGUF
Hy-MT1.5-1.8B-1.25bitSafetensors440MBModel
Hy-MT1.5-1.8B-1.25bitGGUF~440MBGGUF

Using with Transformers

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer model_name = "AngelSlim/Hy-MT1.5-1.8B-2bit" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) # Translate English to Chinese inputs = tokenizer("The weather is great today.", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Using with llama.cpp (GGUF)

# Download and run with llama-cli ./llama-cli -m Hy-MT1.5-1.8B-2bit-Q4_0.gguf -p "Translate to Chinese: The weather is great today."

Under the Hood: AngelSlim Toolkit

Hy-MT1.5-1.8B-2bit is built using Tencent's AngelSlim model compression toolkit, an open-source project that supports compression for models at all scales β€” from small 1B models to large 100B+ VLMs and audio models.

Key AngelSlim Components

  • SEQ (Stretched Elastic Quantization) β€” The core 2-bit quantization algorithm
  • Sherry β€” Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification (see arXiv:2601.07892)
  • Eagle3 β€” Training and deployment support for all-scale LLMs/VLMs/Audio models

The AngelSlim project is actively maintained by Tencent's Hunyuan AI Infra Team, with new features and model support released regularly.

Related Repositories


Comparison with Alternatives

ModelParametersSizeLanguagesDeploymentCommercial API
Hy-MT1.5-1.8B-2bit1.8B574MB33 + 5 dialectsOn-device (mobile)No
Tower-Plus-72B72B~144GB200+Cloud onlyYes (paid)
Qwen3-32B32B~64GB100+Cloud / GPUAPI
Google Translate APIN/AN/A130+CloudYes (paid)
Microsoft TranslatorN/AN/A100+CloudYes (paid)

Key takeaway: Hy-MT1.5-1.8B-2bit is the only option that delivers competitive translation quality in an on-device, privacy-preserving, zero-cost package. If you need the absolute best quality and cost is no object, Tower-Plus or Google Translate are options. But for offline mobile use, embedded applications, or privacy-sensitive scenarios, nothing else comes close.


πŸ€” FAQ

Q: What does "2-bit" quantization mean practically?

A: Each model weight (normally stored as a 16-bit or 32-bit floating-point number) is compressed to just 2 bits. Instead of 65,536 possible values, each weight can only be one of 4 values: -1.5, -0.5, 0.5, or 1.5. This 8x reduction in bit-width, combined with removal of redundancy, produces an 82% smaller model file.

Q: How much quality is lost compared to the full-precision model?

A: Based on Tencent's benchmarks on the Flores-200 dataset, the quality loss is minimal β€” typically less than 3% on standard translation metrics (BLEU, COMET). For many language pairs, the difference is statistically indistinguishable from the FP16 base model in human evaluation.

Q: Can this run on iPhone?

A: Currently, Tencent's optimized binaries target ARM SME2-capable Android devices and Apple M-series chips (Mac/iPad). iPhone deployment would require Core ML conversion or similar optimization, which isn't officially provided yet. The GGUF format can be run on Apple Silicon Macs via llama.cpp.

Q: What languages does Hy-MT1.5-1.8B-2bit support?

A: 33 primary languages including English, Chinese (Simplified & Traditional), Spanish, French, German, Japanese, Korean, Arabic, Russian, Portuguese, Italian, Dutch, Polish, Vietnamese, Thai, Indonesian, and more. Plus 5 dialects/minority language variants and support for 1,056 directional language pairs.

Q: Is the model open-source?

A: Yes. The model weights and the AngelSlim toolkit are open-source. The code is released under the AngelSlim License. Both the standard Safetensors format and GGUF format are freely available on Hugging Face.

Q: How does it compare to GPT-4 / Claude for translation?

A: On standard translation benchmarks, Hy-MT1.5-1.8B-2bit matches or exceeds commercial APIs. However, it is a dedicated translation model β€” it cannot handle general Q&A, code generation, or other non-translation tasks. For pure translation quality vs. size efficiency, it is currently one of the best open-source options available.


Summary

Hy-MT1.5-1.8B-2bit represents a new paradigm in machine translation: domain-specific training, aggressive quantization, and mobile-first deployment β€” all in one open-source package. Tencent's AngelSlim toolkit demonstrates that extreme quantization (2-bit, 1.25-bit) doesn't have to mean catastrophic quality loss, thanks to techniques like Stretched Elastic Quantization and quantization-aware distillation.

For developers building translation-powered applications, embedded systems, privacy-sensitive tools, or offline mobile experiences, Hy-MT1.5-1.8B-2bit is worth serious consideration. The combination of:

  • 574MB model size (or 440MB at 1.25-bit)
  • 33 languages, 1,056 translation directions
  • Fully offline, on-device inference
  • Zero API costs and complete privacy
  • Competitive quality against 72B models

...makes it a uniquely practical achievement in the LLM compression space.

Links:


Originally published at CurateClick

Hy-MT1.5-1.8B-2bit: Tencent Open-Sources a 574MB On-Device Translation Model That Beats 72B Giants | VercelAPP Blog