MamayLM can now see! We are releasing MamayLM v1.0, the best-performing efficient Ukrainian language model that surpasses all similar-sized models in both English and Ukrainian, while matching or overtaking up to 5x larger models.
We are delighted to announce the release of MamayLM v1.0, a new state-of-the-art LLM targeting the Ukrainian language. We are releasing the model in two sizes - 4B and 12B - both of which are cost-efficient, fast, multimodal and can be run on 1 GPU, yet are effective in both Ukrainian and English. The model comes with strong capabilities outpacing open models of similar sizes in both languages, while matching or favourably comparing against much larger models. MamayLM is a result of the research done at the INSAIT Institute - the first MamayLM v0.1 release had shown a wide adoption with more than 10,000 downloads and many positive reviews, providing foundation for further multilingual development. The new version has the following updates:
In our v0.1 version we successfully adapted Gemma 2 to Ukrainian language, based on our research on language transfer
In the previous version, our Ukrainian pre-training data was based on the FineWeb2
During pretraining, we used best-fit packing
Similarly to the v0.1 version, for the post-training stage we extracted topics relevant to Ukrainian history and culture, which enabled the generation of a synthetic dataset of Ukrainian QA pairs using knowledge distillation from a larger model. We also employed our LLM-based translation pipeline to translate domain-specific data to Ukrainian, enhancing both quantity and quality in the target language.
Our instruction-tuning dataset incorporates various open-source datasets, such as the Nemotron SFT and Post-Training datasets, OpenCoder (OPC) SFT dataset, Aya Collection and more. We acknowledge the significant contributions of the Ukrainian open-source community, particularly creators of Spivavtor, UAlpaca, UA-Squad, Ukrainian StackExchange, Crimean Tatar Parallel Corpora and UA-Lawyer QA, which amplify the potential of Ukrainian post-training.
In the pre-training stage we split the dataset into two parts based on different massive web-sourced datasets and re-introduced smaller domain-specific datasets in both splits. Based on the training with different splits we utilized model souping technique to improve pre-trained model performance - this allowed us to increase pre-training performance dramatically.
In the post-training stage, we trained English- and Ukrainian-focused instruction-tuned models separately, which were later combined into a final better version. Such separated approach allows us to increase the performance on both languages even more thanks to the having data targeted for a specific language. We also applied an advanced model merging technique inspired by Layer Swapping
Our pipeline enables us to not just preserve visual and long-context capabilities, but even improve them for both languages without having specific datasets targeted for those domains. We believe that the visual multilingual performance is strongly dependent on the model's linguistic capabilities in given languages, therefore, we observe improvements on visual benchmarks without training on text-image data.
MamayLM v1.0 now supports visual input together with the text thanks to the multimodal support of Gemma 3 models. This is a significant advancement from the previous version, which was limited to text-only processing. Even though our training corpus was focused only on the text data, MamayLM inherited the visual understanding capabilities from the base model, which we further managed to preserve during our training. As a result, the tuned model shows improved results on visual evaluations for both English and Ukrainian without including any image training data! This can be explained by internal model architecture, where multimodal capabilities rely the most on the language performance in the text model itself, while vision tower is only used to process the visual input in the format understanble to the main language model. The enhanced multimodal capabilities of MamayLM v1.0 open up new possibilities for applications that require understanding and generating content based on both text and visual inputs, useful in administrative adoption and various other use cases.
We evaluated MamayLM on a set of standard English benchmarks, a translated version of them in Ukrainian, as well as Ukrainian-specific benchmarks we collected:
We undertook the challenge of unraveling the best translation method for the English-only benchmarks. Although some effort has been made in this direction
To address these issues, we developed a translation framework that preserves the context of both questions and answers. It also employs multisampling and scoring of translation candidates to optimize the balance between machine translation quality and human involvement, ensuring maximum efficiency. All adapted benchmarks for Ukrainian are available in the corresponding GitHub repository.
As illustrated by the figures below, across all benchmarks, MamayLM outperforms all similarly sized models (even outperforming much bigger 70B models on Ukrainian!). It does so in both English and Ukrainian, thanks to the particular method used to train MamayLM (mentioned above).
We also evaluated MamayLM v1.0 against current state-of-the-art LLMs. Impressively, our model outperforms models up to 6 times larger across various benchmarks, including those specific to Ukrainian contexts, as shown in the figure below.
Importantly, as the figure below shows, MamayLM v1.0 achieves the highest score on the ZNO (National Ukrainian) high school exams amongst similarly sized models, while outperforming much larger models, including Gemma 3 27B, Llama 3.1 70B and Qwen 2.5 72B.
The results show that MamayLM models lead the evaluations in Ukrainian language and cultural understanding. While version v0.1 achieved an outstanding score that remains difficult to surpass, our new version delivers overall performance gains across modalities and now includes enhanced visual capabilities.
We also evaluated MamayLM v1.0 on visual benchmarks, where it demonstrates strong performance in both Ukrainian and English. The model's ability to understand and generate text based on visual inputs highlights its versatility and effectiveness across different modalities.
To assess the English performance we use original MMMU
To monitor Ukrainian visual performance we used ZNO-Vision
Beyond benchmark evaluations, we assessed the generative capabilities of MamayLM v1.0 on a set of 500 complex questions. The results demonstrate that our model consistently outperforms significantly larger models, excelling both in the linguistic quality of the generated Ukrainian text and the accuracy of its content. To ensure unbiased and high-quality evaluations, we relied on Gemini 2.0 Flash, which has strong proficiency in Ukrainian and a deep understanding of its cultural and linguistic nuances.
We evaluate the model performance on factual Ukrainian QA data, where our model shows positive performance against much larger models as well as GPT-5-mini and Claude 3.7 Sonnet.
We also check the model performance on m-ArenaHard (Ukrainian subset), designed to evaluate more domain-specific knowledge in math and coding, where our model displays similarly good performance against much larger models.
We assess the capabilities of MamayLM v1.0 4B using the same benchmarks, targeted to evaluate text generation, comprehension, and domain-specific knowledge for both Ukrainian and English. The model shows strong performance against similarly sized models, demonstrating its effectiveness across a range of tasks.
Furthermore, MamayLM v1.0 4B achieves 50% accuracy on ZNO benchmark, showing promising performance on Ukrainian-focused tasks as a small model.
In the current technological landscape, the need for fast, adaptable, and locally optimized solutions has become critical. Available in 4B and 12B sizes, MamayLM is relatively compact and consistently outperforms models up to 5x larger in Ukrainian, while simultaneously maintaining competitive performance in English. Its ability to operate on a single GPU allows for faster adaptation, lower operational costs, and simpler deployment, making it particularly well-suited for environments with limited resources and evolving demands. Moreover, the new version has now visual and long context capabilities with increased performance for both languages.
This offers significant advantages for Ukrainian local businesses and government institutions, which can integrate advanced AI technologies without the prohibitive costs or complex technical requirements typically associated with larger systems. Having a smaller size option allows for more flexibility in deployment and scaling for smaller businesses that lack extensive infrastructure. Additionally, the model's bilingual capabilities support its application in sectors such as education and healthcare, where addressing language barriers can have a meaningful impact. In particular, it helps meet immediate needs in Ukraine by enhancing service delivery across critical areas.
We make normal and quantized versions of MamayLM available on HuggingFace, alongside a detailed description of how to use them for inference:
You can load the model locally using transformers library using the following code:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"INSAIT-Institute/MamayLM-Gemma-3-12B-IT-v1.0",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
The Ukrainian benchmarks are available in the corresponding GitHub repository.
Check out our previous version MamayLM v0.1 9B (Gemma 2) available at this link. If you use our models, please consider citing our work (citation below).
For any questions on MamayLM, please contact us at contact@insait.ai.
INSAIT is a world-class computer science and AI research institute, which is part of Sofia University, located in Sofia, Bulgaria. INSAIT was created in 2022, in partnership with Switzerland's ETH Zurich and EPFL. It is a strategic institution for Bulgaria, funded with an initial endowment of around 100M USD by the Bulgarian government, over a period of 10 years, and is generously supported with donations of roughly 15M USD from SiteGround, Google, AWS, VMware and other big-tech companies. INSAIT is the first center of its kind in Eastern Europe, structured according to top Western computer science and AI institutions – it provides world-class packages and conditions for outstanding tenure-track and tenured faculty, research scientists, post-docs, PhDs and many other positions. Currently, INSAIT hosts researchers from more than 23 nationalities and does research in areas spanning foundational models, safe and secure AI, robotics, computer vision, quantum computing, algorithms, information security, and other key areas.
For attribution in academic contexts, please cite this work as
"MamayLM v1.0: An efficient state-of-the-art multimodal Ukrainian LLM", 2025.
BibTeX citation
@misc{MamayLMv1, title={MamayLM v1.0: An efficient state-of-the-art multimodal Ukrainian LLM}, author={Yukhymenko, Hanna and Alexandrov, Anton and Vechev, Martin}, year={2025}, }
This blog was based on The Distill Template by Leandro von Werra.