
FINEST: Fine-Tuning for Specialized Translation
FINEST
Fine-Tuning for Specialized Translation
(2025 – 2028)
Research Study: “AI-based language processing environment”
The objective of the project is the scientific investigation and prototypical validation of methods for the domain-specific adaptation and augmentation of Large Language Models (LLMs) in order to systematically improve machine translation in the context of the German Bundeswehr. The central question is whether, and to what extent, LLM-based approaches can achieve qualitative and quantitative improvements over existing neural machine translation (NMT) systems. On this basis, well-founded decision-making foundations are to be created for possible later use within the Bundeswehr’s IT environment.
Tasks and objectives
Current LLMs show considerable potential for language processing tasks, but have been trained predominantly on general, civilian data. Military domains differ significantly from this: they employ distinct terminology, specific text types, and particular registers, and they are subject to special requirements for data protection, traceability, and IT security. These characteristics are insufficiently represented in existing models. Translation processes in a military context therefore place increased demands on both quality and security. While the use of LLMs may offer advantages, it requires systematic evaluation, targeted domain adaptation, and careful consideration of potential risks. The project therefore investigates under which conditions LLM-based methods offer measurable added value compared to conventional NMT and where their limitations lie.
Methodologically, the project follows a multi-stage, empirical approach. First, a secure study and evaluation environment is established, relevant specialist domains and language pairs are selected, and baseline experiments with pre-trained models are conducted. Building on this foundation, domain-specific adaptation and augmentation of the models are carried out. Various approaches are systematically compared, in particular prompt-based extensions incorporating domain-specific resources, Retrieval-Augmented Generation (RAG) in different configurations, and fine-tuning using domain-specific translation data.
In parallel, an evaluation environment is being developed that combines automated metrics and human assessments in order to measure improvements over the baselines in a transparent and reproducible manner. Additional aspects are also being investigated, including the impact of anonymization on translation quality, approaches to the cyclical generation of training data, comparisons of different model sizes and architectures, and the integration of automated post-editing procedures.
The results obtained in the project will ultimately be summarized in consistent recommendations for action for the confident and security-compliant use of AI-supported language processing within the Bundeswehr. This will result in evidence-based statements on the actual benefits of LLMs for military specialized translation, methodologically robust evaluation and benchmarking approaches, and scientific foundations for further research, implementation, and strategic decision-making in the field of AI-supported language processing.
The FINEST project is a joint initiative of the Bundessprachenamt, the Center for Digitalization of the Bundeswehr and the University of the Bundeswehr Munich.