HAIF

Research groups

HAIF will establish a novel doctoral training scheme that connects University of Turku’s Faculties of Technology, Social Sciences, Law, Humanities and Medicine. By bringing together 15 University of Turku professors and 11 research groups and topics, HAIF creates a scientific community with unprecedented interdisciplinary knowledge and global partner networks.

HAIF doctoral researchers will perform novel research under scientifically distinguished and experienced supervisors. HAIF doctoral researchers get access to world-class research infrastructure such as high-performance computing offered by the Finnish CSC – IT Center for Science Ltd.

Below you can find a short description of each participating research group.

In the HAIF second call, you can apply to the following research groups: Materials in Health Technology, Materials Informatics Laboratory, Turku Natural Language Processing Group (TurkuNLP) – Computer Science, Intelligent health, and Health Technology Research Group.

Materials in Health Technology – APPLY NOW

Materials Informatics Laboratory – APPLY NOW

Turku Natural Language Processing Group (TurkuNLP) – Computer Science – APPLY NOW

Intelligent Health – APPLY NOW

Health Technology Research Group – APPLY NOW

Materials in Health Technology – APPLY NOW

Emilia Peltola

Prof.

Group leader (PI) Assoc. Prof. Emilia Peltola

Visit the website: Materials in Health Technology

Visit the LinkedIn: Materials Engineering at the University of Turku

In the Materials in Health Technology Group, we work at the intersection of artificial intelligence, and biomedical materials. Our goal is to design next-generation sensors for healthcare by teaching AI to understand and predict how materials behave at the human–technology interface. We generate rich datasets through high-throughput in situ experiments and computational modelling, enabling machine learning to uncover the key surface properties that drive sensor performance, biocompatibility and sustainability. Students in our group can contribute on multiple fronts: developing algorithms and data-driven models, building computational pipelines, and—if they wish—conducting experimental work to produce the very data that powers AI discovery. This unique combination allows us to bridge the gap between computation and experiment while advancing human health and a sustainable future for medical technology.

Potential dissertation topics:

AI for Sustainable Material Discovery
Develop machine learning algorithms to screen and predict environmentally friendly materials as alternatives to traditional toxic reagents in nanoparticle synthesis. Students can focus on computational model development or complement it with experimental synthesis to validate AI predictions.

Predicting Synthesis Outcomes with AI
Build predictive models that forecast the morphological and chemical characteristics of nanoparticles synthesized with alternative, greener chemicals. This enables scientists to select safe and efficient synthesis routes early in the research process. Experimental-minded students can generate new synthesis datasets, while computationally oriented students can focus on modelling and validation.

AI-Driven Electrode Design for Biomedical Applications
Use artificial intelligence to design advanced electrode surfaces with optimized electrocatalysis and biocompatibility. Projects can involve algorithm and pipeline development, computational screening, and/or high-throughput in situ experiments to generate data that trains and tests the models.

Machine Learning for Cyclic Voltammetry Analysis
Apply regression models and advanced machine learning techniques to analyse cyclic voltammetry data. The goal is to identify and quantify signals related to neurotransmitters and hormones, paving the way for AI-enhanced biosensors. Students can work on developing novel ML approaches for time-series electrochemical data, or complement modelling with experiments to generate training datasets from cyclic voltammetry measurements.

Keywords: Human-centric AI, sensors, experimental–computational gap, biomedical applications, nanoparticles, sustainability

Electronic engineer of computer technology. Maintenance computer cpu hardware upgrade of motherboard component. Pc repair, technician and industry support concept.

Materials Informatics Laboratory – APPLY NOW

Milica Todorović

Asst. Prof

Group leader (PI) Asst. Prof. Milica Todorović

Visit the website: Materials Informatics Laboratory Group

This computational group combines AI and data science with materials datasets to optimise the functional properties of materials and boost their performance in technological devices. We typically deploy Python-based supervised learning to map materials structure or processing conditions to their functional properties, so that we may infer which materials design choices produce the optimal functionality. We also develop Bayesian optimisation active learning tools for materials, which allow us to build explainable surrogate models for materials properties. Recently, our focus is shifting to multi-modal AI with multi-task and GPU-powered sparse GPs. We are keen to merge information from text (NLP), images and scientific expertise (human-in-the-loop) with numerical models for materials properties to accelerate the search for best solutions.

Potential dissertation topics:

PhD topic: Human-in-the-loop active learning for materials design
Expert opinion could be exploited to inform and guide the active learning search for materials with optimal properties, making materials informatics more human-centric. We recently demonstrated that Projective Preferential Bayesian Optimisation for integrating human preference into materials models helps to find optimal solutions faster, but many questions remain. What kind of human opinion is most informative for guiding materials design? What questions and interfaces are best suited to recording it? How do we best encode human knowledge and what are the best approaches for integrating it into data-driven models for materials properties? Can we accelerate the search without introducing harmful human bias?
Keywords: human-in-the-loop, multi-task Bayesian optimisation, active learning, technological materials

PhD topic: NLP-informed materials design
In tandem with the TurkuNLP group, we developed NLP tools for extracting information from materials research literature, providing us with a database of materials structures and properties. Such information could be used to guide materials design towards previously unknown compounds with superior functional properties, but how? In what way could we organise the text-extracted knowledge and encode it into information type that is complementary to numerical structure-property data? What are the best approaches for integrating this information into structure-property materials models? How do we account for positive bias and lack of negative outcomes in published texts? Could we discover previously unknown compounds that are both stable in nature and synthesizable?
Keywords: NLP, multi-task active learning, transfer learning, technological materials

Turku Natural Language Processing Group (TurkuNLP) – Computer Science – APPLY NOW

Filip Ginter

Prof.

Group leader (PI) Prof. Filip Ginter

Visit the website: TurkuNLP

The research group is focusing on algorithmic and computational aspects of natural language processing (NLP), one of the cornerstones of modern AI. Our research is centered around machine learning for NLP, in combination with work at a very large-scale corpora, regularly working with collections in the 100+ billion word range. Our most recent work includes training generative large language models in a high-performance computing environment, cross-lingual meaning models in historical language corpora and other noisy datasets, as well as the development of core language technology for Finnish and numerous other languages. By its nature, our research involves topics from deep neural network training to highly scalable algorithms for indexing and matching meaning across languages. We make extensive use of the national computing resources and have been selected for piloting the two most recent generations of GPU-accelerated supercomputers, including LUMI, the largest European supercomputer.
LLM-driven information extraction from web-scale corpora: Methods for efficient application of LLMs to information extraction and document categorization tasks at a very large scale. These may include research in efficient inference methodology in the technical sense, as well as applications of LLMs in specific tasks of interest.

Potential dissertation topics:

Curriculum learning in LLM training:
Methods for data-driven selection of LLM training examples so as to optimize training time and resulting model quality with especial focus on highly multilingual settings. A spectrum of topics is possible, both in terms of methodology and dataset creation and filtering.

NLP on noisy data:
NLP methods to utilize and standardize corpora that are affected by noise, especially OCR artefacts in historical documents but also other types of noise and discontinuities emerging from web scraping. These may include both LLM-driven error correction as well as embedding and retrieval methods resilient to noise.

Context-driven OCR/HTR:
Methods for incorporating external knowledge and context into optical character recognition (OCR) and handwritten text recognition (HTR), especially in historical documents and cultural history artifacts, but not necessarily restricted to this domain. This external knowledge and context can, for instance, relate to layout information, writer information, or prior knowledge on constraints in tabular data.

User-guided document embeddings:
Methods for induction of flexible, user-guided document and text passage embeddings that take into account or disregard specific aspects of meaning in a controlled, promptable manner. Large-scale applications of such methods in document retrieval and retrieval-augmented generation.

Tracing the emergence and flow of ideas in multilingual corpora:
Methods for identifying and tracing passages of related meaning in very large multilingual text collections, tracing the development and mutual interaction of ideas, especially in historical texts but not limited to this domain.

Text-based explainability:
Methods for text-based explainable NLP/AI, moving beyond feature heatmap approaches.

Cross-Lingual Knowledge Compression for Modular Reasoning Agents:
Investigate how to compress and transfer knowledge from large multilingual reasoning models into smaller, modular agents that retain reasoning abilities while being efficient enough for deployment in resource-constrained environments.

Multilingual Multi-Agent Collaboration with Dynamic Task Delegation:
Design multilingual LLM-based agent systems that collaborate across languages to solve complex tasks, dynamically delegating work based on each agent’s linguistic capability, reasoning strengths, and computational efficiency.

Patient-Centric, Trust-Aware LLMs for Health Information Retrieval and Clinical Reasoning:
Research on LLM-based systems that retrieve, reason over, and explain health information in a way that is accurate, trustworthy, and adapted to patients’ literacy levels. The model will integrate evidence from clinical guidelines, scientific literature, and patient-reported data, producing explanations that are medically correct yet understandable for non-expert users.

Keywords: NLP, large language models, large corpora, deep learning, multilingual methods, reasoning agents, health applications of NLP

Engineering Industrial Macro Closeup Pattern.

Intelligent Health – APPLY NOW

Anna Axelin

Prof.

Tella Lantta

Docent

Group leaders (PIs) Prof. Anna Axelin and Docent Tella Lantta

Visit the website: Intelligent Health

The research group focuses on advancing the use of digital and social innovations to create value for health. We are dedicated to advancing mental health promotion and care through the development of highly personalized digital tools for early detection, prevention and care across a range of symptoms, problems and disorders. We have a strong stakeholder approach and conduct research in partnership with patients, families, professionals, healthcare organizations and companies. We are looking for a person with a background in nursing or health sciences.

Potential dissertation topic:

Health literacy in the use of AI
This PhD project will explore and define health literacy in communication with interactive AI language models. The concept will be co-created with young adults who have previously used such models to seek information related to their mental health concerns. In addition, the project will explore prompt optimization techniques to support positive mental health outcomes. As a result, this PhD project will develop best-practice guidelines for promoting individual mental health literacy to ensure safer interactions with AI models.

Keywords: Mental health, Health promotion, Personalized care, Large language models, Health literacy

3D rendering of the head of a female robot. The head is breaking apart with zeros and ones coming out of it. Black background.

Health Technology Research Group – APPLY NOW

Pasi Liljeberg

Prof.

Group leader (PI) Prof. Pasi Liljeberg

Visit the website: Digital Health Technology

Health technology research unit stands at the intersection of technology and medicine. We develop data-analytics based solutions to support health and well-being that can serve both medical professionals and the public. Our research focuses on human sensing solutions, wearable technologies, and machine learning based data analytics methods to help collect and utilise data in a user-friendly, safe, and cost-effective manner. Like many other fields of science, healthcare sector is moving towards data as the foundation of its decision making. The group leverages its research for developing AI based applications utilising health data for preventive and personalised healthcare.

Potential dissertation topics:

Trustworthy Non-contact Biosignal Monitoring for Improving Maternal Health
The health of pregnant women and newborn babies can be monitored via biosignals. These signals serve as indicators of maternal well-being, fetal health, and potential complications, such as stress, hypoxia, and cardiovascular issues, which can adversely affect pregnancy outcomes. The research will explore non-contact methods for the measurement of human physiological parameters, including heart rate (HR), heart rate variation (HRV), and oxygen levels in the blood (SpO2). Remote Photoplethysmography (rPPG) is a non-contact method for monitoring physiological signals using video recording of the subject’s face by utilising machine learning methods. By analyzing the subtle colour changes in the skin due to blood flow, rPPG enables the estimation of human physiological parameters, including HR, HRV, and SpO2, without the need for physical sensors. Despite its potential, this method faces many challenges in practical use. The goal of this proposal is to build a trustworthy method for monitoring the health of pregnant women and newborn babies using rPPG.

Measuring Haemoglobin without Needles: Exploring non-invasive methods for Haemoglobin level estimation
The accurate measurement of haemoglobin levels is essential for diagnosing conditions such as anaemia, thalassemia, and polycythemia. Conventional methods, though reliable, require invasive blood sampling, which can be inconvenient, painful, and difficult to repeat frequently. This project aims to develop a non-invasive, point-of-care solution for haemoglobin estimation using photoplethysmography (PPG). PPG measures blood volume changes through light absorption and reflection, offering a painless approach to physiological monitoring. Building on this, we will further explore remote photoplethysmography (rPPG), a non-contact technique that extracts physiological signals from facial video recordings. By leveraging advances in signal processing and machine learning, the goal is to create a practical, non-invasive tool for haemoglobin monitoring that can improve accessibility and reduce dependence on invasive testing.

Keywords: Health technology, biosignals, Remote Photoplethysmography, non-contact health monitoring, machine learning, maternal health, Haemoglobin