Where academic tradition
meets the exciting future

RITES – Resilient IT Infrastructures

Research area

Our society is becoming increasingly dependent on complex IT infrastructures and services. IT systems are already the most complex systems built by mankind, and their scale and complexity is increasing all the time. Complexity is escalating at all levels of technology stack — from application and service level till underlying implementation technology. Currently, the underlying technologies (i.e. processor architectures, and silicon technologies) are undergoing big leaps, from comparatively straightforward architectures (single-core) with deterministic behaviour, to complex architectures (heterogeneous many-core) with non-deterministic behaviour, due to changes in silicon yield etc.

TUCS is in a unique position to address these challenges in a unified way, since TUCS possesses competence of a cross-cutting nature that spans from highly abstract service modelling level to the hardware implementation platform. The partners of the research programme have strong track record in the corresponding fields.

Research goal

The research programme aims to address the challenges of complex IT infrastructures by targeting solutions along the following important themes, for which the participating research groups have a long track record of research achievements.

Adaptability

The challenge in future ICT-systems is that their application loads will vary highly at the same time as the available computing capabilities may vary over time. To achieve efficiency in different areas (energy, performance, cost, etc.), the application must be able to adapt itself to the environment, and to the capabilities available. The main challenges in this area lie in designing adaptable system architectures, as well as coming up with good adaptation strategies.

Future multi-core systems will exploit heterogeneous networked architectures and run dynamic loads created by future application scenarios. Run-time management of these systems requires a high degree of scalability and adaptivity from both the underlying hardware platform and the operating system or middle-ware of the platform. Distributed (decentralized) solutions need to be found for implementing such management functions. We focus on self-aware computing platforms and embedded storage systems that are energy proportional, i.e., their energy-efficiency remains high independently of the offered load. More specifically, we develop adaptive control and management approaches that enable efficient self-adaptation of the platform according to the dynamically changing computational load.

Efficiency

Efficiency is a key constraint in the construction of future ICT-systems. The construction itself must be cost efficient (lean), that is one should both do the right system and do the system in a right way, and avoid all unnecessary costs to achieve good time to market. This requires lean processes, but also cost effective approaches to verify and validate the proposed solutions. Lean processes, model-driven design, and formal methods are all part of the palette of approaches that have been successfully applied to system construction by the partners.

Efficiency is also important in the deployment of the system. This is specially challenging in the case of adaptable systems that need to provision resources dynamically based on the current needs. The system must be energy-efficient, and it must not underprovision or overprovision resources because this leads to low performance or high operation costs.

Foundations of Software Engineering

Theoretically well-founded techniques for software construction, both in the large and in the small, are a necessity for the production of highly reliable and functionally correct software systems. Our research in this area concerns program correctness, semantics, and formal methods. We focus on two main techniques: invariant-based programming, a correct-by-construction imperative programming methodology, and stepwise feature introduction, a rigorous extension mechanism for layered software architectures. These techniques are based on lattice theory, and provide a sound theoretical foundation on which applied software engineering methods can be built.

Teaching Programming

New correctness-oriented programming paradigms (such as invariant-based programming) coupled with the powerful automatic reasoning tools, promise to increase the quality of software but at the same time demand stronger mathematical and logical reasoning skills from the practitioner compared to the traditional approaches. Our focus in this area is on increasing the role of formal specification and verification techniques in the skillset of the next generation of software engineers. We strive towards this goal by teaching hands-on methods for correct-by-construction program development as early as possible, and couple them with strong theorem prover support to automate the verification process as far as possible.

Intentionality

To achieve efficiency in design and implementation, the description languages and abstractions used for describing systems need to be more problem oriented. The description language is the user-interface to the problem domain, and must be able to describe all the knowledge about the system available to the designer. More importantly it must be possible to describe the design intent of the designer. Therefore new paradigms for programming (e.g. dataflow languages, PRAM models), and new paradigms for system specification (e.g. event-B, metamodelling based DSL’s) have been introduced. Such paradigms are being actively developed within the programme, and they form one of the basic building blocks for the efficient development of scalable, resilient and adaptable IT-systems.

Scalability

Scalability is the ability of a system to handle growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. The scalability requirement is challenging because for a system to be scalable on both of these axes, the system platform must provide efficient ways of adapting to the current workload, something that requires run-time monitoring, and learning from past behaviours, while at the same time be implemented in a way that when new capacity (hardware) is added the system automatically takes this into account. Solutions already exists for some of these issues in the domain of web-services, while for other areas, e.g. radio algorithms the problem is yet unsolved. In particular for radio algorithms, improvements in architecture and hardware capacity usually result in a redesign of the whole system. Indeed one of the big challenges in this area is Performance Portability the ability to provide solutions that retain their performance over several hardware generations.

Resilience

Resilience is a central issue for the dependable systems. It is on the one hand a design problem, where one needs to handle system complexity, to secure the safety-critical and fault tolerant functioning, and on the other hand a platform problem, where the system should provide a number of implementation primitives to handle faults.

The work in this theme will focus on modelling safety-critical and fault tolerant systems from various domains – from traditional control systems to self-adapting multi-agent applications. We work on interfacing formal models with safety analysis techniques, creating patterns and process guidelines for modelling various aspects of dependability as well as proof-based verification and model-checking of essential dependability properties. We are also actively involved into extending refinement-based development method with stochastic reasoning and integration with probabilistic model checking. Furthermore the work will involve research around specific implementation techniques that the platform can provide to guarantee certain resiliency properties. Such techniques include forms of task migration and run-time updating, virtualization, and hardware fault detection.

Programme leader

Juha Plosila and Ivan Porres

Participating TUCS Research Units

Steering group

Ralph-Johan Back Ville Leppänen Johan Lilius Luigia Petre Juha Plosila Ivan Porres Elena Troubitsyna Marina Waldén Jan Westerholm

Activities

Mailing list

Click here to subscribe to the RITES announcements mailing list.