In this section, we will discuss how we can compose integrated testing strategies. What helps us to combine in an intelligent way the different components of our test strategy. And there's two main principles I would like to demonstrate here. One being based on pathophysiology, being based on the key events which happen in order to come to a certain adverse outcome. The mechanisms of toxicity, and here the buzzword of adverse outcome pathways, or AOPs, will be one of the concepts we have to introduce. And I will give you an example from one of the projects I have been involved with, which is called ReProTect. A project which unfortunately was not really developing an integrated testing strategies but a better re-approach. But I still think it is a very nice example of how adverse outcome, our understanding of physiology, can help us to develop a battery of tests. The second concept is different. This concept is not based on our understanding how that hazard manifests. It's not based on physiology. It is based on what has in the past been findings which led to a classification. It is prevalence-driven, so what are the things we saw in our rats, we saw in our rabbits, which lead to classifications? And it is an alert-driven type of system. So which alerts were indicative of some type of problem? Because we are measuring often 40, 60, 80 different things in an animal treated with chemicals. 40 endpoints, and I repeat those tests like a 28-day study, about 60 different endpoints in a cancer bio assay. And about 80 endpoints in a two generation reproductive toxicity test. And this idea has been laid out in this paper for those interested in more details. The development of new concepts for assessing reproductive toxicity applicable to large scale toxicological programs which we published in 2007. And this is one of the key figures from this paper. It showed our analysis of reproductive toxicity tests. Here, 41 chemicals have been tested in two-generation studies. And these two-generation studies are very extensive assays. They do cost almost half a million per piece. And a lot of things are measured. But we asked very simple, what is the reason for coming to the conclusion that a substance is a reproductive toxicant? And you can see that two things were most important, and this is testis weight and spermatogenesis. They outperformed all of the other endpoints. So obviously, test of toxicity is something which is very often leading to a classification of a substance as a reproductive toxicant. So any test strategy would be very well advised to include a strong component on testis toxicity. Because it would give us already for one of the most important and most frequent health effects within reproductive toxicity, an idea what to use. And this can guide us just based on experience without any understanding why the chemical is hitting this component. The ReProTect project was doing a little bit the mission impossible. It was looking for how can we, with a battery of in-vitro tests, predict whether a substance is a reproductive toxicant. Nobody had developed when we were designing this project in late 2002, early 2003. It proposed that there is a test which is not relying on animals, which could call a substance as a reproductive toxicant or not. So we asked, what is the physiology of reproduction? It's the reproductive cycle. A cycle which starts with the production of gametes and their release. The fertilization of x, the implantation of the egg in the uterus. The early prenatal development, the late prenatal development, and then all of the post-natal development. And in the end, the offspring which can enter in the next reproductive cycle. While there was no tests proposed to cover this all, we found a lot of tests which were relevant for these elements. We found these in areas which were not necessarily in toxicology and safety assessments. We found things which were used for breeding of animals, where the quality of sperm was being analyzed. We found elements in the search for contraceptives with pharmaceuticals. We found elements in basic research which studied the implementation process of an embryo into the uterus. Or we found assays which are actually stem cell based, which are reflecting the early stages of embryonic development. And there are also assays which are more generally affecting the reproductive cycle. Because they are relevant for the endocrine system. Which is delicately regulating the entire reproductive process. Which has a strong impact on fertility on the probability of the correct development of an embryo. This project was running from 2004 to 2009. And received about $12 million funding from the European Commission. And at the very end of this project, we had a group of experts test blindly then, not knowing the identity of these substances, with a battery which was the result of this research program. These assays were grouped. Some of them were addressing female fertility. Some of them were addressing male fertility. And some were addressing developmental toxicities. So you can say teratogenesis, the malformations of embryos. And these 10 substances, as you can see, were quite well predicted by these different assay systems. Substance 10 gave some unclear results, it was partially predicted. But altogether, we were pretty happy with these results, where positive and negative identifications took place. So it was a very, very encouraging result. That with a small set of cellular assays, it was possible to get quite a few of these substances working on very different principles of perturbance of the reproductive cycle that could be identified. And this has been, in the meantime, the basis for several follow-up projects of similar size which are trying to optimize this battery further, and developing into integrated testing strategies. And this shall show you that it is possible to actually combine a number of, in this case, in-vitro tests, but also non-test methods like computer programs, in order to cover physiological processes. In the years since 2009, when I moved to Johns Hopkins, we have continued to work on integrated testing strategies. And again, I would like to invite you to look into this article. We have been working ourself on integrated testing strategies on sensitization of the skin. So this is contact dermatitis, allergic skin reactions, which are very important health effects of chemicals. And I will come to these in a second. But we also have been working on eye irritation, which I will not cover today. We have commissioned a very important white paper that integrates testing strategies, which I'll show you on the next slide. And we have done the workshop of 2013, which I already showed. Earlier, as in order to create consensus between experts about the role of integrated testing strategies, their composition and validation. White paper, which we commissioned by Joanna Jaworska from Procter & Gamble and Sebastian Hoffman from our team, on integrated testing strategies, opportunities to better use existing data and a guide future testing in toxicology, was published in 2010. This paper was, from my point of view, a tipping point in the development of integrated testing strategies. It resulted in recommendations on how to use interim decision points. And introduced ideas of probabilistic methods, Bayesian type of mathematical approaches, and was also stressing how computer modeling can optimize testing strategies. And how machine learning approaches help us to come to the overall result of these type of things. I will come back to this in a second. A very important element of this is that toxicology as a whole is at the moment moving to a different level of resolution. And this is shown in this iconographic picture, where the entire organism is dissolving into the network of molecular interactions, which the body is based on. So from the phenomenologic observation of what we call apical endpoints, what is happening in animals, death, changes in size and quality of organs, we're moving to a mode action, we're moving to toxicity pathways, the adverse outcome pathways. And we're becoming more and more molecular, and we understand more and more of these networks. But to be clear, the current state of development is we're starting to understand pathways, we're starting to embrace them. It's the adverse outcome pathways where we are. It's still a highly academic and research area to go the next few steps of molecular pathways and networks. And we will discuss some of these approaches in this lecture series. A very important role is played by the OECD, the Organization for Economic Co-operation and Development, which is the international platform for the harmonization of chemical safety testing. They came up in 2012 with a proposal of adverse outcome pathways, and this proposal is shown here. It is the idea that we can break down, in many instances, how a toxicant leads to its bad, hazardous effects. A toxicant is first of all characterized by its chemical properties. And these chemical properties allow for an interaction with some macro-molecular structures, the targets. These can be receptors, which is ligand, could be DNA binding or protein oxidation, it is the interaction of the chemical with the biology. This physical interaction, physical chemical interaction, is leading to cellular responses. The cell reacts to the insult. It reacts to the injury by gene activation, changing the pattern of proteins produced, alterations and cell signalling. So if the cell is not dying instantly, it shows some type of response in order to adapt, to defend, in order to react to what this chemical interaction is doing to it. And the cells are part of the organ, of the tissue, it is impacting on organ physiology. It can disrupt the homeostasis of the tissue and function or the development of tissue can be affected. These are the organ manifestations of the stress, the toxicant, of the hazard, which is manifested. And if the organs are changing functionality, there's organism responses. If overwhelmed, this will be lethality. It could be impaired development, in case of reproductive toxicants for example, or it could be impaired reproduction, to give an example of even very complex and social activities, which can be impacted by a substance. And then there's not just an organism. Especially in ecotoxicology, we are concerned also about the population. So the structure of an ecosystem, the structure of the population, how many adults compared to juveniles, the extinction of species from an ecosystem, these all are effects which we can measure on population level. And the idea of the adverse outcome pathways that this is an interconnected flow of events, which is more or less escalating. It is starting somewhere with the initiating event, the molecular initiating event, and is then driven by a series of key events, which if not stopped, lead ultimately to these apical endpoints. The outcome pathway concept has been actually combined with the idea of integrated testing strategies. However, OECD did introduce a new name for integrated testing strategies, which we have to introduce here. They speak about IATAs, the integrated approaches to testing and assessment. An IATA is a little bit more than an integrated testing strategy, but its core part is the ITS, the integrated testing strategy. An IATA, the box to the right, is comprising integrated testing strategies, but it is also having elements of the kinetics of a substance. So is this substance actually reaching its targets? It has elements of risk assessment, that's the A in IATA, the assessment. And it has some elements of exposure considerations. What do we know about the substance's actual interaction with organisms? How much of it do we have in air, in food and in water, what are we exposed to? So the IATA is answering an information need. So we want to know whether a certain substance is of risk for a health endpoint, such as, let's say, carcinogenicity. IATA would then describe what type of information we need to satisfy the information need. What are the tests, how to combine them to identify hazard? What are the measurements to do on the exposure, or what is the knowledge on exposure we need to see whether this is relevant? And how are these two combined with kinetics? The basic idea of combining this concept of IATA now with the adverse outcome pathway concept is that our understanding of toxicology, of both the substances and their biology, the way the hazard is manifesting, can actually inform how to build integrated testing strategies and IATAs. A few thoughts, a few considerations from the paper, which I just showed about marrying the averse outcome pathways and the IATA concept. First of all, we should be clear, IATAs have and can be designed without any knowledge about the adverse outcome pathway. We don't need mechanistic knowledge, but the belief that adverse outcome pathways improve IATAs, because they're then based on mechanism, which makes them scientifically sound and explain why these different components work so well together. So AOPs can guide the design, or the improvement of AOPs can guide the redesign of IATA. So if we learn more about how a health effect, let's say skin sensitization, comes about, we can improve how we test for it. As shown in the last picture, ideally, an IATA converts the, That was outcome partially organized knowledge into answers to regulatory information needs. So we want to know is this substance a skin sensitizer? We have an understanding how skin sensitization comes about, and the IATA is telling us how to use this knowledge in order to come to decisions. It is Important if you want to make this connection that we say, what are the different building blocks and how do they correspond to the elements of the AOP, namely the molecule initiating event and the key events. So to have these two systems talk to each other, and depending on our understanding, we need to classify for given hazard them as necessary, as sufficient, as alternative key events, for example. Or as something which is modulating which is positively or negatively impacting on the overall express outcome. And the IATA outcome informs us also retrospectively about the quality of our AOP. Even IATA which reflects an outward outcome pathway, that does not lead to good results. We might need to revise our AOP. We might not be actually covering biology properly. And for this reason, the coverage of the adverse outcome pathway of the elements which are critical for the manifestation of toxic property represents a quality, a validity criterion for an IATA. IATAs, the Integrated Approaches to Testing and Assessment, is made from building blocks, like the testing strategy and these building blocks include the tests, the in chemico, the in vitro the hythrupo screening type of approaches, but also possibly enema tests, and the non-test information of computer modeling, existing information, all of this. Like an integrated testing strategy, it uses algorithms. Which can combine improbablistic Bayesian ways, or in a Boolean way of and, and ors, the different elements and the test results to convert all of the pieces. The test evidence into an answer for the information need. There's also the use of computational tools such as modeling, machine learning sensitivity analysis, which can help us to optimize ITS or IATAs. And this only in its infancy but we are very much convinced that it is possible to use this to come faster to better our test strategies. Ideally, IATAs are flexible to exchange building blocks because a test might not be applicable to a given substance, was not available in a laboratory which wants to run an IATA for many kind of reasons. So flexibility and exchange of building blocks is something we should care for and we recommend a result guided sequence where some building blocks are omitted if not necessary for coming to an overall result in order to achieve maximum efficacy. IATA has offered the opportunity to move to a probabilistic type of risk assessment. Sold by informing the combining these things, for example, in probabilistic ways which every element is not changing flex white but is change the probability of hazard to manifest. We come to view on things which is not as black and white. This is a carcinogen, this is non-carcinogen but something which says, in this scenario the probability of the substance being a carcinogen is 80%. And we might decide that this is too much for putting into baby foot, but we might find it acceptable to put into household cleaner. And this would open up for combining it with other areas, like exposure, kinetic modeling, and hazard assessment, which often is already done in probabilistic ways. Where we have distributions of the probability for exposure or of reaching certain concentrations are achieved. And these different types of probabilistic information could quite easily be combined. The confidence into an adverse outcome pass fail, so the quality of our understanding of the biology, the pathophysiology and its currently testing strategy determines very much the extent of how much we need to test. Because if were very confident that we have the key elements, it does mean we don't need a lot of additional testing to confirm and satisfy our information needs. If our understanding is poor and the coverage of what we know in the testing search is poor we will need typically a lot of different information sources to satisfy an information need. And last, there's also an element on what is a good IATA? And here, we can optimize for different elements. We can ask how much is it reducing animal use? You can also ask how cost-efficient is it or how time-efficient? How fast do I get the results? In order to come to my conclusion. Is it something which is suitable for immediate responses? But we can also ask what is the productivity of the assay which is ultimately most important, is it predicting the reactions of animals or humans? Is it predicting the right substance as hazardous of which we know that they have such properties. And also the coverage of biology, the coverage of the adverse outcome is a value which we should bring into the assessment of the quality of an IATA. This next slide shows you some concepts here, which show you that for a given substance, we can have very different situations of precision of hazard and exposure estimates. So which means if the exposure shown in green is overlapping the hazard manifestation covered in the adverse outcome we obviously have a high priority for doing assessments and doing testing on a given substance. If they are distinct and both of these estimates are very uncertain with very broad distributions, we have low and high exposures. We don't really know how much they can possibly overlap. But also the biology is not very clear. We don't know at which concentration some effects have to be expected. This will give us an intermediate prioritization for testing. But if we have situations and this is actually be the case where there's a very strong distinction between the potential exposure and the potential hazard concentrations we actually need. Then we have a low prioritization for any type of confirmation and further testing on biological tolerance. And this element shows that exposure and hazard are two concepts which talk to each other. And the better and the more precise either of these determination is the more likely we have a situation where we need little additional information. Like shown in the AOP 3 case that we can say beyond safe ground, we don't need to worry about exposures, to possibly meeting the concentrations which are evident for developing a healthy. In the paper we were discussing from this workshop, it was outcome pathways for the endocrine system. Endocrine is as you well know, is a big concern. So chemicals could interact with the estrogen system. The androgen system, or the thyroid system, as the most prominent endochrinic health effects. What we noted applying some of this criteria to the endocrine disruptor. Concept is, that for both esters and estrogens, we have a very good understanding from the outcome pass phase. If even quantitative descriptions and Can modeled them quantitatively and we also have in vitro high-throughput screening test systems which are used. And actually, as you have heard from David Dicks in one of the previous lectures. The endocrine disruptive screening program in the US has most recently accepted in vitro and high-throughput screening assays to replace animal testing. Because we have such a good understanding and predictive models for the estrogenic branch. And we are very optimistic that in the near future also the androgenic disturbance from chemicals can be handled by in vitro, high throughput screening assays, and their respective models. This is quite different for the tiered system. The tiered is quantitatively described, but it lacks modeling opportunities. And we don't have the full coverage in in vitro high throughput screening assays. We only have it for a few of the molecular initiating events, but we don't know how to cover many others. And there's strong efforts to complement the testing battery of the Tox21 testing in order to allow also, in the not too far future, to replace testing for tiered defects in animals by non-animal methods. The important point is that of estrogenic and endocrine disruption already now. The completeness of the adverse outcome pass fail, our test battery, our IATA allows decision taking based alone on molecule initiating events and key events. For example, by proper QSARs. But different degrees of completeness and confidence in adverse outcome pass fails impacts as shown as this example on the IATA and its usefulness. This is taken from a review paper by Kalberg et al, you will find in most textbooks something similar. This is how our current understanding of how contact dermatitis comes about. Substance is typically entering via the skin, this could be cosmetic product. It is a substance which has allergic properties. But the first thing we would need to observe is that this substance can actually penetrate the skin. So this is the first step. And then, the molecule initiating event is actually haptenation. So what is haptenation? Haptenation refers to the small molecule, the chemical binding to proteins. Because most chemicals are too small for our immune system to detect them. The immune system only detects them after they bound to large molecules such as proteins, and this process is called haptenation. So only chemicals which have sufficient chemical reactivity or can be activated by metabolism to be reactive to bind to proteins, can actually initiate contact dermatitis and allergic skin reaction. The recognition by the immune cells then leads to epidermal inflammation and the activation of Langerhans cells, the dendritic cells of the skin. These migrate into the lymph node, there they are interacting with dendritic cells or where these dendritic cells are interacting with T-cells and stimulate the proliferation of T-cells. By doing so, they increase the number of T-cells which are specific for this antigen for this allergic compound. And the re-exposure to the same chemical will lead to a much stronger inflammatory signal and to T-cell mediated inflammation. So the type of sentinel cell, the Langerhans cell, the dendritic cell is first exposed moving into the lymph node, leading to an expansion of specific T cells which are responsive to this. These T cells move into the skin and are prepared for a strong inflammatory reaction if the substance comes again and again, increasing risk. This slide slows the adverse outcome pathway for skin sensitization, which was the foster child for adverse outcome pathway development. What you can see is the sequence from chemical structure and properties to molecular initiating events. To the key events of key events of cellular responses, organ responses and finally, organism responses. You see just the same elements we discussed on the textbook slide on contact dermatitis, now grouped as elements of this phosphate. You can also see that the terminology for toxicity pathway, the mode of action and the adverse outcome pathways is attributed to it. The toxicity pathway referring to what happens from a chemical to cell, the mode of action describing how an organ is impacted by the chemical. And the adverse outcome pathway going further to the whole organism responses the apical endpoints, including the population responses not shown here. So you see here the different elements of the adverse outcome pathway applied to the pathophysiology of contact dermatitis. The development of such an integrated test strategy framework for guiding our testing was historically taking place in several elements, by combining things first in a type of battery of tests. And then starting to put them into a testing strategy for skin sensitization, which was actually driven by the adverse outcome pathway. The first conceptual approach published by Jowsey in 2006 was simply saying let's put a couple of elements together which our relevant for skin sensitization. If we measure all of these and we look where a chemical is disturbing here. The second generation was then going a step further into the end of the last decade. A couple of assays were combined in a type of algorithm where majority vote was taken, for example, of a statistical regression-based analysis led to making a call on the basis on different type of tests. But the full beauty of all of it, came when we moved to the adverse outcome pathways. And used these different elements then in a patient network where the quantitative assessment of the various components and their contribution in informing us were actually combined. And here, the hallmark paper of Joanna Jaworska and colleagues has to be mentioned. This paper, from 2010, which describes the integration of non-animal test information into an adaptive testing strategy using skin sensation as proof of concept, as shown here, was published in 2010 in ALTEX, the Journal for Alternatives to Animal Experimentation. It defined an AOP. It started off with about 20 different cellular tests and non-test information, so computer programs. When it used as point of comparison both human and local lymph node assay data, so animal data, test data for skin sensitization. And it was the first example of a Bayesian network informing and taking decisions, a hallmark on the way. This work led then, in 2013, to the next step. While the first model was only saying this substance is a sensitizer, this could produce allergic reactions in humans, the next step was potency testing. So again, Joanna Jaworska and colleagues developed this approach further. And showed for the first time that it is possible from a combination of similar assays to actually predict how strong a sensitizer is. Whether it's a weak substance or strong substance, whether it will need to be very restrictive or less restrictive in the use of these substances. My own group has built on this, and here is a publication from 2015 where we optimized this approach further Using probabilistic hazard assessment for skin sensitization based on the models developed by Joanna Jaworska. And moving them into the next level of quantitative structure-activity relationships. These are some of the key results of this work, which is meant to illustrate how these methodologies are further progressing. How at this very moment, it is not only the building blocks, the in vitro cell systems which are getting better and better, but also our way of combining them. What you can see on the left side is a chemical similarity map. For the expert, this is a Tanimoto index, which means the Tanimoto index describes how much of functional groups two molecules share. So the closer these molecules are, the more similar they are. And we can see that from our substance set, there is quite a few which are chemically similar. But there's also on the lower end of this panel a lot of substance where there is hardly any similar chemistry involved. But you can see also very nicely that structure is very little informing about whether the substance is a skin sensitizer or not. Green being not, red being a strong sensitizer, orange a mild sensitizer. So structural information alone does not help us. We get some ideas, there's some clusters of green. There's some clusters of red. But it is not helping us alone to make a call whether a substance is a skin sensitizer. We demonstrated in our paper that the L feature elimination or statistical approach to focus on the most important properties was as good or better than former QSAR. We optimized our approach to make the decision on whether a chemical is a sensitizer or not by using supervised machine learning approach as very sophisticated computer programs. So that in the end, we were able to predict skin sensitization potencies similar to what had been doing. But we optimized this by using tools like hidden Markov chains. I won't go into any type of detail. Improving or reducing our extreme misclassifications. And in the end, we're about 60 to 70% accurate. For more than 90% of the substances, we were only one class away. So which means that a severe skin sensitizer would be classified maximal as a strong sensitizer, that a non-sensitizer would only be classified as a mild sensitizer. And most importantly, we carried out for the very first time a cross-validation. And this cross-validation showed that this is actually robust approach that our type of combined different assays is giving us these type of quality of predictions. And not just by chance for one given data set. The work is continuing. And I would like to introduce most recent publication originating from the US Interagency Coordinating Committee for the Validation of Alternative Methods. And the National Toxicology Program which is serving it. They've summarized here very nicely their work on integrated decision strategies for skin sensitization. I don't go into any details of their abstract and their results here. The key results are summarized at the lower end. They achieved 79% accuracy or, in other words, a test battery of calling a substance correctly a skin sensitizer if more than two of the SS were positive. The accuracy that increased to 83%. And then they improved this even further with artificial neuronal networks, support vector machines, to achieve in the end around 90 or more than 90% accuracy in their predictions. And interestingly, this is as good as a pure reproduction of the animal test. So we can't get any better than this. Which means that the combination of good in vitro tests was even better analysis and algorithms to make the decision on our predictions. We can achieve qualities the animal test just offers. And this is the reason why skin sensitization and their testing strategies are at this very moment under consideration by the European chemical agencies and other authorities as a replacement for the respective animal tests. And we are waiting for this good news to come in the near future.