OK so, at the end of this last section, I canceled our party, and said that we have this algorithm that works really well in practice on theoretical spectra. But the issue is going to be that when we move to experimental spectra, we're going to have issues because these experimental spectra often produce errors. Here are the theoretical and an experimental spectrum for the peptide NQEL. The theoretical spectrum is on the top, and then we're giving you a hypothetical experimental spectrum for the same peptide. This experimental spectrum highlights a couple of different ways that the experimental spectrum can be flawed. It can detect “false” masses. These are masses that are present in the experimental spectrum but that don't correspond to real masses in the theoretical spectrum. Examples are 99 and 299 here. We also have “missing” masses. They're masses that are in the theoretical spectrum of the correct peptide but that don't actually get produced in the experimental spectrum, so they're “missing”. This is a problem, because our current implementation, in order for it to consider a peptide correct, the peptide's theoretical spectrum has to exactly match the experimental spectrum that we have, and that's clearly not the case here. The question then is what do we do. What we're going to do is, instead of having to match the spectrum exactly, we're going to instead “score” a peptide based off of how similar it is to the experimental spectrum. For example, here, the theoretical spectrum of NQEL is still pretty similar to the experimental spectrum that we have. Notice that it shares 11 masses. So, we're going to say that the score then, of NQEL, with this experimental spectrum, is equal to 11. We need to use this scoring function to say certain peptides are better than others and to use this to modify the branch and bound algorithm that we have in order to make it a little bit better. The idea that I want to use here is similar to something called a “cut” in a golf tournament. Now I'm a huge golf fan, so maybe I'm just partial to this idea, but I thought it fit it pretty well. What the cut does, is it reduces the field to only the players that are deemed to be in contention for the tournament. So, if there are players, that are not doing too well after a couple of rounds, then, they say, OK, well, we're just going to let them go home. And, we want to have the tournament only for the players who have a chance of actually winning the tournament. Here's a hypothetical leaderboard, and we may say, in this hypothetical small tournament that we want to keep the top three players on the leaderboard. Keeping in mind that low golf scores are good, we put the cut line here. But this would be unfair to this golfer, right, because they're tied with the third place golfer, and so we need to say, well we keep the top three players… …as well as ties. So, keep the top three players “with ties”. This is going to motivate the bound step that we do. So, after this cut, we have four players remaining. The new idea is, we're going to call it LeaderboardCyclopeptideSequencing. How it works is that we're going to start off with a zero peptide. So, this is an empty string, essentially. And then we're going to extend each peptide on the leaderboard in each of 18 possible different directions, so we add each of the 18 different amino acid masses to the peptides that we have, currently, in the leaderboard, which when we begin is empty, or it has just the zero peptide. Then we're going to have this “cut” step, which is the new “bound step”, and it's going to cut the low scoring peptide, so it says: Assign everything a score, see what the score of every peptide is, and then cut the low scoring ones. Get them out of there so that we can prevent the number of solutions that we consider from growing, and so that we can keep only the high scoring peptides. So, we're going to keep the top N peptides “with ties”. Then, the next step is going to say OK, we'll update the “leader peptide”. We're going to keep track of what the leader peptide should be, and, if there's a higher scoring peptide that we can find on the leaderboard, so if something moves up and becomes a higher scoring peptide, and its mass is equal to that parent mass, that we can detect. By the “parent mass” I mean We said that we're always going to be able to know in advance what the correct mass of the entire peptide should be. So, we always want to check if these candidate solutions that we have have a mass that's equal to that total mass. And then, we're going to eliminate any peptides whose mass is going to grow too big. So, if their mass exceeds the parent mass, then, we're not going to want to expand them any more, because their mass is too big, and we want to get rid of them, too. So, once we do these steps, we're just going to iterate these steps over and over again. So, we have a branch step, and then, a bound step. And then, we have an “update step”, where we say, is there a leader? And can we get rid of anything whose mass has gotten too big? At the end, after we go through this procedure, we're going to branch and bound and branch and bound, it's actually going to guarantee to always have an empty leaderboard at the end. So, at step 6, after we go through steps 2 through 5, we're going to have an empty leaderboard. And then, at that time we return, what is our leader? What's the best peptide that we've encountered so far? So, we have this updated method. But, I do want to give you the warning, that it's what's called a “heuristic”. Because it's dealing with high scoring peptides, there's a chance that there is an initially low-scoring peptide that winds up eventually being the highest scoring peptide, so that eventually would be our leader, but because we cut it in an early stage, because, we said it's not in the top however many N peptides, then we throw it out. And so there is a possibility that we throw out the correct peptide, or what will eventually become the correct peptide, at the expense of having a quick method. And so for this reason it's a heuristic. Let's test it though, because remember, we're considering how well these algorithms work in practice. So, here is a hypothetical spectrum. It has 10% false and missing masses. I'll highlight where the false and missing masses are. So, here are the false masses, and here are the missing masses. Now, we're not going to know which of these are false and missing to begin with. In particular, we're not going to have the missing masses present. That's the whole point. They're not present in the experimental spectrum. So, we'll have this picture, but we won't know which of the masses should be green. And then, we say, let's run our algorithm, let's run our leaderboard algorithm, on this hypothetical data set, and see what it churns out. And so it returns the, the correct Tyrocidine B1 peptide, which is where this came from, so we're happy. Then we say, well, maybe 10% false and missing masses, maybe that was a little bit optimistic. So, let's introduce a slightly noisier Tyrocidine B1 spectrum that has 25% false and missing masses. And again, I'll show you (even though we won't know this) I'll show you where the false and missing masses are, so you can believe me. And, we can say, we won't know which masses are missing. Now, when we run that same algorithm, on this noisier spectrum. We're going to see that the peptide that it produces is actually not Tyrocidine B1. It's close, but instead of the correct amino acid here, It's been replaced with an AD, so this is bad. In this case, we wouldn't be able to reconstruct the correct peptide.