[MUSIC] Hi everyone, my name is Pimlapas Leekitcharoenphon, I'm a researcher from the Research Group for Genomic Epidemiology at National Food Institute, Technical University of Denmark. Today, I'm going to talk about MLST typing, MLST tool description and application for many species. Multilocus sequence typing or MLST is considered to be a gold standard of typing. And traditionally, it's performed with an expensive and time-consuming manner. For example, you have to PCR those seven housekeeping gene and second sequencing and then you have to analyze what the FT type of your bacteria species. But right now, the cost of whole genome sequencing continue to decline and it's become increasingly available to scientists and routine diagnostic laboratory. And currently, the cost of sequencing whole genome sequencing is actually below the cause for traditional MLST, and now you might wonder why we had to do MLST, why it's become the gold standard of typing. I give you one example of what are my paper on genomic of emerging clone of Salmonella Typhimurium ST313 from Nigeria and Congo. The story is typhimurium, salmonella typhimurium is everywhere in the world and they have many clone, many substrains. But there's one particular strain called ST313, the sequence ST313 is actually make people with more more deaths, more invasive than the other typhimurium. Then, as a public health or anyone that are doing with surveillance, if you found a typhimurium with ST313, then you need to act very quickly. Because it more invasive than the other normal typhimurium that you might usually found in your country or in your cities, for example. This is why we need to know about the ST type, okay, let's go back to the MLST tool that we're going to talk about today using whole genome sequencing. So at our group, we developed a web-based method for MLST-based typing, based on whole genome sequencing data. And, of course, the MLST tool has advanced database that we actually download it monthly from the PubMLST. You can actually go to PubMLST and get the database of MLST by yourself. So how did that database look like? It's composed of two things, first, it's composed of the ST profile, for example, it contains seven housekeeping genes and each variance number. This combination of variance number, you can get the ST type, so this is the ST profile. Another part is you need to have the sequences of all the variation of each seven genes, so this is another component of the database. So let me give you more example how you actually get the ST type of this. Every species, it has it own scheme, this is an example of one specie, let's say so. So normally, it has seven genes and in each of the genes, you see they have different number. For example, this one, they are all e, they have vary number 1, vary number 4. They might be differ by one or two nucleotide differences, but anyway, it a vary agent for each of the loci of the housekeeping genes, they have it own number. And when you search your genome and you have this profile, the variation of each of the loci, the combination, unique combination of these seven numbers, you can identify the FT number. This is what is already in the database or PubMLST and in the database of our tool here. And the way that we search to identify the number, here we use BLAST so the tool only look at the best-matching MLST allele and give the number over here. The tool has been tested with multiple dataset, the first one is assembled data genome of 336 isolates covering 56 MLST Scheme. The second dataset is from Rohde's data of 387 isolates covering ten scheme. And we have a small test set with Rohde's data of 29 isolate, which we actually did test it in the lab for identifying the ST types. So we have the lab result to confirm the whole genome sequencing results and the results show that MLST can actually determine the sequence types of all Isolates that we have in the testing set here. The idea of MLST tool is as a scientist or researcher, when you have whole genome sequencing data to submit to the MLST tool in CTE and it's going to determine the ST type for you. The database as I told you contain the different allele sequences from the publicly available database like MLST and as a user, you submit your unknown sequences either as symbol genomes. If you submit as simple genomes, the tool will use BLAST for the alignment. It is submit raw reads, the tool will not do any assembly, but the tool will map your raw reads directly to the sequences using KMA or K-mer alignment tool. And what you get you will get the sequence type of your submitted genome. This is a link to the MLST tool and it's the front page of the MLST website. So first thing first, when you see this tool the first thing that we have to choose, you have to choose the MLST configuration which is the species of your strain. That mean first of all before using MLST, you have to know species of your unknown strain. So you can see there are plenty of a list of the species is only available for those species that have the MLST scheme. So for those species which are very rare that doesn't have a scheme, but most of them is already here. Some of them you can see they have scheme number one, scheme number two. There is some example, Escherichia coli E.coli have screen number one and and number two. The differences between one and two might be the one has seven housekeeping gene, the number two might have eight housekeeping gene. And of course it gives different ST types, okay? But most of the people they're likely to use the scheme seven housekeeping gene. So next what you have to choose next is what is the sequence type of your input data? So for example, if you choose a symbol, genome or can take data, then your input need to be in the format called Fasta format. This example of Fasta format tool I first ID and the sequences. If your data is what we call raw read or raw sequencing data, it needs to be in the format of FastQ format like this and then when you choose everything here, now you can upload your data. You click here to upload your data and again, this tool is only accept one submission per one strain. You cannot submit multiple strain per one submission. Okay, so once you have the strain here, you click upload the data. Once the data is already upload completely, you get these page. It's like every tool that we have. If you want to wait for the result, just stay here but you can have an option to put your email here and click notify me via email. When the tool is done, it will send the output link into your email, let's have a look at the output of MLST. This is an example. Of course it tell you what is MLST scheme that you shows. What is the organism and then here, here this is the output that you need from from the MLST tool. What is the sequence type of your input genome and you see more detail of your result. You see those seven housekeeping gene, similar loci and then you see what is allele's number. This combination of this allele's number determine you sequence type and moreover, you see how good your data alignment of your data. For example the coverage, what is the coverage mean? The coverage is the length of the alignment between the best matching alleles and the corresponding sequence in a genome. If you have 100%, what does it mean? You have the gene, right? Let's say this is the gene and this is your sequence, if your sequence align completely from the first position to the last position, this is mean 100% coverage. So you can see from here alignment link, if alignment link equal to the size of alleles, that mean is aligned from the first position of the gene to the last position. And if every match is identical, no mismatch, you would get 100%. So if you get 100% identity, 100 in coverage, that means perfect hit. We can trust it's completely the result. And if you want to see more, we have more options over here. For example you can get all of this output in the text file in the TSV file and you can get what were hit. So let's say the hit of this gene, what part of your input genome that's similar to this gene or all the gene here. You can get from it in Dinome sequences and it's going to look like diseases, the part of your genome that similar to whatever genes that you identify over here. And of course you also can get the St type sequences of these seven [INAUDIBLE] gene if you click over here and you can click attended output to see an alignment in detail, you want to see it. For example, you might want to see if there's any mismatches in the alignment you can see from this option. If you want to read more about these two, you can go to this paper and use these two. You can sign this paper as well and. You want to know more about databases of this MST here in the front page you can see the link to software and link to database version. And by the way, most of the CD tools that we have, we have the web based version and we also have the standalone version that you can actually download and install in your own server. Or your own computer, but you have to have a little bit of unique skills or both mattix field l to be able to install and run the program that, but this is an example if you want to download installed tool by yourself you can go to this website. We have a bit bucket. Our Genomic analogy here. That's all for the NXT tool and thank you very much. [MUSIC]