We begin this module with a quick overview of the various steps involved in ETL processing that have been examined in earlier modules. We use this review to frame an example of applying this process to converting the mimic patient's table to the person table. The key task for ETL can be summarized in seven steps. Step one is reviewing the documentation for both the target and source data models. This includes looking at entity relationship diagrams, data dictionaries, and any other source of insights into how others have an interpreted the fields and values in the two data models. Step two is to profile the source data tables. Data profiling provides insights into the data fields that are actually populated in a sense of the data quality in terms of data completeness and plausibility. It makes no sense to spend time writing code to convert a data table or column that either has no data, or has data that is of little value due to poor data quality. Step three is to create the ETL mapping, which can be divided into two sub-steps. First, making the location of the data elements from the tables and columns in the source data model to the right tables and columns in the target data model, and second, mapping source values into valid target values. Step four is to write the actual ETL code. Some people like to work down the list of source tables, others like to work down the list of target tables. There's no right way to do this task as long as all data that needs to be transformed gets appropriate ETL code written. Step five involves actually executing the ETL code to create the transformed data tables in the target data model format and values. Step six is to perform data quality assessments that can be compared to the data quality findings obtained from the source data model during the data profiling task in step two. Finally, step seven involves creating the ETL documentation package, which includes at least a description of what was done, and what decisions or assumptions were made especially with the unusual situations or bad data, often called edge cases. These decisions often need to be revisited based on changes in the data model or a better understanding of the intended use cases. Having them documented allows future ETL developers and data users to understand why the data look as they do post ETL. This alternative view of the ETL processes include some of the steps that we did not include, like exporting the data out of the existing data source, and loading the data back into the target database. We use this diagram to illustrate the various ETL tools that we have used in the course, and they sit relative to the ETL processes. White Rabbit is used very early in the process for data validation. Rabbit and a Hat is used for data manipulation for source to target mappings. Terminology mappings also occur here. Finally, SQL scripts and the Achilles data quality tools appear in the later stages of the ETL processing cycle. Feel free to go back and review previous modules and videos on these tools. We list here some of the work products that are produced during the ETL process. In many cases, the same documents are upended or modified as the work is completed. For example, the pre-ETL specifications document is used as the starting point for the post ETL specification document with the final set of deviations conventions in any unresolved issues added to it. Code scripts and data quality results are crucial for future ETL developers and data users to understand the details of exactly what data elements in the source system were used to create data elements in the target system, and how source values were mapped to target values. Quite often, these scripts are revised in light of later data model changes, or changes into how data users want to use a common data model. A complete and comprehensive set of ETL documents enables the ETL transformation to evolve over time.