Welcome to Lesson two of Module three on Data Warehouse Design Practices and Methodologies. I'm going to start with an important design question that I want you to think about throughout this lesson. Why is historical integrity important only for selected columns in dimension tables? Lesson two builds on the basic concepts from lesson one. Lesson two presents table design or schema patterns that are typically used for data warehouse designs. You will learn about three schema patterns for dimension of fact tables. As well as history representation for dimension of fact tables. You have three learning objectives in this lesson. You should be able to explain table design patterns for relationships among dimension and fact tables and history representation of dimension and fact tables. As a more reflective goal, you should think about the importance of historical integrity for dimension tables. The traditional schema pattern for a data warehouse is known as a star schema, consisting of one fact table surrounded by dimension tables in one-to-many relationships. The star schema can represent one data cube. This ERD consists of four dimension entity types, Item, Customer, Store, and TimeDim, along with one transaction fact entity type called Sales. When converted to a table design, the Sales table has foreign keys to each dimension table, Item, Customer, Store, and TimeDim. In some designs the fact entity type depends on the related dimension entity types for its primary key. Since fact tables can have many relationships, it is generally preferred to have an artificial identifier rather than a large combined primary key. In this example, SalesNo is an artificial identifier for the SalesFact table. The previous star schema only represents a data cube for sales tracking. Additional star schemas may be required for data cubes involving shipping and inventory. For related business processes that share some of the dimension tables, a star schema can be extended into a constellation schema with multiple fact entity types, as shown in this ERD. When converted to a table design, the inventory entity type becomes a fact table and one of many relationships become foreign keys in the fact table. The inventory entity type contains a number of measures, including the quantity on hand of an item, the cost of an item and a quantity returned. All dimension tables are shared among both fact tables, except for the supplier and customer tables. The constellation pattern is important because it shows shared dimensions to represent multiple data cubes. A data warehouse for a medium size business may have many data cubes. A snowflake schema has multiple levels of dimension tables related to one or more fact tables. In this CRD the store dimension has been split into two entity types. Store and division along with a one to many relationship from division to store. You should consider the snowflake schema instead of the star schema for small dimension tables that are not fully normalized. The Store table in the previously shown star schema is not fully normalized because division id determines division name and division manager. Since the Store table is relatively small query performance will not suffer much with the need to join the division table in a snowflake design. For large dimension tables such as customer however query performance may suffer with extra joint operations required in a snowflake schema. Time representation is crucial for data warehouses, because most data warehouse queries use time in conditions. The principle usage of time is to record to occurrence of facts. The simplest representation is a time stamp data type for a column and a fact table. In place of a timestamp column, many data warehouses use a foreign key to a time dimension table, as shown in previous examples in this lesson. Using a time dimension table supports convenient representation of organization's specific calendar features, such as holidays, fiscal years and week numbers that are not represented in time stamp data types. The granularity of the time dimension table is usually in days. If time of day is also required for a fact table, it can be added as a column in the fact table to augment the foreign key to the time table. A variation identified by Kimball in 2003, is the accumulating fact table. It records the status of multiple events, rather than one event. For example, a fact table containing a snapshot of order processing, would include order date, shipment date, delivery date, payment date, and so on. Each event occurrence column can be represented by a foreign key to the timetable, along with a time of day column if needed. For dimension tables, time representation involves a level of historical integrity. An issue for updates to dimension tables. When a dimension table is updated, related fact table rows are no longer historically accurate. For example, if the city column of customer row changes, the related sales rows are no longer historically accurate. Time representation's only important for selected columns that change quickly, such as credit score for customer. History is not typically necessary for most columns that are relatively stable. Query results usually involve row summaries rather than individual rows, so some inaccuracy can be tolerated. Ralph Kimball, an early data warehouse consultant and author, proposed three alternatives for historical integrity, with two alternatives providing history representation. Type one has no historical integrity, as all values are rewritten. In a type one example the customer table has no changes for history representation. Type two uses version numbers for unlimited history. In the type two example the customer's table contains multiple rows for the same customer but the entire history is represented. The primary key of the customer table is a combination of cust id and version number Type III uses additional columns for a limited history representation. In a Type III example, the customer table contains just a single row for each customer, but only a limited history can be represented. Three states of CustCity are preserved, the current state and two past states. However, to find the correct city value for fact table row, a more complex query is required to match the time associated with the fact row, to the effective date columns in the customer table. Lesson two extended your background about data based design concepts in Lesson one. Lesson two covered schema patterns for dimension of fact tables. As well as time representation. You learned about three schema patterns. The star schema, constellation schema, and the snowflake schema. And time representation for fact and dimension tables. Other lessons in module three will extend beyond your background for ability problems and practice problems. In answer to the opening question, historical integrity is important, only for selective columns and dimension tables due to the nature of data warehouse queries. Most queries contain gross summaries in the result, so some level of inaccuracy can be tolerated. Even for fast changing columns, such as credit rating, the inaccuracy may be tolerable. The impact on data warehouse query results depends on the magnitude of changes and the frequency updating this column in the customer dimension table. Even though credit rating changes daily, the updates may only be propagated to a data warehouse monthly. In addition, typical change magnitudes may be small.