Imagine yourself three years from now. By some estimates, 90 percent of all the data we will have in the world then doesn't even exist yet. We're talking big data and it's getting bigger at an accelerating pace. In this big data world, there's a growing need for a skilled data analysts who can gain insights from all this data and use it to answer vital questions. In this Coursera specialization, modern big data analysis with SQL, we'll give you an opportunity to learn and practice those skills. I'm Ian Cook I'm Glynn Durham. And in this specialization, we'll teach you the essential skills for working with large-scale data using SQL. Isn't it pronounced S-Q-L? “Sequel,” S-Q-L: either way is fine. That's true. Maybe you're already familiar with SQL. You might have used it to query smaller scale data stored in relational database systems like MySQL, PostgreSQL, Oracle, or SQL Server, or with data warehouses from vendors like Teradata, IBM, and Microsoft. SQL is a standard language for accessing data and it's used with all these tools. Or maybe you're new to SQL and you want to learn the basics. SQL is ubiquitous and it's a fundamental skill for data analysts to have. Whether or not you have any prior experience with SQL, if you're interested in gaining the skills necessary to query big data using modern SQL engines, this specialization is for you. Most courses that teach SQL focus on how it's used with traditional relational databases. But today, more and more of the data that's being generated is too big to be stored there, and it's growing too quickly to be efficiently stored in commercial data warehouses. Instead, it's increasingly stored in distributed clusters and cloud storage. These data stores are cost-efficient and infinitely scalable. To query these huge datasets in clusters and cloud storage, you need a newer breed of SQL engine, distributed query engines, like Hive, Impala, Presto, and Drill. These are open source SQL engines capable of querying enormous datasets almost instantaneously. We'll focus on Hive and Impala which are the most widely deployed of these query engines. They are the industry standard big data SQL engines and as data volumes grow, they are increasingly disrupting some of the traditional solutions. In the first course of this specialization, I'll teach you the conceptual foundations of relational databases, SQL, and big data. You'll understand how databases provide structure to data and how this has changed as the volume and variety of data have increased. Toward the end of the first course, you'll set up your own computer with the same software that data analysts use in some of the largest datasets in the world. We've scaled it down for you so that it can work on most laptops. But please look carefully at the hardware requirements, because we want you to get hands-on experience with the actual tools big data analysts use in production. In the second course, I'll teach you the fundamentals of the SQL Select statement which is the one part of the SQL language that's essential for doing data analysis. You can use SQL select statements to query data of all sizes across numerous different systems. And in this course, you'll gain the skills that apply to all of these systems but the emphasis is on distributed SQL engines like Hive and Impala that can query extremely large datasets. If you're new to SQL and databases, these first two courses are a great place to start. In the first course, you'll learn the important concepts and terminology. Then in the second course, you'll learn how to write queries in SQL to answer real-world types of questions. And if you already have some experience with SQL, these first two courses will show you how you can update your existing skills to work with a large-scale data using modern analytic database systems. If you want to set yourself apart as a big data analyst, you need to go beyond just querying data that's already prepared for you. So when the third course in this specialization, we'll teach you how to manage big datasets, how to load them into clusters and cloud storage, and how to apply structure to the data so that you can run queries on it using distributed SQL engines like Hive and Impala. You'll learn how to choose the right data types, storage systems, and file formats based on which tools you'll use and what performance you need. Finally, in the fourth course, we'll build on the foundation of these first three courses to teach you advanced techniques for analyzing data with SQL. You'll learn how to use subqueries, windowing and analytic functions, and text analysis functions to answer more complex questions about more diverse datasets. You'll learn about the differences between SQL engines, how to choose the best one for a particular job, and how to extend their built-in functionality. The sequence of courses in this specialization is designed to provide excellent preparation for the Cloudera Certified Associate Data Analyst exam. This certification was created to identify qualified data analysts with the talent for using SQL to analyze big data. Earning this certification is a great way to stand out and be recognized by potential employers. You can earn it by taking a hands-on practical exam using the same SQL engines that this specialization teaches; Hive and Impala. We've designed the honors track of this specialization to provide you with some additional information and skills to help you prepare for the certification exam. This Cloudera Certified Associate Credential is different from the Coursera certificate you'll receive if you complete the courses in this specialization. But if you complete the courses, including the honors lessons and earn a specialization certificate with honors, then you should be well-equipped to take the certification exam. It's a challenging exam and we can't guarantee you'll pass it but if you do, then you'll be able to add this Cloudera certification to your resume. And more important than that, you'll have the skills you'll need to develop real mastery in the analysis of big data. We can all benefit from the new insights you'll deliver from the growing world of data in the years to come and we hope you'll join us.