Spécialisation Data Science

Commencé le mars 20

Spécialisation Data Science

Launch Your Career in Data Science

Une introduction de neuf cours à la science des données, développés et enseignés par des professeurs de premier plan.

À propos de cette Spécialisation

Ask the right questions, manipulate data sets, and create visualizations to communicate results. This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data. At completion, students will have a portfolio demonstrating their mastery of the material.

Créé par :

Partenaires du secteur :

courses
10 courses

Suivez l'ordre suggéré ou choisissez le vôtre.

projects
Projets

Conçu pour vous aider à vous exercer et à appliquer les compétences que vous avez acquises.

certificates
Certificats

Mettez en évidence vos nouvelles compétences sur votre CV ou sur LinkedIn.

Cours
Beginner Specialization.
No prior experience required.
  1. COURS 1

    The Data Scientist’s Toolbox

    Session en cours : mars 20 — avr. 24.
    Engagement
    1-4 hours/week
    Sous-titres
    English, French, Chinese (Simplified), Greek, Italian, Portuguese (Brazilian), Russian, Turkish, Hebrew

    À propos du cours

    In this course you will get an introduction to the main tools and ideas in the data scientist's toolbox. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio.
  2. COURS 2

    Programmation R

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English, French, Chinese (Simplified)

    À propos du cours

    Dans ce cours vous allez apprendre à programmer avec R et à utiliser R pour faire des analyses de données effectives. Vous apprendrez comment installer et configurer les logiciels nécessaires pour un environnement de programmation statistique et décrire les concepts de langage de programmation générique de la façon dont ils sont implémentés dans un langage de haut niveau statistique. Le cours couvre des problèmes pratiques dans l'informatique statistique qui incluent la programmation en R, la lecture des données avec R, l'accès aux paquets R, l'écriture de fonctions en R, le déboggage, le profilage de code R et l'organisation et les commentaires du code R. Les sujets d'analyses de données statistiques fourniront des exemples fonctionnels.
  3. COURS 3

    Getting and Cleaning Data

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English, Russian, Chinese (Simplified)

    À propos du cours

    Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.
  4. COURS 4

    Exploratory Data Analysis

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English, Chinese (Simplified)

    À propos du cours

    This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
  5. COURS 5

    Reproducible Research

    Session en cours : mars 20 — avr. 24.
    Engagement
    4-9 hours/week
    Sous-titres
    English

    À propos du cours

    This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available. This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.
  6. COURS 6

    Statistical Inference

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English

    À propos du cours

    Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, …) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. After taking this course, students will understand the broad directions of statistical inference and use this information for making informed choices in analyzing data.
  7. COURS 7

    Modèles de Régression

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English

    À propos du cours

    Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.
  8. COURS 8

    Practical Machine Learning

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English

    À propos du cours

    One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
  9. COURS 9

    Développer des produits de données

    Session en cours : mars 20 — avr. 24.
    Sous-titres
    English

    À propos du cours

    Un produit de données est une production provenant d'une analyse statistique. Les produits de données automatisent des tâches d'analyses complexes ou utilisent la technologie pour étendre l'utilité d'un modèle de données informel, algorithmique ou d'inférence. Ce cours couvre les bases de la création de produits de données en utilisant Shiny, des paquets R et des graphiques interactifs. Le cours se focalisera sur les fondamentaux statistiques de la création de produits de données qui peuvent être utilisés pour raconter une histoire à propos des données à une audience de masse.
  10. COURS 10

    Data Science Capstone

    Session à venir : mai 1 — juin 26.
    Engagement
    4-9 hours/week
    Sous-titres
    English

    À propos du Projet Final

    The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.

Créateurs

  • Université Johns-Hopkins

    Johns Hopkins University is recognized as a destination for excellent, ambitious scholars and a world leader in teaching and research. The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for life-long learning, to foster independent and original research, and to bring the benefits of discovery to the world.

    The mission of The Johns Hopkins University is to educate its students and cultivate their capacity for life-long learning, to foster independent and original research, and to bring the benefits of discovery to the world.

  • Roger D. Peng, PhD

    Roger D. Peng, PhD

    Associate Professor, Biostatistics
  • Brian Caffo, PhD

    Brian Caffo, PhD

    Professor, Biostatistics
  • Jeff Leek, PhD

    Jeff Leek, PhD

    Associate Professor, Biostatistics

FAQs

More questions? Visit the Learner Help Center.