À propos de ce cours
Learn to analyze big data using Apache Spark's distributed computing framework. In a series of focused, practical tasks, you will start by launching a spark cluster on Amazon's EC2 cloud computing platform. As you progress to working with real data, you will gain exposure to a variety of useful tools, including RDFlib and SPARQL. The practical tasks on this course make use of the Gutenberg Project data - the world's largest open collection of ebooks. This offers no end of opportunity for highly engaging and novel analyses. As the taught material and example code is given in Python, it is strongly recommended that all students have previous Python programming experience. Furthermore, launching and interacting with a cluster on EC2 requires basic knowledge of Unix command line, and some experience with a command-line editor such as vim or nano would also be advantageous. With these minimal prerequisites, this course is designed to get you up and running in Spark as quickly and painlessly as possible, so that by the end, you will be comfortable and competent enough to start engineering your own big data solutions.
Globe

Cours en ligne à 100 %

Commencez dès maintenant et apprenez aux horaires qui vous conviennent.
Intermediate Level

Niveau intermédiaire

Clock

Approx. 23 hours to complete

Recommandé : 4 weeks of study, 3-6 hours/week
Comment Dots

English

Sous-titres : English
Globe

Cours en ligne à 100 %

Commencez dès maintenant et apprenez aux horaires qui vous conviennent.
Intermediate Level

Niveau intermédiaire

Clock

Approx. 23 hours to complete

Recommandé : 4 weeks of study, 3-6 hours/week
Comment Dots

English

Sous-titres : English

Syllabus - What you will learn from this course

1

Section
Clock
9 hours to complete

Getting Started in Spark on EC2

This week, you'll gain essential background knowledge along with the practical skills needed to run applications in Apache Spark. You'll also take the steps necessary to launch a Spark cluster on the Amazon EC2 cloud computing platform....
Reading
8 videos (Total 47 min), 13 readings, 11 quizzes
Video8 videos
Introduction2m
Create a normal AWS account6m
Launch a Spark cluster on EC212m
What is Spark?6m
Fundamentals10m
Setting up your development environment4m
Summary0m
Reading13 readings
About this course10m
Week 1 Resource zip10m
Tips for following this lesson10m
Create a normal AWS account with billing alarm10m
Tips for following this video10m
Launch a Spark cluster on EC210m
Additional guidance for starter accounts10m
Accessing the pyspark interactive shell10m
Tips for following this lesson10m
How to install Spark locally10m
Tips for following this lesson10m
Setting up your development environment10m
Submitting applications to a cluster10m
Quiz8 practice exercises
Prerequisite Skills Quiz8m
Week 1 Introduction Quiz6m
Lesson 1.1 Practice Quiz8m
Lesson 1.2 Practice Quiz8m
Lesson 1.3 Practice Quiz4m
Lesson 1.4 Practice Quiz6m
Lesson 1.5 Practice Quiz4m
Week 1 Summary Quiz22m

2

Section
Clock
4 hours to complete

Reading and Writing Data

This week you'll learn how to read and write data in Spark. The techniques you'll be shown can be used with data stored locally, or in partnership with the Amazon S3 cloud storage facility. To help get you started, we'll also show you how to upload a subset of the Gutenberg Project dataset onto Amazon S3....
Reading
4 videos (Total 20 min), 8 readings, 6 quizzes
Video4 videos
Reading and writing RDDs7m
Reading data from Amazon S3 with boto35m
2.4 Writing objects to Amazon S3 (Spark methods)5m
Reading8 readings
Week 2 Resources zip10m
Get the Gutenberg project dataset10m
Tips for following this lesson10m
Using Spark methods to read and write data on S310m
Tips for following this lesson10m
Using boto3 to read data from Amazon S310m
Tips for following this lesson10m
Configuring Spark for accessing S310m
Quiz5 practice exercises
Lesson 2.1 Practice Quiz4m
Lesson 2.2 Practice Quiz6m
Lesson 2.3 Practice Quiz6m
Lesson 2.4 Practice Quiz6m
Week 2 Summary Quiz16m

3

Section
Clock
3 hours to complete

Tools for Working with Data

This week you'll be getting to grips with some useful tools in preparation for working with the Gutenberg Project data set. In this week's assessment, you will exercise your data wrangling skills to produce a catalogue index file from the Gutenberg Project meta data, a resource that should prove useful in your final assessment....
Reading
4 videos (Total 23 min), 3 readings, 5 quizzes
Video4 videos
What is RDF?5m
Using RDFLib8m
Summary0m
Reading3 readings
Week 3 Resources zip10m
Tips for following this lesson10m
Tips for following this lesson10m
Quiz4 practice exercises
Lesson 3.1 Practice Quiz4m
Lesson 3.2 Practice Quiz4m
Lesson 3.3 Practice Quiz4m
Week 3 Summary Quiz20m

4

Section
Clock
4 hours to complete

Programming in Spark

This week you'll learn Spark programming in some detail, in preparation for working with the Gutenberg collection of ebooks. The areas that will be covered should lead you to write much more efficient and successful Spark applications....
Reading
8 videos (Total 42 min), 5 readings, 7 quizzes
Video8 videos
4.1 Working with data frames9m
Pipelines and cacheing7m
Spark performance6m
Spark configuration5m
Spark examples8m
Summary0m
Summary0m
Reading5 readings
Week 4 Resources zip10m
Tips for following this lesson10m
Tips for following this lesson10m
Tips for following this lesson10m
Tips for following this lesson10m
Quiz6 practice exercises
Lesson 4.1 Practice Quiz6m
Lesson 4.2 Practice Quiz4m
Lesson 4.3 Practice Quiz4m
Lesson 4.4 Practice Quiz4m
Lesson 4.5 Practice Quiz4m
Week 4 Summary Quiz20m
3.6

Top Reviews

By CCMay 30th 2018

Good Practice session on AWS platform and thorough explanation from the mentors.\n\nThanks a lot.

Instructors

About University of London

The University of London is a federal University which includes 18 world leading Colleges. Our distance learning programmes were founded in 1858 and have enriched the lives of thousands of students, delivering high quality University of London degrees wherever our students are across the globe. Our alumni include 7 Nobel Prize winners. Today, we are a global leader in distance and flexible study, offering degree programmes to over 50,000 students in over 180 countries. To find out more about studying for one of our degrees where you are, visit www.london.ac.uk...

Frequently Asked Questions

  • Once you enroll for a Certificate, you’ll have access to all videos, quizzes, and programming assignments (if applicable). Peer review assignments can only be submitted and reviewed once your session has begun. If you choose to explore the course without purchasing, you may not be able to access certain assignments.

  • If you pay for this course, you will have access to all of the features and content you need to earn a Course Certificate. If you complete the course successfully, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. Note that the Course Certificate does not represent official academic credit from the partner institution offering the course.

  • Yes! Coursera provides financial aid to learners who would like to complete a course but cannot afford the course fee. To apply for aid, select "Learn more and apply" in the Financial Aid section below the "Enroll" button. You'll be prompted to complete a simple application; no other paperwork is required.

More questions? Visit the Learner Help Center