Course Summary: Advanced Data Analytics with Spark

Our Advanced Data Analytics with Spark cohort is a 4 week evening course.

Apache Spark is a fast and general engine for large-scale data processing. Spark was developed as an alternative to the traditional MapReduce processing paradigm. By using in memory storage, Spark can achieve up to 100X the speed of Hadoop MapReduce and is 10X faster when running on disk. Spark is preferred for iterative processing, which is being done by many machine learning algorithms.

Sparks runs on top of Hadoop, as a standalone platform or in the cloud. It is easy to use, fast and has a powerful stack of libraries including SQL and Dataframes. Our course will require that you have some experience programming in python.

Course Details: Advanced Data Analytics with Spark

Week 1 : Spark Fundamentals
  • C: Introduction to Spark
  • C: Why Spark?
  • C: Introduction to RDDs
  • C: Data sharing
  • C: Data Partitioning
Week 2 : Spark SQL
  • C: Working with the Spark Shell
  • C: What is Spark SQL?
  • C: Spark SQL vs Spark Core
  • C: DataFrames API
Week 3 : Spark Streaming
  • C: DStreams
  • C: Transformations: Stateless and Stateful Transformation
  • C: Checkpointing and Output Operations
  • C: Tuning and Debugging Spark
Learning Objectives: Advanced Data Analytics with Spark
  • Become familiar with Spark fundamentals. Learn about the different components of Spark.
  • Use Spark on a HDFS cluster. Gain experience working with RDDs.
  • Learn how to tune and debug Spark.
  • Tools used : Python, Spark

Next Steps:

  • Drop us a note, to schedule an interview, and see if this course is a good fit for you.
  • Enroll@bitbootcamp.com


Next Cohort

  • January 10th, 2017 - February 2nd, 2017
    Tuesday and Thursday: 6:30 PM to 9:30 PM


  • $2,500 USD


Financing Options available with: Pave