
DATA101 – Big Data Boot Camp
Download Course Outline$1,795.00
Title: Big Data Boot Camp DATA101; 2 Days, Instructor-led
Description: This fast-paced Big Data training course provides a functional, results-oriented overview of the big data workspace with an emphasis on the Apache Hadoop ecosystem and related technologies. Go back to work with a detailed understanding of the overall big data space, the tools involved and how to use them in your own situation. You will be exposed to real world use cases which cement your understanding of what’s actually possible with big data capabilities. Prepare to build your own big data projects as you walk through live demonstrations of Hadoop, HDFS, MapReduce, HBase, SPARK, YARN, PIG, HIVE, OOZIE, and FLUME. Big Data Boot Camp is a pragmatic introduction to the toolset that allows the power of scalable, distributed cloud computing to efficiently process today’s mountains of data. The course is taught by highly experienced engineers with a proven track record of delivering strategic value to clients using big data technologies.
At Course Completion:
Audience & Prerequisites: • Data Analysts • Database Administrators • Developers and Team Leads • Application Teams • Software Engineers • Project Managers • Business Analysts • System Analysts
Course Outline Details: 1. Introduction to Big Data • Academic • Early web • Web scale o 1994 – 2012 o 2016 o 2020 2. Sources (Examples) • Internet • Transport systems • Medical, healthcare • Insurance • Military and others 3. Hadoop – the free platform for working with big data • History • Yahoo incubation • Platform fragmentation • The current usage, small-scale to enterprise 4. How to apply the concepts of big data • Load data how you find it • Process it when you can • Project it into various schemas on the fly • Push it back to where you need it 5. The basics • What it’s good for • What can’t it do • Disadvantages and opportunities • Key big data use cases 6. Introduction to HDFS • HDFS walkthrough • Using HDFS • Robustness • Data Replication • Gotchas 7. MapReduce – the core big data function • Map explained • Sort and shuffle • Reduce • A few practical applications Exercise — Hadoop, HDFS, and MapReduce: Let’s try it! 8. YARN • How it fits • How it works • Resource Manager • Application Master 9. PIG • What it is • How it works • Compatibilities • Advantages • Disadvantages Exercise — YARN and PIG: Let’s try it! 10. Processing Data • The Piggy Bank • Loading and Illustrating the data • Writing a Query • Storing the Result 11. HIVE • Data warehousing • What it is, what it’s not • Language compatibilities • Advantages and disadvantages • An advance look at HIVE on Spark Exercise — HIVE: Let’s try it! • Cloud demonstration: Contextual advertising 12. OOZIE • What it is • Complex workflow environments • Reducing time-to-market • Frequency execution • How it works with other big data tools • Cloud demonstration: How to run a job 13. FLUME – stream, collect, store & analyze high-volume log data How it works: • Event • Source • Sink • Channel • Agent and client • Cloud demonstration: FLUME in action 14. Spark • Move over 2012 tools: introducing Apache Spark • The new open source cluster framework • How SPARK performs 100 times faster • Performance comparison of Spark and Hadoop • What else can it do? 15. HBASE • How HBASE works • Common use cases 16. Using External Tools
Start Date: 04/29/2019
End Date: 04/30/2019
Location: Virtual
Brand: Professional Development