Hadoop For Administrators
Corso
Online
Hai bisogno di un coach per la formazione?
Ti aiuterà a confrontare vari corsi e trovare l'offerta formativa più conveniente.
Descrizione
-
Tipologia
Corso
-
Metodologia
Online
-
Inizio
Scegli data
Apache Hadoop è il framework più popolare per l'elaborazione di Big Data su cluster di server In questo corso di tre (opzionalmente, quattro) giorni, i partecipanti apprenderanno i vantaggi aziendali e i casi d'uso di Hadoop e del suo ecosistema, come pianificare la distribuzione e la crescita dei cluster, come installare, mantenere, monitorare, risolvere e ottimizzare Hadoop Praticheranno inoltre il carico dei dati di massa dei cluster, acquisiranno familiarità con le varie distribuzioni Hadoop e pratichino l'installazione e la gestione degli strumenti dell'ecosistema Hadoop Il corso termina con la discussione sulla messa in sicurezza del cluster con Kerberos " I materiali erano ben preparati e coperti a fondo Il laboratorio è stato molto utile e ben organizzato " - Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising Pubblico Amministratori Hadoop Formato Lezioni frontali e esercitazioni pratiche, bilancio approssimativo 60% lezioni frontali, 40% laboratori .
Machine Translated
Sedi e date
Luogo
Inizio del corso
Inizio del corso
Profilo del corso
comfortable with basic Linux system administration
basic scripting skills
Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.
Lab environment
Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed
Opinioni
Programma
- Introduction
- Hadoop history, concepts
- Ecosystem
- Distributions
- High level architecture
- Hadoop myths
- Hadoop challenges (hardware / software)
- Labs: discuss your Big Data projects and problems
- Planning and installation
- Selecting software, Hadoop distributions
- Sizing the cluster, planning for growth
- Selecting hardware and network
- Rack topology
- Installation
- Multi-tenancy
- Directory structure, logs
- Benchmarking
- Labs: cluster install, run performance benchmarks
- HDFS operations
- Concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring
- Command-line and browser-based administration
- Adding storage, replacing defective drives
- Labs: getting familiar with HDFS command lines
- Data ingestion
- Flume for logs and other data ingestion into HDFS
- Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
- Hadoop data warehousing with Hive
- Copying data between clusters (distcp)
- Using S3 as complementary to HDFS
- Data ingestion best practices and architectures
- Labs: setting up and using Flume, the same for Sqoop
- MapReduce operations and administration
- Parallel computing before mapreduce: compare HPC vs Hadoop administration
- MapReduce cluster loads
- Nodes and Daemons (JobTracker, TaskTracker)
- MapReduce UI walk through
- Mapreduce configuration
- Job config
- Optimizing MapReduce
- Fool-proofing MR: what to tell your programmers
- Labs: running MapReduce examples
- YARN: new architecture and new capabilities
- YARN design goals and implementation architecture
- New actors: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling under YARN
- Labs: investigate job scheduling
- Advanced topics
- Hardware monitoring
- Cluster monitoring
- Adding and removing servers, upgrading Hadoop
- Backup, recovery and business continuity planning
- Oozie job workflows
- Hadoop high availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: set up monitoring
- Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
- Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)
Hai bisogno di un coach per la formazione?
Ti aiuterà a confrontare vari corsi e trovare l'offerta formativa più conveniente.
Hadoop For Administrators