Hadoop For Administrators

Corso

Online

Prezzo da consultare

Chiama il centro

Hai bisogno di un coach per la formazione?

Ti aiuterà a confrontare vari corsi e trovare l'offerta formativa più conveniente.

Descrizione

  • Tipologia

    Corso

  • Metodologia

    Online

  • Inizio

    Scegli data

Apache Hadoop è il framework più popolare per l'elaborazione di Big Data su cluster di server In questo corso di tre (opzionalmente, quattro) giorni, i partecipanti apprenderanno i vantaggi aziendali e i casi d'uso di Hadoop e del suo ecosistema, come pianificare la distribuzione e la crescita dei cluster, come installare, mantenere, monitorare, risolvere e ottimizzare Hadoop Praticheranno inoltre il carico dei dati di massa dei cluster, acquisiranno familiarità con le varie distribuzioni Hadoop e pratichino l'installazione e la gestione degli strumenti dell'ecosistema Hadoop Il corso termina con la discussione sulla messa in sicurezza del cluster con Kerberos " I materiali erano ben preparati e coperti a fondo Il laboratorio è stato molto utile e ben organizzato " - Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising Pubblico Amministratori Hadoop Formato Lezioni frontali e esercitazioni pratiche, bilancio approssimativo 60% lezioni frontali, 40% laboratori .
Machine Translated

Sedi e date

Luogo

Inizio del corso

Online

Inizio del corso

Scegli dataIscrizioni aperte

Profilo del corso

comfortable with basic Linux system administration
basic scripting skills
Knowledge of Hadoop and Distributed Computing is not required, but will be introduced and explained in the course.
Lab environment
Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed

Domande e risposte

Aggiungi la tua domanda

I nostri consulenti e altri utenti potranno risponderti

Chi vuoi che ti risponda?

Inserisci i tuoi dati per ricevere una risposta

Pubblicheremo solo il tuo nome e la domanda

Opinioni

Programma

  • Introduction
    • Hadoop history, concepts
    • Ecosystem
    • Distributions
    • High level architecture
    • Hadoop myths
    • Hadoop challenges (hardware / software)
    • Labs: discuss your Big Data projects and problems
  • Planning and installation
    • Selecting software, Hadoop distributions
    • Sizing the cluster, planning for growth
    • Selecting hardware and network
    • Rack topology
    • Installation
    • Multi-tenancy
    • Directory structure, logs
    • Benchmarking
    • Labs: cluster install, run performance benchmarks
  • HDFS operations
    • Concepts (horizontal scaling, replication, data locality, rack awareness)
    • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
    • Health monitoring
    • Command-line and browser-based administration
    • Adding storage, replacing defective drives
    • Labs: getting familiar with HDFS command lines
  • Data ingestion
    • Flume for logs and other data ingestion into HDFS
    • Sqoop for importing from SQL databases to HDFS, as well as exporting back to SQL
    • Hadoop data warehousing with Hive
    • Copying data between clusters (distcp)
    • Using S3 as complementary to HDFS
    • Data ingestion best practices and architectures
    • Labs: setting up and using Flume, the same for Sqoop
  • MapReduce operations and administration
    • Parallel computing before mapreduce: compare HPC vs Hadoop administration
    • MapReduce cluster loads
    • Nodes and Daemons (JobTracker, TaskTracker)
    • MapReduce UI walk through
    • Mapreduce configuration
    • Job config
    • Optimizing MapReduce
    • Fool-proofing MR: what to tell your programmers
    • Labs: running MapReduce examples
  • YARN: new architecture and new capabilities
    • YARN design goals and implementation architecture
    • New actors: ResourceManager, NodeManager, Application Master
    • Installing YARN
    • Job scheduling under YARN
    • Labs: investigate job scheduling
  • Advanced topics
    • Hardware monitoring
    • Cluster monitoring
    • Adding and removing servers, upgrading Hadoop
    • Backup, recovery and business continuity planning
    • Oozie job workflows
    • Hadoop high availability (HA)
    • Hadoop Federation
    • Securing your cluster with Kerberos
    • Labs: set up monitoring
  • Optional tracks
    • Cloudera Manager for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Cloudera distribution environment (CDH5)
    • Ambari for cluster administration, monitoring, and routine tasks; installation, use. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0)

Chiama il centro

Hai bisogno di un coach per la formazione?

Ti aiuterà a confrontare vari corsi e trovare l'offerta formativa più conveniente.

Hadoop For Administrators

Prezzo da consultare