2015年版 Hadoopを10分で試す〜Docker編〜

今年は死ぬほど忙しいので、アドベントカレンダー全部俺は断念しました。。。

Hadoopを10分で(Dockerで)試す

が、例年「Hadoopを10分で試す」というブログを書いてきたので、今年も書いておきましょう。ちょうどいいタイミングでDockerのブログが公開されていたので、今年はDockerで試してみます。

MacOSにDocker環境を準備する

WindowsやMacでDockerを使う場合、Docker Toolboxを利用することができます。
Docker Toolbox
今回はMacOSに環境を作成し、その上でHadoopを動かします。

Docker Toolboxのインストール

パッケージをダウンロードしてインストールを開始します。
docker1「続ける」をクリック
docker2概要を読んだら「続ける」をクリック
docer3インストール先を選択し、「続ける」をクリック
docker4インストールの種類もデフォルトのままインストール
docker5「続ける」をクリック
docker6完了しました。特に引っかかるところはありません。

Docker Quickstart Terminalを実行

docker6.5無事に起動しました。

Cloudera QuickstartをDockerで試す

さて、Dockerの環境ができたので、Cloudera Quickstartを実行します。
[code]
## .
## ## ## ==
## ## ## ## ## ===
/"""""""""""""""""\___/ ===
~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ / ===- ~~~
\______ o __/
\ \ __/
\____\_______/
docker is configured to use the default machine with IP 192.168.99.100
For help getting started, check out the docs at https://docs.docker.com
mbp:~ kawasaki$
[/code]

Docker Imageを取得

docker pull で最新のイメージを取得します。
[code language=”shell”]
mbp: :~ kawasaki$ docker pull cloudera/quickstart:latest
latest: Pulling from cloudera/quickstart
7e0ff6dfbec6: Pull complete
Digest: sha256:a317b8a32591990eb2ee52149ead298b1c387242a3a36da48cdc2133e04a9881
Status: Downloaded newer image for cloudera/quickstart:latest
[/code]

取得したイメージを確認します。

[code language=”shell”]
mbp:~ kawasaki$ docker images
REPOSITORY            TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
cloudera/quickstart   latest              7e0ff6dfbec6        32 hours ago        6.213 GB
hello-world           latest              0a6ba66e537a        7 weeks ago         960 B
mbp:~ kawasaki$
[/code]
IMAGE ID は 7e0ff6dfbec6 でした。

Dockerを実行する

このIDを引数に指定して、新しいコンテナでQuickstartを実行します。
[code language=”shell”]
mbp:~ kawasaki$ docker run –privileged=true –hostname=quickstart.cloudera -t -i 7e0ff6dfbec6  /usr/bin/docker-quickstart
Starting mysqld:                                           [  OK  ]
if [ "$1" == "start" ] ; then
    if [ "${EC2}" == ‘true’ ]; then
        FIRST_BOOT_FLAG=/var/lib/cloudera-quickstart/.ec2-key-installed
        if [ ! -f "${FIRST_BOOT_FLAG}" ]; then
            METADATA_API=http://169.254.169.254/latest/meta-data
            KEY_URL=${METADATA_API}/public-keys/0/openssh-key
            SSH_DIR=/home/cloudera/.ssh
            mkdir -p ${SSH_DIR}
            chown cloudera:cloudera ${SSH_DIR}
            curl ${KEY_URL} >> ${SSH_DIR}/authorized_keys
            touch ${FIRST_BOOT_FLAG}
        fi
    fi
    if [ "${DOCKER}" != ‘true’ ]; then
        if [ -f /sys/kernel/mm/redhat_transparent_hugepage/defrag ]; then
            echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
        fi
        cloudera-quickstart-ip
        HOSTNAME=quickstart.cloudera
        hostname ${HOSTNAME}
        sed -i -e "s/HOSTNAME=.*/HOSTNAME=${HOSTNAME}/" /etc/sysconfig/network
    fi
    (
        cd /var/lib/cloudera-quickstart/tutorial;
        nohup python -m SimpleHTTPServer 80 &
    )
    # TODO: check for expired CM license and update config.js accordingly
fi
+ ‘[‘ start == start ‘]’
+ ‘[‘ ” == true ‘]’
+ ‘[‘ true ‘!=’ true ‘]’
+ cd /var/lib/cloudera-quickstart/tutorial
+ nohup python -m SimpleHTTPServer 80
nohup: appending output to `nohup.out’
JMX enabled by default
Using config: /etc/zookeeper/conf/zoo.cfg
Starting zookeeper … STARTED
starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-datanode-quickstart.cloudera.out
Started Hadoop datanode (hadoop-hdfs-datanode):            [  OK  ]
starting journalnode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-journalnode-quickstart.cloudera.out
Started Hadoop journalnode:                                [  OK  ]
starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-quickstart.cloudera.out
Started Hadoop namenode:                                   [  OK  ]
starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-quickstart.cloudera.out
Started Hadoop secondarynamenode:                          [  OK  ]
Setting HTTPFS_HOME:          /usr/lib/hadoop-httpfs
Using   HTTPFS_CONFIG:        /etc/hadoop-httpfs/conf
Sourcing:                    /etc/hadoop-httpfs/conf/httpfs-env.sh
Using   HTTPFS_LOG:           /var/log/hadoop-httpfs/
Using   HTTPFS_TEMP:           /var/run/hadoop-httpfs
Setting HTTPFS_HTTP_PORT:     14000
Setting HTTPFS_ADMIN_PORT:     14001
Setting HTTPFS_HTTP_HOSTNAME: quickstart.cloudera
Setting HTTPFS_SSL_ENABLED: false
Setting HTTPFS_SSL_KEYSTORE_FILE:     /var/lib/hadoop-httpfs/.keystore
Setting HTTPFS_SSL_KEYSTORE_PASS:     password
Using   CATALINA_BASE:       /var/lib/hadoop-httpfs/tomcat-deployment
Using   HTTPFS_CATALINA_HOME:       /usr/lib/bigtop-tomcat
Setting CATALINA_OUT:        /var/log/hadoop-httpfs//httpfs-catalina.out
Using   CATALINA_PID:        /var/run/hadoop-httpfs/hadoop-httpfs-httpfs.pid
Using   CATALINA_OPTS:       
Adding to CATALINA_OPTS:     -Dhttpfs.home.dir=/usr/lib/hadoop-httpfs -Dhttpfs.config.dir=/etc/hadoop-httpfs/conf -Dhttpfs.log.dir=/var/log/hadoop-httpfs/ -Dhttpfs.temp.dir=/var/run/hadoop-httpfs -Dhttpfs.admin.port=14001 -Dhttpfs.http.port=14000 -Dhttpfs.http.hostname=quickstart.cloudera -Dhttpfs.ssl.enabled=false -Dhttpfs.ssl.keystore.file=/var/lib/hadoop-httpfs/.keystore -Dhttpfs.ssl.keystore.pass=password
Using CATALINA_BASE:   /var/lib/hadoop-httpfs/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/run/hadoop-httpfs
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/hadoop-httpfs/hadoop-httpfs-httpfs.pid
Started Hadoop httpfs (hadoop-httpfs):                     [  OK  ]
starting historyserver, logging to /var/log/hadoop-mapreduce/mapred-mapred-historyserver-quickstart.cloudera.out
Started Hadoop historyserver:                              [  OK  ]
starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-quickstart.cloudera.out
Started Hadoop nodemanager:                                [  OK  ]
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-quickstart.cloudera.out
Started Hadoop resourcemanager:                            [  OK  ]
starting master, logging to /var/log/hbase/hbase-hbase-master-quickstart.cloudera.out
Started HBase master daemon (hbase-master):                [  OK  ]
starting rest, logging to /var/log/hbase/hbase-hbase-rest-quickstart.cloudera.out
Started HBase rest daemon (hbase-rest):                    [  OK  ]
starting thrift, logging to /var/log/hbase/hbase-hbase-thrift-quickstart.cloudera.out
Started HBase thrift daemon (hbase-thrift):                [  OK  ]
Starting Hive Metastore (hive-metastore):                  [  OK  ]
Started Hive Server2 (hive-server2):                       [  OK  ]
Starting Sqoop Server:                                     [  OK  ]
Sqoop home directory: /usr/lib/sqoop2
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:       -Xmx1024m
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /var/lib/sqoop2/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/tmp/sqoop2
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/sqoop2/sqoop-server-sqoop2.pid
Starting Spark history-server (spark-history-server):      [  OK  ]
Starting Hadoop HBase regionserver daemon: starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-quickstart.cloudera.out
hbase-regionserver.
Starting hue:                                              [FAILED]
Started Impala State Store Server (statestored):           [  OK  ]
Setting OOZIE_HOME:          /usr/lib/oozie
Sourcing:                    /usr/lib/oozie/bin/oozie-env.sh
  setting JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native"
  setting OOZIE_DATA=/var/lib/oozie
  setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
  setting CATALINA_TMPDIR=/var/lib/oozie
  setting CATALINA_PID=/var/run/oozie/oozie.pid
  setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment
  setting OOZIE_HTTPS_PORT=11443
  setting OOZIE_HTTPS_KEYSTORE_PASS=password
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
  setting OOZIE_CONFIG=/etc/oozie/conf
  setting OOZIE_LOG=/var/log/oozie
Using   OOZIE_CONFIG:        /etc/oozie/conf
Sourcing:                    /etc/oozie/conf/oozie-env.sh
  setting JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native"
  setting OOZIE_DATA=/var/lib/oozie
  setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat
  setting CATALINA_TMPDIR=/var/lib/oozie
  setting CATALINA_PID=/var/run/oozie/oozie.pid
  setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment
  setting OOZIE_HTTPS_PORT=11443
  setting OOZIE_HTTPS_KEYSTORE_PASS=password
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"
  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
  setting OOZIE_CONFIG=/etc/oozie/conf
  setting OOZIE_LOG=/var/log/oozie
Setting OOZIE_CONFIG_FILE:   oozie-site.xml
Using   OOZIE_DATA:          /var/lib/oozie
Using   OOZIE_LOG:           /var/log/oozie
Setting OOZIE_LOG4J_FILE:    oozie-log4j.properties
Setting OOZIE_LOG4J_RELOAD:  10
Setting OOZIE_HTTP_HOSTNAME: quickstart.cloudera
Setting OOZIE_HTTP_PORT:     11000
Setting OOZIE_ADMIN_PORT:     11001
Using   OOZIE_HTTPS_PORT:     11443
Setting OOZIE_BASE_URL:      http://quickstart.cloudera:11000/oozie
Using   CATALINA_BASE:       /var/lib/oozie/tomcat-deployment
Setting OOZIE_HTTPS_KEYSTORE_FILE:     /var/lib/oozie/.keystore
Using   OOZIE_HTTPS_KEYSTORE_PASS:     password
Setting OOZIE_INSTANCE_ID:       quickstart.cloudera
Setting CATALINA_OUT:        /var/log/oozie/catalina.out
Using   CATALINA_PID:        /var/run/oozie/oozie.pid
Using   CATALINA_OPTS:        -Doozie.https.port=11443 -Doozie.https.keystore.pass=password -Xmx1024m -Doozie.https.port=11443 -Doozie.https.keystore.pass=password -Xmx1024m -Dderby.stream.error.file=/var/log/oozie/derby.log
Adding to CATALINA_OPTS:     -Doozie.home.dir=/usr/lib/oozie -Doozie.config.dir=/etc/oozie/conf -Doozie.log.dir=/var/log/oozie -Doozie.data.dir=/var/lib/oozie -Doozie.instance.id=quickstart.cloudera -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=quickstart.cloudera -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://quickstart.cloudera:11000/oozie -Doozie.https.keystore.file=/var/lib/oozie/.keystore -Doozie.https.keystore.pass=password -Djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native
Using CATALINA_BASE:   /var/lib/oozie/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/bigtop-tomcat
Using CATALINA_TMPDIR: /var/lib/oozie
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/oozie/oozie.pid
Starting Solr server daemon:                               [  OK  ]
Using CATALINA_BASE:   /var/lib/solr/tomcat-deployment
Using CATALINA_HOME:   /usr/lib/solr/../bigtop-tomcat
Using CATALINA_TMPDIR: /var/lib/solr/
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /usr/lib/solr/../bigtop-tomcat/bin/bootstrap.jar
Using CATALINA_PID:    /var/run/solr/solr.pid
Started Impala Catalog Server (catalogd) :                 [  OK  ]
Started Impala Server (impalad):                           [  OK  ]
[root@quickstart /]#
[/code]
無事に起動しました!(が、デフォルトのメインメモリ2GBでは重すぎた。使い物にならん。。。。が、メモリを増やしても改善しない?)
 
なお、Cloudera Quickstart Docker Imageのドキュメントはこちらから。
https://hub.docker.com/r/cloudera/quickstart/
 
#Dockerの書籍をお探しなら@enakai00さんの「Docker実践入門」がオススメです!(宣伝)
 
 

コメント