macbook 安装hadoop hive
1 2 3 brew install hardoop brew install hive
配置hadoop hadoop安装的是3.3.4
echo “127.0.0.1 wanghongxing” >> /etc/hosts
目录是 /usr/local/Cellar/hadoop/3.3.4/
进入 /usr/local/Cellar/hadoop/3.3.4/目录,
1 2 3 4 cd /usr/local/Cellar/hadoop/3.3.4/ mkdir tmp mkdir -p dfs/name mkdir hadoop
进入libexec目录,修改etc下的配置文件
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
修改 core-site.xml 文件 设置 Hadoop 的临时目录和文件系统,localhost:9000 表示本地主机。如果使用远程主机,要用相应的 IP 地址来代替,填写远程主机的域名,则需要到 /etc/hosts 文件中做 DNS 映射。在 core-site.xml 文件里作如下配置:
1 2 3 4 5 6 7 8 9 10 11 12 <configuration > <property > <name > fs.defaultFS</name > <value > hdfs://localhost:9000</value > </property > <property > <name > hadoop.tmp.dir</name > <value > /usr/local/Cellar/hadoop/3.3.4/tmp</value > </property > </configuration >
修改 hdfs-site.xml 文件 hdfs-site.xml 的配置修改如下,注意 name 和 data 的路径都要替换成本地的路径:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 <configuration > <property > <name > dfs.replication</name > <value > 1</value > </property > <property > <name > dfs.permissions</name > <value > false</value > </property > <property > <name > dfs.namenode.name.dir</name > <value > /usr/local/Cellar/hadoop/3.3.4/dfs/name</value > </property > <property > <name > dfs.datanode.data.dir</name > <value > /usr/local/Cellar/hadoop/3.3.4/hadoop</value > </property > </configuration >
修改 mapred-site.xml 文件 由于根目录下 etc/hadoop 中没有 mapred-site.xml 文件,所以需要创建该文件。但是目录中提供了 mapred-site.xml.template 模版文件。我们将其重命名为 mapred-site.xml,然后将 yarn 设置成数据处理框架:
1 2 3 4 5 6 7 <configuration > <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property > </configuration >
修改 yarn-site.xml 文件 配置数据的处理框架 yarn:
1 2 3 4 5 6 7 8 9 10 11 <configuration > <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property > <property > <name > yarn.resourcemanager.address</name > <value > localhost:8088</value > </property > </configuration >
名称节点设置 1 2 $ hdfs namenode -format
验证Hadoop
在浏览器访问Hadoop 访问Hadoop的默认端口号为9870(老版本的是 50070 ). 使用以下网址,以获取浏览器Hadoop服务。
验证集群的所有应用程序 访问集群中的所有应用程序的默认端口号为8088。使用以下URL访问该服务。
重启 关闭
1 2 3 4 5 6 7 sbin/stop-all.sh rm -rf tmp/dfs sbin/start-all.sh
至此,hadoop看着么有问题。
配置hive 在 ~/.bash_profile文件新增 1 2 3 4 5 #Setting PATH for Hive export HIVE_HOME=/usr/local/Cellar/hive/3.1.3/libexec export PATH=$PATH:HIVE_HOME/bin
mysql 1 2 3 4 5 6 7 create database hivestore; CREATE USER 'hardoop'@'%' IDENTIFIED BY '123456'; GRANT ALL PRIVILEGES ON *.* TO 'hardoop'@'%' WITH GRANT OPTION; flush privileges;
**修改Hive配置文件 1 2 3 cd $HIVE_HOME/conf cp hive-default.xml.template hive-site.xml vim hive-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 <configuration > <property > <name > javax.jdo.option.ConnectionUserName</name > <value > hadoop</value > </property > <property > <name > javax.jdo.option.ConnectionPassword</name > <value > 123456</value > </property > <property > <name > javax.jdo.option.ConnectionURL</name > <value > jdbc:mysql://localhost:3306/hivestore</value > </property > <property > <name > javax.jdo.option.ConnectionDriverName</name > <value > com.mysql.cj.jdbc.Driver</value > </property > <property > <name > hive.exec.local.scratchdir</name > <value > /usr/local/Cellar/hive/3.1.3/libexec/iotmp</value > </property > <property > <name > hive.querylog.location</name > <value > /usr/local/Cellar/hive/3.1.3/libexec/iotmp</value > </property > <property > <name > hive.downloaded.resources.dir</name > <value > /usr/local/Cellar/hive/3.1.2/libexec/iotmp</value > </property > </configuration >
下载mysql连接器 在https://dev.mysql.com/downloads/connector/j/
mysql-connector,下载选platform independent的操作系统。解压以后,把jar文件复制到/usr/local/Cellar/hive/3.1.3/libexec/lib目录下面。
在/usr/local/Cellar/hive/3.1.2/libexec/(即$HIVE_HOME)文件夹内新建iotmp文件夹
初始化库 在/usr/local/Cellar/hive/3.1.2/libexec/bin目录下
1 schematool -initSchema -dbType mysql
查看初始化信息
1 schematool -dbType mysql -info
启动hive 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 wanghongxing:~ whx$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/3.1.3/libexec/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/3.3.4/libexec/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 8a401692-9f09-41f6-babb-0ac77fd8eb16 Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/3.1.3/libexec/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true Hive Session ID = 380b0568-e219-453f-abf8-12eb4d7cf331 Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases ; OK default Time taken: 0.798 seconds, Fetched: 1 row(s) hive>
折腾了半天,重新看hadoop,启动总是出问题,因为自己是arm芯片的macbook,决定用docker的方式折腾。
从 https://github.com/big-data-europe/docker-hadoop clone 了他的代码库
看他头打包镜像的脚本,执行make build就可以打包。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 DOCKER_NETWORK = docker-hadoop_default ENV_FILE = hadoop.env current_branch := $(shell git rev-parse --abbrev-ref HEAD) build: docker build -t bde2020/hadoop-base:$(current_branch) ./base docker build -t bde2020/hadoop-namenode:$(current_branch) ./namenode docker build -t bde2020/hadoop-datanode:$(current_branch) ./datanode docker build -t bde2020/hadoop-resourcemanager:$(current_branch) ./resourcemanager docker build -t bde2020/hadoop-nodemanager:$(current_branch) ./nodemanager docker build -t bde2020/hadoop-historyserver:$(current_branch) ./historyserver docker build -t bde2020/hadoop-submit:$(current_branch) ./submit wordcount: docker build -t hadoop-wordcount ./submit docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -mkdir -p /input/ docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -copyFromLocal -f /opt/hadoop-3.2.3/README.txt /input/ docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} hadoop-wordcount docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -cat /output/* docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -rm -r /output docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -rm -r /input
另外Dockerfile是从debian 来做,安装arm64版jdk比较麻烦,因为改成openjdk11,同时把jdk相应的安装去掉了,顺便把hardoop升级到3.2.3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 FROM openjdk:11 MAINTAINER Ivan Ermilov <ivan.s.ermilov@gmail.com> MAINTAINER Giannis Mouchakis <gmouchakis@iit.demokritos.gr> RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ net-tools \ curl \ netcat \ gnupg \ libsnappy-dev \ && rm -rf /var/lib/apt/lists/* # ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ RUN curl -O https://dist.apache.org/repos/dist/release/hadoop/common/KEYS RUN gpg --import KEYS ENV HADOOP_VERSION 3.2.3 ENV HADOOP_URL https://www.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz RUN set -x \ && curl -fSL "$HADOOP_URL" -o /tmp/hadoop.tar.gz \ && curl -fSL "$HADOOP_URL.asc" -o /tmp/hadoop.tar.gz.asc \ && gpg --verify /tmp/hadoop.tar.gz.asc \ && tar -xvf /tmp/hadoop.tar.gz -C /opt/ \ && rm /tmp/hadoop.tar.gz* RUN ln -s /opt/hadoop-$HADOOP_VERSION/etc/hadoop /etc/hadoop RUN mkdir /opt/hadoop-$HADOOP_VERSION/logs RUN mkdir /hadoop-data ENV HADOOP_HOME=/opt/hadoop-$HADOOP_VERSION ENV HADOOP_CONF_DIR=/etc/hadoop ENV MULTIHOMED_NETWORK=1 ENV USER=root ENV PATH $HADOOP_HOME/bin/:$PATH ADD entrypoint.sh /entrypoint.sh RUN chmod a+x /entrypoint.sh ENTRYPOINT ["/entrypoint.sh"]
这个弄好后hadoop没有问题,但是加上 hive后总是有问题。
拿出intel芯片的macbook,这样搞。