搭建Spark单节点本地运行环境

搭建Spark单节点本地运行环境

我使用的是系统是macOS,搭建步骤如下:

  • 下载Spark

下载地址:http://spark.apache.org/downloads.html

我使用的版本是:https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgz

  • 解压缩后进入目录:

    tar zxvf spark-2.1.1-bin-hadoop2.7.tgz
    cd spark-2.1.1-bin-hadoop2.7

  • 启动shell

    ./bin/spark-shell

  • 在shell依次输入以下语句,观察结果:

    scala> val textFile = sc.textFile(“README.md”)
    scala> textFile.count()
    scala> textFile.first()
    scala> val linesWithSpark = textFile.filter(line => line.contains(“Spark”))
    scala> textFile.filter(line => line.contains(“Spark”)).count()

  • 运行独立的程序

首先需要安装sbt,sbt之于Scala就像Maven之于Java,用于管理项目依赖,构建项目。macOS只需要执行brew install sbt即可安装完成。

由于sbt下载依赖的速度极慢,所以在正式使用之前我们需要为其换源。换源方式很简单,在~/.sbt目录下新建 repositories文件,内容如下:

[repositories]
#local
public: http://maven.aliyun.com/nexus/content/groups/public/
typesafe:http://dl.bintray.com/typesafe/ivy-releases/ , [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
ivy-sbt-plugin:http://dl.bintray.com/sbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
sonatype-oss-releases
sonatype-oss-snapshots

下面重新回到spark-2.1.1-bin-hadoop2.7目录下,创建如下目录结构:

./src  
./src/main  
./src/main/scala  
./src/main/scala/example.scala  
./simple.sbt

文件simple.sbt的内容如下:

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"

文件example.scala的内容如下:

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "./README.md"
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    sc.stop()
  }
}

然后执行命令sbt package

再执行如下命令:

./bin/spark-submit --class "SimpleApp" target/scala-2.11/simple-project_2.11-1.0.jar

正常结果如下:

enter image description here

至此,Spark单节点本地运行环境就搭建成功了!

发表评论

电子邮件地址不会被公开。 必填项已用*标注