可以在一个Controller里面实现spark抓取的代码:
import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;import org.springframework.web.bind.annotation.RequestMapping;import org.springframework.web.bind.annotation.RestController;import org.springframework.web.servlet.ModelAndView;import org.apache.spark.sql.SparkSession;import org.apache.spark.SparkConf;import org.springframework.web.bind.annotation.ResponseBody;/** * @author flash胜龙 */@RestControllerpublic class DataFigureController { @RequestMapping("/dataimportlocal.html") public ModelAndView dataimportlocal() { System.setProperty("hadoop.home.dir", "D:\\hadoop-2.7.2test"); System.setProperty("HADOOP_USER_NAME", "hadoop"); SparkSession spark = SparkSession.builder().master("local[*]").appName("Word Count").config("spark.sql.warehouse.dir", "file:///d:/tmp").getOrCreate(); Datasetdf = spark.read().option("header", true).csv("D:\\book.csv"); df.show(); return new ModelAndView("dataimport"); }}
在spark的rdd进行Row封装的时候,会涉及到日期类型的转换问题。
默认org.apache.spark.sql.RowFactory 类型只接受java.sql.Date
// util.date转换成sql.datejava.util.Date utilDate = new java.util.Date(); //获取当前时间java.sql.Date sqlDate = new java.sql.Date(utilDate.getTime());// sql.date转换成util.datejava.sql.Date sqlDate1 = new java.sql.Date(new java.util.Date().getTime());java.util.Date utilDate1 = new java.util.Date(sqlDate1.getTime());
Maven配置如下(英 ['meɪv(ə)n] 美 ['mevn])
其中有不少坑:一个是包冲突问题,hadoop、spark和springboot体系里面每个都自己引用了一系列logger实现的包,一起编译运行会有冲突,对部分包的依赖要exclusions掉;二个是版本问题,对于要使用的版本,必须整个工程前后一致。如果一个引用的是A版本,另一个引用的是B版本,就会出问题,要么把A给exclusion掉,只用B版本,要不想其它办法:
4.0.0 my.groud.id sparkuitest 0.0.1-SNAPSHOT jar sparkuitest http://maven.apache.org central http://maven.aliyun.com/nexus/content/groups/public/ true maven2 http://repo1.maven.org/maven2 true UTF-8 2.11.8 2.0.0 2.6.0 4.12 2.6.5 UTF-8 1.8 1.8 com.opencsv opencsv 4.1 com.fasterxml.jackson.core jackson-core ${jackson.version} com.fasterxml.jackson.core jackson-databind ${jackson.version} com.fasterxml.jackson.core jackson-annotations ${jackson.version} org.springframework.boot spring-boot-starter-test 1.4.2.RELEASE org.springframework.boot spring-boot-starter-jdbc 1.4.2.RELEASE com.h2database h2 1.3.156 mysql mysql-connector-java 5.1.27 org.springframework.boot spring-boot-starter-web 1.4.2.RELEASE com.alibaba druid 1.0.11 org.scala-lang scala-library ${scala.version} com.typesafe config 1.2.1 org.apache.spark spark-core_2.11 ${spark.version} org.slf4j slf4j-log4j12 log4j log4j org.apache.spark spark-yarn_2.11 ${spark.version} provided org.apache.hadoop hadoop-client ${hadoop.version} org.slf4j slf4j-log4j12 log4j log4j org.mortbay.jetty jetty-util javax.servlet servlet-api javax.servlet javax.servlet-api 3.1.0 org.apache.spark spark-hive_2.11 ${spark.version} provided org.apache.spark spark-sql_2.11 ${spark.version} org.apache.spark spark-streaming_2.11 ${spark.version} provided org.apache.spark spark-streaming-kafka-0-8_2.11 ${spark.version} provided com.mchange c3p0 0.9.5.2 junit junit ${junit.version} test org.springframework.boot spring-boot-maven-plugin 1.4.2.RELEASE repackage org.apache.maven.plugins maven-compiler-plugin 2.3.2 ${project.build.sourceEncoding} org.apache.maven.plugins maven-resources-plugin 2.4.3 ${project.build.sourceEncoding} compile