目录

java.lang.String-is-not-a-valid-external-type-for-schema-of-int

java.lang.String is not a valid external type for schema of int

关于"java.lang.String is not a valid external type for schema of int"解决方法。

问题描述

#sparksession.sql(select * from …)出现java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of int

解决方法

是根据一个博主的帖子得到的启发,例子如下:

val spark = SparkSession.builder()
  .appName("Duplicate Detection")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

val rawData = spark.sparkContext.textFile("src/test/Resources/DataSets/Emp.txt")
val rowRD = rawData.map(line => line.split("-"))
val rowRDD = rowRD.map(r => Row(r(0).toInt, r(1), r(2).toInt, r(3))
)
// rowRDD.collect().foreach(println)
val schema = StructType(Array(
StructField(id, IntegerType, true),
StructField(name, StringType, true),
StructField(age, IntegerType, true),
StructField(sal, IntegerType, true)
))
val df = spark.createDataFrame(rowRDD, schema)
df.printSchema()
df.show()

1.执行上述代码时,会出现报错:

18/04/21 20:21:15 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2)
java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.lang.String is not a valid external type for schema of int
if (assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, id), IntegerType) AS id#0
± if (assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object).isNullAt) null else validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object), 0, id), IntegerType)
:- assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object).isNullAt
: :- assertnotnull(input[0, org.apache.spark.sql.Row, true], top level row object)
: : ± input[0, org.apache.spark.sql.Row, true]
: ± 0
:- null

2.分析出错的原因

val rowRDD = rowRD.map(r => Row(r(0).toInt, r(1), r(2).toInt, r(3))

上述代码的r(3)并未进行类型转换,所以仍是String类型,但后面schema中对应的类型已经是IntegerType,所以导致代码运行出错。

3.改正

val rowRDD = rowRD.map(r => Row(r(0).toInt, r(1), r(2).toInt, r(3).toInt)

总结

应该优先检查代码中是不是在类型转换处出错,以至于类型对应不起来。