What do the number prefixes mean in explain operator?
What does (1)
, (6)
and (3)
mean in the following output of explain
. Spark version is 2.3.1.
apache-spark apache-spark-sql
add a comment |
What does (1)
, (6)
and (3)
mean in the following output of explain
. Spark version is 2.3.1.
apache-spark apache-spark-sql
Strange thing, I've ran severalexplain
against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51
@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35
I use 2.2.0 too
– morsik
Nov 21 at 16:01
add a comment |
What does (1)
, (6)
and (3)
mean in the following output of explain
. Spark version is 2.3.1.
apache-spark apache-spark-sql
What does (1)
, (6)
and (3)
mean in the following output of explain
. Spark version is 2.3.1.
apache-spark apache-spark-sql
apache-spark apache-spark-sql
edited Nov 21 at 11:47
Jacek Laskowski
43.2k17126256
43.2k17126256
asked Nov 20 at 19:47
user10349797
82
82
Strange thing, I've ran severalexplain
against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51
@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35
I use 2.2.0 too
– morsik
Nov 21 at 16:01
add a comment |
Strange thing, I've ran severalexplain
against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51
@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35
I use 2.2.0 too
– morsik
Nov 21 at 16:01
Strange thing, I've ran several
explain
against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?– morsik
Nov 20 at 20:51
Strange thing, I've ran several
explain
against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?– morsik
Nov 20 at 20:51
@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35
@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35
I use 2.2.0 too
– morsik
Nov 21 at 16:01
I use 2.2.0 too
– morsik
Nov 21 at 16:01
add a comment |
1 Answer
1
active
oldest
votes
I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).
Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.
You can learn about that Java-generated code parts of a structured query using explain
operator.
val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)
As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.
The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.
You can see the code and the subtrees using debug
implicit interface.
scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)
Generated code:
...
1
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
1
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400493%2fwhat-do-the-number-prefixes-mean-in-explain-operator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).
Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.
You can learn about that Java-generated code parts of a structured query using explain
operator.
val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)
As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.
The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.
You can see the code and the subtrees using debug
implicit interface.
scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)
Generated code:
...
1
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
1
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
add a comment |
I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).
Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.
You can learn about that Java-generated code parts of a structured query using explain
operator.
val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)
As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.
The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.
You can see the code and the subtrees using debug
implicit interface.
scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)
Generated code:
...
1
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
1
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
add a comment |
I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).
Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.
You can learn about that Java-generated code parts of a structured query using explain
operator.
val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)
As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.
The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.
You can see the code and the subtrees using debug
implicit interface.
scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)
Generated code:
...
I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).
Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.
You can learn about that Java-generated code parts of a structured query using explain
operator.
val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)
As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.
The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.
You can see the code and the subtrees using debug
implicit interface.
scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)
Generated code:
...
answered Nov 21 at 11:46
Jacek Laskowski
43.2k17126256
43.2k17126256
1
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
1
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
add a comment |
1
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
1
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
1
1
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39
1
1
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400493%2fwhat-do-the-number-prefixes-mean-in-explain-operator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Strange thing, I've ran several
explain
against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?– morsik
Nov 20 at 20:51
@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35
I use 2.2.0 too
– morsik
Nov 21 at 16:01