What do the number prefixes mean in explain operator?












0














What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.



enter image description here










share|improve this question
























  • Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
    – morsik
    Nov 20 at 20:51












  • @morsik The numbers were added around Spark 2.2. What's your Spark version?
    – Jacek Laskowski
    Nov 21 at 14:35










  • I use 2.2.0 too
    – morsik
    Nov 21 at 16:01
















0














What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.



enter image description here










share|improve this question
























  • Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
    – morsik
    Nov 20 at 20:51












  • @morsik The numbers were added around Spark 2.2. What's your Spark version?
    – Jacek Laskowski
    Nov 21 at 14:35










  • I use 2.2.0 too
    – morsik
    Nov 21 at 16:01














0












0








0







What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.



enter image description here










share|improve this question















What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.



enter image description here







apache-spark apache-spark-sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 at 11:47









Jacek Laskowski

43.2k17126256




43.2k17126256










asked Nov 20 at 19:47









user10349797

82




82












  • Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
    – morsik
    Nov 20 at 20:51












  • @morsik The numbers were added around Spark 2.2. What's your Spark version?
    – Jacek Laskowski
    Nov 21 at 14:35










  • I use 2.2.0 too
    – morsik
    Nov 21 at 16:01


















  • Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
    – morsik
    Nov 20 at 20:51












  • @morsik The numbers were added around Spark 2.2. What's your Spark version?
    – Jacek Laskowski
    Nov 21 at 14:35










  • I use 2.2.0 too
    – morsik
    Nov 21 at 16:01
















Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51






Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51














@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35




@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35












I use 2.2.0 too
– morsik
Nov 21 at 16:01




I use 2.2.0 too
– morsik
Nov 21 at 16:01












1 Answer
1






active

oldest

votes


















2














I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).



Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.



You can learn about that Java-generated code parts of a structured query using explain operator.



val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)


As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.



The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.



You can see the code and the subtrees using debug implicit interface.



scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)

Generated code:
...





share|improve this answer

















  • 1




    I was indeed not entirely off the mark.
    – thebluephantom
    Nov 21 at 13:01










  • @thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
    – Jacek Laskowski
    Nov 21 at 13:39






  • 1




    Thank you @JacekLaskowski
    – user10349797
    Nov 21 at 14:23











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400493%2fwhat-do-the-number-prefixes-mean-in-explain-operator%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).



Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.



You can learn about that Java-generated code parts of a structured query using explain operator.



val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)


As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.



The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.



You can see the code and the subtrees using debug implicit interface.



scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)

Generated code:
...





share|improve this answer

















  • 1




    I was indeed not entirely off the mark.
    – thebluephantom
    Nov 21 at 13:01










  • @thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
    – Jacek Laskowski
    Nov 21 at 13:39






  • 1




    Thank you @JacekLaskowski
    – user10349797
    Nov 21 at 14:23
















2














I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).



Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.



You can learn about that Java-generated code parts of a structured query using explain operator.



val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)


As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.



The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.



You can see the code and the subtrees using debug implicit interface.



scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)

Generated code:
...





share|improve this answer

















  • 1




    I was indeed not entirely off the mark.
    – thebluephantom
    Nov 21 at 13:01










  • @thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
    – Jacek Laskowski
    Nov 21 at 13:39






  • 1




    Thank you @JacekLaskowski
    – user10349797
    Nov 21 at 14:23














2












2








2






I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).



Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.



You can learn about that Java-generated code parts of a structured query using explain operator.



val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)


As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.



The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.



You can see the code and the subtrees using debug implicit interface.



scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)

Generated code:
...





share|improve this answer












I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).



Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.



You can learn about that Java-generated code parts of a structured query using explain operator.



val q = spark.range(5)
.groupBy('id % 2 as "g")
.agg(collect_list('id) as "ids")
.join(spark.range(5))
.where('id === 'g)
scala> q.explain
== Physical Plan ==
*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight
:- *(3) Filter isnotnull(g#1266L)
: +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])
: +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)
: +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])
: +- *(1) Range (0, 5, step=1, splits=8)
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
+- *(2) Range (0, 5, step=1, splits=8)


As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.



The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.



You can see the code and the subtrees using debug implicit interface.



scala> q.queryExecution.debug.codegen
Found 3 WholeStageCodegen subtrees.
== Subtree 1 / 3 ==
*(1) Range (0, 5, step=1, splits=8)

Generated code:
...






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 at 11:46









Jacek Laskowski

43.2k17126256




43.2k17126256








  • 1




    I was indeed not entirely off the mark.
    – thebluephantom
    Nov 21 at 13:01










  • @thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
    – Jacek Laskowski
    Nov 21 at 13:39






  • 1




    Thank you @JacekLaskowski
    – user10349797
    Nov 21 at 14:23














  • 1




    I was indeed not entirely off the mark.
    – thebluephantom
    Nov 21 at 13:01










  • @thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
    – Jacek Laskowski
    Nov 21 at 13:39






  • 1




    Thank you @JacekLaskowski
    – user10349797
    Nov 21 at 14:23








1




1




I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01




I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01












@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39




@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39




1




1




Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23




Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400493%2fwhat-do-the-number-prefixes-mean-in-explain-operator%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

TypeError: fit_transform() missing 1 required positional argument: 'X'