What do the number prefixes mean in explain operator?

What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.

enter image description here

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

asked Nov 20 at 19:47

user10349797

Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51

@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35

I use 2.2.0 too
– morsik
Nov 21 at 16:01

add a comment |

What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.

enter image description here

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

asked Nov 20 at 19:47

user10349797

Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51

@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35

I use 2.2.0 too
– morsik
Nov 21 at 16:01

add a comment |

What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.

enter image description here

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

asked Nov 20 at 19:47

user10349797

What does (1), (6) and (3) mean in the following output of explain. Spark version is 2.3.1.

enter image description here

apache-spark apache-spark-sql

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

asked Nov 20 at 19:47

user10349797

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

asked Nov 20 at 19:47

user10349797

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

edited Nov 21 at 11:47

Jacek Laskowski

43.2k17126256

asked Nov 20 at 19:47

user10349797

asked Nov 20 at 19:47

user10349797

asked Nov 20 at 19:47

user10349797

Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51

@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35

I use 2.2.0 too
– morsik
Nov 21 at 16:01

add a comment |

Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51

@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35

I use 2.2.0 too
– morsik
Nov 21 at 16:01

Strange thing, I've ran several explain against synthetic data frames and don't see these number. Can you share code snippet please, so I can reproduce this on my env?
– morsik
Nov 20 at 20:51

@morsik The numbers were added around Spark 2.2. What's your Spark version?
– Jacek Laskowski
Nov 21 at 14:35

I use 2.2.0 too
– morsik
Nov 21 at 16:01

add a comment |

1 Answer
1

active

oldest

votes

I think it was around Spark 2.0 when Spark SQL started generating Java code for some parts of structured queries. That feature is called Whole-Stage Java Code Generation (aka Whole-Stage CodeGen).

Whole-Stage Java Code Generation (aka Whole-Stage CodeGen) is simply a physical query optimization in Spark SQL that fuses multiple physical operators (as a subtree of plans that support code generation) together into a single Java function.

You can learn about that Java-generated code parts of a structured query using explain operator.

val q = spark.range(5)

  .groupBy('id % 2 as "g")

  .agg(collect_list('id) as "ids")

  .join(spark.range(5))

  .where('id === 'g)

scala> q.explain

== Physical Plan ==

*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight

:- *(3) Filter isnotnull(g#1266L)

:  +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])

:     +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)

:        +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])

:           +- *(1) Range (0, 5, step=1, splits=8)

+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))

   +- *(2) Range (0, 5, step=1, splits=8)

As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.

The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.

You can see the code and the subtrees using debug implicit interface.

scala> q.queryExecution.debug.codegen

Found 3 WholeStageCodegen subtrees.

== Subtree 1 / 3 ==

*(1) Range (0, 5, step=1, splits=8)



Generated code:

...

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

1

I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01

@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39

1

Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400493%2fwhat-do-the-number-prefixes-mean-in-explain-operator%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can learn about that Java-generated code parts of a structured query using explain operator.

val q = spark.range(5)

  .groupBy('id % 2 as "g")

  .agg(collect_list('id) as "ids")

  .join(spark.range(5))

  .where('id === 'g)

scala> q.explain

== Physical Plan ==

*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight

:- *(3) Filter isnotnull(g#1266L)

:  +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])

:     +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)

:        +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])

:           +- *(1) Range (0, 5, step=1, splits=8)

+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))

   +- *(2) Range (0, 5, step=1, splits=8)

As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.

The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.

You can see the code and the subtrees using debug implicit interface.

scala> q.queryExecution.debug.codegen

Found 3 WholeStageCodegen subtrees.

== Subtree 1 / 3 ==

*(1) Range (0, 5, step=1, splits=8)



Generated code:

...

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

1

I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01

@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39

1

Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23

add a comment |

You can learn about that Java-generated code parts of a structured query using explain operator.

val q = spark.range(5)

  .groupBy('id % 2 as "g")

  .agg(collect_list('id) as "ids")

  .join(spark.range(5))

  .where('id === 'g)

scala> q.explain

== Physical Plan ==

*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight

:- *(3) Filter isnotnull(g#1266L)

:  +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])

:     +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)

:        +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])

:           +- *(1) Range (0, 5, step=1, splits=8)

+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))

   +- *(2) Range (0, 5, step=1, splits=8)

As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.

The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.

You can see the code and the subtrees using debug implicit interface.

scala> q.queryExecution.debug.codegen

Found 3 WholeStageCodegen subtrees.

== Subtree 1 / 3 ==

*(1) Range (0, 5, step=1, splits=8)



Generated code:

...

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

1

I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01

@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39

1

Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23

add a comment |

You can learn about that Java-generated code parts of a structured query using explain operator.

val q = spark.range(5)

  .groupBy('id % 2 as "g")

  .agg(collect_list('id) as "ids")

  .join(spark.range(5))

  .where('id === 'g)

scala> q.explain

== Physical Plan ==

*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight

:- *(3) Filter isnotnull(g#1266L)

:  +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])

:     +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)

:        +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])

:           +- *(1) Range (0, 5, step=1, splits=8)

+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))

   +- *(2) Range (0, 5, step=1, splits=8)

As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.

The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.

You can see the code and the subtrees using debug implicit interface.

scala> q.queryExecution.debug.codegen

Found 3 WholeStageCodegen subtrees.

== Subtree 1 / 3 ==

*(1) Range (0, 5, step=1, splits=8)



Generated code:

...

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

You can learn about that Java-generated code parts of a structured query using explain operator.

val q = spark.range(5)

  .groupBy('id % 2 as "g")

  .agg(collect_list('id) as "ids")

  .join(spark.range(5))

  .where('id === 'g)

scala> q.explain

== Physical Plan ==

*(3) BroadcastHashJoin [g#1266L], [id#1272L], Inner, BuildRight

:- *(3) Filter isnotnull(g#1266L)

:  +- ObjectHashAggregate(keys=[(id#1264L % 2)#1278L], functions=[collect_list(id#1264L, 0, 0)])

:     +- Exchange hashpartitioning((id#1264L % 2)#1278L, 200)

:        +- ObjectHashAggregate(keys=[(id#1264L % 2) AS (id#1264L % 2)#1278L], functions=[partial_collect_list(id#1264L, 0, 0)])

:           +- *(1) Range (0, 5, step=1, splits=8)

+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))

   +- *(2) Range (0, 5, step=1, splits=8)

As you noticed, I have a query with 3 starred round-bracketed numbers. These adornment (the star and the numbers) are all part of whole-stage java code generation optimization.

The numbers denote WholeStageCodegen subtrees for which Spark SQL generates separate functions that all together are the underlying code that Spark SQL uses to execute the query.

You can see the code and the subtrees using debug implicit interface.

scala> q.queryExecution.debug.codegen

Found 3 WholeStageCodegen subtrees.

== Subtree 1 / 3 ==

*(1) Range (0, 5, step=1, splits=8)



Generated code:

...

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

answered Nov 21 at 11:46

Jacek Laskowski

43.2k17126256

1

I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01

@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39

1

Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23

add a comment |

1

I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01

@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39

1

Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23

I was indeed not entirely off the mark.
– thebluephantom
Nov 21 at 13:01

@thebluephantom Sure. You were not. Eventually you end up with RDDs and tasks as for most structured queries.
– Jacek Laskowski
Nov 21 at 13:39

Thank you @JacekLaskowski
– user10349797
Nov 21 at 14:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Tukukkk