PDI /Kettle - Passing data from previous hop to database query











up vote
0
down vote

favorite












I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.



I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).



My first hurdle selecting only the vendor name from 8 fields in the CSV



The second hurdle is how to use that vendor name as a variable in a database query.



My third issue is what type of step to use for the database lookup.



I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.



The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.



My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?



How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?










share|improve this question






















  • How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
    – AlainD
    Nov 20 at 8:46










  • Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.
    – AlainD
    Nov 20 at 8:58










  • @AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
    – lucholland
    Nov 20 at 9:54










  • @AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
    – lucholland
    Nov 20 at 10:03










  • Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.
    – AlainD
    Nov 21 at 10:29















up vote
0
down vote

favorite












I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.



I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).



My first hurdle selecting only the vendor name from 8 fields in the CSV



The second hurdle is how to use that vendor name as a variable in a database query.



My third issue is what type of step to use for the database lookup.



I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.



The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.



My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?



How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?










share|improve this question






















  • How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
    – AlainD
    Nov 20 at 8:46










  • Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.
    – AlainD
    Nov 20 at 8:58










  • @AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
    – lucholland
    Nov 20 at 9:54










  • @AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
    – lucholland
    Nov 20 at 10:03










  • Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.
    – AlainD
    Nov 21 at 10:29













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.



I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).



My first hurdle selecting only the vendor name from 8 fields in the CSV



The second hurdle is how to use that vendor name as a variable in a database query.



My third issue is what type of step to use for the database lookup.



I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.



The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.



My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?



How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?










share|improve this question













I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.



I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).



My first hurdle selecting only the vendor name from 8 fields in the CSV



The second hurdle is how to use that vendor name as a variable in a database query.



My third issue is what type of step to use for the database lookup.



I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.



The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.



My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?



How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?







pentaho kettle






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 at 2:17









lucholland

302311




302311












  • How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
    – AlainD
    Nov 20 at 8:46










  • Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.
    – AlainD
    Nov 20 at 8:58










  • @AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
    – lucholland
    Nov 20 at 9:54










  • @AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
    – lucholland
    Nov 20 at 10:03










  • Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.
    – AlainD
    Nov 21 at 10:29


















  • How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
    – AlainD
    Nov 20 at 8:46










  • Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.
    – AlainD
    Nov 20 at 8:58










  • @AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
    – lucholland
    Nov 20 at 9:54










  • @AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
    – lucholland
    Nov 20 at 10:03










  • Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.
    – AlainD
    Nov 21 at 10:29
















How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46




How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46












Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.
– AlainD
Nov 20 at 8:58




Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.
– AlainD
Nov 20 at 8:58












@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54




@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54












@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03




@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03












Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 at 10:29




Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 at 10:29












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










Due to the fact that




  1. There is at most one vendorrating per vendor.

  2. You have to do something if there is no match.


I suggest the following flow:enter image description here



Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".



Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.






share|improve this answer





















  • that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
    – lucholland
    Nov 21 at 12:41










  • Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
    – AlainD
    Nov 21 at 13:40


















up vote
1
down vote













The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).



enter image description here



First "hurdle" : Once the path of the csv file defined, press the Get field button.



It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.



If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.



If there is more than one header row or other complexities, use the Text file input.



The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.






share|improve this answer





















  • I'll update once I test this out. Thanks!
    – lucholland
    Nov 20 at 10:13











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385287%2fpdi-kettle-passing-data-from-previous-hop-to-database-query%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Due to the fact that




  1. There is at most one vendorrating per vendor.

  2. You have to do something if there is no match.


I suggest the following flow:enter image description here



Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".



Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.






share|improve this answer





















  • that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
    – lucholland
    Nov 21 at 12:41










  • Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
    – AlainD
    Nov 21 at 13:40















up vote
1
down vote



accepted










Due to the fact that




  1. There is at most one vendorrating per vendor.

  2. You have to do something if there is no match.


I suggest the following flow:enter image description here



Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".



Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.






share|improve this answer





















  • that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
    – lucholland
    Nov 21 at 12:41










  • Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
    – AlainD
    Nov 21 at 13:40













up vote
1
down vote



accepted







up vote
1
down vote



accepted






Due to the fact that




  1. There is at most one vendorrating per vendor.

  2. You have to do something if there is no match.


I suggest the following flow:enter image description here



Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".



Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.






share|improve this answer












Due to the fact that




  1. There is at most one vendorrating per vendor.

  2. You have to do something if there is no match.


I suggest the following flow:enter image description here



Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".



Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 at 10:48









AlainD

4,12331026




4,12331026












  • that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
    – lucholland
    Nov 21 at 12:41










  • Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
    – AlainD
    Nov 21 at 13:40


















  • that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
    – lucholland
    Nov 21 at 12:41










  • Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
    – AlainD
    Nov 21 at 13:40
















that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41




that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41












Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40




Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft. Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40












up vote
1
down vote













The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).



enter image description here



First "hurdle" : Once the path of the csv file defined, press the Get field button.



It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.



If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.



If there is more than one header row or other complexities, use the Text file input.



The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.






share|improve this answer





















  • I'll update once I test this out. Thanks!
    – lucholland
    Nov 20 at 10:13















up vote
1
down vote













The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).



enter image description here



First "hurdle" : Once the path of the csv file defined, press the Get field button.



It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.



If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.



If there is more than one header row or other complexities, use the Text file input.



The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.






share|improve this answer





















  • I'll update once I test this out. Thanks!
    – lucholland
    Nov 20 at 10:13













up vote
1
down vote










up vote
1
down vote









The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).



enter image description here



First "hurdle" : Once the path of the csv file defined, press the Get field button.



It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.



If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.



If there is more than one header row or other complexities, use the Text file input.



The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.






share|improve this answer












The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).



enter image description here



First "hurdle" : Once the path of the csv file defined, press the Get field button.



It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.



If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.



If there is more than one header row or other complexities, use the Text file input.



The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 20 at 9:16









AlainD

4,12331026




4,12331026












  • I'll update once I test this out. Thanks!
    – lucholland
    Nov 20 at 10:13


















  • I'll update once I test this out. Thanks!
    – lucholland
    Nov 20 at 10:13
















I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13




I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385287%2fpdi-kettle-passing-data-from-previous-hop-to-database-query%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

404 Error Contact Form 7 ajax form submitting

How to know if a Active Directory user can login interactively

Refactoring coordinates for Minecraft Pi buildings written in Python