PDI /Kettle - Passing data from previous hop to database query
up vote
0
down vote
favorite
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
add a comment |
up vote
0
down vote
favorite
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46
Which step do you call "Dynamic query". For theDynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.
– AlainD
Nov 20 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03
Dynamic SQL Row could be an answer. If for example you need toSELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 at 10:29
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.
I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).
My first hurdle selecting only the vendor name from 8 fields in the CSV
The second hurdle is how to use that vendor name as a variable in a database query.
My third issue is what type of step to use for the database lookup.
I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.
The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.
My best effort was to use a dynamic query using:
SELECT * FROM VENDORRATINGS WHERE VENDOR = ?
How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?
pentaho kettle
pentaho kettle
asked Nov 20 at 2:17
lucholland
302311
302311
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46
Which step do you call "Dynamic query". For theDynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.
– AlainD
Nov 20 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03
Dynamic SQL Row could be an answer. If for example you need toSELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 at 10:29
add a comment |
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46
Which step do you call "Dynamic query". For theDynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.
– AlainD
Nov 20 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03
Dynamic SQL Row could be an answer. If for example you need toSELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.
– AlainD
Nov 21 at 10:29
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46
Which step do you call "Dynamic query". For the
Dynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table
you'll loose all the field which are not in the table.– AlainD
Nov 20 at 8:58
Which step do you call "Dynamic query". For the
Dynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table
you'll loose all the field which are not in the table.– AlainD
Nov 20 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03
Dynamic SQL Row could be an answer. If for example you need to
SELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.– AlainD
Nov 21 at 10:29
Dynamic SQL Row could be an answer. If for example you need to
SELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.– AlainD
Nov 21 at 10:29
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40
add a comment |
up vote
1
down vote
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40
add a comment |
up vote
1
down vote
accepted
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
Due to the fact that
- There is at most one vendorrating per vendor.
- You have to do something if there is no match.
I suggest the following flow:
Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".
Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.
answered Nov 21 at 10:48
AlainD
4,12331026
4,12331026
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40
add a comment |
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.
– AlainD
Nov 21 at 13:40
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
that works perfectly. Thank you! Do you have any suggestions for books / tutorials on Kettle or PDI?
– lucholland
Nov 21 at 12:41
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.
Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.– AlainD
Nov 21 at 13:40
Although it is not in SOF policy to give such advise, the best one is from Matt Casters, the author of the soft.
Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
, J. van Dongen, R. Bouman, M. Casters.– AlainD
Nov 21 at 13:40
add a comment |
up vote
1
down vote
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
add a comment |
up vote
1
down vote
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
add a comment |
up vote
1
down vote
up vote
1
down vote
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
The best practice is a Stream lookup
. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).
First "hurdle" : Once the path of the csv file defined, press the Get field
button.
It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.
If the name is not on the first line, uncheck the Header row present
, press the Get field
button, and then change the name on the panel.
If there is more than one header row or other complexities, use the Text file input
.
The same is valid for the lookup step
: use the Get lookup field
button and delete the fields you do not need.
answered Nov 20 at 9:16
AlainD
4,12331026
4,12331026
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
add a comment |
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
I'll update once I test this out. Thanks!
– lucholland
Nov 20 at 10:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385287%2fpdi-kettle-passing-data-from-previous-hop-to-database-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?
– AlainD
Nov 20 at 8:46
Which step do you call "Dynamic query". For the
Dynamic SQL row
, you need to built the SELECT...WHERE Vendor="<value>" in a field. For theInput table
you'll loose all the field which are not in the table.– AlainD
Nov 20 at 8:58
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..
– lucholland
Nov 20 at 9:54
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).
– lucholland
Nov 20 at 10:03
Dynamic SQL Row could be an answer. If for example you need to
SELECT SUM(field)
but, generally speaking, it requires some work to built a String containing the query.– AlainD
Nov 21 at 10:29