AWS SQS with a single worker?
up vote
1
down vote
favorite
I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.
AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.
Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.
amazon-web-services aws-lambda amazon-sqs serverless
add a comment |
up vote
1
down vote
favorite
I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.
AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.
Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.
amazon-web-services aws-lambda amazon-sqs serverless
In theory, SQS messages are consumed by just one consumer, isn't it?
– Héctor
Nov 20 at 8:21
When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.
– hendry
Nov 20 at 8:37
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.
AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.
Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.
amazon-web-services aws-lambda amazon-sqs serverless
I'm struggling to establish a queue in an AWS environment where the tasks are consumed by one Lambda / worker.
AWS Lambda automatically scales however I don't want that. The trouble is the function makes several complex changes to a database and there can be race conditions. Unfortunately this is out of my control.
Therefore it is easier to ensure there is one worker instead of solving the complex SQL issues. So what I want is whenever there is a messages in the queue, a single worker receives the messages and completes the tasks sequentially. Order does not matter.
amazon-web-services aws-lambda amazon-sqs serverless
amazon-web-services aws-lambda amazon-sqs serverless
asked Nov 20 at 8:15
hendry
3,92195182
3,92195182
In theory, SQS messages are consumed by just one consumer, isn't it?
– Héctor
Nov 20 at 8:21
When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.
– hendry
Nov 20 at 8:37
add a comment |
In theory, SQS messages are consumed by just one consumer, isn't it?
– Héctor
Nov 20 at 8:21
When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.
– hendry
Nov 20 at 8:37
In theory, SQS messages are consumed by just one consumer, isn't it?
– Héctor
Nov 20 at 8:21
In theory, SQS messages are consumed by just one consumer, isn't it?
– Héctor
Nov 20 at 8:21
When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.
– hendry
Nov 20 at 8:37
When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.
– hendry
Nov 20 at 8:37
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
Set the concurrency limit on the Lambda function to 1
.
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
add a comment |
up vote
0
down vote
As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.
I have two suggestions for you, however:
- If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.
- If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to1
in the Lambda function's settings.
– Mark B
Nov 20 at 15:10
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Set the concurrency limit on the Lambda function to 1
.
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
add a comment |
up vote
1
down vote
Set the concurrency limit on the Lambda function to 1
.
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
add a comment |
up vote
1
down vote
up vote
1
down vote
Set the concurrency limit on the Lambda function to 1
.
Set the concurrency limit on the Lambda function to 1
.
answered Nov 20 at 14:23
Mark B
98.9k15156171
98.9k15156171
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
add a comment |
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
Good point- although I'd always understood that the if you do this SQS still dispatches the messages at 5/time, and 4 of these will fail and be re-driven, which (depending on your configuration) will end up with them just being dumped into the DLQ (ref: jeremydaly.com/…)
– thomasmichaelwallace
Nov 20 at 15:22
add a comment |
up vote
0
down vote
As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.
I have two suggestions for you, however:
- If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.
- If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to1
in the Lambda function's settings.
– Mark B
Nov 20 at 15:10
add a comment |
up vote
0
down vote
As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.
I have two suggestions for you, however:
- If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.
- If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to1
in the Lambda function's settings.
– Mark B
Nov 20 at 15:10
add a comment |
up vote
0
down vote
up vote
0
down vote
As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.
I have two suggestions for you, however:
- If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.
- If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).
As you've noticed that 'built-in' SQS starts with a minimum of five workers and scales up.
I have two suggestions for you, however:
- If you only have one shard, then kinesis (with a batch-size of one item), will ensure sequential, ordered, execution. This is because Kinesis is parallel by shard (and one shard can take 1000 records/second, so it's probably fine to only have one!) and the built-in lambda trigger takes a customisable batch size (which can be 1) and waits for it to complete before taking the next batch.
- If you need to use SQS, then the "old" way of integrating (prior to the SQS trigger) will give you a "most likely one" and sequential execution. This is when you actually trigger your lambda on a Scheduled CloudWatch Event, which allows you to have a single lambda checking the queue every X (configured by you). The challenge here is if X is shorter than the amount of time it takes to process a message, then a second lambda will run in parallel (there are patterns such as having X = the timeout of your lambda, and just having your lambda run for 5 minutes going through the queue one message at a time).
answered Nov 20 at 9:40
thomasmichaelwallace
2,5001817
2,5001817
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to1
in the Lambda function's settings.
– Mark B
Nov 20 at 15:10
add a comment |
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to1
in the Lambda function's settings.
– Mark B
Nov 20 at 15:10
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Thanks, I am now looking into Kinesis Data Stream. Another question I have is how to avoid duplicate records in the stream?
– hendry
Nov 20 at 10:16
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
Btw the lambda could take a large batch to process one by one, as I think it is a better approach than each tasks triggering a lambda execution. Lambda's 15 minute timeout should be more than enough for the typical workloads expected.
– hendry
Nov 20 at 10:21
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
That's up to you (there's unlikely to be much change in sum execution time, thus cost)- but you should notice that Kinesis cannot "acknowledge" so you either retry (or dump) the whole batch on error; which makes 1-by-1 sound more suitable for what I understand of your use case.
– thomasmichaelwallace
Nov 20 at 10:23
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
As for duplicates- if you need to de-duplicate, possibly the best pattern (noting that neither SNS or SQS do this either) is to use DynamoDb. Given that a task can be uniquely identified by an id, you can write the task to DDb and then use the 'INSERT' transactions on the dynamo stream (i.e. ignore the UPDATE/DELETE) which will only occur the first time the unique task id is called.
– thomasmichaelwallace
Nov 20 at 10:26
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to
1
in the Lambda function's settings.– Mark B
Nov 20 at 15:10
There's no need to switch from SQS to Kinesis for this, or to stop using the built-in SQS/Lambda integration. You simply need to set the concurrency limit to
1
in the Lambda function's settings.– Mark B
Nov 20 at 15:10
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53388753%2faws-sqs-with-a-single-worker%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
In theory, SQS messages are consumed by just one consumer, isn't it?
– Héctor
Nov 20 at 8:21
When I connect it to my lambda, it just horizontally scales and the messages are processed concurrently... or is it in parallel? This will cause race conditions for me.
– hendry
Nov 20 at 8:37