Uploading a file in S3 via Carrierwave creates multiple notification events
I found this duplication last week and wrote this article to note what I did.
First of all
I’m using those two gems for my project and talking about them.
Background
I use the carrierwave backgrounder
for uploading audio files. Last week, I implemented the transcription for the audio files. Then I set the notification event in the S3 bucket to trigger a lambada for transcription. But I noticed there were two events for one audio file like below.
# 1st
{
"Messages": [
{
"MessageId": "ac5e9675-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"ReceiptHandle": "AQEB...ABCD",
"MD5OfBody": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Body": "{
"Records": [
{
"eventVersion":"2.1",
"eventSource":"aws:s3",
"awsRegion":"xxxxxxxxx",
"eventTime":"2023-11-16T16:40:47.303",
"eventName":"ObjectCreated:Copy",
"userIdentity":{...},
"requestParameters":{...},
"responseElements":{...},
"s3":{...},
"object":{"key":"path/to/audio_file.mp3"}
}
]
}"
}
]
}
# 2nd
{
"Messages": [
{
"MessageId": "ac5e9675-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"ReceiptHandle": "AQEB...ABCD",
"MD5OfBody": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Body": "{
"Records": [
{
"eventVersion":"2.1",
"eventSource":"aws:s3",
"awsRegion":"xxxxxxxxx",
"eventTime":"2023-11-16T16:41:04.593Z",
"eventName":"ObjectCreated:Copy",
"userIdentity":{...},
"requestParameters":{...},
"responseElements":{...},
"s3":{...},
"object":{"key":"path/to/audio_file.mp3"}
}
]
}"
}
]
}
It was annoying because the transcription ran twice for one file and the process after transcription ran twice, which led to duplicated records in a database. So I started to investigate why it happened.
What I did
First of all, I felt weird to ObjectCreated:Copy
event because it was supposed to ObjectCreated:Put
or ObjectCreated:Post
. So I had a quick look at the carrierwave
gem and the following part is about the copy thing.
cache
method calls store
method inside then if the file is instantiated by the same class, the store
method calls the copy method of fog, which creates creating ObjectCreated:Copy
event in S3. At this moment, I thought there would be two events because the first one was created for the cache file and the other was done for the final output file. From here, I tried to prove my hypothesis.
Judging from what I found so far, I started to assume there would be two different classes for cache and final output file. Based on the hypothesis, I found the following code lines related to cache and how it works.
Long story short, once I set self.class.cache_storage
, carrierwave
uses different classes for cache and final output files. Then I can create ObjectCreated:Put
or ObjectCreated:Post
. I checked how to set the value and found the option on the README.
It’s super easy. Just to add config.cache_storage = :file
in somewhere. In my case, I put the settings in the audio uploader itself and tried uploading a file. Finally, I got only one ObjectCreated:Put
event like below.
{
"Messages": [
{
"MessageId": "380d9b0c-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"ReceiptHandle": "AQEB...ABCD",
"MD5OfBody": "0bedxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Body": "{
"Records": [
{
"eventVersion":"2.1",
"eventSource":"aws:s3",
"awsRegion":"xxxxxxxxx",
"eventTime":"2023-11-16T22:43:59.082Z",
"eventName":"ObjectCreated:Put",
"userIdentity":{...},
"requestParameters":{...},
"responseElements":{...},
"s3":{...},
"object":{"key":"path/to/audio_file.mp3"}
}
]
}"
}
]
}
After all, I was like “It seems my hypothesis is correct” and I looked at carrierwave backgournder
README. Then I found the following comment.
# This is required if you are using S3 or any other remote storage.
# CarrierWave's default is to store the cached file remotely which is slow and uses bandwidth.
# By setting this to File, it will only store on saving of the record.
cache_storage CarrierWave::Storage::File
What I found was written in the README. hahaha. Next time, I’ll check the READMEs which I use first.
That’s it!