Amazon s3 provides high availability by storing multiple copies of your data across different availability zone within your data center for each successful create file operation.
In this diagram, when an user initiate new file create process, the file is stored in one of the Availability zone first followed by its replica in other AZ’s for each successful file upload operation.
There are two data consistency model is supported in s3 by amazon as of now. These are
- Read-after-write consistency
- Eventual consistency
AWS s3 consistency model started with eventual only. But currently it supports new read-after-write along with eventual consistency.
1. Read-after-write consistency
Read-after-write consistency helps in immediate visibility of a new object to all clients & its more consistent as compared to eventual consistency.
Suppose, when you initiate a new object creation or upload process in s3 bucket, as soon as the file creation is completed then you would be able to read the object. Same as, when you PUT an object, as soon as you get a success response, you would be able to get that object.
When a write operation initiates, it does not wait for all six copies (N.Virginia) to be replicated across all availability zone. As soon as its uploaded in any one of the AZ, you would be able to read the object which is more consistent.
Amazon s3 provides read-after-write consistency for PUTS of new objects.
- A process writes a new object to Amazon s3 and immediately list keys within its bucket. until the change is fully propagated, the object might not appear in the list.
The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon s3 provides eventual consistency for read-after-write.
2. Eventual Consistency
Amazon s3 offers eventual consistency for overwrite PUTS and DELETES
As name suggests, overwrite PUTS means there is already an object exists with same name for your new object upload/creation and as that of overwrite DELETE means you are deleting an existing object in s3.
To understand this, when an overwrite PUTS operation happens, the final action would take place only after when all the copies are replicated across all availability zone. So there would be sometime required from the time the overwrite PUTS request is issued and the action is completed across all the AZ’s in your respective region. The data would not be consistent during this period of time.
For an example, lets say you have an object with some content and you initiated the overwrite PUTS again with same object name but some modification to the original object content, then it usually takes sometime to propagate the object to get copied across all AZ’s in your region by replacing the existing objects with new one and in between this propagation, if you try to read the object, then there is chances that you may find the existing or new object. As per this consistency model, you would be able to read the actual object only after object copy is entirely copied across all AZ’s.
- A process replaces an existing bucket and immediately tries to read it. Until the change is fully propagated, Amazon s3 might return the previous data
- A process deletes an existing object and immediately tries to read it. Until the deletion is fully propagated, Amazon s3 might return the deleted data
- A process deletes an existing object and immediately list keys within its bucket. Until the deletion is fully propagated, Amazon s3 might list the deleted object
How read-after-write is better than eventual consistency?
For a new object PUT operation, it does not require to wait for the object to read unless the object is copied across all AZ’s in your region. That means, if the object is copied in any one of the AZ and immediately you initiate a read or GET operation, then you would be able to read or get the object.
I hope this blog helps to understand the data consistency model in s3. Please comment below if you have any concerns/questions related to this blog.