Exam Notes – S3

S3 is one of the oldest services. It will definitely feature in the exam.

S3 = Simple Storage Service. 

Provides:

  • secure
  • durable
  • highly scalable
  • object storage
  • WS interface to store and retrieve any amount of data from anywhere on the web

Data is spread across multiple devices and facilities.

Facts:

  • object based
  • files size = 0 bytes – 5TB
  • unlimited storage (but you pay by the Gig)
  • folders are called buckets
  • buckets have a universal namespace
  • bucket has an (DNS) address
    • https://s3-
    • <region>. (like eu-west-1.)
    • amazonaws.com/
    •  <bucket name>
  • HTTP 200 code for sucessful uploads
  • Built for 99.99% availability – SLA for 99.9 % availability
  • 11 9s durability (closes in on 100% depending on the storage class)
  • Tiered storage available
  •  Lifecycle Management supported (move objects between storage tiers)
  • Versioning supported
  • Encryption supported (different types)
  • Data can be secured using ACLs and Bucket Policies

Data consistency model for S3

  • Read after Write consistency for PUTS of new objects
  • Eventual consistency for overwrite PUTs and DELETES

Simple Key / Value store

  • Key (name of the file)
  • Value (data – sequence of bytes)
  • Version ID
  • Metadata (data about the data, like tags)
  • Subresources
    • Acces Control Lists (ACLs)

S3 Storage Classed

  • S3 Standard
    • 99.99% availability
    • 11 9s durability 
    • stored on multiple devices (disks)
    • > 2 facilities used (AZs)
    • designed to sustain the loss of 2 facilities concurrently
  • S3 IA
    • Less frequent but still fast access
    • Lower storage fee
    • Includes retrieval fee
  • S3 One Zone IA
    • Single zone so no data reliliance
    • ok for data that can be easily recreated
  • Glacier
    • Very cheap
    • Archival purposes
    • Longer retrieval times
 
Note: There is no tetrieval fee for S3 Standard, for everything else there is.
 
S3 Charges
  • You will be charged for:
    • Storage (by the Gig)
    • Requests (if we have an object in S3 and 1000 people download it, we will be charged for each download)
    • Storage Management Pricing (if you put tags on an object you will be charged for it)
    • Data transfer pricing (CRR)
    • Transfer Acceleration
 Exam Tips
  • S3 is object based
  • file size 0kbs to 5TB
  • unlimited storage
  • files stored in bucket
  • bucket name must be unique globally (universal namespace)
  • remember bucket CDN format
  • Remember consistency models of S3
  • Remember Storage Classes
    • S3 (durable, available immediately, frequently accessed)
    • S3- IA (durable, immediatly available infrequent access)
    • S3 One Zonw IA (even cheaper that IA, but only in one AZ)
    • Glacier – archiving, longer (3-5 hours) retrieval times
    • Remember core fundamentals of S3
      • Key, value, version ID, metadata, subresouces (ACLs)
  • You can add Tags to buckets – After you activate cost allocation tags, AWS uses the tags to organize your resource costs on your cost allocation report. Cost allocation tags can only be used to label buckets.
 

To Remember:

  • Buckets are a universal namespace
  • Sucessful uploads returns a HTTP 200 code
  • S3, S3 IA, S3 one zone IA
  • Encryption
    • client side
    • server side
      • SSE-S3
      • SSE-KMS
      • SSE-C
  • Control access to buckets with ACLs or bucket policies
  • BY DEFAULT BUCKETS ARE PRIVATE AND ALL OBJECTS STORED INSIDE THEM ARE PRIVATE

Notes:

  • AWS has chenged how objects are made public.
  • Previously, you only had to go to the object itself and make it public
  • Now
  • you need to start with your account. The below were ON after I created my root account. I needed to switch these off.
  • Then you need to go the the bucket permissions and unclick the boxes in the permissions there
  • Only then can you go to the file and make it public.
    • Any files loaded into this bucket after this, you only need to deal with the file
  • However, if you create a new bucket, you need to start the above again, starting from the account settings

You switch on version control under bucket properties. Once on it cannot be switched off, only suspended.

Changes to files are uploaded to S3 as new files with the same name but a later version. 

Every file uploaded to S3 needs to be made public before it can be viewed. Same applies to every version of the file.

Deleting a file does not delete it, rather a new version is created which is a delete marker on the latest existing version. The way to restore files is to delete the delete marker.

If you really want to delete the entire file, you need to delete all versions.

NOTES:

  • Every version is a new file. Think about this for larger files. Space requirements need to be considered.
  • It is possible to implement MFA for deletes

Exam tips

  • Stores all versions, even deletes
  • Great backup tool
  • Once enabled, cannot be disabled, only suspended
  • Integrates with lifecycle management
  • Has MFA delete capability (See AWS web pages on this)

For this to work versioning needs to be enabled on both buckets.

Only new objects or changes to existing objects are replicated, not the list of objects existing at the time replication is enabled

If you want to copy the contents of one bucket to another, you need to do it another way.

–> Use a CLI tool –> google “AWS CLI tools” –> one of the first links

  • Install windows 64 bit version
  • Open terminal
  • Type AWS configure
  • Type (pass access key + secret accesskey of a user) 
  • Copy/Paste keys
  • Give default region (like eu-west-2)
  • THEN TYPE “aws s3 ls”
  • TYPE “aws s3 cp –recursive s3:/// s3://
  • /

Permissions are not replicated

New versions are replicated but deletion of versions is not

Exam Tips

  • Versioning must be enabled on both source and target buckets
  • Regions must be unique, cannot replicate within the same region
  • Source and destination buckets do not have to have the same storage class. They can also have different owners.
  • Existing files are NOT replicated automatically
  • New files and updates to existing files ARE replicated automatically
  • Cannot replicate to multiple buckets
  • Delete markers are NOT replicated
  • Deletion of individual versions or delete markers are NOT replicated
  • UNDERSTAND CROSS REGION AT A HIGH LEVEL – NEEDS TO BE TURNED ON

Scenario:

  • Data only relevant for 30 days
  • After 30 days (from creation date) it can be stored in S3 IA (saves cost) – also rule about files size being 128kb in size ???
  • S3 One Zone IA in here aswell
  • After 60 days (from creation date) it can be sent to Glacier

After bucket creation you can click on “Management” and setup life cycle rules. Its a way to automate transition to tiered storage (e.g. S3 to One Zone IA to Glacier and even to Expiration).

Its also possible to have lifecycle rules for file versions and say that older versions are moved according to a schema and the original stays put. Just an example.

Also, you cannot enable clean up expired object delete markers if you enable Expiration.

Exam Tips

  • Used with or without versioning
  • can apply to current and previous versions
  • transition to IA storage x days after creation
  • Archive to Glacier Y days (default 30) after creation
  • Lifecycle can be used to permanently delete objects aswell (expiration)

A content delivery network (CDN) is a system of distributed servers (network) that deliver webpages and other web content to a user based on the geographic locations of the user, the origin of the content and a content delivery server.

Cloudfront solves the problem of latency and the different latencies experienced depending on where you are and where the content you are requesting is.

In the second picture, the user who requests the data first gets no benefit from Cloud Front. However the subsequent users (green) get a substantial benefit.

Key terminology:

  • Edge location – location where the content will be cached (seperate to an AWS Region/AZ). There are over 50 EDGE locations in the world currently
  • Origin (the origin of all the files that the CDN will distribute. This can be either an S3 bucket, an EC2 instance, an Elastic Load Balancer or route53). An origin does not even have to be with AWS, you can have your own custom origin server.
  • Distribution (name given to CDN which consists of a collection of edge locations)
  • Distribution TTL (Time To Live – determines how long the data should be cached for on the edge location)
  • Web distribution (generally used for websites) / RTMP distribution (used for media streaming)

Exam Tips

  • Understand the above and how they work together
  • Edge locations are not just read only, you can write to them too (put an object on them)
  • Objects are cached for the life of the TTL
  • You can clear cached objects but you will be charged (like if you have a new video and you want it live straight away – you will need to clear the cache and you will be charged)

CDN

Creation of:

  • AWS console –> CloudFront
  • Click “Create Distribution”
  • Get Started on Web distribution
  • Original domain name – where is our origin coming from
  • Origin Path – you can have multiple origins in a distribution. This could point to a bucket containing other buckets
  • Origin ID – user defined name to distinguish this origin from other origins in a distribution
  • Restrict bucket access – yes means that the content url can no longer be used directly and that all requests must go through CloudFront
  • Grant Read Permissions on Bucket – basically means that read permissons are granted as part of this process.
  • Viewer Protocol Policy – what is allowed? HTTP or HTTPS or redirect HTTP to HTTPS
  • Allowed HTTP Methods – GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE
  • TTL Settings – in seconds. Min, Max, Default (preset as 86400 seconds (24 hours))
  • Restrict Viewer Access – this is using signed URLs or signed cookies. This is important if your content is, for example, restricted to a particular audience and isnt for everybody. Find out more about this …
  • Price Class – can restrict to specific geographies
  • Alternative names – CNAMES, used instead of the long string generated by AWS. More readable
  • SSL Certificates – there is a default or you could upload your own
  • Logging – on/off, select bucket to store logs, make up a log prefix for easy management (log prefix might make a bucket inside your set bucket using the name of the prefix). Cookie logging
  • Distribution State – Enabled / Disabled

It can take 5-10 mins for the distribution to deploy. It can also be deleted, can take up to 15 mins to do this also

In the distribution settings, you can:

  • create more origins for this distribution
  • set up behaviours – for example, you can set it up that specific file types can be retrieved from specific busket (origin?)
  • create georestrictions  which means you can create a black list or a white list (not both) and select countries from a list.
  • create invalidations – this is how you can remove an object from and edge location instead of waiting for a TTL to expire (this is charged)

CloudFront can be used to deliver your entire web site including dynamic static streaming and interactive content using a global network of Education’s request feature content are automatically routed to the nearest edge location. So content is delivered with the best possible performance.

Amazon’s CloudFront is optimized to work with other Amazon Web services like Amazon S3, Amazon’s EC2, Amazon’s elastic load balance and Amazon’s route 53. Amazon CloudFront also works seamlessly with any known AWS origin server which stores the original definitive versions of the files.

How to secure your buckets. All new buckets are private.

  • You can use ACLs (Access Control Lists) or Bucket Policies
    • Bucket Policies are bucket wide
    • ACLs may be applied down to object level (folders, files) – you can make one object public whil the rest of the bucket is private
  • S3 bucket can be configured to log bucket access
    • You can log to that bucket, to another bucket or to a different AWS user

Encryption:

  1. In Transit
    • data to and from the bucket
    • Secured using SSL / TLS (TLS is basically the replacement to SSL) – means using https
  2. At Rest
    • Server Side
      • S3 managed keys (SSE-S3)
        • each object is encrypted with a unique key employing strong multi factor encryption.
        • As an extra safe guard Amazon encrypt this unique key itself with a master key and they regularly rotate that master key. Amazon handle all the keys for you.
        • It is AES-256 (Advanced Encryption Standard 256 bit) – to use it, you just click on the object and select encrypt. Easy.
      • AWS key management service (Managed keys SSE-KMS) –
        • KMS = Key Management Service.
        • Similiar to SSE-S3 but has a few additional benefits (and also some additional charges for using it).
        • There are seperate permissions for an envelope key (a key which protects your data encryption key – extra layer against unauthorised access to your objects.
        • This method provides an audit trail of when the key was used and who was using the key – additional level of transparency as to who is decrypting what and when.
        • You can create and manage encryption keys yourself, or you can use the default key which is unique to you, the service that you are using as well as the region you are working in.
      • Server side encryption with customer provided keys (SSE-C) –
        • Customer provided keys.
        • Amazon still manages the encryption and decription of the data but the keys are provided by the customer
    • Client Side
      • encrypt at client side and upload to S3

How AWS Storage Gateway Works (Architecture)

AWS Storage Gateway is a service that connects an on-premises software appliance with cloud-based starage to provide seamless and secure integration between an organizations on-premises IT environment and AWS’s storage infrastructure. The service enables you to securely store data to the AWS cloud for scalable and cost-effiective storage.

It is a virtual appliance, connected to your data centre, that asynchronously propogates/replicates your data up to AWS ( could be S3 – Glacier – whichever is right for you)

Basically this is where a companies operations and IT are seperated from the company storage. The link to the storage is via this gateway.

AWS Storage Gateway’s software appliance is available for download as a VM image that you install on a hoat in your datacanter. Storage Gateway supports either VMware ESXi or Microsoft Hyper-V. Once you’ve installed your gateway and associated it with your AWS account through the activation process, you can use the AWS Management Console to create the storage gateway option that is right for you.

The storage gateway is an application which needs to be downloaded and stored on a host inside the company datacenter.

4 types of Storage Gateway

  1. File Gateway – NFS (newest one – flat files in S3)
  2. Volume Gateway (iSCSI) – block based storage (usually from a virtual hard disk) – Stored volumes
    • stores entire data on premis (used to be called gateway stored volumes)
  3. Volume Gateway (iSCSI) – block based storage (usually from a virtual hard disk) – Cached volumes
    • stores only recently accessed data (rest of the data is backed of into AWS) (used to be called gateway cachedvolumes)
  4. Tap Gateway (VTL) – create tapes and send to AWS (used to be called gateway vurtual tape library)

FILE GATEWAY

Files are stored as objects in your S3 Bucket, accessed through a Network File System (NFS) mount point. Ownership, permissions, and timestamps are durably stored in S3 in the user-metadata of the object

associated with the file. Once objects are transferred to S3, they can be managed as native S3 objects and bucket policies such as versioning, lifecycle management and cross region replication apply directly to objects stored in your bucket.

If we are using a VPC it means that the Storage Gateway application is also in AWS – everything in AWS.

VOLUME GATEWAY

  • The volume interface presents your applications with disk volumes using the iSCSI block protocol. You can store databases on it, you can run applications on it. Its like a virtual hard disk
  • Data written to these volumes (virtual hard disks) can be asynchronously backed up as point-in-time snapshots of your volumes, and stored in the cloud as Amaxon EBS (Elastic Block Store) snapshots
  • Snapshots are incremental bamkups that capture only changed blocks. All snapshot storage is also compressed to minimise your storage charges.
  • 2 Types
    • Stored Volumes
      • lets you store your primary data locally, while asynchronously backing up that data to AWS. Stored volumes provide your on-premises applications with low-latency access to their entire datasets while providing durable off-site backups. You can create storage volumes and mount them as iSCSI devices from your on-premises application servers. Data written to your stored volumes is stored on your on-premises storage hardware. This data is asynchronously backed up to Amazon Elastic Block Store (Amazon EBS) snapshots. 1GB – 16 TB in size for Stored Volumes.
      •  
    • Cached Volumes
      • lets you use S3 as your primary data storage while retaining frequently accessed data locally in your storage gateway. Cached volumes minimize the need to scale your on-premises storage infrastructure while still providing your applications with low latency access to their frequently accessed data. You can create storage volumes up to 32 TiB in size and attach to them as iSCSI devices from your on-premises application servers. Your gateway stores data that you write to these volumes in Amazon S3 and retains recently read data in your on-premises storage gateways cache and upload buffer storage – everytime you read data it is stored in the cache storage of your application server. 1GB – 32 TB in size for Cached Voumes.
      •  

TAPE GATEWAY

  • Virtual Tape Libray (VTL – old name)
  • Supported by many backup applications
  •  

 – Exam Tips

  • File GW – flat files only, no data stored locally at all, only files in S3
  • Volume GW
  • 2 types
    • Stored Volumes
    • Cached Volumes
  • VTL
  • Exam has scenarion questions where you need to choose which storage approach is appropriate.

Before Snowball AWS provided Import/Export disks. This was a problem as people sent in all kinds of disk types, sizes and interface types.

Types of Snowballs

  1. Snowball
  2. Snowball edge
  3. Snowmobile

 – Snowball

  • Petobyte scale data transport solution. Very secure
  • 80TB of snowballs available in all regions
  • Tamper resistant enclosures
  • 256 bit encryption (AES)
  • Uses TPM (Trusted Platform Module) to ensure security and full chain of custody of your data – on kindles for example
  • After upload, SW erasure of the snowball appliance is run

 – Snowball Edge

  • Looks the same as the previous one
  • 100TB of data
  • on board starage and compute capabilitios
    • its like bringind a little AWS data center on premise
    • you can run LAMDA functions on your data and stere the results of these in S3 also
  • Connects to local storage the same as snowball (using standard storage interfaces) but it can also process your data onpremise – ensures applications can run even when not connected to the cloud

 – Snowmobile

  • Exabyte levels of data
  • driven around on a truck
  • one snowmoble ca take 100PB – 10 trucks to make an exabyte
  • If a company has this amount of data it would take 6 months to transfer using snowmobiles (25 years over the wire)

 – Exam Tips

  • What is a snowball
  • what happened before there were snowballs
  • Snowball can
    • import to S3
    • export from S3

 – LAB

  • You need a client to access the snowball – you get it from the console. It is not under storage, it is under  .
  • when ordering the snowball you need to define your S3 buckets
  • When starting up you will need a manifext file and an unlock code –> credentials
  • There is a specific command line you need to run to kick off the snowball – uses manifest file and unlock code

Basically, instead of upliading directly to an S3 bucket, you upliad to an edge location. From there AWS uses its fast, optimised backbone network to transfer the files to S3.

“Create bucket –> properties –> Transfer Acceleration –> ENABLE”

You get a url for transfer acceleration to this busket. Take note of the new domain name, .s3-accelerate.amazonaws.com.

Also here you can compare transfer acceleration per region – there is a link you can click. It will tell you the percentage gain you get by accessessing the bucket from the different regions if transfer acceleration is turned on. Thats why, the further away you are the greater the benefit.

Basically what happens is that you use cloudfrone and the edge locations nearest to you to accelerate your data uploads.

The further away you are from your bucket, the greater the benefit of using transfer acceleration.

Static – no dynamic content

  • “create bucket –> properties –> static web site hosting”
    • If you decide to point a real domain name to this bucket, you need to use route 53 (where you buy domain names)
    • Then your domain name and your bucket name have to be the same for Route 53 for S3 to work with S3
  • –> options
  • –> end point (used in Route 53)

Common exam question is around the format of the endpoint URL

http://and bucket name>.s3-website-us-east-1.amazonaws.com

s3-website = service

us.east-1 = region

Summary – Checklist

  1. Object based storage (block based = EBS). What can go here, what cannot? Lifecycle rules?
  2. Files sizes allowed
  3. Storage limitations
  4. buckets (explain)
  5. bucket names and urls
  6. consistency model for S3
  7.  S3 storage tiers or classes – S3, S3 IA, S3 One Zone IA, Glacier –  (availability, durability, redundancy, retrieval time/costs, scalability) (One-zone replaces RRS (reduced redundancy))
  8. Core fundamentals of S3 (key/value store +  version ID, + metadata + ACLs)
  9. Versioning, what does it allow, how do you enable it? Advantages? Disadvantages? What to watch out for? Where does MFA fit in? How to use with lifecycle rules?
  10. Cross region replication – prerequisites?
  11. Lifecycle management and versioning. Transfer, where to where (also to deletion)? What are the rules around file sizes and durations (30 + 30 + 30) on each class?
  12. Cloud Front – edge locations. What is an origin – what can it be (4 options)? What is a distribution? For each distribution, you can have multiple origins (origin does not even need to be inside AWS)
    • types of distributions (2 types)
    • read / write / both
    • What is TTL? Units? Defaults?
    • How does clearing cached objects work?
  13. Bucket security. Defaults? How to manage security (2 types)
    • Access logs. How do they work?
  14. Encryption types (2 types) – In transit (SSL / TLS) + At rest (SSE-S3 / SSE-KMS / SSE-C). Other option – client side encryption
  15. Storage Gateway – File GW + Volume GWs (x2) + VTL GW
  16. Snowball – types (x3) and what their capabilities are. Snowmobiles? Understand what Import/Export is (old version). Snowball can import to S3 and export from S3
  17. S3 transfer acceleration – how does it work?
  18. Static websites – scaling – serverless
  19. S3 successful writes
  20. Multipart uploads
  21. READ S3 FAQ