Tuesday, April 17, 2012

Instagram - architecture that worth now 1B

  • Amazon shop. They use many of Amazon's services. With only 3 engineers so don’t have the time to look at self hosting.
  • 100+ EC2 instances total for various purposes.
  • Ubuntu Linux 11.04 (“Natty Narwhal”). Solid, other Ubuntu versions froze on them.
  • Amazon’s Elastic Load Balancer routes requests and 3 nginx instances sit behind the ELB.
  • SSL terminates at the ELB, which lessens the CPU load on nginx.
  • Amazon’s Route53 for the DNS.
  • 25+ Django application servers on High-CPU Extra-Large machines.
  • Traffic is CPU-bound rather than memory-bound, so High-CPU Extra-Large machines are a good balance of memory and CPU.
  • Gunicorn as their WSGI server. Apache harder to configure and more CPU intensive.
  • Fabric is used to execute commands in parallel on all machines. A deploy takes only seconds.
  • PostgreSQL (users, photo metadata, tags, etc) runs on 12 Quadruple Extra-Large memory instances.
  • Twelve PostgreSQL replicas run in a different availability zone.
  • PostgreSQL instances run in a master-replica setup using Streaming Replication. EBS is used for snapshotting, to take frequent backups.
  • EBS is deployed in a software RAID configuration. Uses mdadm to get decent IO.
  • All of their working set is stored memory. EBS doesn’t support enough disk seeks per second.
  • Vmtouch (portable file system cache diagnostics) is used to manage what data is in memory, especially when failing over from one machine to another, where there is no active memory profile already.
  • XFS as the file system. Used to get consistent snapshots by freezing and unfreezing the RAID arrays when snapshotting.
  • Pgbouncer is used pool connections to PostgreSQL.
  • Several terabytes of photos are stored on Amazon S3.
  • Amazon CloudFront as the CDN.
  • Redis powers their main feed, activity feed, sessions system, and other services.
  • Redis runs on several Quadruple Extra-Large Memory instances. Occasionally shard across instances.
  • Redis runs in a master-replica setup. Replicas constantly save to disk. EBS snapshots backup the DB dumps. Dumping on the DB on the master was too taxing.
  • Apache Solr powers the geo-search API. Like the simple JSON interface.
  • 6 memcached instances for caching. Connect using pylibmc & libmemcached. Amazon Elastic Cache service isn't any cheaper.
  • Gearman is used to: asynchronously share photos to Twitter, Facebook, etc; notifying real-time subscribers of a new photo posted; feed fan-out.
  • 200 Python workers consume tasks off the Gearman task queue.
  • Pyapns (Apple Push Notification Service) handles over a billion push notifications. Rock solid.
  • Munin to graph metrics across the system and alert on problems. Write many custom plugins using Python-Munin to graph, signups per minute, photos posted per second, etc.
  • Pingdom for external monitoring of the service.
  • PagerDuty for handling notifications and incidents.
  • Sentry for Python error reporting.

RAMFS vs TMPFS on Linux

RAMFS vs TMPFS on Linux

[Linux Ramfs and Tmpfs]Using ramfs or tmpfs you can allocate part of the physical memory to be used as a partition. You can mount this partition and start writing and reading files like a hard disk partition. Since you’ll be reading and writing to the RAM, it will be faster.

When a vital process becomes drastically slow because of disk writes, you can choose either ramfs or tmpfs file systems for writing files to the RAM.


Both tmpfs and ramfs mount will give you the power of fast reading and writing files from and to the primary memory. When you test this on a small file, you may not see a huge difference. You’ll notice the difference only when you write large amount of data to a file with some other processing overhead such as network.

1. How to mount Tmpfs

# mkdir -p /mnt/tmp  # mount -t tmpfs -o size=20m tmpfs /mnt/tmp

The last line in the following df -k shows the above mounted /mnt/tmp tmpfs file system.

# df -k Filesystem      1K-blocks  Used     Available Use%  Mounted on /dev/sda2       32705400   5002488  26041576  17%   / /dev/sda1       194442     18567    165836    11%   /boot tmpfs           517320     0        517320    0%    /dev/shm tmpfs           20480      0        20480     0%    /mnt/tmp

2. How to mount Ramfs

# mkdir -p /mnt/ram  # mount -t ramfs -o size=20m ramfs /mnt/ram

The last line in the following mount command shows the above mounted /mnt/ram ramfs file system.

# mount /dev/sda2 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) fusectl on /sys/fs/fuse/connections type fusectl (rw) tmpfs on /mnt/tmp type tmpfs (rw,size=20m) ramfs on /mnt/ram type ramfs (rw,size=20m)

You can mount ramfs and tmpfs during boot time by adding an entry to the /etc/fstab.

3. Ramfs vs Tmpfs

Primarily both ramfs and tmpfs does the same thing with few minor differences.

  • Ramfs will grow dynamically. So, you need control the process that writes the data to make sure ramfs doesn’t go above the available RAM size in the system. Let us say you have 2GB of RAM on your system and created a 1 GB ramfs and mounted as /tmp/ram. When the total size of the /tmp/ram crosses 1GB, you can still write data to it. System will not stop you from writing data more than 1GB. However, when it goes above total RAM size of 2GB, the system may hang, as there is no place in the RAM to keep the data.
  • Tmpfs will not grow dynamically. It would not allow you to write more than the size you’ve specified while mounting the tmpfs. So, you don’t need to worry about controlling the process that writes the data to make sure tmpfs doesn’t go above the specified limit. It may give errors similar to “No space left on device”.
  • Tmpfs uses swap.
  • Ramfs does not use swap.

4. Disadvantages of Ramfs and Tmpfs

Since both ramfs and tmpfs is writing to the system RAM, it would get deleted once the system gets rebooted, or crashed. So, you should write a process to pick up the data from ramfs/tmpfs to disk in periodic intervals. You can also write a process to write down the data from ramfs/tmpfs to disk while the system is shutting down. But, this will not help you in the time of system crash.

Table: Comparison of ramfs and tmpfs
Experimentation Tmpfs Ramfs
Fill maximum space and continue writing Will display error Will continue writing
Fixed Size Yes No
Uses Swap Yes No
Volatile Storage Yes Yes

If you want your process to write faster, opting for tmpfs is a better choice with precautions about the system crash.