Why BootDev — Controlling Cache headers

From AWS document, when u want to cache objects at browser, From S3 or from Cloudfront, at the same time support CORs resources like font, You can use a parameter MaxAgeSeconds: http://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

With all the tests i tried Chrome don’t really respect MaxAgeSeconds , you still need traditional Cache-Control: Max-age=xxx AND Expires: into header. When using AWS Cloudfront as your edge cache / CDN, and especially adding S3 as your origin, you take special care of your cache headers.

You can use API / CLI / UI to change the cache header at the metadata session of S3.

Screen Shot 2015-07-10 at 3.41.24 PM

And at your bucket policy’s permission, set CORs

Screen Shot 2015-07-10 at 3.43.35 PM

Once you success in setting up those things, you can curl -I -H to test your settings. If you use Chrome to test, REMEMBER

  1. DONT click refresh
  2. DONT click CMD +R
  3. Click another link in your website to test 

Otherwise, you will end up in lots of confusion !

run command:

curl -I http://xxxxxx.example.com/fonts/Neutra2Display-Titling.woff -H “Origin: xxxx.example.com”

Screen Shot 2015-07-10 at 3.47.35 PMScreen Shot 2015-07-10 at 3.47.59 PM

first time u will see “Miss From Cloudfront”, if it is your production site url, you may ask why ? You should have many people visiting this obejcy. As the header is different than normal browser, Cloudfront treat it as a new object. So, no worry.

At the second time you curl, you will see “HIT from cloudfront”. So with this setup your resource (this time is font), will be cached on Cloudfront a long time and then once downloaded to browser, it will locally cached as the Cache-control: max-age set.

P.S. Cloudfront respect Cache-Control, so how long your browser will cache = how long your object will stay on Cloudfront.

With MaxAgeSeconds only, your resource can keep at browser with 304.

With Cache-control and expires header, your resource can keep at 200, from cache.

Question: So what does MaxAgeSeconds do here ? Any special require that we always want 304 but not 200, from cache ? I need someone to answer me as well 🙂

Why Bootdev — Dynamic CDN

In the old days, we put images, css / js, woff etc any assets to CDN, so that clients can download somewhere that geographically optimised.

Around end of 2012 – early 2013, new idea comes out like we should CDN everything, so that we can reduce the complex architecture with memcache, page cache cluster (Varnish) or even microccache etc. Just 1 layer cache and having everything to CDN. Like architecture below.

dnamic cdn

Your website domain name will directly point to CDN with CNAME. And then the CDN will point to your load balancer address or your web server. So it is like a proxy. When you do create, it will 100% bypass CDN and goes to web server, when u do UPDATE, the CDN will update from web server then invalidate itself, when you do DELETE, the CDN will bypass request to web server and invalidate its cache. When you read, it read from the CDN, not from web server. So the CRUD action can be completed.

You will need your own cache invalidation strategy, like send update to cloudfront / using versioning object or url.

Here is a sample conf of how we bypass some URLs to go to web server, and making Drupal works.

E1EB6A9C-17EC-446D-AD59-80B471A4F962 62367506-DDC3-4E5C-8F05-24E2D20DBBBB

With AWS cloudfront, you can bypass header ORIGIN, so that you can preform CORs actions. Also you can use similar header bypass feature to detect mobile/PC. With such architecture well setup, theoretically, you can have unlimited PV, as your server wont be really hitted. Your bound will be write DB bound only, which is not a concern in most case.

If you don’t want to understand all these, but want to lower your cost and have higher traffic and faster response, contact bootdev at founders@bootdev.com ! We can deploy Dynamic CDN to you in minutes no matter you are using AWS or not. We can point our CloudFront account to your server, it can be Azure, Linode, or any bare-meter. It just need to be Drupal, and you can enjoy the best performance ever.

ref: https://media.amazonwebservices.com/blog/cloudfront_dynamic_web_sites_full_1.jpg

Why bootdev — Nginx conf extended

Recently, one of our Drupal stack really high CPU, It do monthly increased 30% RPM (request per minute) from new relic. BTW, i think its CPU wont increase like 40%. During my investigation,

  1. some robots scan can somehow eat your CPU. Like many robots will look for wp-login as it guess you are wordpress site and more.
  2. Your CDN may pull your data but goes to a wrong URL and have a ERROR in your php-fpm log
  3. robot from same IP address(es) keep scan your site

We here added some conf which help about 6-7% CPU decrease for a 300-400 RPM Drupal site.

#Set a limit connection zone
limit_conn_zone $binary_remote_addr zone=gulag:5m;

#Check robot
map $http_user_agent $is_bot {
default  ”;
~*crawl|goog|yahoo|yandex|spider|bot|tracker|click|parser is_bot;
### Send all known bots to $args free URLs.
location @nobots {
if ($is_bot) {
rewrite ^ $scheme://$host$uri? permanent;
rewrite ^/(.*)$  /index.php?q=$1 last;

#In CDN Config and Advagg conf of Drupal nginx,
#add below to restrict same IP only allow 32 connections
limit_conn gulag 32;
#Block connection if it is robots (Put anywhere u like)
try_files  $uri @nobots;

#Block wordpress scan attack
location ~ ^/(wp-admin|wp-login\.php) {
deny all;

The block wordpress scan thing can add anything that u find strange from php-fpm/error.log by just changing the path pattern

I have load balancer setup with 2 web server, one with and one without config above. The different on Amazon m3.large is about 6% in a 300-400RPM website, in which both server throughput the same RPM.

Of coz, the above settings is updated into BootDev Nginx conf.

Enjoy 🙂

Why bootdev — Caching (2)

Last time we talked about the 3 level of cache in Drupal + AWS.

This time we go down to more details of APC (Alternative PHP Cache). There are kinds of code level cache (OPcache) for php. Most common are OPCache (default PHP5.5), Xcache, APC. Mostly Drupal sites use APC.

There is also a Drupal APC module , in most case, that is not necessary to be installed. Unless you want to share some Drupal Cache table into APC. For bootdev, we don’t do this. Cache tables should be in a standalone cache instead of the web server, so that, while scale up servers, metadata can share across servers. So your system is ready for horizontal scaling.

You can simply install APC with pecl, the package manager of PHP

pecl install apc

Here I share the APC configure for EC2 m3.large machine which is 7.5GB memory

; configuration for php apc module
;Relative to the number of cached files (you may need to watch your stats for a day or two to find out a good number)

;Relative to the size of site

;The number of seconds a cache entry is allowed to idle in a slot before APC dumps the cache

;Setting this to 0 will give you the best performance, as APC will
;not have to check the IO for changes. However, you must clear 
;the APC cache to recompile already cached files. If you are still
;developing, updating your site daily in WP-ADMIN, and running W3TC
;set this to 1

;This MUST be 0, WP can have errors otherwise! Drupal can set to 1

;Only set to 1 while debugging

;Allow 2 seconds after a file is created before it is cached to prevent users from seeing half-written/weird pages

;Leave at 2M or lower. Site does't have any file sizes close to 2M

apc.rfc1867_prefix =upload_

You can have apc.php inside somewhere of /usr/share/php-apc depends on which server you are using.

Copy this file to your web root and access it, you can see the performance after tunning.

Screen Shot 2014-12-26 at 11.31.32 AM

Why bootdev — Caching

Cache structure

For an LNMP structure, you probably will have 3 layers of caching to protect your servers from hitting too much.

Page cache: MicroCache / Varnish / Cloudfront Dynamic Content

PHP Cache: APC / OpsCache (PHP 5.5+)

in memory cache: ElastiCache (memcache) / ElastiCache (Redis)

In BootDev, we choose MicroCache + APC + memcache. It is a traditional approach here. We have REVOLUTION plan, that use mostly Cloudfront Dynamic Content service, but that is still experimental. Let’s put that later.

Page cache

Cache invalidation sucks. To advoid this, we use microcache buildin nginx rather than using Varnish. With Varnish, of coz you can speed up per every single request. But, you need to invalidate cache when content update. But with Microcache, what we do is expire the content every 40s, so that when the SECOND+ user hit the same page, he will spend up. We only render pages at the first request, so that we use less memory, but same performance if your site traffic is enough.

If you want to make sure every request HIT cache, we suggest to add cache-warmer, which is build in @bootdev management machine (CHEF Server) too.

Here are some basic config Reference: https://github.com/perusio/drupal-with-nginx

fastcgi_cache_path /dev/shm/microcache levels=1:2 keys_zone=MYAPP:5M max_size=256M inactive=2h;
fastcgi_cache_key “$scheme$request_method$host$request_uri”;
add_header X-Cache $upstream_cache_status;
map $http_cookie $cache_uid {
default nil; # hommage to Lisp 🙂
~SESS[[:alnum:]]+=(?<session_id>[[:alnum:]]+) $session_id;
map $request_method $no_cache {
default 1;
GET 0;

#Cache everything by default
set $no_cache 0;
#Don’t cache POST requests
if ($request_method = POST)
set $no_cache 1;
#Don’t cache if the URL contains a query string
if ($query_string != “”)
{    set $no_cache 1;
#Don’t cache the following URLs
if ($request_uri ~* “/(administrator/|login.php)”)
set $no_cache 1;
#Don’t cache if there is a cookie called PHPSESSID
if ($http_cookie = “PHPSESSID”)
set $no_cache 1;

location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
#NOTE: You should have “cgi.fix_pathinfo = 0;” in php.ini
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_intercept_errors on;
fastcgi_pass unix:/var/run/php-fpm-www.sock;
fastcgi_read_timeout 40;
fastcgi_cache MYAPP;
fastcgi_cache_valid 200 301 30s;
fastcgi_cache_bypass $no_cache;
fastcgi_no_cache $no_cache;

# Set cache key to include identifying components
fastcgi_cache_valid 302     1m;
fastcgi_cache_valid 404     1s;
fastcgi_cache_min_uses 1;
fastcgi_cache_use_stale error timeout invalid_header updating http_500;
fastcgi_ignore_headers Cache-Control Expires;
fastcgi_pass_header Set-Cookie;
fastcgi_pass_header Cookie;

## Add a cache miss/hit status header.
add_header X-Micro-Cache $upstream_cache_status;

## To avoid any interaction with the cache control headers we expire
## everything on this location immediately.
expires epoch;

## Cache locking mechanism for protecting the backend of too many
## simultaneous requests.
fastcgi_cache_lock on;

The first few lines of code is to tell where to store cache, and the limit. Also, try to handle auth user. (It need details tunning per site) But, in general , above lines of code will be enough.

Remember to have    #NOTE: You should have “cgi.fix_pathinfo = 0;” in php.ini

PHP Cache

For code cache, we use APC. And later will switch to OpsCache, as APC for PHP 5.4- OpsCache for PHP 5.5+

Share some of the php-fpm.d/www.conf . It really depends on your server resource, calculation and loading test result.

listen = /var/run/php-fpm-www.sock #using .sock over port 9001 will be faster, if your php server and nginx are on the same server, it will be ok.

pm = dynamic

pm.max_children = 50
pm.start_servers = 4
pm.min_spare_servers = 4
pm.max_spare_servers = 10

request_terminate_timeout = 40

In php.ini we have

memory_limit = 256M

Config above is base on loading test result that we enabled APC and also run under EC2 m3.large with 2000 users from blazemeter.

In memory cache

I see some people will now use Redis over memcache. For bootdev, we use memcache as it is fast and simple enough.

For redis, you can have more advance features like query it with noSQL. But, we don’t need to use all these as in memory cache for Drupal is just use to cache block / views / other things from cache-api. You should install php-memcache library too.

Add below to Drupal with Drupal memcache module, so that you can have memcache backend.

$conf[‘cache_backends’][] = ‘sites/all/modules/contrib/memcache/memcache.inc’;
$conf[‘cache_default_class’] = ‘MemCacheDrupal’;
$conf[‘cache_class_cache_form’] = ‘DrupalDatabaseCache’;
$conf[‘memcache_key_prefix’] = ‘your_prefix’;
$conf[‘memcache_servers’] = array(
‘yourCacheNode:11211’ => ‘default’,
‘yourCacheNode:11211’ => ‘others’,

$conf[‘memcache_bins’] = array(
‘cache’ => ‘default’,
‘cache_block’ => ‘others’,
‘cache_content’ => ‘others’,
‘cache_filter’ => ‘others’,
‘cache_form’ => ‘others’,
‘cache_menu’ => ‘others’,
‘cache_page’ => ‘others’,
‘cache_update’ => ‘others’,
‘cache_views’ => ‘others’,
‘cache_views_data’ => ‘others’,

You can set different type of cache to different cache node. Of coz, you need to prepare cache node in AWS first. And that is why bootdev, one click for both server architect and relative configurations 🙂

Now you have better understanding of caching in bootdev or (Drupal + AWS)

Why BootDev — For website builders’ Backend as a service (wBaaS)

Recently, we launched a project, bootdev which deploys a configured Drupal site (Drucloud) into a pre given architecture into users’ personal AWS account. We call it website backend as a service (wBaaS). With this approach, we can deliver our hard earned experience to other business owner or developers without REWORK of what we did.

What we deliver is just configuration and knowledge. After the deployment, we dont host it, users’ host their site in their own AWS. We also provide a technical foundation / playground for people to add features /  experience best practice(s).

Currently available with Drupal 7.

With just 1 click, you can have pre configured like:

  • Caching
    • Nginx micro-cache
    • Cache pre-warmer
    • PHP APC
    • Memcache (AWS ElastiCache)
    • MySQL Query Cache
  • Database
    • AWS RDS configuration for Drupal
    • Multi-AZ
    • Secure by under AWS VPC
  • Web server
    • Nginx Drupal configuration
      • advagg css / js compression support
      • Image style catching
      • Micro cache
      • AWS Cloudfront CDN support (far future expire)
      • Cache control headers
    • PHP-FPM
      • APC
      • Max client configurations under EC2 m3.large
  • SOLR Search
    • SOLR index cron job
    • Drupal SOLR integration
  • DevOps
    • Cloudformation
    • Chef
    • Auto-scaling
    • Git deploy with bitbucket private repo + deploy key
  • Maintenance
    • System maintenance cron job
    • xmlsitemap
    • social network stat
  • Email
    • SPAM email control
    • Mass mail support (AWS SES support)
  • MAP
    • Google Map integration
  • Social
    • Social network meta-tag
  • File handling
    • Drupal S3 integration
    • Push to CDN
    • Separate file handling with other Drupal services
  • CDN
    • Drupal CDN configuration
    • Suport CORs
  • Server architecture
    • 2* EC2 m3.large web server
    • 1* EC2 m1.small chef server
    • 1* Mutli-AZ m3.large RDS
    • 1* EC2 m3.large SOLR server
    • 2* ElastiCache node m1.medium
    • Cloudfront CDN
    • Amazon S3
    • Auto-scaling with ELB
    • 2* EC2 m1.small GlusterFS
    • Inside VPC

In coming topics, I will explain why you need each of those configurations to make your site awesome.