keithyau’s Incredible 2015

In my past 30 years, I don’t mind getting hands dirty (**apply to my keyboard/software only). Not refuse but I will try to avoid touching hardware (Wood/electronics) ~

  1. the reason is i had a super dad do all stuff for me, from homework to toy to experiments to home furniture …
  2. I think 1 person do 1 good thing already too hard, if i’m good at software i should keep focus…

Although my world not small (no comparison here),  looks like i were keeping away from many good things  due to 1) and 2)

At end of Nov 2015, i made my first DIY IEKA standing desk & MacBook Dock. A good start to try getting my hands raw and training execution power, later on, i will develop more DIY tool ~

IMG_4605 IMG_4603

Hey dad, your son grow up ! After making this, I feel really happy. DIY is King !

My crew and i did lots of stuff in 2015 ~ Remarkable as below

  1. Thanks Byron & Spencer’s trust (if they read the post), we hosted up Coconuts media with 10M+ PV traffic, peak 20k users, SLA around 99.99% up time. Before 2015, it were about 1M-2M PV. Also, thanks editors team bring in traffic so that i can have the challenges, not everyone can meet real traffic.
  2. Kevin Tsang Jacky Chan and I Keith Yau found a team — BootDev, originally from Mobingi, Now focus in AWS/Drupal related community business, pushing Drupal and AWS user group forward in China. This year, we did lots of community works in China & APAC. Thanks AWS China team, prompt me as AWS Community hero ! It is my honor.Screen Shot 2015-12-25 at 2.17.28 PM.png
  3. Congrads Former Mobing get into 500Startups and thanks Wayland Zhang proofed the concept works. I will super charge and make another better thing.
  4. May2015, thanks Drupal Assoc buy me free tickets to LA and join DrupalCon, As a Drupal scholarship winner, i see Drupal and the community in a total different angle.
    IMG_7993
  5. Thanks my crew, Jacky & Kevin~ Travel with me and enjoy this crazy year ! We been to Bangkok, Singapore, Taipei, Beijing, Shanghai, San Fransisco, LA and …. Did lots of presentation and Pushing our community business forward !
    IMG_4046 IMG_4202
  6. May2015 also, First time been to Canada, first time see wild animals in national park and thanks mum & uncle & sisters for everything
    DSC_0134
  7. May2015 also, thanks spencer & his gradma, first time been to San Fransisco, the city of tech, everyone (include me)’s dream ! We visit some companies, getting into a SF local home and supermarket and learn alot ! I will be there ! Officially !
    IMG_7333IMG_7905
  8. We had a Chinese website launched about how we do Scaling as a service. Thanks strikingly for their super tool to let us build a website in a few hours !
  9. Jun 2015, Becky and me’s first relocation to Thailand, we visited lots of apartments and try to rent one ~ Anyway, failed but really nice experience that planned to live outside our country. P.S. I WILL RETRY !
    IMG_3806 IMG_3816 IMG_3823
  10. We organize a drupal tech meetup late Nov and supported by China Accelerator, which were really successful.
    IMG_4554
  11. Drupal 8 released this Nov2015, finally
    IMG_4555
  12. Oct2015, thanks Becky zhang ! Brave girl ! Present us on stage with hundreds of people And get a prize for BootDev !
    IMG_4455 IMG_4450
  13. Nov2015, Got my first TV + PS4 at my own place 48 inch 4K ! Feel sweet and home
    IMG_4526
  14. Dec2015, AWS Shanghai User group launched ! The first China User group ! 851333089_4577116538317424546.jpg
  15. Dec 2015, The best workspace ever in my life !

Hopefully 2016, Coconuts can growth 100M+ traffic and Bootdev can get 500 developers join our both Drupal and AWS user group !

See how far we can go at 2016 ! and remarkable 2015 !

 

 

Why BootDev — Controlling Cache headers

From AWS document, when u want to cache objects at browser, From S3 or from Cloudfront, at the same time support CORs resources like font, You can use a parameter MaxAgeSeconds: http://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

With all the tests i tried Chrome don’t really respect MaxAgeSeconds , you still need traditional Cache-Control: Max-age=xxx AND Expires: into header. When using AWS Cloudfront as your edge cache / CDN, and especially adding S3 as your origin, you take special care of your cache headers.

You can use API / CLI / UI to change the cache header at the metadata session of S3.

Screen Shot 2015-07-10 at 3.41.24 PM

And at your bucket policy’s permission, set CORs

Screen Shot 2015-07-10 at 3.43.35 PM

Once you success in setting up those things, you can curl -I -H to test your settings. If you use Chrome to test, REMEMBER

  1. DONT click refresh
  2. DONT click CMD +R
  3. Click another link in your website to test 

Otherwise, you will end up in lots of confusion !

run command:

curl -I http://xxxxxx.example.com/fonts/Neutra2Display-Titling.woff -H “Origin: xxxx.example.com”

Screen Shot 2015-07-10 at 3.47.35 PMScreen Shot 2015-07-10 at 3.47.59 PM

first time u will see “Miss From Cloudfront”, if it is your production site url, you may ask why ? You should have many people visiting this obejcy. As the header is different than normal browser, Cloudfront treat it as a new object. So, no worry.

At the second time you curl, you will see “HIT from cloudfront”. So with this setup your resource (this time is font), will be cached on Cloudfront a long time and then once downloaded to browser, it will locally cached as the Cache-control: max-age set.

P.S. Cloudfront respect Cache-Control, so how long your browser will cache = how long your object will stay on Cloudfront.

With MaxAgeSeconds only, your resource can keep at browser with 304.

With Cache-control and expires header, your resource can keep at 200, from cache.

Question: So what does MaxAgeSeconds do here ? Any special require that we always want 304 but not 200, from cache ? I need someone to answer me as well 🙂

Why Bootdev — Dynamic CDN

In the old days, we put images, css / js, woff etc any assets to CDN, so that clients can download somewhere that geographically optimised.

Around end of 2012 – early 2013, new idea comes out like we should CDN everything, so that we can reduce the complex architecture with memcache, page cache cluster (Varnish) or even microccache etc. Just 1 layer cache and having everything to CDN. Like architecture below.

dnamic cdn

Your website domain name will directly point to CDN with CNAME. And then the CDN will point to your load balancer address or your web server. So it is like a proxy. When you do create, it will 100% bypass CDN and goes to web server, when u do UPDATE, the CDN will update from web server then invalidate itself, when you do DELETE, the CDN will bypass request to web server and invalidate its cache. When you read, it read from the CDN, not from web server. So the CRUD action can be completed.

You will need your own cache invalidation strategy, like send update to cloudfront / using versioning object or url.

Here is a sample conf of how we bypass some URLs to go to web server, and making Drupal works.

E1EB6A9C-17EC-446D-AD59-80B471A4F962 62367506-DDC3-4E5C-8F05-24E2D20DBBBB

With AWS cloudfront, you can bypass header ORIGIN, so that you can preform CORs actions. Also you can use similar header bypass feature to detect mobile/PC. With such architecture well setup, theoretically, you can have unlimited PV, as your server wont be really hitted. Your bound will be write DB bound only, which is not a concern in most case.

If you don’t want to understand all these, but want to lower your cost and have higher traffic and faster response, contact bootdev at founders@bootdev.com ! We can deploy Dynamic CDN to you in minutes no matter you are using AWS or not. We can point our CloudFront account to your server, it can be Azure, Linode, or any bare-meter. It just need to be Drupal, and you can enjoy the best performance ever.

ref: https://media.amazonwebservices.com/blog/cloudfront_dynamic_web_sites_full_1.jpg

why BootDev — AWS management (EC2 decrease performance by time)

In our experience, if your EC2 got resource left for a period of time, lets say 1-2 months, even AWS say it is not, but we always experience performance drop. For example: server response time drop from 400ms to 600ms . Or CPU raised from 40% to 70% in average.

In many case, many users will not 100% * 24 * 7  consume their server resources. So, in my experience, AWS want to reduce cost and reallocate some resource privately to other clients. If your server is actually low utilization, it is OK. But, in my situation, we need high response time and we reserve the CPU for that. We will feel performance drop.

How we deal with that ? That is what auto-scaling group should do. If in your auto-scaling group, your server will keep renew itself. Like turning off old servers and start new servers, the EC2 resource allocation will maintain at best performance.

Such issue especially obvious in Drupal. Drupal PHP request involve in many hooks / functions / third party calls etc. Server CPU requirement is higher than other web PHP applications. So, renewing server can help to keep better performance.

How to deploy auto-scaling group ? Call BootDev 🙂

Hope above information can help anyone that feel AWS sometimes drop performance by time 🙂

 

 

 

 

Why bootdev — Nginx conf extended

Recently, one of our Drupal stack really high CPU, It do monthly increased 30% RPM (request per minute) from new relic. BTW, i think its CPU wont increase like 40%. During my investigation,

  1. some robots scan can somehow eat your CPU. Like many robots will look for wp-login as it guess you are wordpress site and more.
  2. Your CDN may pull your data but goes to a wrong URL and have a ERROR in your php-fpm log
  3. robot from same IP address(es) keep scan your site

We here added some conf which help about 6-7% CPU decrease for a 300-400 RPM Drupal site.

#Set a limit connection zone
limit_conn_zone $binary_remote_addr zone=gulag:5m;

#Check robot
map $http_user_agent $is_bot {
default  ”;
~*crawl|goog|yahoo|yandex|spider|bot|tracker|click|parser is_bot;
}
###
### Send all known bots to $args free URLs.
###
location @nobots {
if ($is_bot) {
rewrite ^ $scheme://$host$uri? permanent;
}
rewrite ^/(.*)$  /index.php?q=$1 last;
}

#In CDN Config and Advagg conf of Drupal nginx,
#add below to restrict same IP only allow 32 connections
limit_conn gulag 32;
#Block connection if it is robots (Put anywhere u like)
try_files  $uri @nobots;

#Block wordpress scan attack
location ~ ^/(wp-admin|wp-login\.php) {
deny all;
}

The block wordpress scan thing can add anything that u find strange from php-fpm/error.log by just changing the path pattern

I have load balancer setup with 2 web server, one with and one without config above. The different on Amazon m3.large is about 6% in a 300-400RPM website, in which both server throughput the same RPM.

Of coz, the above settings is updated into BootDev Nginx conf.

Enjoy 🙂

Why BootDev — Nginx Config

We had spent many effort on Nginx configuration. Different with Apache + mod_php5, Nginx + php_fpm need much detail configuration and nginx is Drupal module dependent. It means some Drupal module require support from Nginx configuration. Like CDN module / Advagg module.

There is a github about Drupal + Nginx, but that will be too much and you will require to filter the necessary part in your project.

Here i share the main nginx configure

server {
  server_name *.compute.amazonaws.com;
  root   /opt/source/app;
  access_log  /var/log/nginx/access.log;
  error_log  /var/log/nginx/error.log;

  #include /etc/nginx/apps/drupal/drupal.conf;
  #Cache everything by default
  set $no_cache 0;
  #Don't cache POST requests
  if ($request_method = POST)
  {
    set $no_cache 1;
  }

  #Don't cache if the URL contains a query string
  if ($query_string != "")
  {    
    set $no_cache 1;
  }

  #Don't cache the following URLs
  if ($request_uri ~* "/(administrator/|login.php)")
  {
    set $no_cache 1;
  }

  #Don't cache if there is a cookie called PHPSESSID
  if ($http_cookie = "PHPSESSID")
  {
    set $no_cache 1;
  }

  # Enable compression, this will help if you have for instance advagg module
  # by serving Gzip versions of the files.
  gzip_static on;

  location = /favicon.ico {
    log_not_found off;
    access_log off;
  }

  location = /robots.txt {
    allow all;
    log_not_found off;
    access_log off;
  }

  # This matters if you use drush
  location = /backup {
    deny all;
  }

  # Very rarely should these ever be accessed outside of your lan
  location ~* \.(txt|log)$ {
    deny all;
  }

  location ~ \..*/.*\.php$ {
    return 403;
  }

  location / {
    # This is cool because no php is touched for static content
    try_files $uri @rewrite;
  }

  location @rewrite {
    # Some modules enforce no slash (/) at the end of the URL
    # Else this rewrite block wouldn't be needed (GlobalRedirect)
    rewrite ^/(.*)$ /index.php?q=$1;
  }

  location ~ \.php$ {
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    #NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini
    include fastcgi_params;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    fastcgi_pass unix:/var/run/php-fpm-www.sock;
    fastcgi_read_timeout 40;
    fastcgi_cache MYAPP;
    fastcgi_cache_valid 200 301 30s;
    fastcgi_cache_bypass $no_cache;
    fastcgi_no_cache $no_cache;

    # Set cache key to include identifying components
    fastcgi_cache_valid 302     1m;
    fastcgi_cache_valid 404     1s;
    fastcgi_cache_min_uses 1;
    fastcgi_cache_use_stale error timeout invalid_header updating http_500;
    fastcgi_ignore_headers Cache-Control Expires;
    fastcgi_pass_header Set-Cookie;
    fastcgi_pass_header Cookie;

    ## Add a cache miss/hit status header.
    add_header X-Micro-Cache $upstream_cache_status;

    ## To avoid any interaction with the cache control headers we expire
    ## everything on this location immediately.
    expires epoch;

    ## Cache locking mechanism for protecting the backend of too many
    ## simultaneous requests.
    fastcgi_cache_lock on;
  }

  # Catch image styles for D7.
  location ~ ^/sites/.*/files/ {
    try_files $uri @rewrite;
  }

  # Catch image styles for AmazonS3 D7.
  location ~ ^/system/files/styles/ {
    try_files $uri @rewrite;
  }

  location ~* \.(js|css|png|jpg|jpeg|gif|ico|svg)$ {
    expires max;
    log_not_found off;
  }

  location ~* \.(eot|ttf|woff|svg) {
    add_header Access-Control-Allow-Origin *;
    try_files $uri @rewrite;
  }

  ##   
  # Advanced Aggregation module CSS
  ##   
  # http://drupal.org/project/advagg.
  ##    
  location ^~ /sites/default/files/advagg_css/ {
    expires max;
    add_header ETag '';
    add_header Last-Modified 'Wed, 20 Jan 1988 04:20:42 GMT';
    add_header Accept-Ranges '';
    add_header Access-Control-Allow-Origin *;
    
    location ~* /sites/default/files/advagg_css/css__[[:alnum:]-_]+\.css$ {
      access_log off;
      try_files $uri @drupal;
    }
  }

  ###
  ### CDN Far Future expiration support.
  ###
  location ^~ /cdn/farfuture/ {
    tcp_nodelay   off;
    access_log    off;
    log_not_found off;
    etag          off;
    gzip_http_version 1.0;
    if_modified_since exact;
    location ~* ^/cdn/farfuture/.+\.(?:css|js|jpe?g|gif|png|ico|bmp|svg|swf|pdf|docx?|xlsx?|pptx?|tiff?|txt|rtf|class|otf|ttf|woff|eot|less)$ {
      expires max;
      add_header X-Header "CDN Far Future Generator 1.0";
      add_header Cache-Control "no-transform, public";
      add_header Last-Modified "Wed, 20 Jan 1988 04:20:42 GMT";
      rewrite ^/cdn/farfuture/[^/]+/[^/]+/(.+)$ /$1 break;
      try_files $uri @nobots;
    }
    location ~* ^/cdn/farfuture/ {
      expires epoch;
      add_header X-Header "CDN Far Future Generator 1.1";
      add_header Cache-Control "private, must-revalidate, proxy-revalidate";
      rewrite ^/cdn/farfuture/[^/]+/[^/]+/(.+)$ /$1 break;
      try_files $uri @nobots;
    }
    try_files $uri @nobots;
  }

}

The idea of this config file is to support CDN far future, CDN, advagg, Drupal image styling, AmazonS3 and microcache modules. You need to catch different url pattern for different purpose.

For microcache, we put the cache into memory and expire every 30s, so that in each 30s, only the 1st visitor hit your site will generate by PHP. 2nd – N users will hit microcahe. In this approach, we can support high traffic website and then same time avoid handling cache invalidation problem.

Here I also share the cache config which we put nginx cache into memory which release better performance.

fastcgi_cache_path /dev/shm/microcache levels=1:2 keys_zone=MYAPP:5M max_size=256M inactive=2h;
fastcgi_cache_key "$scheme$request_method$host$request_uri";
add_header X-Cache $upstream_cache_status;
map $http_cookie $cache_uid {
  default nil; # hommage to Lisp :)
  ~SESS[[:alnum:]]+=(?<session_id>[[:alnum:]]+) $session_id;
}
map $request_method $no_cache {
  default 1;
  HEAD 0;
  GET 0;
}

You can read the comment inside the config file for more detail explanation.

This config requires to work with another PHP-FPM config, so that the memory is optimized. Then, you can estimate how many request per second that your server can serve. And i will talk about it next time.

Why bootdev — Caching (2)

Last time we talked about the 3 level of cache in Drupal + AWS.

This time we go down to more details of APC (Alternative PHP Cache). There are kinds of code level cache (OPcache) for php. Most common are OPCache (default PHP5.5), Xcache, APC. Mostly Drupal sites use APC.

There is also a Drupal APC module , in most case, that is not necessary to be installed. Unless you want to share some Drupal Cache table into APC. For bootdev, we don’t do this. Cache tables should be in a standalone cache instead of the web server, so that, while scale up servers, metadata can share across servers. So your system is ready for horizontal scaling.

You can simply install APC with pecl, the package manager of PHP

pecl install apc

Here I share the APC configure for EC2 m3.large machine which is 7.5GB memory

; configuration for php apc module
extension=apc.so
apc.enabled=1
apc.shm_segments=1
apc.shm_size="256M"
;Relative to the number of cached files (you may need to watch your stats for a day or two to find out a good number)
apc.num_files_hint=7000

;Relative to the size of site
apc.user_entries_hint=4096

;The number of seconds a cache entry is allowed to idle in a slot before APC dumps the cache
apc.ttl=7200
apc.user_ttl=7200
apc.gc_ttl=3600

;Setting this to 0 will give you the best performance, as APC will
;not have to check the IO for changes. However, you must clear 
;the APC cache to recompile already cached files. If you are still
;developing, updating your site daily in WP-ADMIN, and running W3TC
;set this to 1
apc.stat=0

;This MUST be 0, WP can have errors otherwise! Drupal can set to 1
apc.include_once_override=1

;Only set to 1 while debugging
apc.enable_cli=1

;Allow 2 seconds after a file is created before it is cached to prevent users from seeing half-written/weird pages
apc.file_update_protection=2

;Leave at 2M or lower. Site does't have any file sizes close to 2M
apc.max_file_size=2M

apc.cache_by_default=1
apc.use_request_time=1
apc.slam_defense=0
apc.mmap_file_mask=/tmp/php_apc.XXXXXX
apc.stat_ctime=0
apc.canonicalize=1
apc.write_lock=1
apc.report_autofilter=0
apc.rfc1867=0
apc.rfc1867_prefix =upload_
apc.rfc1867_name=APC_UPLOAD_PROGRESS
apc.rfc1867_freq=0
apc.rfc1867_ttl=3600
apc.lazy_classes=0
apc.lazy_functions=0

You can have apc.php inside somewhere of /usr/share/php-apc depends on which server you are using.

Copy this file to your web root and access it, you can see the performance after tunning.

Screen Shot 2014-12-26 at 11.31.32 AM