WordPress bulk image optimization and offload to S3 CDN

Development

Current situation and goal

Current situation is unfavorable, images are all stored and served from the webserver. This puts a lot of extra load and bandwidth usage on the webserver which we want to avoid.

Also when making snapshots or backing up the server 10.000+ images, which is actually 60.000+ with all media sizes, is inconvenient to say the least.

  • Optimize image load times
  • Reduce webserver load
  • Reduce used bandwith
  • Reduce webserver disk space

How should implement this solution

Using plugins

We have tried most, if not all, plugins to achieve the goal. The issues with most of them are the unfavorable vendor lock in, pricing, existing media issues etc etc.

Initially we did not want to create our own implementation and went ahead with Spaces Sync plugin which does one thing well, it hooks to add_attachment and delete_attachment hooks to copy or remove the media to and from S3 compatible storage.

However this plugin was executing during uploads and it slowed down the process of uploading and managing media on the site, very unfavorable.

CDN compatibility with other plugins

First pitfall..

Not all media plugins work well with offsite images. Think about regeneration plugins, crop plugins or any plugin which expects the file to be stored local.

For example, the Fly Dynamic Image Resizer plugin is a great tool to reduce image sizes. Especially when dealing with huge banner images. It creates the required image size on the fly when required. Call me autistic but I refuse to add a additional image size for each media attachment when the size is rarely used.

Anyway, when an image is stored offsite, the image resizer is unable to create the required image from the source. First option would be to request the plugin author to get the file url instead of the file path. Or we can hook in the get_attached_file action hook and serve our full CDN url.

add_filter ( 'get_attached_file', 'my_get_attached_file', 1, 2);
function my_get_attached_file( $file, $attachment_id )
{    
     if ( ! file_exists ( $file ) ) # we will improve this later in this blog post :)
     {
         return str_replace('/home/991455.cloudwaysapps.com/juseyhuctm/public_html/wp-content/uploads','https://cdn.url.com',$file);
     }   
     return $file;
}

We use file_exists to check if the file should be loaded from CDN. Simple and robust.

Blocking or Non-blocking

Second pitfall..

Some plugins are blocking progress when uploading or removing media. Under normal conditions this should not be a issue. What if the CDN reacts slowly? It would block everything until the full job has completed.

For example, we are using a Lightroom export plugin which adds/removes media from and to WordPress. We found that the Spaces Sync is blocking during the sync of S3 spaces. Which leads to huge delays when adding or removing attachments through Lightroom.

This means that the CDN synchronization should be active with either mass uploads, scheduled, or non blocking, background tasks. At this time, after testing many plugins, we had enough. Lets just create our own implementation.

Implementing the offload to CDN

Mass upload server side solution

First we need to know how to interact with the CDN. After some research we found s3cmd. Now there seem to be some revisions: s4cmd and s5cmd which are able to move significantly more data in a shorter time frame.

First we created a bash script which will mass process all existing local files. For the full script check the below github sync repository which will mass optimize, sync and purge local files.

#full sync script
https://github.com/opicron/cli-spaces-sync/blob/master/spaceput.sh

I wont go into detail about this script but we used it as a server only solution during the refined implementation. See it as a 1) initialization script and 2) brute force way to keep the offsite images synced. The script runs the following jobs:

  • Optimize all local images and save paths of optimized images in purgelist
  • Copy purgelist images to CDN (halt on fail, to avoid removing local images on errors in last step)
  • Purge cache from CDN in batches based on purgelist
  • Remove local images

One could add the script to your daily cron job or the script can be called from the add_attachment WordPress hook to sync the new files. Now all current, and future, images will be accessible through the CDN.

Optimizing images

The common ways to optimize media are jpegoptim and optipng. The commands to optimize and strip unused data are:

optipng -preserve -strip all "image"
jpegoptim -s -p --all-progressive "image"

Send media to S3 CDN

The command to send a file to CDN with s3cmd is as following. Do note to add --acl-public flag to make the image available to the public instead of storing it private.

Note: s3cmd does need a configuration file set (usually ~/.s3cmd) in which the following fields are defined.

website_endpoint
secret_key
host_base
host_bucket
access_key
s3cmd put "file" s3://BUCKET/"file" --acl-public

Purge cache from Digital Ocean CDN

In the Github script you will find that the images are not only copied to the CDN but also have their cache reset. If the cache is not refreshed one will not see changes of image crops or adjustments until the cache runs out.

The CDN api accepts only batches of purge requests so we needed to split these up as shown below.

#group per chunk to avoid timeout on api  
g=20                
for((i=0; i < ${#purge[@]}; i+=g))                   
do                
 part=( "${purge[@]:i:g}" )                
 #json purgelist         
 purgelist=`printf '%s\n' "${part[@]}" | jq -R . | jq -s '{"files":.}' -r`   
 echo $purgelist | jq  
                      
 #purge cdn api  
 if [ "${DRYRUN}" -eq 0 ]; then     
 curl -X DELETE -H "Authorization: Bearer <<YOUR_DO_TOKEN_HERE>>" \  
   "https://api.digitalocean.com/v2/cdn/endpoints/<<END_POINT_ID_HERE>>/cache" \  
   -s -o /dev/null -w "%{http_code}" \  
   -d "$purgelist"  
                
 fi                                              
done

Serving the correct image in WordPress

Now that the media is on the CDN we need to add a function which will check if the image is on CDN.

function is_cdn_image($post_id)
 {
     //skip if attachment is not a image
     if ( ! wp_attachment_is_image( $post_id ) )
            return $false;
     
     //if on CDN (or non existent meta)
     if ( ! file_exists ( $file ) )
     {
            return true;
     }
 }

Then some hooks to tell WordPress to make sure the CDN urls are used when images are displayed or requested. We found the following three hooks are required for a good compatibilty: wp_get_attachment_url, wp_get_attachment_image_src and wp_calculate_image_srcset.

//helper function
function to_cdn($url) {
  
                 $needle = trailingslashit( 'wp-content/uploads/' );
                 $pos = strpos($url, $needle); 
                 
                 //skip if needle is not found
                 if ($pos === false)
                                return $url;
                                
                 $filepath = substr ( $url, $pos + strlen( $needle )  );
                 $url = trailingslashit('https://bucket.cdn.digitaloceanspaces.com/') . $filepath;             
  
                 return $url;
}
add_filter('wp_get_attachment_url', 'clrs_get_attachment_url', 999, 2);
function clrs_get_attachment_url($url, $post_id) {
                 
                 if ( is_cdn_image( $post_id ) )
                 {
                                $url = to_cdn($url);
                 }
                 return $url;
}
  
add_filter('wp_get_attachment_image_src', 'test_get_attachment_image_src', 10, 4);
function test_get_attachment_image_src($image, $attachment_id, $size, $icon) {
  
                 if (  ! $image )
                                return $image;
  
                 if( is_array( $image ) ) 
                 {                                              
                                if ( is_cdn_image( $attachment_id ) )
                                {
  
                                                $src = to_cdn($image[0]); // To CDN
                                                $width = $image[1];
                                                $height = $image[2];
  
                                                return [$src, $width, $height, true];
                                }                                              
                 }
  
                 return $image;
}
  
  
add_filter('wp_calculate_image_srcset', 'test_calculate_image_srcset', 10, 5);
function test_calculate_image_srcset($sources, $size_array, $image_src, $image_meta, $attachment_id) {
                
                 if ( is_cdn_image( $attachment_id ) )
                 {
  
                                $images = [];
  
                                foreach($sources as $source) {
                                                $src = to_cdn($source['url']); // To CDN
                                                $images[] = [
                                                                'url' => $src,
                                                                'descriptor' => $source['descriptor'],
                                                                'value' => $source['value']
                                                ];
                                }
                 
                                return $images;
                 }
  
                 return $sources;              
} 

Removing images from CDN

When using the above script do keep in mind when images are removed from the WordPress media library they are not removed from the offsite storage. The delete_attachment hook will be used to remove the offsite images in the second implementation.

add_action('delete_attachment','schedule_delete_attachment',999 ,1);
function schedule_delete_attachment( $post_id )
{
  //remove image and all media sizes from CDN
}

Back up CDN

The reason we wanted the images off the webserver is to make snapshots quicker. But now the images are not backed up anywhere. To create a backup and sync destination with origin we can use the rclone command. Of course we need an destination bucket.

Same as s3cmd the rclone command needs both destination and origin to be configured in the ~/.config/rclone/rclone.conf file.

[spacesorigin]
  type=s3
  env_auth=false
  access_key_id=
  secret_access_key=
  endpoint=
  acl=private
[spacesdest]
  ..

Then, after configuration is done we run the following command.

rclone sync spacesorigin:bucket spacesdest:bucket --progress ...

It runs pretty slow, lets increase the speed of the sync

--max-backlog= --transfers=100 --checkers=200 ..

Conclusion

Mass upload only

This mass offload to CDN solution works fine and is very robust. We tried implementing an fine grained solution which takes care of individual edits and uploads, that proved to be finicky. In the end we run the mass offload on the server daily and it does what it needs to do.

If you liked this post, please like below!

Leave a Comment