Real-Time UC

A Universal Communications Blog by Office Apps and Services MVP Michael LaMontagne

Azure Hugo

The Tale of Migrating from WordPress to a Static Site in Azure

A little over a month ago this blog went through a forced migration from one web hosting provider to another due to an acquisition. WordPress data migrated successfully and some of the other sites I host came up without issue. https://realtimeuc.com previously had a dedicated IP, which allowed me to force HTTPS and bring my own certificate. During the migration the site was placed under a shared IP that apparently was directing all HTTPS requests to some professional cleaning and restoration company.

After hours with no response to my support ticket and a few Direct Messages on Twitter asking if I was shifting my career from IT to construction. I figured it was a good time to force DNS back to the old IP, knowing at anytime my blog could go down or the bigger issue of having two siloed versions of WordPress. I couldn’t bring myself to disable SSL/TLS, also most of the links to my content would be broken.

I’ve been toying with the idea of moving away from WordPress and going with a Static Site Generator (SSG), this would generate static HTML files based on a theme and posts written in Markdown. Don’t get me wrong WordPress is an awesome Content Management System (CMS), that I’ve been using for about 9 years. But WordPress is very bloated for my needs.

  • Little to no dynamic content, which results in amazing performance.
  • No databases
  • No interpreters
  • No security updates
  • Site portability
  • Host anywhere that can serve up a static html page and for really cheap!
    • Azure Blob Storage
    • GitHub
    • Gitlab
    • Amazon S3
    • Countless other options….

I’m not ashamed to admit it, but I’m one of ’those people’ that sees an interesting article and keeps it in an open browser tab for eternity. Usually due to a reference for a future blog post, an issue or a project to tackle if I ever have free time. One such tab was: Building a static website with Jekyll and GitHub Pages. I was about to go down this rabbit hole, but put a pause and figured I should do a bit more research on SSGs: StaticGen | Top Open-Source Static Site Generators.

  1. Jekyll
  2. Hugo

After reading a few pros/cons articles for each, most of the points summed up in Hugo vs. Jekyll: Comparing the leading static website generator, I landed on Hugo. Hugo just seemed simple and easier to customize. After reviewing both theme galleries, Hugo had more to my liking.

I was a bit torn between leveraging GitHub or Azure Blob Storage.

You will save yourself countless hours of pain, if you spend a bit of time upfront to learn:

  1. I grabbed the latest build for Windows 64Bit from: https://github.com/gohugoio/hugo/releases

  2. Extract the zip file and added the path for ‘hugo.exe’ to Windows Environment Variables - Path

  3. Create a new base site by running from CMD:

    hugo new site <sitename>
    
  4. Select, download a theme from: https://themes.gohugo.io/ and place it into the ’themes’ folder in the new site hierarchy

  5. Run a local copy of the site by running from CMD:

    hugo server
    
  6. Open a web browser and navigate to http://localhost:1313

  7. Have fun tweaking the theme

  8. Generate the static site by running from CMD:

    hugo
    

This was a fun task. The old server had all access disabled and the new server would allow FTP access but couldn’t get into the WordPress admin panel. I attempted a few export plugins without success, before finding ExitWP for Hugo. This Python script was created by Arjan Wooning and his corresponding blog: Conversion tools from Wordpress to Hugo.

Steps:

  1. Export WordPress using the default WordPress exporter under Tools/Export in the WordPress admin panel.

  2. Add in the rss tag within the XML export:

    xmlns:atom="http://www.w3.org/2005/Atom"
    
  3. Download the ‘ExitWP for Hugo’ project from GitHub.

  4. Move the WordPress XML export file to the ‘wordpress-xml’ folder.

  5. I attempted to run the python script from a Windows Virtual Machine without success, a colleague of mine has a Linux box kicking around and ran the following:

    sudo apt-get install python-yaml python-bs4 python-html2text
    sudo pip install --upgrade -r pip_requirements.txt
    sudo apt-get install libyaml-dev python-dev build-essential
    ./exitwp.py
    
  6. Next was to zip the build folder and transfer it to my machine for further tweaking.

  7. I ran the following PowerShell to rename all the .markdown files to .md

    Get-ChildItem -Filter "*.markdown" -recurse | rename-item -newname {$_.name -replace '.markdown','.md'}
    
  8. I spent a few hours going through all of my posts, stripping any remaining HTML tags and tweaking some of the Markdown for lists, urls and images.

  9. Renamed the ‘_post’ folder to ‘post’ and copied into the ‘content’ folder in the hugo site hierarchy.

  10. Legacy media; I cheated a bit here by copying the complete wp-content folder (WordPress media folder) into the ‘static’ folder in the hugo site hierarchy. This prevented having to go through all my posts and adjust the reference paths for images.

  1. I have an existing Azure subscription, so I create a new Resource Group to contain my Blog.
  2. Add a Storage Account, I just used Storage V2, Locally-redundant storage (LRS), Standard Performance and Hot Access.
  3. Create a new container in the Storage account named ‘Blog’ with anonymous read Access.
  4. Add Content Delivery Network (CDN), I used the Premium Verizon pricing tier and created the CDN Endpoint using Origin type: ‘Storage’ and Origin hostname: <Storage Account>. Microsoft has announced they will be providing their own CDN, currently in preview and doesn’t have a premium option to support custom rules. The Microsoft standard CDN is a fraction of the cost compared to the Verizon and Akamai SKUs ( https://azure.microsoft.com/en-us/blog/announcing-microsoft-s-own-cdn-network/).
    Azure CDN Pricing
  5. Modify the Endpoint and set the Origin path to be the folder path within your container that will be the root of your site.
  6. I added a bunch of custom domains under the Endpoint, because my blog was still live I leveraged CNAME verify records to authorize the domains. CNAME Verify Custom Domains
  7. For each custom domain I turned on HTTPS, after completing the validate ownership email from DigiCert and waiting for the process to complete, I was in business. Custom Domains HTTPS
  8. The reason I needed to go with the Premium Verizon CDN option was to access the CDN Rules Engine. This was actually the most painful part of the process… I wasted days troubleshooting, adjusting and tweaking rules. The CDN Manage site says “Approval of new Rules takes up to 4 hours.” and https://docs.microsoft.com/en-us/azure/cdn/cdn-rules-engine says “Rules changes can take up to 90 minutes to propagate through the CDN.” I actually ran into some syntax error that preventing activating rules for over 24 hours. To top that off, I was battling an issue that was actually caused by stale data in the CDN cache. I was using purge but it actually wasn’t working, even though it showed as successful. I ended up with the following two rules:
  • Redirect HTTP to HTTPS

    HTTP to HTTPS

    Request Scheme; Http
    Redirect Source: origin-path/(.*)
    Redirect Destination: https://%{host}/$1
    
  • Rewrite URLs to fetch the corresponding html files and exclude the EdgeCast user-agent to allow for Purging of the CDN

    Rewrite URLs

    Request Header Wildcard; Name: User-Agent, Does Not Match: ECPurge/*, Ignore Case
    Rewrite #1 Source: ((?:[^\?]*/)?)($|\?.*)
    Rewrite #1 Destination: $1index.html$2
    Rewrite #2 Source: ((?:[^\?]*/)?[^\?/.]+)($|\?.*)
    Rewrite #2 Destination: $1/index.html$2
    

    Note: There is a no custom 404 pages, but support for this in Static websites was announced at Microsoft Build 2018 and will be going into preview (~10 minute mark):

The beauty of Markdown is you can simply use any text editor to create your posts. I started out using Notepad++ with the MarkdownViewer++ plugin. I’ve now moved to using Atom Editor ( https://atom.io), with the following packages:

This allows me to see syntax highlighting, preview my rendered page and also kick off custom scripts like starting/stopping Hugo.
Atom Editor

To upload only the updated/new content to Azure Storage, I’ve modified Floris van der Ploeg’s ‘Back up files to Azure Blob Storage PowerShell script’ ( https://gallery.technet.microsoft.com/scriptcenter/Back-up-files-to-Azure-b9e863d0). Some of the modifications were around default parameters, logging, file paths but the biggest change was adding the Content Type to the files:

If ($copyblob -eq $true) { 
    # Blob doesn't exist, upload the blob with lastwrite metadata 
    Write-Log -Value "Copying local file $($file.Name) to blob $blobname in container $Container" 
    $Extn = [IO.Path]::GetExtension($file.FullName) ######################################################################Need to set Content Type
    $ContentType = ""
    # types missing altogether, add them below
    switch ($Extn) {
        ".html" { $ContentType = "text/html" }
        ".htm" { $ContentType = "text/html" }
        ".css" { $ContentType = "text/css" }
        ".txt" { $ContentType = "text/plain" }
        ".xml" { $ContentType = "application/xml" }
        ".json" { $ContentType = "application/json" }
        ".js" { $ContentType = "application/javascript" }
        ".svg" { $ContentType = "image/svg+xml" }
        ".png" { $ContentType = "image/png" }
        ".jpg" { $ContentType = "image/jpeg" }
        ".ico" { $ContentType = "image/x-icon" }
        Default { $ContentType = "" }
    }
    try { 
        $output = Set-AzureStorageBlobContent -File $file.FullName -Blob $blobname -Container $Container -Context $context -Properties @{"ContentType" = $ContentType} -Metadata @{"lastwritetime" = $file.LastWriteTimeUTC.Ticks} -Force -ErrorAction SilentlyContinue 
    } catch { 
        Write-Log -Value "ERROR: Could not copy file to Azure blob $($blobname): $($_.Exception.Message)" -Color Red 
    } 
}

To purge the CDN, I just run the following PowerShell one-liner:

Login-AzureRMAccount; (get-AzureRmCdnProfile).where({$_.name -eq '<CDN Profile Name>'}) | Get-AzureRmCdnEndpoint | Unpublish-AzureRmCdnEndpointContent -PurgeContent "/*"

On my task list is to add the file upload and CDN purge scripts to the Atom Process Palette. This would allow a single window to manage the full lifecycle of my blog.

  1. Pingdom ( https://tools.pingdom.com), 33% increase for ‘Faster than’ and almost 1.75 seconds faster Load time.
    Pingdom
  2. Google PageSpeed Insights ( https://developers.google.com/speed/pagespeed/insights), 40 point increase on Mobile and 22 point increase on Desktop.
    PageSpeed
  3. Bonus realtimeuc.com has HTTP/2 Support out of the box ( https://tools.keycdn.com/http2-test).
  4. Final Result:
    New Site Old Site
    New Site
    Old Site
  5. First 30 day Azure hosting cost ($0.61/CAD):
    Azure Charges

Hugo-Octopress Theme | Powered by Hugo