It's not everyday you need to read access logs to your server - but when you do, it's not because you want to. Maybe your site is running slow, or your form submissions and node saving is starting to time out - this is probably a good time to take a look at your logs. Generally it's being caused by some kind of SEO or search engine crawler, but it could also be a rogue bot from China or even a potentially real hacker trying to brute force their way into your site.
You don't have time comb through thousands of lines of access logs, and that's only for the last couple hours! Lucky for us, the awesome people of the internet made some really great tools to help us analyze and decode these log files. In this post, I'm going to walk you through a couple scripts and tools you can use to make pretty reports and easily find what ails you.
I'm not going to get too much into this, but if you're developing using a Mac and you haven't heard of Homebrew, I highly recommend you install it. As the self described "missing package manager for MacOS", it's a super easy way to install system packages and libraries you often need. From wget to the next tool in our list, this will indefinitely change your workflow. Download now by visiting http://brew.sh.
Terminus by Pantheon
We're no stranger to our friends over at Pantheon, a leading industry Drupal hosting company, so much that we now host our own website on their platform instead of our old independent Rackspace server we had to keep up with. In addition to their intuitive user interface and top notch support, they've also created some useful platform tools like Quicksilver and Terminus, which is what we'll be using today. Terminus is their command-line (CLI) tool that "enables you to do almost everything in a terminal that you can do in the Dashboard, and much more." In our case, we'll be using it to access a specific site using a machine name and environment tag. More on this later, but if you haven't installed it yet, head over to their GitHub repo for installation instructions, or by using Homebrew.
brew tap homebrew/dupes; brew tap homebrew/versions; brew tap homebrew/php brew install homebrew/php/terminus
This is the primary tool you'll need, and it will change the way you read access logs. GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser. It provides fast and valuable HTTP statistics for system administrators, or people like us, that require a visual server report on the fly.
The great thing about GoAccess is that you can use it for more than just this tutorial. You can either download and consolidate log files to parse locally, or install it directly on your own server. In this tutorial, we're only going to use it locally, but I highly recommend you check out their documentation for different use cases. Download from their website, or once again, use Homebrew to install GoAccess. At the time of this writing, we're using version 1.1.1.
brew install goaccess
Pantheon Nginx Access - Go Access Easy Pull
This is the magic script created by Albert Causing at Pantheon. In short, it helps to orchestrate the retrieval of access logs from the Pantheon site, then pass them to GoAccess and spit out a nice and pretty HTML report you can share with your friends (or boss, probably to explain it wasn't your fault). Head to the GitHub repo for more information.
Steps to analyze your site.
By following the instructions in the Github repo of the previous section, you should have the script installed and ready to go. One thing to especially be aware of is making sure you uncomment and add in the log-format lines in your goaccess.conf file. Read the troubleshooting section if you run into any issues.
From here, all you'll need is the site key of your Pantheon instance and the environment you want to pull logs from (probably live). In this one-line example, we'll be using our site key and the live environment flag.
cd #Go to ~/home or whatever directory you want. mkdir goaccess #Make temporary directory to store site logs. You'll see why. access_getlogs --site=levelten --env=live #Creates directory and starts process.
After the script runs, you'll see access logs being gathered and an output similar to this.
total size is 11,229,878 speedup is 1.16 [+] DECOMPRESS GZ FILES [+] CONSOLIDATING NGINX-ACCESS LOGS [+] EXPORTING GOACCESS REPORT HTML [+] DONE, LAUNCHING: [15,276/s]
If your terminal doesn't launch a browser automatically, you can look inside the directory based on your site key and find an HTML report.
[12-27 14:44] - ~/goaccess/levelten developer@levelten$ ls -la drwxr-xr-x 20 developer staff 680 Dec 27 12:23 as_220.127.116.11 drwxr-xr-x 20 developer staff 680 Dec 27 12:23 as_18.104.22.168 -rw-r--r-- 1 developer staff 164594492 Dec 27 12:23 consolidated_nginx_access.log -rw-r--r-- 1 developer staff 858469 Dec 27 12:23 levelten_report.html
And that's it! Now you have a beautifully crafted report for your access logs that looks something like this. You'll notice you also have the individual log files downloaded, so you can use GoAccess to create individual reports as well.
I hope you found this blog post useful, and if you'd like a more visual tutorial, you can watch me walkthrough this on the YouTube video below. If you have any comments or questions, post them below!