Mike Ginley

Use Python To Evaluate SEO Performance

Share Article!

In This Post

So Python has been all the rage in SEO over the past year (if not longer). This language allows data from multiple sources to be combined in one area to help fuel strategic decisions. It is also a great way to automate the tedious tasks that SEO is full of, but I’m not that good at it yet. The opportunities with Python for SEO are endless, and this is my first step toward mastering it. I wouldn’t be able to get this far without awesome tutorials from JC Chouinard who has an awesome Python for SEO Guide as well as Hamlet Batista who authors numerous Python articles on Search Engine Journal. There are tons of sources and people I need to thank for helping me get this far, and I will try to call a couple out in each post.

My First Python SEO Script

So I’ve been trying to learn data science for a while now and the typical Udemy or Coursera courses just weren’t doing it for me. They are amazing courses, but I learn better by doing so I needed to use real-life examples to really understand this programming language. One problem I have always had is comparing Screaming Frog crawls to other relevant data that I use to make optimizations. I’m sure there are a bunch of different ways to do this with Excel magic, but those aren’t fun. This brought me to my first Python script. I set up a simple way to combine a crawl from Screaming Frog with data from Google Analytics and Google Search Console. This should give any SEO Specialist or Digital Marketer enough ammo to make some great optimizations.

What You Will Need

  1. Screaming Frog Site Crawl – I exported the entire crawl to make sure nothing was left out. I’ve also done partial crawls with larger sites and it worked just as well. Saved as ‘Crawl’.
  2. Google Analytics Data – I went with 1 full year of organic data to make sure I had a good sample size. Saved as ‘GA’.
  3. Google Search Console Data – Same thing as GA, a full year to eliminate any concerns. Saved as ‘GSC’.

Next, I fired up Jupyter Notebook through Anaconda. This is what I practiced with and have gotten the most comfortable with it. There are lots of different options out there that I am still exploring.

Step 1: Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import seaborn as sns
import scipy.stats as stats

Step 2: Pull In The Screaming Frog Crawl Data

crl = pd.read_excel(r’C:\Users\mike.ginley\Desktop\crawl.xlsx’)
crl.head()

*NaN shows that the field is blank, you will run into this if there is no content in the cell, or there are no matches.

Step 3: Pull In The Google Analytics Data

ga = pd.read_excel(r’C:\Users\mike.ginley\Desktop\GA.xlsx’)
ga.head()

Step 4: Merge The Crawl & Google Analytics Data

merged = pd.merge(crl, ga, left_on=’Address’, right_on=’Landing Page’, how=’outer’)
merged.head()

Step 5: Pull In The Google Search Console Data

gsc = pd.read_excel(r’C:\Users\mike.ginley\Desktop\gsc.xlsx’)
gsc.head()

Step 6: Combine The Merged Data From Step 4 With The GSC Data

combine = pd.merge(merged, gsc, left_on=’Address’, right_on=’Page’, how=’outer’)
combine.head()

Final Step: Save The Data

combine.to_csv(‘combine.csv’, encoding=’utf-8′)

Python SEO Export Results

The final step should save a CSV in the c drive under the user’s folder. You can now open that and see if your experiment worked. If it did you will see the Screaming Frog Crawl first with all the typical crawl data. Next to it should be Google Analytics data followed by Google Search Console. If the results are blank it is most likely because the URLs didn’t match for one reason or another – think Vlookup. The URLs need to be precisely the same to check the data. I ran into issues with capitalization so I just used Notepad++ to lowercase all URLs to make sure there were no issues. If that is the problem, make sure all the URLs are the same casing, save the new files again and start over from step one.

What Did I Find?

So I sorted by sessions to see who the top organic performers were from the past year. The results were pretty obvious but what I did not realize from how I used to review data separately was what all the top pages had in common. Many of the top pages have numerous inlinks pointing toward them. Inlinks are a vote of confidence in the eyes of the search engines so your top pages should have a good amount of internal links. The cells highlighted in red were URLs that had less than 50 inlinks throughout the entire site crawl. Some are fortunate enough to have performed without many inlinks, but I’m sure we could boost performance with more votes of confidence.

Another area I noticed was word count. Again a lot of the top pages had more content than others. The yellow cells are pages with less than 250 words of content. Similar to the inlinks issue, some of these have been able to do well even with thin content, but getting some more content could definitely help. It would also allow for some more internal linking opportunities throughout the site. One thing to note is that this was a text-only crawl on a site that has a lot behind JS so the word count and possible links are a little thrown off in this report. Didn’t want to drag my computer down with a full site JS crawl, ain’t nobody got time for that. The thought process still remains that the top pages will most likely have something in common.

The final area I have used so far is crawl depth. This is another area I like to focus a lot on. If the page is too deep in the site structure it is going to lose a lot of perceived importance and Google won’t see it very much. Luckily on this site which is fairly large, the lowest crawl depth is 3, thank you frontend sitemap. If a crawl depth is below 10 I would say there are big issues, but it all depends on the structure of the site.

What’s Next?

Now that we combined all this great SEO data, we can see what areas of the site need to be optimized. Even better we have data in the same sheet to back any of our recommendations. What are the important SEO metrics you follow or report on? Can you combine them with crawl data or GSC data? Will that give you enough info to show ‘x’ works in this area, maybe we should try it in this area that is struggling.

The beauty of this script is that it is now saved in Jupyter Notebook so it can be replicated across all clients. All you need to do is save the crawl or whatever data you’re analyzing in one area and update the file paths and you’re done. Like I said earlier this could probably all be done with Vlookups or some crazy Excel magic, but this way is definitely much faster.

I am going to continue to research different ways to utilize this awesome programming language. I know very little about it right now and it already helped me a ton, so I am excited to see where else it will take me.

Recent Articles

Popular Articles