How to Graph the Instructables
by matseng in Circuits > Websites
2730 Views, 4 Favorites, 0 Comments
How to Graph the Instructables
I was curious of how many instructables that are published each month and what the growth rate looks like. Since I couldn't find any statistics here I decided to do it myself.
Sorry, there are no photos, just some screenshots.
Tools needed:
The Google account can also be replaced by Excel, Open Office or any other decent spreadsheet. I wanted to try the Google Docs so I used that here.
Sorry, there are no photos, just some screenshots.
Tools needed:
- Access to a Linux shell prompt
- A Google account
The Google account can also be replaced by Excel, Open Office or any other decent spreadsheet. I wanted to try the Google Docs so I used that here.
Finding the Data
Ok, the first step of making a graph is obviously to collect the information you want to created the graph from.
In the Explore-menu on the site there's a entry called Most recent that shows all instructables in groups of 15. The URL for that page looks like this: https://www.instructables.com/tag/type:instructable/?sort=RECENT . After clicking on one of the numbered pages at the bottom the string "&offset=15" is added to end of the URL.
On each of the 15 entries on the pages there are a line that displays the author and the date of that particular instructable. "posted by Parker955 on Aug 4, 2007"
So we now know the URL for getting the data that we want to use and we also have a string to search to extract the data from the page.
In the Explore-menu on the site there's a entry called Most recent that shows all instructables in groups of 15. The URL for that page looks like this: https://www.instructables.com/tag/type:instructable/?sort=RECENT . After clicking on one of the numbered pages at the bottom the string "&offset=15" is added to end of the URL.
On each of the 15 entries on the pages there are a line that displays the author and the date of that particular instructable. "posted by Parker955 on Aug 4, 2007"
So we now know the URL for getting the data that we want to use and we also have a string to search to extract the data from the page.
Getting the Data
Here's the BASH script that I wrote to get the data. I think the comments in in code itself is descriptive enough so I won't explain them again here. I'd be happy to explain any particular part of the code that you don't understand, just ask in the comment section.
Save the file as instructables.sh and make it runnable by giving the command "chmod 0755 instructables.sh" .
Run the program by typing "./instrucables.sh"
Save the file as instructables.sh and make it runnable by giving the command "chmod 0755 instructables.sh" .
Run the program by typing "./instrucables.sh"
#!/bin/bash# Start from the beginningoffset=0# Continue getting pages until the offset is larger than 5731while [ $offset -lt 5731 ] do # Be nice to the webserver! sleep 1 # Show te user the progress echo $offset # Make the URL to retrieve url='https://www.instructables.com/tag/type:instructable/?sort=RECENT&offset='$offset # Get the page as viewable text (not html) by using the -dump flag # Then let grep remove all lines not containing the phrase "posted by" # And finally concatenate all lines into the log-file for later processing lynx -dump $url | grep "posted by" >> instructables.log # Increment the offset with 15 to get to the next 15 instructables offset=$((offset+15))done# Now we've got the dates of all instructables in the file named instrucatbles.log # It't time to count the number of lines for each combination of lines that contains# a particular year and month combo. (Like Mar 2007). By counting them we know how# many instructables that was publised during that month.# The outer loop handles the years that we are interested infor year in 2005 2006 2007do # The inner loop handles the months for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec do # First search for all lines containign the year. # Then pipe that result into a second grep that searches those lines for the month # And finally count the number of resulting lines count=$(grep $year instructables.log | grep $month | wc --lines) # Show the information to the user to we can make a graph of it later echo $year $month,$count donedone
The Output
This program will run for quite a while since I've added a pause for 1 second between each page request. Without that delay the script would hammer the webserver with a shitload of requests in a very short period of time and that would probably incur the wrath of Mr. ewilhelm, something that we all should avoid. :-)
When the program is done it will show a list of months+years and the count of the number of instructables written that month. It will look like this:
As can be seen in this list there are no data before August 2005, and since I'm running this in the first week of August 2007 the count for this month is really low and should be discarded.
When the program is done it will show a list of months+years and the count of the number of instructables written that month. It will look like this:
2005 Jan,02005 Feb,02005 Mar,02005 Apr,02005 Jun,02005 Jul,02005 Aug,752005 Sep,472005 Oct,232005 Nov,222005 Dec,442006 Jan,302006 Feb,442006 Mar,1582006 Apr,1462006 May,1422006 Jun,2072006 Jul,2832006 Aug,2282006 Sep,2802006 Oct,3172006 Nov,2532006 Dec,2572007 Jan,3202007 Feb,3522007 Mar,3672007 Apr,4272007 May,4482007 Jun,6102007 Jul,6362007 Aug,472007 Sep,02007 Oct,02007 Nov,02007 Dec,0
As can be seen in this list there are no data before August 2005, and since I'm running this in the first week of August 2007 the count for this month is really low and should be discarded.
Day-by-Day Info
To extract day by day data from the file generated by the script in step 2 the following script can be run.
To see the graphs attached to this step in the instructable you need to click on the (i) -symbol in the upper left corner of the image and then on the link with the size information ( (880x585) 24 KB) just below the image. You will then get a bigger and unclipped version of the image.
I've also included the excel spreadsheet with the day-by-day data that generated the charts on this step.
#!/bin/bash# The outer loop handles the years that we are interested infor year in 2005 2006 2007do # The middle loop handles the months for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec do # The inner loop handles the days day=1 maxdays=31 if [ "$month" == "Feb" ]; then maxdays=28; fi if [ "$month" == "Apr" ]; then maxdays=30; fi if [ "$month" == "Jun" ]; then maxdays=30; fi if [ "$month" == "Sep" ]; then maxdays=30; fi if [ "$month" == "Nov" ]; then maxdays=30; fi while [ $day -le $maxdays ] do # Search for the month day year-combo in the already existing log file # and count how many instructables there are for that particular day count=$(grep "$month $day, $year" instructables.log | wc --lines) # Show the information to the user to we can make a graph of it later echo $day $month $year,$count day=$((day+1)) done donedone
To see the graphs attached to this step in the instructable you need to click on the (i) -symbol in the upper left corner of the image and then on the link with the size information ( (880x585) 24 KB) just below the image. You will then get a bigger and unclipped version of the image.
I've also included the excel spreadsheet with the day-by-day data that generated the charts on this step.
Downloads
Google Docs
Here I'm showing how to do in using Google Docs. If you're using Excel or OO you're on your own.
Start by going to Google Docs. If you don't have a Google Account you need to Sign Up first. It's free and gives you a lot of nice features.
When you are at Google Docs click on the New-button and select Spreadsheet
Start by going to Google Docs. If you don't have a Google Account you need to Sign Up first. It's free and gives you a lot of nice features.
When you are at Google Docs click on the New-button and select Spreadsheet
The Spreadsheet
Copy all lines from the output if the Linux script that you want to include in the graph.( I copied from August 2005 to July 2007.)
To paste the data you need to select the A1-cell and then press ctrl-v. Note! You can't click on the clipboard or right click and select paste or even use the Edit | Paste menu of the Firefox. You must use ctrl-v to paste external data.
Now we need to separate the dates from the counts and put these separately into the B and C columns.
In the B1-cell type "=MID(A1,1, 8)" and press Enter
In the C1-cell type "=value(MID(A1,10,999))" and press Enter
( Note: Don't type the "-marks )
After entering the formulas in B1 & C1 we need to copy them to the rest of the rows. Mark both B1 & C1 by clicking and holding on B1 and drag over to C1. Then right-click and select Copy.
Mark the entire destination area by clicking and holding on cell B2 and drag down to C24. Then right-click on select Paste.
We are now ready with the copying of the data and it's time to create the graph.
To paste the data you need to select the A1-cell and then press ctrl-v. Note! You can't click on the clipboard or right click and select paste or even use the Edit | Paste menu of the Firefox. You must use ctrl-v to paste external data.
Now we need to separate the dates from the counts and put these separately into the B and C columns.
In the B1-cell type "=MID(A1,1, 8)" and press Enter
In the C1-cell type "=value(MID(A1,10,999))" and press Enter
( Note: Don't type the "-marks )
After entering the formulas in B1 & C1 we need to copy them to the rest of the rows. Mark both B1 & C1 by clicking and holding on B1 and drag over to C1. Then right-click and select Copy.
Mark the entire destination area by clicking and holding on cell B2 and drag down to C24. Then right-click on select Paste.
We are now ready with the copying of the data and it's time to create the graph.
Making the Graph
Creating the graph from the data is the easiest step. Click on the graph symbol and fill in the form that pops up.
There you should select the Columns type of chart. Enter some descriptions for the legends and labels. Tell the chart that you want to use the area B1 to C24 as the data for the graph by typing B1:C24 in the area field. And finally tell the graph that you have the labels (the dates) in column B.
Please note the screenshot below incorrectly shows C22 as the last cell instead if C24. I forgot the month of May on my first run. :-(
Click on Save Chart and you're done here. The graph will be nicely inserted into the spreadsheet and you can save the graph as an image by clicking on the Graph-dropdown and select Save Image.
Easy as pie!
There you should select the Columns type of chart. Enter some descriptions for the legends and labels. Tell the chart that you want to use the area B1 to C24 as the data for the graph by typing B1:C24 in the area field. And finally tell the graph that you have the labels (the dates) in column B.
Please note the screenshot below incorrectly shows C22 as the last cell instead if C24. I forgot the month of May on my first run. :-(
Click on Save Chart and you're done here. The graph will be nicely inserted into the spreadsheet and you can save the graph as an image by clicking on the Graph-dropdown and select Save Image.
Easy as pie!
The Final Result
The final result of this exercise is the image below.
There it can be clearly seen that we have an increasing rate of new instructables almost every month. Between April and June 2007 we got a 50% increase! And looking back further to February / March 2006 the increase was even larger. On the graph in looks like 300%.
Note: The 3D graph is done in Excel.
There it can be clearly seen that we have an increasing rate of new instructables almost every month. Between April and June 2007 we got a 50% increase! And looking back further to February / March 2006 the increase was even larger. On the graph in looks like 300%.
Note: The 3D graph is done in Excel.