Reflections on Authoring a Minimum Viable Book
As the production of the second edition of Mining the Social Web nears completion with an estimated publishing date of mid-September 2013, I wanted to share some thoughts and reflections on what it has been like to write and then (mostly) re-write and re-launch a book. Like anything else, there’s always a backstory, and this post is an abridged version of that backstory that I hope you enjoy.
The First Edition: Two Years of Lessons Learned
I spent the vast majority of my night and weekends during 2010 authoring Mining the Social Web, and it officially debuted at the inaugural Strata Conference in February 2011. At the time, I was quite happy with the book. I had managed to demonstrate how to collect and analyze disparate data from popular social websites in ways that could be used for business intelligence using what seemed to be a fairly accessible toolchain. Moreover, all of the source code for the first edition was hosted at GitHub, which believe it or not, was a bit of an innovation at the time. (The source code for the second edition is managed in a separate GitHub repository.)
Each example listing in the first edition has a hyperlink to the hosted source file, which allows (ebook) readers to trivially navigate from the book to the latest up-to-date bug-fixed code. From there, the reasoning was that they could easily pull down the entire repository and execute standalone scripts for each example. In general, this approach seems to have worked well, although I had overestimated the number of ebook readers compared to paperback readers.
One particular consequence is that since paperback readers tended not to go to the URLs in the example captions since it involved considerably more effort than just clicking a link. As a result, it wasn’t always obvious to them that the latest code fixing Twitter APIs changes (to name one common source of strife) were already available. On more than one occasion, I received a negative review from a reader about how the content that they had just purchased was obsolete even though updates to the latest code had been in place for months. One of the key lessons learned here is that the content in the printed book should have been more regularly updated as opposed to assuming that all readers would take advantage of the GitHub repository. After all, introducing GitHub as a common medium for managing a book’s source wasn’t and still isn’t as normative as I hope it will soon be.
Without getting into all of the other details, it simply turned out to be the case that I’d expected my readers to go to far too much trouble to enjoy the book. Even though my intentions were to originally write a how-to manual on social web mining targeted primarily for software developers, I’d inadvertently tailored the content for a particular niche audience of mostly Linux and Mac OS X developers who were familiar with Python toolchains and could wade through installing and configuring some fairly complex software dependencies and generally make their way around a terminal environment to troubleshoot problems.
One of the primary consequences of the assumptions I’d made is that I’d effectively limited (and frustrated) much of my readership by relegating Windows users to a kind of second-class “you are mostly supported and smart enough to figure it out own your own” status. Check the stats sometime. Regardless of your personal preferences, Windows users still constitute the vast majority of the desktop/laptop market. Treat them with second-class status at your own risk.
…Windows users still constitute the vast majority of the desktop/laptop market. Treat them with second-class status at your own risk…
To sum a lot of this up, I’d done my best to keep the GitHub repository up-to-date with code fixes and completed a couple of significant revisions to the text itself to keep it relevant, but I’d done little else to help myself with the first edition of Mining the Social Web. In hindsight, I was more or less treating the book as a static product and could have taken a much more proactive stance toward improving it with customer feedback as part of a continuous cycle driven by customer feedback.
Late last year as v1.1 of Twitter’s API approached and menaced me with widespread breakage of code examples across three chapters, I committed to start applying lean to the task of producing a second edition. What I thought would be a series of light revisions and a new chapter later proved to be a much more arduous task that once again consumed the majority of my nights and weekends across the first half of this year. The primary difference this time around, however, that I was working smarter.
The Second Edition: Build-Measure-Learn
Although I was (and still am) proud of what I had accomplished with the first edition of Mining the Social Web, a lot has changed over the past ~2.5 years that presents the opportunity for me to release a new edition that is superior to the original in nearly every single way. Having learned a lot about the lean startup over the past year or so, I committed early this year to apply the the lean principle of build-measure-learn to the task of producing a second edition. It’s still early, but the continuous cycle of gathering feedback, measuring it, and learning from it seems to have made a critical difference.
Any book, including a book about technology, needs to tell an entertaining story to be successful, and if the reader isn’t learning and having some fun along the way, it’s hard to imagine that you’ll enjoy very much success as an author. For a technology book like Mining the Social Web, I tend to think of the ability to seamlessly follow along with the code examples as you read the book as one of the most critical components of the user experience for the book. Just like anything else, a poor user experience will result in a poor product regardless of good (you think) the product really is.
…Any book, including a book about technology, needs to tell an entertaining story to be successful…
Two of of the more notable developments in technology that have impacted the user experience for the second edition of Mining the Social Web are Vagrant and IPython Notebook. Vagrant is essentially a way to box up a development environment and distribute it, and IPython Notebook is an interactive browser-based client for Python (and other programming languages) that lowers the bar for conducting and sharing data science experiments. It might be helpful to think of it this way:
- Vagrant provides a virtualized server environment (and is configured with all of the necessary dependencies that you’ll need so that you don’t have to configure/install them yourself)
- IPython Notebook provides a web application running on that server and provides your user interface for Python programming.
The net benefit to the reader is that these two collective innovations trivialize the effort required to run the code examples. While each of these technologies are powerful independent of one another, there is a synergy between them that makes the total greater than the sum of the parts, and I have been calling that collective synergy the “virtual machine experience” for Mining the Social Web. Let’s consider each of these items in more detail.
UX Improvement #1: Configuration Management (Vagrant)
Whereas the first edition of Mining the Social Web included a GitHub repository that I tried to keep up to date, it had no configuration management at all and did nothing at all to ease the pains involved in configuring and installing software across multiple versions of platforms. Even though the example code was almost completely Python, it’s wasn’t trivial to get all of the third-party package dependencies installed even if you were a Python developer. The inclusion of multiple database technologies and their own third-party add-ons only made matters worse in some circumstances. In retrospect, there was just too much disparate technology crammed into the book, and none of it was being properly handled with configuration management.
Applying lean practices, I’ve parsed the feedback, generalized some patterns, and very actively solicited feedback from early readers and reviewers while working on the second edition as part of trying to develop a “minimum viable book.” All of the source code is still managed at GitHub, but included with the source code is configuration management software that bootstraps a virtual machine, which guarantees that all of your software dependencies will be versioned and installed appropriately. Effectively, this allows me as an author to control the user experience for readers and all other consumers of the code who choose to use the virtual machine.
There is minimal effort required to activate and manage the virtual machine, and in addition to writing an appendix describing the few steps involved, I’ve also produced a ~3 minute screencast that visually steps through the process that you might enjoy watching:
The goal is simple: anyone on any platform should be able to install the virtual machine. No advanced skills should be required; just point and click. To be perfectly clear with a concrete example: a user who has never worked with developer tools, typed in a terminal, or even programmed before should be able to follow the instructions and be on equal footing with an advanced software developer who can wade through all of the idiosyncrasies in setting up a development environment.
…The goal is simple: anyone on any platform should be able to install the virtual machine…
UX Improvement #2: Graphical User Interface (IPython Notebook)
Just providing a Vagrant-backed virtual machine would be an innovation for a tech book in and of itself, and while it does provide a better user experience for the reader of a book (as I’ve framed it), the innovations stop once the virtual machine is installed. At that point, the reader has everything that is needed to start running the code, but nothing has been done to decrease the friction involved with actually running the code or authoring new code. Even with a terrific virtual machine, the reader is still expected to use SSH to login to the virtual machine and start a Python interpreter session in a terminal. Given that everything has been mostly “point and click” up to this point, why begin re-introducing aggressive assumptions about reader’s skills by beginning to work exclusively with a command line interface again?
Fortunately, IPython Notebook is effectively just a web application, and once your virtual machine is up and running, you just point the same browser that you use for everything else to http://localhost:8080, and it presents all of the source code for the book and the power of the full Python interpreter. You just select the chapter you want to work with, and from there, it is little more than “point and click” to follow along and execute the provided code, author new code, copy and paste examples to compose new programs, and otherwise do whatever you’d like with the full power of the Python interpreter.
I can’t think of a better way to motivate someone to program or learn more about data science than to give them a powerful development environment with a user interface like IPython Notebook as an interface. It provides turn-key starting templates for mining social data and allows a person to see the results of data science experiments before they necessarily even understand all of the nuts and bolts that are involved…and that’s the beauty of it.
After all, isn’t something like that what we were all drawn towards when we first wanted to become scientists or engineers or programmers in the first place? It was the awe of seeing some phenomenon that we didn’t fully understand but thought it was so amazing that we wanted to commit to learn more about it. That’s where the digging in and learning began…
The GitHub source code repository for the second edition provides good instructions on getting started if you’d like to pull down the project and see all of this for yourself.
The virtual machine experience for Mining the Social Web is a powerful concept and raises the bar for just about any tech book involving code examples. However, it does require change to existing patterns of behavior since many readers may resist using it at first, positing that it might be more trouble than it’s worth to get installed. The current bet I’m making is that short ~3 minute screencasts will largely mitigate those kinds of concerns as well as provide an ample demonstration of what is possible to accomplish with the virtual machine and motivate adoption.
We will never become a society of programmers with development environments, and requiring users to have a development environment or even think in terms of a development environment is a sure way not to bring social web mining to the masses. What society really needs is a product that enables a mere mortal to transform curiosity about their social data into insights as easily as a person can currently interrogate the web with a document-oriented keyword search.
…What society really needs is a product that enables a mere mortal to transform curiosity about their social data into insights…
Admittedly, the virtual machine experience that I’m providing with Mining the Social Web is a development environment and is not the product I am alluding to. However, it’s a step in the right direction — the current incarnation of a certain minimum viable product — and I believe that it has the potential to significantly decrease the friction involved in conducting data science experiments and be an enabler of innovation and education in this space. Products built on top of it or an environment like it are sure to follow once a budding entrepreneur identifies the right problem and bolts a sufficient user experience on top of it.