Declining Entrepreneurship in the US: Fact, Fiction or Some Third Thing?

Ian Hathaway and Robert Litan at the Brookings Institute just came out with an interesting working paper showing that new business creation and entrepreneurship in the US has been declining since the 1970s [PDF warning]. Since people at Brookings know how to write a good paper for public consumption, the lede is right there in Figure 1.

And also since the Brookings Institute is really good a press releases, this report has been part of today's twitter tales. Plenty of people are blaming this on OABAMA, OBAMA TAXES, and OBAMA REGULATIONS.

I don't doubt the author's analysis, but entrepreneurship data is tricky, tricky stuff. I've spent far too many hours yelling at my excel spreadsheets, wondering why they don't add up, only to realize I was using the wrong definition.

What's a startup? What's a new firm? Are we talking about every new firm registered with the IRS? Firms that have incorporated with their Secretary of State? What about firms that have more than nominal turn over? Firms who employ more than just the founder? Are we counting farms? Franchises?

There are a lot of reasons why aggregate levels of entrepreneurship would fall over time. As much as I am loath to blame regulations (in the over 100 entrepreneurs I've interviewed as part of my research, I don't think any have ever mentioned regulations as a major challenge), but it's not like there are fewer building codes out there than in 1970. The idea that slightly higher marginal federal tax rates discourage entrepreneurs (who are unlikely to make a profit during the term of the president who raised or lowered the taxes) is such a stupid idea that it doesn't even merit a clever joke.

Rather, I think there are fewer opportunities for entrepreneurs out there. A Wal-mart in town means that there's no need for the local 5 and Dime, the appliance shop (and now that TVs are so cheep, no need for the TV repair shop). Now, consumer facing small businesses are only a small part of the overall entrepreneurial scene. But, it's hard to ignore the fact that major firms and franchises are able to out compete most independent entrepreneurs in the same field. McDonalds franchise owners work hard, but they have a lower failure rate than any independent restaurant.

The authors say they've got another paper in progress that'll control for external economic factors and that the decline in entrepreneurship survives the addition of control variables. But something like this doesn't show up in any econometric variable I know of. It represents a structural change in the economy.

Mapping UK Startups

Since I first washed up on the chalky (more peaty, I guess) British shores, I've been doing my best to get an overview of the geography of UK startup activities. That's my job after all: to figure out where the entrepreneurship hots spots are and why those places are great areas for startups. I forgot about this for a while after being buried in other work and teaching, but I was reminded about this by a recent report by Startup Britain about the the UK's entrepreneurial hotspots. They were kind enough to release the underlying dataset, which was produced by Companies House. The data is a report of how many new firms were registered in every postal code area in the UK.

This data set helped me rediscover the joy and the pain of making maps while watching re-runs of Law and Order.

Plugging the data from Startup Britain into QGIS (a nice, open source GIS platform that actually runs on OS X!) produces a nice visualization of where the UK's entrepreneurs are. UK Startups

This is a pretty diverse geography of startups, but it's about what we'd expect. High levels of entrepreneurship in the Southwest and up into the Midlands, lower levels of entrepreneurship in the Northeast and in the Highlands.

We can make this a bit simpler to get an even broader overview of the UK's entrepreneurial geography. This is an equal area map of the average number of startups in the postal code areas contained within 25 KM hexes I think this is the prettiest map I've ever made.

With this, you can see a very clear pattern of high rates of startup activity in the area between London and Manchester, with fewer activity elsewhere.

But XKCD teaches us that most maps just map population.... XKCD teaches us every lesson.

So, we've got to control for population. This is where I ran into the wall of horrible data collection. It's pretty dang easy to get population for postal code areas England and Wales from NOMIS. But, because of Events over the past 700 years, Scotland gets it's own census and it's not very good at showing what data they have and letting you have it. After several hours of yelling at the computer, I finally found what I needed and could make a map of the number of startups per 1000 people in every area code in the UK (except for northern Ireland, Gibraltar, and the Channel Islands, because I just couldn't bring myself to care.) Startups per 1000 people

This is..... ummm.....less interesting. London is really the only place where we see huge deviations from the mean of 20.66 new firms per 1000 people. Indeed, if we look at a histogram of the log of startups per capita, we see it's really concentrated around the mean. Has anyone writen a history of histograms?

This is because there is a very clear relationship between the population of a postal code area and the number of startups. The correlation coefficient is 78%! This is very apparent when you graph population against startups. The colors! From the graph, it's clear that there are very few regions that have an exceptionally high rates of startups per capita, but there are plenty of regions in the North and the North West which have very low rates.

This is even more apparent when we make a box plot of startups per capita by region. I guess it's more of a violin plot than a boxplot. London does have a lot of areas with exceptionally high levels of entrepreneurship per capita. Of the 6 area codes that have more than 1 reported startup per person, 5 are in London (EC1V, SW1Y, EC4A, W1B, W1S) and one is in Birmingham (B2). I imagine these codes are some weird corporate or historical zones where no one actually lives (maybe just the Queen and her Dogs), which totally throws off the per capita calculation. But even with that, the average startups per capita in London is still significantly higher than the national mean.

So, where do we go from here. The first thing I want to do is try to break this down by industry. In terms of economic development, all new firms aren't created equally. A consulting LLC will likely never employ more than a few people, but a new manufacturing firm can employ many people and export products abroad. We also need to look at firm births as well as death. What regions are gaining startups and which are losing them? We also need more data to figure out what's driving entrepreneurship. High populations do mean more economic activity, but this doesn't help policy makers figure out how to encourage entrepreneurship. We need to look at things like education, levels of immigration and migration, and that fun stuff.

So, I've got a lot of librarians and statisticians to yell at. I want to thank everyone on the twitter-sphere who encouraged me to make these maps, it was a great excuse to learn some new tools and data sources.

Big Data and Deep Data

I'm officially done with my dissertation — It's been handed into to ml committee and I couldn't change anything, even if I wanted to. This puts me in an odd position: for the past 24 months most of my days were spent working on my dissertation, either analyzing my interviews, outlining my ideas, writing or editing. Being done with this has left a pretty big hole in my daily schedule. I've started work on a few other projects to fill this gap, projects that have me working with entirely new types of data than in my dissertation. My dissertation research was interview based. I conducted 110 interviews which produced something like 70 hours of tape and over 3000 pages of transcripts. I have lots of detail on the 80 entrepreneurs I talked to. I know how and why they started their company, how they raised money from investors or why they've avoided it, the challenges they've faced and what they did to overcome them, if they've networked with other entrepreneurs and what they talked about.

This data is amazingly deep, but in the grand scheme of things it's very small. I talked to about 1/3 of the high-tech entrepreneurs in each city who happened to be on a business directory I used. So, when I found really cool things in my interviews, like the fact that most entrepreneurs in Waterloo actively searched through their own social networks to find mentors but those in Ottawa mostly relied on their parents or former business partners to provide business advice, it's hard to say if this is something True for everyone in the city or if it was just a coincidence. There are a few statistical tests to try to figure out what's real and what's an illusion, but they can only go so far.

The new project I'm working on gives me access to fantastic datasets about innovation and economic development in Canada. This includes the famous Dun and Bradstreet directory, which is the biggest dataset I've ever played with. Clocking in at 1.5 gigabytes, it contains information on more than 1.5 million Canadian firms. I would consider this to be on the very small end of 'big data.' For someone studying entrepreneurship, this is a godsend. I can now tell you, for instance, between 2001 and 2006, there were 669 new high tech firms founded in Toronto* and that the average sales of these firms are around $360,000. I can also make really cool pictures like this, which shows that there is a positive relationship between the proportion of immigrants in a region and the proportion of high tech firms in every province except Saskatchewan and New Brunswick.

But as I work more and more with this data, I'm beginning to see its limitations. I know things about a whole lot of firms, but I don't know much about them. With the D&B data, I essentially know a firm's name, it's address, what year it was founded, what industry it's in, how many employees they have and a guess about their sales number. In aggregate, these data can tell me many things — which regions have the most startups, which industries seem to grow the fastest, what's the relationship between workers and sales across the entire country. But it also raises lots of questions that the data can never answer.

Looking at one record at random, I know that Bait Consulting Inc. of Thornhill is a consulting company that was formed in 2001 and which has one employee and an estimated 120,000 in sales. But unlike in my dissertation research, I don't know anything more. I don't know why the company was founded, I don't know why it was founded in Thornhill instead of Toronto or Mississagua or Cambridge. I don't know how its founder learns about the market or finds new customers.It's difficult to figure out if a government policy is working from this data, or how an entrepreneur is affected by where they live.

That's the big difference between big data and what I'd call deep data. Big data can tell you a small number of things about a whole lot of things. You can do a whole lot with this, but you always need to be aware what it's not telling you. Only so many different questions can be asked on surveys — the more you ask, the fewer people will respond.

Qualitative data collected through long, semi-structured interviews, is deep data. I know a lot of about the people I talked to. Not everything, and many of the responses are biased by the respondent wanting me to think they are really skilled entrepreneurs. I know more than a binary variable, I know what they did, why they did it, and what that has caused. I can understand what practices they took to start and grow their firm and relate those back to their larger cultural context. But again, there's that tradeoff: I know a lot about a very small number of people. And I have it easy, people doing ethnography or observational research will have hundreds and hundreds of hours of recordings or notes about an even smaller range of people.

It would be nice to think that we can meet in the middle, but working with big qualitative datasets requires a totally different set of skills than working with big quantitative datasets. Very few people are equally as able to produce a grounded analysis of a collection of interviews and a Baysian analysis of a census dataset. But there is value in each, and the challenge is being able to figure out the right way to collect data to solve a problem. The platonic ideal is for quantitative and qualitative data to be used together to prove a larger point, but this kind of research is expensive and rare. But it might be the only way to get a real sense of what's going on in the world around us.

*This seems really low to me and I'm already working with librarians and others to figure out the proportion of all firms the D&B directory accounts for

New article: The sources of regional variation in Canadian self-employment

I just got the final version of my new paper in the International Journal of Entrepreneurship and Small Business (Vol 15, issue 3, pages 340-361 for those keeping track at home. E-mail me for a copy). This is my first solo paper and the first paper that I controlled from start to finish. It's not directly related to my dissertation, but rather an outcome of what I saw as a gap in the literature: the lack of any research on what regional economic and social factors are associated with local levels of entrepreneurship and self employment. There is research on this topic from dozens of countries, but none yet in Canada. I wanted to highlight two tables from the paper. The first was part of the lit review. Like I said, there have been dozens of papers since the 1980s that have examined the regional causes of entrepreneurial activity. Normally these are regressions based on census or tax data on a metropolitan level, but some of the more advanced work employ high level statistical approaches to giant, micro-level datasets. But, there has yet to be a serious attempt to synthesize this research. The challenge is that these papers employ a variety of datasets and examine a variety of countries at a variety of times, making it difficult to really compare. But after many, many hours spent reading articles and working with spreadsheets, I was able to create this table:

Significant findings of past research on regional entrepreneurial determinates

The big takeaway from this table is (1) it's easy to see that things that proxy economic growth, like population growth, and the presence of other startups, are generally constantly associated with higher levels of entrepreneurial activities. We also see interesting differences between countries. Personal wealth has almost no effect on German entrepreneurship, but it is shows to cause it in countries like Sweden and the US. It's a difficult task to tease out if this is more related to differing national economies, or due to the different statistics and methods used by the various papers.

The second table are the results from Canada. Regression results of non-agricultural self-employment in Canadian census metropolitan areas

I argue in this paper that Canadian self-employment appears to be mostly driven by local economic growth. Population growth, a fairly good proxy for economic growth (people aren't moving to Fort MacMurray for the culture) has a positive effect and unemployment has a negative effect. Nothing too surprising there. Barriers to entry are important too, economies dominated by a few large employers have less entrepreneurship than those with a pre-existing base of small businesses. Most surprising was the role of taxes, I found that areas with higher commercial-to-residential tax ratios had higher rates of self-employment than other regions. I don't know what to make of this last finding: it'll take some more work to figure out if this is a real issue or just a statistical artifact.