In my last article I discussed how the failure to find the Heartbleed bug sooner was in some sense a failure to refine or deploy what is otherwise effective technology for static analysis. In particular, commercial static analysis tools purposely will ignore potential bugs so as to avoid reporting too many false alarms, i.e., favoring completeness over soundness. The companies that make these tools aim to provide a profitable service to a broad market, and their own investigations indicate soundness is not important for sales. Instead, to be viable, tools must help developers find real, important bugs efficiently, and not necessarily every bug. A challenge to researchers is to find ways to push the business proposition back toward soundness while retaining efficiency (and other desirable criteria); Andy Chou’s POPL’14 keynote outlines other useful challenges.
While Heartbleed is ostensibly about the adoption and improvement of static analysis, in this article I explore the related question of fostering the adoption of programming languages. I summarize impressive research by Leo Meyerovich and Ariel Rabkin on adoption research questions and adoption practices that appeared at OOPSLA’12 and OOPSLA’13, respectively. I think there are some interesting results here, with implications for improving the adoption of languages. Their results also raise new questions for further research (but too late for yesterday’s POPL deadline — good luck to all submitters!).
Research and language adoption
Researchers (and the government agencies and companies that fund them) have expended significant resources on the study, and invention, of programming languages and their features.
Some of the results have been adopted in practice. For example, the languages Haskell, Scala, and OCaml started as academic research projects and have seen mainstream use. Mainstream languages have also adopted researcher-designed features such as garbage collection, exceptions, closures, type inference, parametric polymorphism (generics), and more, though often after a decades-long delay. Nascent languages Swift and Rust have continued this trend, causing PL research icons like Bob Harper to express hope that good ideas from research eventually do take hold.
At the same time, academics often lament that languages with poor designs take hold far more successfully. For example, Javascript’s language design creates all sorts of headaches for ensuring basic security properties, prompting much research in trying to fix it or work around it (cf. Javascript, the Good Parts). When designing analysis tools for Ruby we found that Ruby contained several unfortunate misfeatures, such as highly ambiguous parsing and surprising control flow. Work from Jan Vitek’s group at Purdue found the popular statistical language R to be a “rather unlikely linguistic cocktail that probably never would have been prepared by computer scientists.”
The question is: Why do some languages succeed (in getting adopted) where others fail? Answering this question would help researchers do more impactful research, either by packaging their work better, or by changing it to address problems they hadn’t appreciated.
SocioPLT
Ariel Rabkin and Leo Meyerovich, together as graduate students at UC Berkeley, decided to attempt to answer this question. They refer to their investigations as SocioPLT.
At OOPSLA’12 they published a research agenda: the paper contains observations from PL history, some comparisons to related fields (such as the general theory of diffusion of innovations), and some particular hypotheses and research questions.
At OOPSLA’13 they published some results: this paper (which I summarize below) answers several questions posed in the first paper using survey data and source code analysis.
There were three surveys. One was carried out at the outset of a massively open online course (MOOC) on a software as a service (SaaS), garnering 1,142 responses; one came from a website called The Hammer Principle which allows respondents to compare languages in various ways, garnering roughly 13K responses; and one came from a link via Slashdot that announced a visualization of the data from the Hammer survey, garnering 1,679 responses. The survey participants were largely professional developers; the median age of the MOOC participants was 30, and that of the Slashdot participants was 37.
The source code analysis considered 217,368 projects hosted by Sourceforge between 2000 and 2010, considering in particular project metadata including the programming languages used, the primary project category (e.g., accounting), date of creation, and the project’s owners. They also considered data from Ohloh that tracks over 590,000 projects hosted on SourceForce, Github, and elsewhere, and supports fine-grained queries about project contents.
Which languages are most preferred?
The six most popular languages used in Sourceforge projects are probably not surprising: Java, C++, PHP, C, Python, and C#. Overall, the use of languages follows a heavy-tailed power law, with the top six languages accounting for 75% of projects, and 20 languages accounting for 95% of projects. The top six are diverse in their character. Java and C# are statically typed (i.e., they must be deemed type correct before they run), PHP and Python are dynamically typed (i.e., type errors are caught during execution), whereas C and C++ are weakly typed (i.e., objects of one type can be treated, perhaps incorrectly, as if they had another type). Java, C#, and C++ are all object-oriented languages.
Notably, no functional languages were in the top 20. I found this surprising (Tcl is more popular!) given various stories I’d heard, such as from the hype surrounding Microsoft’s release and support of F#, the use of OCaml at Citrix/Xen and Jane Street Capital, and the use of Haskell by financial firms like Morgan-Stanley. Obviously I was overgeneralizing these data points. On the other hand, perhaps the data is somewhat stale: the SocioPLT top 20 comes largely from 2010 data, and arguably there has been an increase in interest in functional programming since that time.
What factors correlate with adoption?
The paper shows convincingly that the single factor that most strongly correlates with both preferring and actually using a language is good libraries, particularly open source libraries. This is not surprising to me. Imagine Ruby without Rails. Ruby was released in 1995 and Rails in 2004: had you heard of Ruby before you heard for Rails? Or, imagine Java without the collection libraries, or the more recent concurrency libraries. Using the language before these things existed was just a lot more painful.
One interesting result was that simplicity was the least ranked of the factors respondents deemed important, identified by about 25% of respondents, compared to 60% for libraries. Safety/correctness was deemed important by nearly 40% of respondents. So if we take this result at face value, programmers are willing to deal with a complicated language in order to get other benefits, like correctness. On the other hand, there are mixed messages here. For example, development speed was important to 40% of respondents, and we would think that simplicity would help that. Perhaps the definition of the word “simplicity” is key. The lambda calculus is very simple by one definition (syntax and semantics), but using it to write Windows is not a simple task!
Interestingly, a language’s performance was not in the top five factors when choosing it for a project. Instead, other extrinsic factors dominated, including the language used for existing code bases and the experience and comfort of programmers on the project development team. On the other hand, when asked why they prefer a language independent of its use for a particular project, respondents favored performance just below good library support. Perhaps these results are consistent in that many projects are written in high performance languages, and many people are familiar with those languages, so the extrinsic factors tend to line up with good-performance languages. In general, developers claim to enjoy languages the believe are expressive, and produce elegant code.
The PL research community thinks a lot about (static) types; e.g., see Benjamin Pierce‘s well-respected book, Types in Programming Languages. The survey results show that developers place comparatively less value on static types. According to the MOOC survey, only 36% of respondents “see the value” of static types, and only 18% “enjoy using” static types. Unfortunately, the MOOC survey population is probably biased in favor of dynamic languages given the MOOC course topic on Software-as-a-Service. Indeed, the Hammer survey showed a more positive view of types, with statically typed languages strongly correlated with statements such as “If my code in this language successfully compiles there is a good chance my code is correct.” However, this survey agreed with the MOOC survey on the lower developer preference for statically typed languages.
Education had a strong influence on whether respondents knew functional or mathematical languages, but little influence on whether they knew imperative/OO or dynamic languages. For example, respondents who had seen functional programming in college claimed to know a functional language 40% of the time, whereas those who had not seen one in school only knew one 15% of the time. Those who had seen an imperative/OO language in college knew one 95% of the time, but those who hadn’t knew one 87% of the time. This makes sense, given the state of language popularity and the language decision making process: If most code is in Java/C/C++ (imperative/OO) and most new projects are strongly influenced by the language of past projects, then most developers will (by now) know Java/C/C++, and this familiarity will further strengthen the preference for Java/C/C++ in the future. This pattern could ensure that if you did not see a functional language in school, you might never see one.
What next?
There are many other results in the paper that I have not covered; I encourage you to read it. All of the results provide useful food for thought for PL researchers when aiming to increase adoption.
The most obvious thing to do is focus on libraries, broadly construed (e.g., think of Rails as a library). This is already happening in some cases: If a functional language is to break into the top 20, then perhaps it will be Haskell due to the rise of Hackage, or OCaml due to the rise of OPAM. Scala, which supports functional programming paradigms, almost certainly got a bump in popularity by interfacing easily with Java’s libraries.
Another thing to do is focus on education. In the modern on-line world, education does not necessarily mean the college classroom. MOOCs can be a good medium too; cf. Dan Grossman’s class on programming languages, which teaches using Standard ML, Racket (Scheme), and Ruby. Or, imagine a tutorial in the style of those at Code Academy, which teaches Javascript, Python, Ruby, and more.
In terms of pushing the benefits of types, perhaps we can have our cake and eat it too: Aim for both the expressiveness of dynamic typing and the documentation and safety benefits of static typing (both of these benefits were recognized in the survey). One way to do this is to push research on scripts to programs and gradual typing, which aim to make static typing optional, but in a sensible way. Academic languages like Racket, and industrial languages like Typescript, have adopted this approach.
It is important to note that none of the results I’ve summarized consider the actual effectiveness of languages, just the state of their use and programmers’ stated preferences. It would be very interesting to attempt to gather evidence of effectiveness, and use that evidence to motivate change.
I’ve heard Joe Armstrong tell the story that in the early days of Erlang‘s development at Ericsson they had two teams build the same system, one in Erlang and one in C++. The Erlang system was completed successfully and the C++ project kept missing deadlines and was ultimately abandoned. This experience led Erlang to be adopted company-wide (though that mandate has since lapsed). The motivation for the ICFP programming contest was in part to provide similar evidence, but I do not know if an analysis of outcomes has ever been done. Gathering evidence for effectiveness is also behind our Build-it, Break-it, Fix-it contest; we’ll see what happens there.
Another obvious next step is to continue to perform SocioPLT research and use it to motivate the technical research the PL community is already doing. There are many open questions in the OOPSLA’12 paper, and much validation still to be done on the results I’ve summarized above.
One very important caveat when considering the research mentioned is that it is ancient, by today’s standards. I don’t currently use SourceForge, and barely anything I use or do is SourceForge-based. The world has changed; in fact, anecdotally, I would guess that the world changed drastically between 2011 and the present (2014).
The action is all on GitHub now, and there is now widespread dissemination of information about choices (and much-improved ecosystems for libraries and documentation and community meetups and conferences). I would like to see an updated report based on analysis of GitHub.
It is not a coincidence that Swift and Rust have emerged just in recent years. I predict that in 5 years, no data from before 2010 will be relevant any longer.
It would be interesting to track the top-N languages on a year-by-year basis, so we can test your hypothesis. I have a feeling that SourceForge’10 top 10 languages is not too far off from the GitHub/Bitbucket’14 top 10. I also suspect that the correlation with libraries and adoption is a lasting result. But continuing the SocioPLT line of work, by re-running the surveys and keeping the source code analyses up to date, would provide evidence for such suspicions.
June 2014: RedMonk graphed rankings:
Besides the above plot, which can be difficult to parse even at full size, we offer the following numerical rankings. As will be observed, this run produced several ties which are reflected below.
1 Java / JavaScript
3 PHP
4 Python
5 C#
6 C++ / Ruby
8 CSS
9 C
10 Objective-C
11 Shell
12 Perl
13 R
14 Scala
15 Haskell
16 Matlab
17 Visual Basic
18 CoffeeScript
19 Clojure / Groovy
Read more: http://redmonk.com/sogrady/category/programming-languages/#ixzz37T5hhP9v
I agree that it would be great to repeat the analysis with current data, and ideally to do so regularly. But most of the data in the paper is post-2010. In particular, the surveys were conducted over the summer of 2012.
I really liked reading this post. I found the summaries very useful and interesting. One additional thought I had for next steps, along the lines of doing more SocioPLT: We PL enthusiasts would benefit from a better understanding of human psychology.
First, PLs are about interfacing computers to people, and in this sense, PLs are the original “Human-Computer Interfaces”. Understanding why a language is or is not pleasing to humans seems to be a much more than a technical question—it seems to be a question (largely) of psychology.
Second, we need to understand how much to trust our own sense of self-awareness. It’s tempting to do SocioPLT research by asking people to systematize how they make their decisions (e.g., by giving them surveys that ask them to rank different factors); but, I suspect that this only tells the researchers about how people think that they think, and not necessarily how they actually act.
For those of us with little or no exposure to research in psychology, I would expect that more knowledge of the subject would lead to interesting surprises.
The research is naive at best and ivory tower at worst. Languages become popular for business reasons by becoming dominant choice for solving some problem. Libraries come later. Currently no functional language will become mainstream until there is business need for it.Shameless plug See How language becomes popular http://t.co/JU2eju46Hy read @AlessioStalla comments in opposite order
Thanks for the link to your post; interesting observations there and by the commenter.
I’m not sure why what you are saying disagrees with the observation about libraries correlating with deployment. Business reasons are supported by strong libraries, are they not? That is, if there is more (hardened, well-designed) code in language A that is readily available and can help you build your product, and much less than in language B, then to me it’s not surprising that people would me inclined to choose A. I see your point that there is a chicken-and-egg issue, though. The OOPSLA’13 paper makes the same observation, now that I think of it, claiming to report correlation, not causation.
Can you say why you think the research is naive or “ivory tower”? The paper is reporting outcomes based on surveys of 1,000s of developers, and analysis of even more actual projects. Why is this naive? Perhaps you think this methodology is flawed or its conclusions not supported by the data? What would you do differently to answer the question of adoption scientifically?
Hi Mike,
Mostly nailed it! Our reviewers rightly asked us to cross-validate our SourceForge result with other data sets, so we checked against more recent Ohloh/github/etc. data for some of the more doable parts.
We had to work a lot on subjective vs. objective and causation vs. correlation (and all of the above are useful.) Interestingly, SourceForge data and some of survey questions gave us longitudinal data (“this then that”) beyond the more typical cross-sectional. Likewise, for the survey, we asked about recent decisions about recent projects. Focusing on these concrete instances gave significantly different results (and arguably more objective ones!) than when we asked about decisions in general.
Understanding languages in the small vs. in the large is fascinating, and part of why we used both repos and surveys. Going forward, I’d love to see more about feature evolution. In our Onwards paper, we used the example of the long road traveled by continuations/generators, and in the OOPSLA one, a bit on generics vs. templates. If we are to make a science out of language design, understanding individual features seems pretty foundational.
I couldn’t agree more about the need to study language features and their evolution. Our group also started looking into feature adoption trends, specifically for the Java language which we presented at ICSE this year. You can read the paper here: http://goo.gl/zwvgG3
What bothers me about this kind of study is the following: Let’s replace “adoption of programming languages” with “food people will eat.” If you survey a lot of people, they will say they like foods that taste good (i.e., sugar, fat, salt) even if there’s good evidence that it’s bad for them. The danger is drawing the conclusion that either (a) food that tastes good must be good for you because, well lots of people eat it, or (b) the way to get people to eat really good food is to add lots of sugar, fat, and salt. You know, the way I deal with oatmeal or salads.
It’s a classic design fallacy that your customers know what they want, and it’s what they already have. We see that kind of stagnation in lots of areas (e.g., movie sequels, UIs, C syntax) because no one wants to take risks.
That’s not to say that these studies and articles are making this kind of mistake. Rather, I worry that this is the kind of sound bite people will take from it. That’s particularly dangerous for academic researchers who should be thinking far ahead of today’s practice. Well, really I mean that the *whole* research community shouldn’t be focusing on how to add the salad dressing, bacon bits, etc. to make it palatable today.
Good points, Greg. I think I basically agree with everything that you say. It would be bad for the takeaway to be the sound bite that you mention.
The way I look at this study is that it is trying to understand ground truth scientifically, rather than rely on impression and intuition. We might have had hypotheses for which programming languages were equivalent to salt and sugar, to use your analogy, and why. But now we have actual data, so we can speak with more confidence. I for one was surprised at several of the results, e.g., safety is more valued than simplicity, at least in peoples’ minds. Such results could lead to new research.
The question of what we do with SocioPLT information is very important. We certainly do not have to keep giving people the languages they currently claim to like. My post talked a lot about functional languages because I think they are very effective (which is why I use them, and teach them), despite the fact that most people appear not to use them in industry. We can use data like that in this study to hopefully figure out how to foster adoption of languages that, for reasons people surveyed may not yet appreciate, are the equivalent of fruits and vegetables and will lead to healthy code. 🙂
I agree pretty strongly with both of you. To Greg’s point, popularity is an unsatisfying goal for a researcher. For practicing language designers, however, understanding it is huge: they’re constantly balancing engineering with community building. I’ve talked with many now about frustration in knowing what’s actually happening with their users and how they should be managing their rollout, language/library/etc. evolution, etc.
For the theory of language design, I think something exciting is happening. An important category of languages is those that are designed for people, and more specifically, for groups of people. There’s a lot of room for foundational work and, I hope, in feature design.
How can PL research help with library design?
Good question! On quick thought: There’s an important connection between the language’s features and the libraries you can write in that language. CPS-style interfaces to libraries probably don’t work well in C because it doesn’t have closures. Possibly another advantage of Scala in terms of adoption (on top of its backward compatibility with Java) is being able to express functional and OO paradigms, and thus have a broader diversity of possible library designs.
I know of one data point that suggests libraries are not enough for adoption.
In 2007, I used a a language called Nemerle. I was coerced into doing so by one of its authors, but I ended up really enjoying it. Like Scala, a primary goal was to put together functional and object-oriented features. Like Scala, it provided easy access to lots of libraries — it works on .NET. It also has a macro system that lets you do insanely cool things.
Yet, pretty much everyone I talk to never heard of it.
Agreed: important but not sufficient.
Pingback: Who teaches functional programming? | The PL Enthusiast
Pingback: 【翻訳】型安全性とは何か | POSTD
Pingback: 【翻訳】USトップ大学でも関数型プログラミングが余り教えられていない現実 | POSTD
Most type systems are actually absolutely truly terrible and horrendous …….
18% of programmers like to have a type system.
This is a really mind blowing number.
> Imagine Ruby without Rails. Ruby was released in 1995 and Rails in 2004: had you heard of Ruby before you heard for Rails?
Yes. I tried to look into it around ’99 but got nowhere due to the sparsity and low quality of information available in English. What I knew was that it was inspired by Python and Perl and could potentially become a viable competitor in the same niches as them. I don’t think I knew anything about the Smalltalk inspiration back then.
Pingback: Interview with Go's Russ Cox and Sameer Ajmani - The PL Enthusiast
(I know this is being posted REALLY LATE but I just saw this post REALLY LATE.)
I am reminded of the philophical issue in both Linguistics and Phil of Science and Phil of math:
Are these fields descriptive or perscriptive?
Your post and the articles it pointed to seem to be descriptive: this is what people use. You don’t seem to have an opinion on what people should use.
People in Phil of math sometimes sound rather odd since they are not mathematicians, yet they are telling people in math what they should do.
You are not in that category. You know more than the uses, not less.
So what to do with your results: try to combine what people like in their programming langs, with what is good for them, and have some sort of combination.
An example from food: Yogurt is good for you (so they say) AND people like it.
bill g
Good comment, Bill! I think that some of the ideas in this research have been at least informally in the minds of current language designers, and they are bringing them to bear in their designs. My quick impression is that Scala is one example of a language that people like, and is good for you.
Great post! My three biggest takeaways are:
1. Language use and preference correlate heavily with good libraries (i.e., strong ecosystem).
2. Programmers are willing to deal with a complicated language in order to gain other benefits.
3. Developers enjoy languages that are expressive and produce elegant code.
I am profoundly saddened by the last two. IMO, language (syntax) simplicity is vitally important. It lessens the cognitive burden and makes programming less stressful and more enjoyable. After all, programming is a *human* activity. Shouldn’t it be made as pleasant as possible?
“Expressive” languages are typically more complicated because language designers choose to pile on feature after feature, rather than choosing a minimal set of powerful and orthogonal features. I prefer Smalltalk and Go because I adhere to the latter philosophy.
Pingback: Jean Sammet, a Remembrance - The PL Enthusiast