[Offtopic] Re: [LINK] R programming language

stephen at melbpc.org.au stephen at melbpc.org.au
Fri Jan 9 22:01:23 EST 2009


The follow up NYTimes blog re 'R' by the same journalist:

 http://bits.blogs.nytimes.com/2009/01/08/r-you-ready-for-r/

There seems to be a cathaRsis taking place. 

My story published Tuesday on the R programming language has generated a 
flood of reader e-mail messages. The story covers the software’s broad 
usage and vibrant developer community in detail, but, in short, R helps 
people deal with large volumes of data in a wide variety of industries, 
including pharmaceuticals, finance and oil and gas. 

Also of note, the software is open source, meaning people can pick it up 
for free and make their own changes to the code. Such flexibility has 
inspired statistically minded people of all stripes to get behind R and 
make it a real success story. 

Most of the people reacting to the story expressed pleasure at seeing R 
receive mainstream attention. People chimed in with the unique ways 
they’re using the technology. Vhayu Technologies talked about its passion 
for tweaking R to help traders on Wall Street, while others discussed its 
increasing adoption at universities for everything from biology to 
economics. 

There were also some complaints that my story did not focus enough on S, 
which was a precursor to R developed at Bell Labs. John Chambers, now a 
consulting professor of statistics at Stanford University, drove much of 
the early S work at Bell Labs and talked with me at length about S and R. 
Without question, R arrived as a result of the fine work done with S, but 
it’s the rapid rise of R, helped by its open-source nature, that has 
proved so gripping. 

Speaking of R, Mr. Chambers said, “It’s way beyond anything we could have 
imagined at Bell Labs.”

If you’d like some more of S’s history, you’ll find it at the end of Mr. 
Chambers’s new book, “Software for Data Analysis.” 

In addition, the commercial potential of R is worth some further 
discussion. 

Pfizer was a prominent R user mentioned in the story. The company relies 
on R for its nonclinical drug studies and has shied away from using the 
technology for clinical research that will ultimately be presented to 
regulators. For such work, Pfizer instead turns to software from SAS 
Institute, which brings in more than $2 billion a year in revenue from 
data analytics software and services. 

Were Pfizer to use R in clinical studies, it would run the risk of seeing 
its research questioned or even rejected by regulators doubting the 
veracity of results based on what they view as an unknown quantity. 

“It’s very hard to displace the industry standard in those types of 
cases,” said Max Kuhn, associate director of nonclinical statistics at 
Pfizer. 

Of course, the Linux operating system over the course of many years has 
managed to rise from an unknown entity to one that has gained top 
approval from governments around the world for everything from handling 
top-secret files to being used for processing tax data. So we’ll see what 
happens with R in the long run. 

Revolution Computing stands as one company trying to push the commercial 
agenda forward with R.

While the base software is free, Revolution offers ways to speed up the 
software on certain applications and to run it on large computers. In 
addition, Revolution provides support services to customers like Pfizer 
and Bank of America. Intel’s venture capital arm invested in Revolution 
last year. 

Lastly, some readers had questions on exactly how many people use R. A 
number of people interviewed, including those who work most closely with 
the software, estimated the R population at 250,000. 

Intel Capital has placed the number of R users at 1 million, and 
Revolution kicks the estimate all the way up to 2 million. 

Such disparity often accompanies open-source projects, where it’s 
difficult to tell just how far a piece of software’s tentacles spread and 
how active the users really, um, R.


> Data Analysts Captivated by R Power
> 
> By ASHLEE VANCE  www.nytimes.com  Published: January 6, 2009 
> 
> .. R is the name of a popular programming language used by a growing 
> number of data analysts inside corporations and academia. 
> 
> It's becoming the lingua franca partly because data mining has entered 
> a golden age, whether being used to set ad prices, find new drugs more 
> quickly or fine-tune financial models. Companies as diverse as Google, 
> Pfizer, Merck, Bank of America, the InterContinental Hotels Group and 
> Shell use it.
> 
> R has also quickly found a following because statisticians, engineers 
> and scientists without computer programming skills find it easy to use.
> 
> “R is really important to the point that it’s hard to overvalue it,”
> Daryl Pregibon, a research scientist at Google, which uses the software 
> widely. “It allows statisticians to do very intricate and complicated 
> analyses without knowing the blood and guts of computing systems.”
> 
> It is also free. 
>
> R is an open-source program, and its popularity reflects 
> a shift in the type of software used inside corporations. Open-source 
> software is free for anyone to use and modify ..
> 
> R is similar to other programming languages, like C, Java and Perl, in 
> that it helps people perform a variety of computing tasks by giving 
> them access to various commands. 
> 
> For statisticians however, R is particularly useful for it contains a 
> number of built-in mechanisms for organizing data, running calculations 
> on the information and creating graphical representations of data sets. 
> 
> Some people familiar with R describe it as a supercharged version of 
> Microsoft’s Excel spreadsheet software that can help illuminate data 
> trends more clearly than is possible by entering information into rows 
> and columns. 
> 
> What makes R so useful, and helps explain its quick acceptance, is that 
> statisticians, engineers and scientists can improve the software’s code 
> or write variations for specific tasks. Packages written for R add 
> advanced algorithms, colored & textured graphs and mining techniques to 
> dig deeper into databases. 
> 
> Close to 1,600 different packages reside on just one of the many Web 
> sites devoted to R, and the number of packages has grown exponentially. 
> 
> One package, called BiodiversityR, offers a graphical interface aimed
> at making calculations of environmental trends easier. 
> 
> Another package, called Emu, analyzes speech patterns, while GenABEL is 
> used to study the human genome. 
> 
> The financial services community has demonstrated a particular affinity 
> for R; dozens of packages exist for derivatives analysis alone. 
> 
> “The great beauty of R is that you can modify it to do all sorts of 
> things,” Hal Varian, chief economist at Google. “And you have a lot 
> of prepackaged stuff already available, so you’re standing on the 
> shoulders of giants.”
> 
> R first appeared in 1996, when the statistics professors Ross Ihaka and 
> Robert Gentleman of the University of Auckland in New Zealand released 
> the code as a free software package. 
> 
> According to them, the notion of devising something like R sprang up 
> during a hallway conversation. They wanted technology better suited 
> for their statistics students, who needed to analyze data and produce 
> graphical models of the information. Most comparable software had been 
> designed by computer scientists and proved hard to use. 
> 
> Lacking deep computer science training, the professors considered their 
> coding efforts more of an academic game than anything else. 
>
> Nonetheless, starting in about 1991, they worked on R full time. “We
> were pretty much inseparable for five or six years,” Mr. Gentleman
> said. “One person would do the typing and one person would do the
> thinking.”
> 
> Some statisticians who took an early look at the software considered it 
> rough around the edges. But despite its shortcomings, R immediately 
> gained a following with people who saw the possibilities in customizing 
> the free software. 
> 
> John M. Chambers, a former Bell Labs researcher who is now a consulting 
> professor of statistics at Stanford University, was an early champion. 
> 
> At Bell Labs, Mr. Chambers had helped develop S, another statistics 
> software project, which was meant to give researchers of all stripes an 
> accessible data analysis tool. It was, however, not an open-source 
> project. 
> 
> The software failed to generate broad interest & ultimately the rights 
> to S ended up in the hands of Tibco Software. Now R is surpassing what 
> Mr. Chambers had imagined possible with S. 
> 
> “The diversity and excitement around what all of these people are doing 
> is great,” Mr. Chambers said.
> 
> While it is difficult to calculate exactly how many people use R, those 
> most familiar with the software estimate that close to 250,000 people 
> work with it regularly. 
> 
> The popularity of R at universities could threaten SAS Institute, the 
> privately held business software company that specializes in data 
> analysis software. SAS, with more than $2 billion in annual revenue,
> has been the preferred tool of scholars and corporate managers. 
> 
> “R has really become the second language for people coming out of grad 
> school now, & there’s an amazing amount of code being written for it,” 
> said Max Kuhn, associate director of nonclinical statistics at 
> Pfizer. “You can look on the SAS message boards and see there is a 
> proportional downturn in traffic.” (snip)
> 
> 
> “R is a real demonstration of the power of collaboration, and I don’t 
> think you could construct something like this any other way,” Mr. Ihaka 
> said. “We could have chosen to be commercial, and we would have sold
> five copies of the software.”
> 
> » A version of this article appeared in print on January 7, 2009, on
> page B6 of the New York edition.
> --
> 
> Cheers people
> Stephen Loosley
> Victoria, Australia


Message sent using MelbPC WebMail Server





More information about the offtopic mailing list