*The role of spillovers in research and development expenditure in Australian industries*(which I will refer to as ‘the

*Spillover*paper’). The paper describes an econometric model that uses data from Australian companies that conduct research and development (R&D), and looks at how R&D activity of other firms and public institutions affect a firm’s own R&D expenditure, i.e. the effects of ‘

**spillover**’ of R&D being conducted elsewhere. The paper also examines the impact of geographical proximity and clustering on these R&D spillovers.

Overall, the model indicates that there are positive effects on R&D expenditure due to spillovers from peers and clients to companies that are located nearby (within 25 or 50km). Furthermore, R&D expenditure by academia also has a positive influence on a company’s R&D expenditure within state boundaries. However, R&D spending by government bodies appears to have the opposite effect, seemingly ‘crowding out’ private R&D spending.

The study has important policy implications, because it suggests that public support for R&D, whether to private firms through grants and/or tax incentives, or through funding of research in universities and other public institutions, results in benefits not only to the organisation receiving the direct support, but also to other firms and institutions more broadly.

Significantly, the modelling provides further evidence that Australia’s reputation for having a woefully low level of industry/research collaboration (which is based on one rather dubious OECD data point) is largely undeserved. I have previously observed that Australian companies clustered geographically close to major academic institutions tend to file more patent applications, while research by IP Australia has shown a healthy rate of patent applications naming industry and research partners as co-applicants from Australia when compared with other OECD nations. The

*Spillover*paper supports this by showing a positive correlation between academic and industry R&D spending, particularly for companies and institutions located within the same state (and, in practice, probably more closely than this, although the paper does not break down academic R&D expenditure below state level).

I discuss further details of the model, and the paper’s key findings, later in this article. If this is all you are interested in, feel free to skip ahead. But first I would like to take the opportunity to explain the general process of econometric modelling for readers who may be interested in better understanding how economists think about the kinds of questions addressed by this paper, and how to interpret their results.

#### Basics of Econometric Modelling

Econometrics is the term used to describe the application of statistical methods to economic data. An econometric model is thus a statistical model for a relationship between various quantities pertaining to some economic system under study. Before discussing the actual model employed in the*Spillover*paper, I will illustrate the general approach via a trivial model of R&D spending.

Suppose we have data on the sales

**s**and R&D expenditure

**r**of different companies. In Australia, this information is available, in principle, because companies have to file tax returns declaring their revenues, and may make claims for their R&D spend under the R&D tax incentive scheme. Our goal is to model the relationship between the quantities

**r**and

**s**. According to the most simple of all possible models, we might hypothesise a linear relationship, i.e. that the amount a company spends on R&D is proportional to the amount it makes through sales of its products or services. Of course, this will not hold exactly for every company in our data set, but

**statistically speaking**it might turn out to be generally valid, and thus a useful model for the overall behaviour of companies operating within the Australian economy.

The diagram below shows some example data that I made up, and a straight-line fit that I also made up. In mathematical terms, our econometric model is r = a·s + b, where a and b are parameters that we can vary in order to find the ‘best fit’ of our simple linear model to the available data.

When we say ‘best fit’, what we usually mean is a selection of the parameters that minimises some measure of ‘error’ between the model and the actual data. A common measure of the overall error is the sum of the squares of the difference between the model values and the corresponding data values. When we do this for a simple linear model, like the trivial example above, it is commonly called ‘linear regression’ or ‘ordinary least-squares’ (OLS).

#### Assessing Model Performance

Two other concepts you will often encounter in statistical modelling (including the*Spillover*paper) are

**standard error**and

**p-value**. The

**standard error**for a particular model parameter is a statistical measure of how ‘uncertain’ the value of the parameter is as a result of the variations (or ‘dispersion’) of the available data. You can, for example, always draw a straight line through a cloud of data points, but the more spread out the cloud is, the less certain you can be that you have the ‘correct’ straight line, meaning that the standard error for your a and b values will be larger. Note that a large standard error does not necessarily mean that the model is ‘wrong’. It might be that the data is inherently ‘noisy’, and that there are factors at play that cannot be modelled as anything other than random variation. Or the model may be incomplete, i.e. there may be unknown factors at work that just look like random noise, which does not necessarily mean that the model is invalid as far as it goes.

The

**p-value**is a more difficult concept, even though many people compute it using established formulae and treat it as though it is simple and all-powerful. A p-value is supposed to be a measure of statistical significance – roughly speaking, how ‘confident’ should we feel that our model actually ‘explains’ the data. When we calculate a value using our fitted model, what we get typically differs from

**all**of the actual points in the real data. Is this due to inevitable statistical variation, or are we erroneously trying to impose a form of order that bears no relationship to reality? The p-value addresses this question, and is often (mis)interpreted as the probability that the observed variation is not ‘merely’ a statistical fluctuation in the data but is, in fact, explained by the model/parameter in question. While this gives a nice ‘meaning’ to the number, a preferable interpretation is that the p-value is a measure of how incompatible the actual data are with the model. A smaller p-value (‘less incompatible’ or, conversely, ‘more compatible’) is therefore better.

Researchers typically set a threshold p-value, below which you are supposed to accept that their model provides a sound explanation of the observed data. The threshold is often 0.05, but it may also be as high as 0.10, or less than 0.01. Sometimes it is all of the above, and asterisks are often used to distinguish between parameter values having different significance levels (e.g. ‘***’ for p<0.01, ‘**’ for p<0.05 and ‘*’ for p<0.10). You will see this convention used if you look at the tables of results in the

*Spillover*paper.

Despite their common use, you should approach p-values with caution. The p-value is

**not**simple and all-powerful. It is a difficult concept that even scientists cannot easily explain. P-values are subject to misinterpretation and abuse. This does not mean that a smaller p-value is not ‘better’ than a larger one. But the chances are that in a complex model, you do not really know what a particular p-value is actually telling you. Worse yet, it is possible that the researchers who developed the model do not know, either!

#### Interpreting Modelling Results

Returning to our trivial linear model, it looks like the simple straight-line fit might not fully explain my invented data. There is some indication that R&D spend does not continue to rise in proportion to sales indefinitely. It may be that there are other factors which lead to a suppression of R&D spending when sales are higher, and if we could add some of those other factors to the model, we may get a better fit to the data. But then we would have more than two parameters, and it would no longer be possible to illustrate the model on a two-dimensional graph. Practical models, such as the one described in the*Spillover*paper, have numerous parameters, which are fitted using many thousands of data points. These cannot be visualised graphically, and so we are forced to interpret them based upon the parameter values and statistical measures that they generate.

Suppose, for example, that our sales-based term a·s was just one element of a much more complex model (this is, in fact, the case in the

*Spillover*paper). Although there would be many other parameters affecting R&D expenditure in this model, we might still find that the parameter a has a positive value, with a relatively small standard error and a low p-value. From this we might reasonably conclude that R&D spending is positively correlated to sales, i.e. companies that earn more revenue tend to spend more on R&D. Note that this says nothing about any direction of cause and effect, and in practice the relationship could be quite complex. Higher sales obviously implies more money to spend on R&D, but then more R&D may lead to new and better products that result in higher sales – a virtuous circle!

I mention this because the conclusions in the

*Spillover*paper are mostly of this type. The authors’ goal is not to provide quantitative predictions of how much a company will spend on R&D, but rather to evaluate the factors that tend to result in increased and decreased R&D spending by Australian companies generally. Knowing these factors, and their relative importance, has obvious applications in developing innovation policy.

#### The R&D Spillover Model

The model developed in the*Spillover*paper is exponentially more complex than the simple linear model discussed above. However, the principles remain the same: the model has a number of parameters, and a fitting process is employed in order to find a set of values for the parameters that minimises an objective function representing some ‘error’ between quantities calculated using the model and the corresponding real-world data.

The authors employ tens years’ worth of sales and R&D expenditure data, between 2001 and 2011, taking into account the fact that smaller firms, in particular, tend to enter and exit (and, in some cases, re-enter) the R&D ‘system’ over time, rather than investing continuously in R&D. An underlying hypothesis in the paper is that R&D spillover effects result from interactions between employees of different companies, and that the type and effect of such interactions may be different between peer companies (i.e. those operating or competing in similar markets and technologies), suppliers, and customers/clients. Thus the number of R&D employees in each company is included in the model, along with factors representing the opportunities for interactions to occur. It is not assumed that interactions will necessarily have a positive effect on R&D expenditure – some may actually reduce the need, incentives, and/or interest for a company to invest in R&D.

Annual spending on R&D by Federal and State governments is also incorporated into the model, as is spending by academic institutions. Here, it does not seem unreasonable to suppose that R&D expenditure of private companies may be affected either positively or negatively by R&D conducted by government authorities and academia.

Finally, the authors extend the model to account for geographic location, separating the effects of spillovers between ‘local’ companies (headquartered within 10km, 25km, or 50km of one another) and ‘remote’ companies. They consider a three-zone model, including ‘local’ companies (within 10km), ‘regional’ companies (between 10km and 250km) and ‘remote’ companies.

The model assumes that there is a delay between causes and their effects on R&D spending, and that whatever interactions occur in one year affect expenditure in the following year. Much of the computational complexity in the model arises from this lag, and from dealing with companies that enter or exit the system from year-to-year, which would introduce a selection bias if not managed carefully.

In all cases, running the model to fit all of the parameters results in a set of numbers indicating whether each factor tends to have a positive influence (increase), a negative influence (decrease), or little effect on R&D spending by private companies.

#### Key Findings

The main findings from the economic modelling are as follows.- Geography plays an important role for R&D spillovers in Australia. For clients and peers, the model shows a positive effect of spillovers on firm-level R&D expenditure for companies located near each other, i.e. within 25km or 50km.
- However, the model shows that although spillover effects for suppliers are positive overall, they become negative with proximity. This seems strange, and the authors are not really able to offer a plausible explanation.
- Academic R&D expenditure has positive spillover effects on firm-level R&D expenditure.
- On the other hand, government R&D at both local and federal levels appears to have negative effects on firm-level R&D expenditure. As one possible explanation for this difference from the effects of academic R&D, the authors note that while more than 60 per cent of R&D expenditure in academia is for basic research, state and federal governments spend about 70 per cent of their total R&D expenditures on applied research. They suggest that this may imply that basic research generates more knowledge spillovers than applied research.
- The model shows a positive effect of clustering on R&D expenditure, i.e. companies spend more on R&D if they are located in proximity to other businesses operating in a similar field. The authors suggest that this may be a consequence of more intensive competition, and/or a result of collaborations within clusters.

#### Conclusion – More Evidence of Industry/Research Collaboration?

The above findings are consistent with my own observations derived from Australian patent filing data over a similar time period. When I looked specifically at the field of biotechnology, I found very strong evidence of geographic clustering of patent applicants, with the main clusters forming particularly around academic institutions and hospitals where there is active research in the field. And when I went looking for Australia’s most innovative postcodes, based on patent filings across all technologies, I again found evidence of clustering, with academic institutions figuring prominently within regions showing high levels of patent filing by commercial entities.IP Australia also recently published results of a study conducted by its Office of the Chief Economist, in which collaboration between research institutes and business was assessed using data on patent co-applications. In line with my observations on geographic clustering, the results indicated a healthy level of interaction between businesses and academic research compared to other developed nations, which is at odds with an often-cited OECD measure that places Australia in a very woeful last place among OECD countries on industry/research collaboration.

The econometric model described in the

*Spillover*paper is almost certainly incomplete and imperfect. But qualitatively, at least, it leads to findings that are consistent with other approaches to assessing the significance of geographic clustering, and the positive impact of academic research spending on R&D expenditure by private-sector companies. This study is thus another contribution to the mounting body of evidence suggesting that the OECD measure is simply wrong, and should be disregarded in any discussion of Australia’s performance on industry/research collaboration, and what should be done to improve in this area.

Tags: Australia, Economics, Public policy, Statistics

## 0 comments:

## Post a Comment