
I recently generated
an interactive map of Australia’s most innovative postcodes, using recent Australian patent application and maintenance data. While it is, of course, interesting to know whether one lives or works in a particularly innovative part of the country, the exercise of classifying activity by postcode is inherently artificial, in that it presupposes there is something ‘meaningful’ about the region covered by each particular postcode. There is, however, no reason to suppose that this is the case. There is nothing about the boundaries between adjacent postcodes, which are set by postal authorities for their own administrative purposes, that would necessarily lead to them being well-suited to the task of locating innovative activity.
A somewhat less arbitrary regional structure (from a socio-economic perspective) is
defined by the Australian Bureau of Statistics (ABS), in the form of a hierarchy of ‘statistical areas’ (SAs). In conjunction with its
Australian Geography of Innovative Entrepreneurship (2015) research paper, the Department of Industry, Innovation and Science produced its own
interactive Innovation Map using the
ABS SA3 definitions as the basic regional unit. Generally speaking, SA3s are regions with populations between 30,000 and 130,000 persons and reflecting regional identity in terms of geographic and socio-economic characteristics. This results in aggregation of data over wider areas than individual postcodes. In Sydney, for example, it results in the greatest number of patent filings being attributed to the Sydney Inner City SA3 region. It is not possible, at this level, to observe the particular concentration of activity occurring around Macquarie University in the North of Sydney noted in my postcode-based analysis, since this is ‘diluted’ by lower activity in other parts of the encompassing
Ryde-Hunters Hill SA3 region.

The answer you receive thus depends upon the question you ask, e.g. how many patent applicants are located within a particular postcode, or within a particular SA3 region? In both of these cases, a set of geographic areas is imposed before even commencing the analysis, and the results are constrained by this choice. In the real world, however, innovation does not begin or end at some artificial boundary set by a postal officer or statistician. So how can we analyse the distribution of patent applicants objectively and without applying predetermined geographic constraints?
One approach to this problem is a technique known as
cluster analysis, or clustering. The idea behind clustering is to apply an algorithm to automatically group elements in a data set according to a measure of similarity, such as geographic proximity. It can be regarded as a form of machine learning in which the algorithm is designed to ‘discover’ patterns in the data without explicit direction from a human operator.
In this article, I present some results of applying one of the most commonly-used clustering algorithms,
k-means, to an Australian patent application data set to analyse national and local distributions of patent applicants. This kind of analysis could be used, for example, to identify regions in which it could be most productive to invest in support for innovative industries, or to set up a business providing services to innovative companies, such as R&D tax advice or IP services.