Archival bias in recorded artworks on Wikidata

02 Nov 2019 | all notes

visualArtworksPerDecade <- WikidataQueryServiceR::query_wikidata('SELECT
  ?decade (COUNT(DISTINCT ?item) as ?count)
WHERE {
  ?item wdt:P31/wdt:P279* wd:Q4502142 .
  ?item wdt:P571 ?inception .
  BIND (YEAR(?inception) AS ?year).
  BIND (10*FLOOR(?year / 10) AS ?decade).
}
GROUP BY ?decade
ORDER BY ?decade')
## 444 rows were returned by WDQS
plot(visualArtworksPerDecade, type='l', xlim=c(1000, 2010), ylim=c(0, max(visualArtworksPerDecade$count)), main='visual artworks per decade recorded in WikiData')

Assuming notable creative output per capita was stable, we should see a correlation between global human population and number of recorded artworks.

worldPopulation <- WikidataQueryServiceR::query_wikidata('SELECT
  ?year ?population
WHERE {
  wd:Q2 p:P1082 ?populationStatement .
  ?populationStatement    ps:P1082 ?population
    ;   pq:P585 ?date .
  BIND(YEAR(?date) AS ?year)             
  FILTER (?year >= 1000 && ?year <= 2010)
}
ORDER BY ?year')
## 23 rows were returned by WDQS
xlim <- c(1700, 2010)
plot(visualArtworksPerDecade, xlim=xlim, main='visual artworks per decade recorded in WikiData vs world population')
par(new = T)
plot(worldPopulation, xlim=xlim, ylim=c(0, max(worldPopulation$population)), pch=16, axes=F, xlab=NA, ylab=NA, cex=1.2)
axis(side = 4)
mtext(side = 4, line = 3, 'world population')

Hone in on target artworks (those with a title):

start <- '"1900-01-01"^^xsd:dateTime'
end <- '"2020-01-01"^^xsd:dateTime'

artworksPerYear <- WikidataQueryServiceR::query_wikidata(paste('SELECT
  ?year ?hasTitle (COUNT(DISTINCT ?item) as ?count)
WHERE {
  ?item wdt:P31/wdt:P279* wd:Q4502142 .
  ?item wdt:P571 ?inception .
  BIND(YEAR(?inception) AS ?year)             
  FILTER(?inception >= ', start, ' && ?inception < ', end, ')
  OPTIONAL { ?item wdt:P1476 ?title . }
  BIND(BOUND(?title) AS ?hasTitle)
}
GROUP BY ?year ?hasTitle
ORDER BY ?year ?hasTitle'))
## 240 rows were returned by WDQS

The spikes occurring at 10 year intervals seem to have to do with the fact that works that can only be dated to a decade are conventionally marked as created in the first year of that decade (i.e. 1900, 1910, 1920 etc). The effect holds for both titled and untitled artworks, but it stops completely after 1950 (or perhaps 1955).

counts <- reshape(artworksPerYear, direction = "wide", idvar="year", timevar = "hasTitle")
countMatrix <- t(as.matrix(counts[,3:2]))
rownames(countMatrix) <- c('titled', 'untitled')
colnames(countMatrix) <- counts$year
barplot(countMatrix, main="titled and untitled visual artworks recorded in WikiData",
  xlab="year", ylab="number of recorded works", legend=TRUE) #col=c("darkblue","red"),

#plot(titledArtworksPerYear, ylim=c(0, max(titledArtworksPerYear$count)))
titlesforyear <- function(year) {
    return(WikidataQueryServiceR::query_wikidata(paste('SELECT
  DISTINCT ?item ?itemLabel ?title ?language
WHERE {
  ?item wdt:P31/wdt:P279* wd:Q4502142 .
  ?item wdt:P1476 ?title .
  ?item wdt:P571 ?inception .
  FILTER(?inception >= "', year, '-01-01"^^xsd:dateTime && ?inception < "', year+1, '-01-01"^^xsd:dateTime)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
  }', sep=''))
)
}
#foo <- titlesforyear(2003)
#titles <- lapply(1990:2000, titlesforyear)

Relevant WikiData classes

Other artwork APIs

Comments