If you are in charge of DSpace installations in production where the success of the items are measure by number of downloads as me, sometimes you should suffer some pressure to delivery quick answer about the number of download for a period of time or any other facet that your clients have in their mind, and you should believe me when I say that the creativity of people to ask to report is almost infinite. Well, I will not show you how to do reports, instead, I am going to show you a few Solr queries that help me to get quick answer in Solr to answer questions quickly.

The first query is simple, what is the total of downloads that we have? Out of the box DSpace only support the Solr query in the localhost, so I will need to query my Solr in the server, to do it I am using the command curl in the server console. The Solr queries actually are url that call the Solr’s REST API. Below you can see the query, the first part of it is “http://localhost:8080/solr/statistics/select”, it is the DSpace’s Solr endpoint for statistics, the second part o the url is the query itself “q=*:*&rows=0&indent=true”. In the query: the “*:*” says give me everything, the “rows=0” says that I don’t want see the lines of each one of the downloads, finally, the “indent=true” bring the results in a human friendly format.

curl 'http://localhost:8080/solr/statistics/select?q=*:*&rows=0&indent=true'

The problem with the query above is that return all statistics, such as download of documents, downloads of thumbnails and pages views. So if we what to know the downloads of files only. We can add the filter “bundleName:ORIGINAL”, in this way, we filter the downloads of the items only, not all statistic data available DSpace’s Solr.

curl 'http://localhost:8080/solr/statistics/select?q=bundleName:ORIGINAL&rown=0&indent=true'

The next step is select the downloads of only one publication, or like they say in the DSpace world, the number of downloads by item. All items in DSpace have an id, once you got the id of the item, we can add the filter “containerItem:7034”. Note the boolean operator between the two parameters “bundleName” and “containerItem”.

curl 'http://localhost:8080/solr/statistics/select?q=bundleName:ORIGINAL+AND+containerItem:7043&rown=0&indent=true'

Now that we have the number of downloads one item, it would be great if we could filter by a range of dates, for example, how many downloads from the item “7043” are from the January 1th of 2016 up to now. Each statistics entrance in DSpace’s Solr have a field called “time” where the data of the event is stored in ISO format. To perform a search by range on this field, we can use “time:[2016-01-01T00:00:000z TO NOW]”, sure that instead the word “NOW”, we can put any date in ISO format as well.

curl 'http://localhost:8080/solr/statistics/select?q=bundleName:ORIGINAL+AND+containerItem:7043+AND+time:[2016-01-01T00:00:000z TO NOW]&rown=0&indent=true'

The last detail, if you tried to execute the curl command above, you will get a error message, because it is use some characters that have to be escaped, below you see the same characters escapade already. Believe me, it is easy topic, but it is a very powerfull tool when you are in the rush to give some quick answers to your clients.

curl 'http://localhost:8080/solr/statistics/select?q=bundleName:ORIGINAL+AND+containerItem:7043+AND+time:%5B2016-01-01T00:00:000z TO NOW%5D&rown=0&indent=true'

Leave a Reply

Your email address will not be published. Required fields are marked *