The life of an arXiv paper on INSPIRE

Dear INSPIRE blog readers,

I am a HEP arXiv paper and I was recently invited to tell you about my life on INSPIRE. This should help you understand the work the INSPIRE team does and explain why, for example. it may take some time for references to show up. So – here’s what happens to me:

After I first appear on arXiv, it takes INSPIRE about 2 hours to harvest my friends and me. This usually happens at 4 a.m. CET. INSPIRE extracts my plots and indexes my metadata and fulltext which takes about 1-2 hours. When all this is done, I am visible for you users on INSPIRE. In the next step, something called “reference extractor” is run on my PDF and the references it extracts are linked via arXiv number, journal reference, report number or DOI to corresponding existing INSPIRE records where they are counted as citations.

The main curation for my data – excluding references – is still done on SPIRES, and this will probably continue for the next 2 months. So later in the day, my INSPIRE record will be overwritten by the SPIRES record which will add the BibTeX key to my metadata.

Based solely on author names, I will be assigned to likely author profiles. The next day, I will be assigned standardized keywords which will be improved by physicists in the following weeks.

Since there’s a high chance I might be revised within my first week on arXiv, there will be no human curation on my record on INSPIRE during this time. Any revised version during this period will completely overwrite my record and my references will be re-extracted. After this embargo period, my metadata will be thoroughly curated: title, author names and references are corrected. Affiliations, report numbers, collaboration and experiment names are added. If I am a conference paper, the record will be linked to the corresponding entry in the conference database. Missing or wrong references can be added or corrected by you as INSPIRE users via a web interface. Using the additional information on affiliations, co-authors and collaboration names the algorithmic matching of my author profiles will be refined. If I should be assigned to a wrong author, my authors can claim me as their own through a web interface.

After a few months as an arXiv paper on INSPIRE, I will most likely be published in a journal or conference proceedings. I will then be included in the feeds publishers give to INSPIRE, which are matched against INSPIRE records based on title and author names. Here it is important to have human intervention as my title, or even my authors, might be modified. Matching records are merged and publication note and DOI or a link to the publisher web page are added to my INSPIRE record. Citations I gain are from now on based on both my arXiv ID and publication note.

For my relatives – papers from other sources like non-hep* arXiv categories, journals, conference proceedings, thesis servers – life on INSPIRE is a little bit more complicated. First the ones relevant for HEP have to be selected; this is done semi-automatically with the aid of a script identifying core keywords in the fulltext. Then subject categories have to be assigned to them. If they are of immediate relevance to High Energy Physics, they are considered as so called “core” papers for the database and go through the same hand curation as me.

The INSPIRE team is constantly working on improving this workflow and adding new tools to make the process faster. And they are very happy to respond to the questions and comments you send to feedback@inspirehep.net.

I hope you liked this short insight into my life.

Enjoy working with INSPIRE!
Your HEP arXiv paper

The life of an arXiv paper on INSPIRE

Post Navigation