Dear INSPIRE blog readers,

I am a HEP arXiv paper and I was recently invited to tell you about my life on INSPIRE. This should help you understand the work the INSPIRE team does and explain why, for example. it may take some time for references to show up. So – here’s what happens to me:

After I first appear on arXiv, it takes INSPIRE about 2 hours to harvest my friends and me. This usually happens at 4 a.m. CET. INSPIRE extracts my plots and indexes my metadata and fulltext which takes about 1-2 hours. When all this is done, I am visible for you users on INSPIRE. In the next step, something called “reference extractor” is run on my PDF and the references it extracts are linked via arXiv number, journal reference, report number or DOI to corresponding existing INSPIRE records where they are counted as citations.

The main curation for my data – excluding references – is still done on SPIRES, and this will probably continue for the next 2 months. So later in the day, my INSPIRE record will be overwritten by the SPIRES record which will add the BibTeX key to my metadata.

Based solely on author names, I will be assigned to likely author profiles. The next day, I will be assigned standardized keywords which will be improved by physicists in the following weeks.

Since there’s a high chance I might be revised within my first week on arXiv, there will be no human curation on my record on INSPIRE during this time. Any revised version during this period will completely overwrite my record and my references will be re-extracted. After this embargo period, my metadata will be thoroughly curated: title, author names and references are corrected. Affiliations, report numbers, collaboration and experiment names are added. If I am a conference paper, the record will be linked to the corresponding entry in the conference database. Missing or wrong references can be added or corrected by you as INSPIRE users via a web interface. Using the additional information on affiliations, co-authors and collaboration names the algorithmic matching of my author profiles will be refined. If I should be assigned to a wrong author, my authors can claim me as their own through a web interface.

After a few months as an arXiv paper on INSPIRE, I will most likely be published in a journal or conference proceedings. I will then be included in the feeds publishers give to INSPIRE, which are matched against INSPIRE records based on title and author names. Here it is important to have human intervention as my title, or even my authors, might be modified. Matching records are merged and publication note and DOI or a link to the publisher web page are added to my INSPIRE record. Citations I gain are from now on based on both my arXiv ID and publication note.

For my relatives – papers from other sources like non-hep* arXiv categories, journals, conference proceedings, thesis servers – life on INSPIRE is a little bit more complicated. First the ones relevant for HEP have to be selected; this is done semi-automatically with the aid of a script identifying core keywords in the fulltext. Then subject categories have to be assigned to them. If they are of immediate relevance to High Energy Physics, they are considered as so called “core” papers for the database and go through the same hand curation as me.

The INSPIRE team is constantly working on improving this workflow and adding new tools to make the process faster. And they are very happy to respond to the questions and comments you send to feedback@inspirehep.net.

I hope you liked this short insight into my life.

Enjoy working with INSPIRE!
Your HEP arXiv paper

BibTeX key generation was improved and consolidated and every paper on INSPIRE is now guaranteed to have a BibTeX key. BibTeX keys allow cross referencing of bibliographic information in LaTeX documents. The keys are part of the BibTeX output format and are generated as follows: <family name of first author>:<year>< 3 random letters>

For collaboration papers without a list of authors upon ingestion to INSPIRE, the TeX key will be generated based on the collaboration name. Records with existing SPIRES keys are not affected by this and those keys will remain functional.

For comments on and suggestions for our services, don’t hesitate to write us at feedback@inspirehep.net.

The large collaborations at the LHC have an unusual intermediate form of publication: the conference note.  These are significant results prepared by the collaboration for major international conferences (not to be confused with proceedings written by a conference attendee).  They are  heavily peer-reviewed within the collaboration, signed by the collaboration as a whole, and often precede submission to a journal.  Moreover, these conference notes typically provide more detail than the documents submitted for publication, which makes them particularly valuable to anyone following the research closely.

However, finding these conference notes has confounded almost everyone that has looked for them.  They are “catalogued” in a maze of wiki pages, plain HTML pages, and various categories in the CERN document server (CDS).  While CDS is based on the same underlying Invenio technology, it lacks much of the functionality that INSPIRE offers.  In particular, there has been no way to easily navigate references, track citations, or generate bibliographic information.

This situation improved dramatically when both ATLAS and CMS agreed to put these conference notes into INSPIRE.  There are already more than 800 conference notes indexed, with many more to come!

For example, you can find the ATLAS conference notes with
find r atlas-conf-*
and the CMS Physics Analysis Summaries (PAS) with
find r cms-pas-*

Now, I can easily track citations to a recent conference note on the Higgs decaying to photons; perform a full text search for the word “asymptotic“; and see which ATLAS conference notes have been cited by CERN theorist Christophe Grojean.

As an author of several of these conference notes, I am particularly excited about the ability to generate standard bibliography entries.  For example, I can easily export a .bib file for all the 2012 ATLAS conference notes.  This will be a huge time savings for the collaborations and a great example of the impact an excellent literature database can have!