Dear INSPIRE blog readers,

I am a HEP arXiv paper and I was recently invited to tell you about my life on INSPIRE. This should help you understand the work the INSPIRE team does and explain why, for example. it may take some time for references to show up. So – here’s what happens to me:

After I first appear on arXiv, it takes INSPIRE about 2 hours to harvest my friends and me. This usually happens at 4 a.m. CET. INSPIRE extracts my plots and indexes my metadata and fulltext which takes about 1-2 hours. When all this is done, I am visible for you users on INSPIRE. In the next step, something called “reference extractor” is run on my PDF and the references it extracts are linked via arXiv number, journal reference, report number or DOI to corresponding existing INSPIRE records where they are counted as citations.

The main curation for my data – excluding references – is still done on SPIRES, and this will probably continue for the next 2 months. So later in the day, my INSPIRE record will be overwritten by the SPIRES record which will add the BibTeX key to my metadata.

Based solely on author names, I will be assigned to likely author profiles. The next day, I will be assigned standardized keywords which will be improved by physicists in the following weeks.

Since there’s a high chance I might be revised within my first week on arXiv, there will be no human curation on my record on INSPIRE during this time. Any revised version during this period will completely overwrite my record and my references will be re-extracted. After this embargo period, my metadata will be thoroughly curated: title, author names and references are corrected. Affiliations, report numbers, collaboration and experiment names are added. If I am a conference paper, the record will be linked to the corresponding entry in the conference database. Missing or wrong references can be added or corrected by you as INSPIRE users via a web interface. Using the additional information on affiliations, co-authors and collaboration names the algorithmic matching of my author profiles will be refined. If I should be assigned to a wrong author, my authors can claim me as their own through a web interface.

After a few months as an arXiv paper on INSPIRE, I will most likely be published in a journal or conference proceedings. I will then be included in the feeds publishers give to INSPIRE, which are matched against INSPIRE records based on title and author names. Here it is important to have human intervention as my title, or even my authors, might be modified. Matching records are merged and publication note and DOI or a link to the publisher web page are added to my INSPIRE record. Citations I gain are from now on based on both my arXiv ID and publication note.

For my relatives – papers from other sources like non-hep* arXiv categories, journals, conference proceedings, thesis servers – life on INSPIRE is a little bit more complicated. First the ones relevant for HEP have to be selected; this is done semi-automatically with the aid of a script identifying core keywords in the fulltext. Then subject categories have to be assigned to them. If they are of immediate relevance to High Energy Physics, they are considered as so called “core” papers for the database and go through the same hand curation as me.

The INSPIRE team is constantly working on improving this workflow and adding new tools to make the process faster.

I hope you liked this short insight into my life.

Enjoy working with INSPIRE!
Your HEP arXiv paper

The large collaborations at the LHC have an unusual intermediate form of publication: the conference note.  These are significant results prepared by the collaboration for major international conferences (not to be confused with proceedings written by a conference attendee).  They are  heavily peer-reviewed within the collaboration, signed by the collaboration as a whole, and often precede submission to a journal.  Moreover, these conference notes typically provide more detail than the documents submitted for publication, which makes them particularly valuable to anyone following the research closely.

However, finding these conference notes has confounded almost everyone that has looked for them.  They are “catalogued” in a maze of wiki pages, plain HTML pages, and various categories in the CERN document server (CDS).  While CDS is based on the same underlying Invenio technology, it lacks much of the functionality that INSPIRE offers.  In particular, there has been no way to easily navigate references, track citations, or generate bibliographic information.

This situation improved dramatically when both ATLAS and CMS agreed to put these conference notes into INSPIRE.  There are already more than 800 conference notes indexed, with many more to come!

For example, you can find the ATLAS conference notes with
find r atlas-conf-*
and the CMS Physics Analysis Summaries (PAS) with
find r cms-pas-*

Now, I can easily track citations to a recent conference note on the Higgs decaying to photons; perform a full text search for the word “asymptotic“; and see which ATLAS conference notes have been cited by CERN theorist Christophe Grojean.

As an author of several of these conference notes, I am particularly excited about the ability to generate standard bibliography entries.  For example, I can easily export a .bib file for all the 2012 ATLAS conference notes.  This will be a huge time savings for the collaborations and a great example of the impact an excellent literature database can have!

Recently we reprocessed the citations of articles in the Journal of Physics. For historical reasons, each letter series of the Journal of Physics (A through G) was treated in SPIRES, and then INSPIRE, as a separate journal. For all the other journals in INSPIRE each letter series is simply treated as a volume of a single journal (for example, Nuclear Physics, Physical Review and Physics Letters). Because special exceptions had to be made in the database for how we handled the Journal of Physics, it was difficult to guarantee that searches, citation counts and even the display of the publication note always worked correctly.  INSPIRE contains almost 13,000 J.Phys. articles with over 100,000 citations. The re-indexing was completed a while ago but during clean-up you may have noticed a temporary fluctuation in the citation counts. However, everything is fixed now and our entries are much more consistent. In the process we saw citations to J.Phys. articles rise by several thousand.

As we consolidate the move from SPIRES to INSPIRE we will continue to examine things that, though they once made sense in SPIRES, no longer need to be done the same way. One particularly important issue is the eprint number. In SPIRES, depending on where in the record it was stored, an eprint number could be written: hep-th/9711200, hep-th 9711200, hepth-9711200 or even arXiv:hep-th/9711200. Cleaning up this is sure to net some long-hidden citations!

Following our users’ request to extend the historical content of INSPIRE, we now provide full text PDFs of contributions to the International Conferences on High-Energy Accelerators (HEACC). Conference proceedings are a cornerstone of communication for the Accelerator Physics community, (as preprints are for the HEP community) and these proceedings are very valuable, as HEACC was the first important accelerator conference series. In total, 18 conferences took place – the first one in 1956 at CERN and the last one in 2001 in Tsukuba in Japan. The proceedings have been scanned and are searchable. Unfortunately, the contributions from the 11th (in Geneva in 1980) and the 15th (in Hamburg in 1992) conference cannot be made available, except for preprints we already have in INSPIRE, as the copyright for these is owned by the publishers. The 2001 contributions are only available in digital form. Unfortunately, the links to the conference homepage are broken but we will try to get in touch with the editors to make these contributions available. The conferences can be found in the conference database with the search find series HEACC. As part of each conference entry, you will find a link that leads you to a list of the contributions.

In addition to the conference proceedings, we have also scanned the Catalogues of High-Energy Accelerators published in conjunction with some of the HEACC conferences.

If you or your colleagues happen to have an electronic version of historical material that is listed on INSPIRE, we are happy to make it available.

Sometimes you only want, for example, published articles or theory papers. INSPIRE has two search terms to help with this:

1. type code, tc, lets you specify the type of paper:
b  Book,
c  Conference paper,
l   Lectures,
p  Published,
r   Review,
t   Thesis,
e.g. find t quark and tc p and tc r or find cn atlas not tc c

2. field code, fc, lets you specify what field you are interested in (based on arXiv categories but extended to non-eprints):
a   Astrophysics
b   Accelerators
c   Computing
e   Experiment-HEP
g   Gravitation and Cosmology
i    Instrumentation
l    Lattice
m  Math and Math Physics
n   Theory-Nucl
o   Other
p   Phenomenology-HEP
q   General Physics
t    Theory-HEP
x   Experiment-Nucl
e.g. find aff fermilab and fc b or find topcite 500+ and fc e

