Normally the process would be to follow the above order.
By "anomaly" we are referring to entries or groups of entries that do not correspond to the entries in the index or, infrequently, entries in the index that appear to be wrong. In the latter case the entries are left unchanged but we add information to indicate to researchers the possible correction.
The purpose of the Reports is to identify anomalies systematically but anomalies may be identified in a number of other ways, for example a researcher may report a missing entry which results in a missing page being identified. The analysis and correction tools can also be used in these cases.
While the Reports and Analysis Tools relate to the current data, the Correction Tools will normally only apply after the next update (exceptions here are Postem and Scan corrections which take place immediately).
Not all facilities are freely available, in particular the Correction Tools normally require privileged access. However, many of the Correction Tools operate statically, that is they are files of correction data, and this data can be generated by anyone and then submitted to an administrator for inclusion.
An explanation of the concepts referred to in this page is given here.
This section describes tools that can be used in the quality assurance process. These tools are not necessarily specifically or solely designed for use in this process.
Probably the most important tool is Show File which is used to display the contents a particular file. It can be invoked directly by going to Show File and then filling in the details of the file in the form provided. However, it is also often available from Reports or other Analysis Tools as a link that goes directly to the file concerned; details are included with each Report or Tool. In this case it may also take you directly to one or more lines (identified by having a grey background) that are the subject of the reference. Line numbers are included in the listing when this facility is used.
It is often important to go from a search result to the file that contains the entry, that is the provenance of the entry. Having clicked on the button next to an entry, the transcriptions that make up that entry are displayed in the Information page; if you shift+click on one of the entries will take you to provenance.pl which will display the usernames and files corresponding to the entry. Clicking on one of these will take you to the line in the file containing the entry (using Show File).
However, having done this once the next time you go to the Information page you will find that the user/filename is displayed next to entry and you can click on this to go directly to the entry in the file (using Show File).
Sometimes it is suspected that a transcription is for the wrong quarter, i.e. the year, quarter or event is wrong in the file header. However, it may be difficult to work out what it should be.
Predict from Volume and Page will go through a file and using data about page ranges (see here) work out the possible quarters that fit. The quarters are presented with the highest probability first, each one having the percentage of lines that would be correct for that quarter. Note that this can only be done when there are sufficient entries available for a quarter to gather the data about page ranges and the prediction needs to be checked against the actual source (e.g. scan) to confirm which quarter it really should be.
It should also be noted that this facility is only available for files that contain entries from a single quarter and event (i.e. RANDOM and OneName files are probably not suitable).
The Show Accession page enables the records in an accession to be displayed. The accession can be specified by Accession Number or by the Record Number of an entry in the Accession.
The Show Records in Chunk page enables the records in a chunk to be displayed. The chunk can be specified by Chunk Number or by the Record Number of an entry in the Chunk. Chunk numbers are shown in the Superchunk Report.
Actually the reports themselves can often be useful analysis tools in the sense that they display information systematically which can be useful in starting to determine the cause of an anomaly.
When doing corrections the principle of Least Impact should apply. That is, there is more than one way to effect a correction the one that makes the least impact should be used. So if a change could be effected by changing a ONENAME file to RANDOM or by removing the file, then the former should be chosen because it is the most conservative and has the least overall impact on the data in the system.
One of the most straightforward ways of implementing a correction is just to request that an entry is changed (which implies that it does not correspond to the entry in the index). On the Information page click on the link to submit a correction in the same way that any researcher could.
You can report that there is a systematic error (e.g. a computer assisted error such as all the surnames being misspelt) by noting in the Source field of the correction that the amendment applies to a number of entries (normally you attach the correction to the first entry in error and refer to "also the next n entries").
The page Make Alignments is used to initiate and manage informing the update that two non-identical entries should be considered to be identical. There are several ways that such alignments can be specified and these are explained on the Make Alignments page.
Making alignments requires authorisation.
It is possible to omit the entries in a file for a particular quarter from the update, or, indeed, the whole file. This is normally only done for OneName files, but can also be done for RANDOM files, for the following reasons:
This facility is effected through entries in a file that have the following format:
<user>/<filename> [ yyyyeq [, yyyyeq ]* ]
and can span lines if the preceding line ends with back slash (\). Example
jonesa/jones_study 1837M3,1867B1,1901M3,1888D1, \ 1867M3
Comments start with hash (#).
Submit entries for quarters to be omitted, together with justification (preferably as a comment), to the update corrections coordinator
Where two chunks are contiguous but have not been joined together into a superchunk it is possible to force them to be joined. This may be done for the following reasons:
This facility is effected through entries in a file which have the following format:
<user1>/<filename1> [ ,<page1> ] <whitespace> <user2>/<filename2> [ ,<page2> ] <whitespace> [ <datetime> ]
which will force the chunk containing the first file (and page if given) to be forced to be in a superchunk with the second file (and page if given) provided neither file has been modified after the date and time (if present). If the first page is omitted the last page in the file is assumed. If the second page is omitted the first page in the file is assumed. The following is an example
jonesa/1839M3A0024,25 jonesa/1839M3B0001 17/11/07 23:19
Comments start with hash (#).
If there are no page numbers on the +PAGE lines in a file the system assumes a starting page of 1, incrementing by 1.
Submit entries for quarters to be omitted, together with justification (preferably as a preceding comment), to the update corrections coordinator
In order to align entries the system has to put accessions in the right order, that is the order they are in the index. It does this by sorting on the names in the file; normally this is just the first name but, if that is unreadable, entries are scanned until a suitable entry can be found.
Occasionally this does not produce the correct order because the entries are out of order in the index or because of a mis-transcription. To overcome this it is possible to tell the system to use an entry other than the first in the accession.
This facility is effected through entries in a file which have the following format:
<user>/<filename> [ ,<page> ] <whitespace> <entrynumber> <whitespace> [ <datetime> ]
and if <page>
is omitted the first page in the file is
assumed. Entries in the page are numbered from 1 and so sorting will be
done on the <entrynumber> entry in the page provided the modification
timestamp of the file is before <datetime>
(if present).
The following is an example
jonesa/1839M3A0024,25 5 17/11/07 23:19
Comments start with hash (#).
If there are no page numbers on the +PAGE lines the system assumes a starting page of 1, incrementing by 1.
Submit entries for quarters to be omitted, together with justification (preferably as a preceding comment), to the update corrections coordinator
The system will attempt to align two accessions if sufficient of the entries in the accessions are identical and do not contain UCF characters1. The proportion can be adjusted but is currently set to 20%. Provided at least this proportion of the entries are identical the system will attempt to align the accessions, including using UCF comparison to determine if two entries are the same. Where UCF has been used in this way it is reported in the UCF Alignments Report (which is not normally used for Quality Assurance purposes).
Where less than this proportion is identical it is possible to instruct the system to attempt align the accessions anyway. This can be done by manually using Make Alignments to align sufficient entries but there is a simpler way with Accession Alignments. This facility is effected through entries in a file which have the following format:
<user1>/<filename1> [ ,<page1> ] <whitespace> <user2>/<filename2> [ ,<page2> ] <whitespace> [ <datetime> ]
and if <page1>
or <page2>
are omitted
the first page in the file is assumed. Entries in the two pages will be aligned
provided the modification timestamp of the file is before <datetime>
(if present). The following is an example
jonesa/1839M3A0024,25 thomas/39M30024 17/11/07 23:19
Comments start with hash (#).
If there are no page numbers on the +PAGE lines the system assumes a starting page of 1, incrementing by 1
Submit entries for quarters to be omitted, together with justification (preferably as a preceding comment), to the update corrections coordinator
Because of the frequency with which files have been incorrectly given a type of OneName or SEQUENCED (when they should be RANDOM or RANDOM/OneName respectively) there is a facility to force a file to be considered to be RANDOM or OneName irrespective of what its header says. This facility is effected through entries in a file which have the following format:
<user>/<filename>
Comments start with hash (#).
Submit entries for files to forced to be RANDOM or OneName, together with justification (preferably as a preceding comment), to the update corrections coordinator. Please keep RANDOM and OneName separate and clearly indicate which is required.
This report contains a list of files that have some aspect of the name of the file and/or the content that could mean there is an error in the file (or its name). The report is produced weekly (the date it was produced is given in the preamble).
Please note that many of these issues are also identified when a file is uploaded so they should not occur for new files. However, the checking of uploaded files as tightened over the years and some old files may still have these issues.
The following is a list of the most common issues found.
Message | Meaning | Example |
---|---|---|
Year mismatch | The year in the file disagrees with the year implied by the file name | In file 1863M30001 the +S line specifies a year of 1873, i.e. +S,1873,Sep |
Quarter mismatch | The quarter (Mar,Jun,Sep,Dec) in the file disagrees with the quarter implied by the file name | In file 1863M30001 the +S line specifies a quarter of March, i.e. +S,1873,Mar |
Event mismatch | The event (Births,Deaths,Marriages) in the file disagrees with the event implied by the file name | In file 1863M30001 the +INFO line specifies an event of Births, i.e. +INFO,,,SEQUENCED,BIRTHS |
Page number mismatch | The page number in the file disagrees with the page number implied by the file name | In file 1863M30001 the first +PAGE line specifies a page number of 10, i.e. +PAGE,10 |
Pages outside range | The file contains a high percentage of lines in which the page is outside the range expected for the district (x - y). Note that this refers to the current update - changing the file will only take effect at the next update | |
No data between +PAGE lines | There are two consecutive +PAGE lines | |
Duplicate page number | The same number appears in two +PAGE lines | |
No page number at start of file | There is no +PAGE line at the start of the data | |
Too many entries between +PAGE | There should be a +PAGE at the start of each page of the index but there are more lines between +PAGE lines than could be on a page of the index | |
Age at Death (or DOB) missing from file | The Age at Death field (or Date of Birth) is missing from all records in a file from 1st Jan 1866 onwards | |
Possible alternative name (alias/or) | A name in the file contains the characters 'alias' or 'or' indicating an alternative name that should be transcribed as two entries. | Bonus or Chapman,John,Aston,6a,312 |
Since these errors relate to the content of the file without reference to other data (e.g. similar transcriptions) the normal corrective action is to arrange for the file to be changed.
The list of suspected duplicate files gives a list of files that have been transcribed by different users but are identical. They are suspect because in many cases this results from the same file being uploaded by two different users (typically the user and their syndicate coordinator). A facility is available to exclude from this report files that are identical but have been individually keyed (a gratifyingly common occurrence); report such situations as described on the page.
A slightly more sophisticated check is done and presented at the end of the report. This is files that have versions that were previously identical and covers the case where a coordinator uploads the same file as a user but then makes some changes. Because the previous version was the same this gets reported.
Corrections are normally done by the Syndicate Coordinator.
If files are the same transcription, one of them needs to be removed. Contact the Syndicate Coordinator.
Alternatively if two transcriptions are different, then follow the instructions on page.
Alignment is the process of merging the contents of two transcriptions that refer to the same data (normally to the same page of the index). If this has been done but some entries do not match then we get Misalignments.
See here for additional information on alignments.
The Misalignments Report contains, for each quarter, a list of entries that have not been aligned but, being in a similar position on the page, perhaps should be. This putative alignment if often right but sometimes wrong.
Use Make Alignments to align the entries (requires administrative privilege). The Misalignments Report enables access to Make Alignments by control+click on an entry thus considerably simplifying the process of making alignment (see details in the report). Furthermore cells that have been aligned, or are in the process of being aligned, are coloured.
See here for an explanation of superchunking
The Superchunk Report lists all the quarters for which Superchunk information is available. Next to each quarter are two numbers in the form (n,m), where m is the number of chunks in the quarter (normally approximately the number of pages) and n is the number of superchunks. The objective is to get n down to 1.
The information about each quarter is arranged in columns:
Column | Contains |
---|---|
1 | the superchunk number |
2 | the files in the first chunk in the superchunk |
3 | the chunk number of the first chunk in the superchunk |
4 | the files in the last chunk in the superchunk |
5 | the chunk number of the last chunk in the superchunk |
The lists of files are arranged in order of SEQUENCED, OneName and then RANDOM. Within SEQUENCED the files are arranged in order of page number (although it is uncommon for a chunk to contain more than one page number when it does occur it is very useful to have them ordered).
When examining the data we are looking to understand why chunks have not been joined into one superchunk. So typically we would look at the last chunk of one superchunk and the first chunk of the next and try to determine why they have not been joined. There can be numerous reasons which are explained below.
Take care with page numbers; the page number of the file may not be the same as the page number of the chunk, for example if it is the second page of a double page scan.
Postems are linked to entries through the content of the entry. It follows therefore that if an entry changes any postem for that entry will no longer be linked to it. By far the most common cause of this that the entry has been corrected. Unfortunately, although understandably, researchers often submit a correction and put the same information in a postem so when the entry gets corrected the postem becomes unlinked. After each update there will be a new batch of unlinked postems.
When looking at the Postem listing unlinked postems are shown in red and it is possible to request a listing of only unlinked postems (check box "Unlinked only").
Unlinked postems can be
In order to do this is necessary to access the Postem listing with administrative privilege. In this mode facilities are provided to
Where entries have been transcribed from scans held by FreeBMD the system attempts to link the each search result to the scan of the page (or pages) containing the entry. However, this relies on transcribers putting the correct page number at the head of each transcription of a page and, inevitably, errors are made or the rules are not followed. As result of this the selection of the right scan is a complex process.
Where the wrong scan is selected by the system, or no appropriate scan can be found (even though scans are available), researchers are invited to leave feedback, either negative (the scan shown does not contain the entry) or positive (the scan does contain the entry). In addition researchers can attempt to find the right scan themselves and leave positive feedback when they find it.
The way the linking process works, once one entry on a page has been linked to the correct scan (for example, through positive feedback) all other entries on the same page will be correctly linked.
Facilities are available for those with administrative privilege to view the linkages and provide corrections through the Manage Scan Links page. These facilities are used to correct mistakes made in the feedback process and to provide definitive linkages that feedback cannot change.
There are two formats in which transcribers can record transcriptions:
The key difference is that the Flat file structure has the quarter defined in each line whereas for the Standard file structure the quarter is defined by special lines that apply to all subsequent lines. Irrespective of the file structure, files may only contain events of one type.
Note that there is no mandatory connection between File Type and File Structure, even though as recorded above there is normally a relationship. Hence it is possible to find SEQUENCED files in Flat file structured files.
Please see the description of system processes for information on what alignment means.
An accession is created for each set of contiguous entries in a file. For SEQUENCED files this normally corresponds to a page (although +BREAK would affect this). For OneName files it is a contiguous list of entries with the same surname (unless interrupted by +BREAK). For RANDOM files an accession is always a single entry.
It follows, therefore, that each file consists of one or more accessions.
A chunk is a set of contiguous entries displayed in search results. A fuller explanation of chunks is given here.
Where the syntax of a line is specified above, the follow conventions are used:
DD/MM/YY hh:mm
.
Search engine, layout and database
Copyright © 1998-2024 Free UK Genealogy CIO, a charity registered in England and Wales, Number 1167484.
We make no warranty whatsoever as to the accuracy or completeness of the FreeBMD data. Use of the FreeBMD website is conditional upon acceptance of the Terms and Conditions |