This page describes how the system processes work within FreeBMD. System processes are
those processes that provide the means to handle the more complex aspects of data
handling with the FreeBMD database. Some aspects of these processes are only available
to users with enhanced privilege.
System Entries
What are System Entries?
System entries are entries in the database that do not correspond to an entry
in the GRO index but which have been specifically inserted into the database to correct
an error that is presumed to exist in the GRO index. For example, where information from
a certificate indicates that the entry in the index is incorrect a system entry can be
inserted to provide the correct information. Whilst this can sometimes be handled by a
postem, where the error would make the entry unsearchable,
e.g. the spelling of a name is wrong, a system entry needs to be created.
How do System Entries work?
A system entry in the database has a flag set. In the results listing it has a special
marker to indicate that it has a special status. If the
for such an entry is clicked an explanation is given of the special status of the
entry. If a system entry has a link associated with it, pointing to the entry
that is corrected, then the information will include a reference to the corrected
entry.
Where a link exists, then the entry referred to will have a special marker in the
search results and the information displayed will contain information about the system
entry, including a clickable link that will perform a search to locate the system entry.
How are System Entries created?
System entries are those entries that exist under special user accounts that have a
type of SystemEntry. Note that the type of a user account is different from
the role of a user account. The type defines how entries under that user
account are handled, where as the role indicates the privileges that the account
has.
All entries in all files under a user account of type
SystemEntry are system entries. There is no facility to opt out.
In order to create the link mentioned above a special #THEORY line is used.
Should these special #THEORY lines occur in ordinary user accounts they
are treated as ordinary #THEORY lines. The format of these special
#THEORY lines is as follows:
#THEORY,LINK comment,year,quarter,event,record
Where
comment
is an optional comment that will be included
in the information about the system entry and the entry the link refers to
year,quarter,event
defines the year, quarter and event
of the entry being referred to - quarter and event can be alphabetic or numeric
record
is the entry being referred to - this must contain
all the fields, in the correct order, for the year, quarter and event for the entry
being referred to (which may be different from the system entry)
It is expected that files containing System Entries will be ONENAME files,
although they could be RANDOM. They must not be SEQUENCED. The use of ONENAME
files means that where there is a block of System Entries the order will be preserved.
+BREAK should be used as appropriate, i.e. if there are contiguous entries in
the file for the same surname, but these are not contiguous in the index.
It is possible to have multiple links for a single System Entry. Just list
the #THEORY,LINK lines after the System Entry.
More than one link may refer to the same entry and thus there would be more
than one System Entry that is associated with the entry. This could occur
if the correction was not certain and there were two or more possible
corrections that should be shown.
System Entry special facilities
When editing or loading entries using FileManagement
under a user account with the type SystemEntry a special facility is
available to help with the creation of the #THEORY,LINK lines as
described below.
Open the file to contain the system entries using View/Edit
Insert a +U line to define the year and quarter of the entries (the +INFO line
of the file determines the type)
Insert the entries that are to become the system entries (e.g. paste them
from another file)
If the surname of the system entries is different from the entries inserted
(a common case) entre the revised surname in the Surname box
Select the entries
Click on the Insert button
This will cause #THEORY,LINK lines to be added after each entry. If
a surname was entered in the Surname box the surname of the original
entry (not the link) will be changed to this surname.
The #THEORY,LINK lines can now be edited, for example to add a comment
or to change the year and quarter.
The following example might help to clarify:
First the +U line is inserted
Then the entries are pasted in. These entries are in the index as HOOF
but they should be HOOK so we are creating System Entries for HOOK which will
be linked to these erroneous (we assert) HOOF entries.
To create the System Entries put HOOK in the Surname box, a
comment in the Comment box and select the entries.
Clicking on the Insert button has caused System Entries to be created
consisting of the original with the surname changed to HOOK and links from thse
to the original entries.
System Entry errors
If the link (in #THEORY,LINK) does not resolve to another entry in the
database during the update, an entry is placed in a report that is produced. This
report is available here.
Assisted Alignments
What is Alignment?
Alignment is the process that takes place during an update to match entries
from different transcriptions to determine that they represent the same entry in the
index. Logically the processes puts two transcriptions side by side and attempts to get
the best correspondence. The two transcriptions are then merged to give a composite
with equal entries shown as one doubled keyed entry and differing entries shown separately
as single keyed. So in this example
most of the entries have been aligned except for the one where the given name is
spelt differently.
What is Misalignment?
Misalignment is where two entries look like they should be representing the
same entry but they are not the same. In the above example Austin, Zechariah and
Austin, Zachariah are misaligned. The update produces a
report that gives for each quarter the misalignments the system has found. Some
of these are accurate and some are not. Where there has been a simple typing error
then the report has good results but where a transcriber has re-ordered entries
it gives poor results.
What is Assisted Alignment?
So, the process of Assisted Alignment is where the system is told to align two (or
more) entries that are not the same. In the above example it would be told to treat
Austin, Zechariah and Austin, Zachariah as if they were the same; it is also told
which one is to be displayed in the search results. Clicking on the
button will show the original transcriptions with
a note concerning the alignment of the entries.
How is Assisted Alignment done?
There are two methods for creating Assisted Alignments
match two (or more) existing database entries
specify in a transcription file that an entry is to be aligned with another entry
which will cause the entries referenced to be aligned at the next update of the
database.
In order to match existing database entries go to
the alignments page
and follow the instructions there. Entries can be selected either from the
misalignments page or
from the search results.
In order to align with an entry in a file add the following after the entry
#THEORY,ALIGN,entry
where entry is the entry that the entry in the file is to be aligned
with. Both entries must be in the same quarter. The format for entry must be
the same as in the rest of the file, including the year, quarter and event
if present. The content of entry must be the same as the original transcription,
for example if the original has Roman numerals for the page then so must entry.
There are two exceptions to this; letter case does not have to be the same and any
number of spaces will match (hence Jack Jones will match with
jackjones).
would cause an entry for Zachariah to be put in the database and it
would be aligned with the Zechariah entry. The entry being inserted
(the Zachariah one in this case) is considered to be the correct one
and is the one that will appear in the search results.
UCF Alignments
We said under Misalignments above that entries will not be aligned by the system
unless they are the same. This is not exactly true. If two entries would be reported
as misalgned (Austin,Zechariah and Austin,Zachariah in the above example) but
they match according to UCF rules then they are considered to be the same. So in our
example if Austin,Zechariah had been Austin,Z_chariah the entries would have been
automatically aligned.
The system produces a report of alignments achieved
in this way.
Note that UCF Alignments will not align entries unless those around them are aligned
in the normal way.
So if Austin,Zechariah and Austin,Z_chariah had been the only entries in each of
two files they would not have been UCF aligned.
Superchunks
What is a chunk, let alone a superchunk?
Once the entries from files have been aligned the result is a chunk. In a simple
case the chunk would be slightly bigger than the biggest file (because some entries
that refer to the same entry in the index are not identical). However, if there are
several files involved the chunk could be larger, for example here we show a chunk
that is larger than any of the three constituent files:
When search results are presented there may be a change of colour which is described
in the legend as "a possible discontinuity" in the data. This discontinuity is where
data comes from different chunks because there is no guarantee that between two
adjacent chunks there is not a missing set of entries (e.g. a page of the index).
So, what's a superchunk?
Once the chunks have been produced the update tries to stitch them together into
superchunks consisting of a number of chunks. The rules for doing this are
relatively pragmatic, using such things as
No gaps between page numbers (taking into account suffices, e.g. 44a)
No gaps between filenames (e.g. 1852B20025 and 1852B20026)
Same surname at the end of one file and start of another (provided it is not a
surname that has entries over more than a file)
Why is a superchunk useful?
The objective is to get every quarter to have just a single superchunk which would
mean that there were no pages missing and no extraneous pages (e.g. pages that
belong to another quarter). The update produces a
report that shows what superchunks have been
produced so we can investigate the places where superchunks have "broken" and thus
find gaps and errors in the transcriptions. The two figures after each quarter are
the number of superchunks and the number of chunks - the objective is to get the
first to be 1!
How is the number of superchunks reduced?
Basically by looking the report, working out why the chunks have not been stitched
together and correcting the issue. Typical reasons are:
Transcription from the wrong quarter
Random transcriptions coded as Onename or Sequenced