This page describes the main format for submission of transcriptions to FreeBMD.
However, if you are using one of the add-on packages to do
transcription, for example, WinBMD or MacBMD-X, you do not need to understand all the detail
on this page although if you have problems this page will be a useful reference
source. For help with using one of the add-on packages please see
these help instructions or use the help with the package.
If you already have entries which you need to submit to FreeBMD, you may find the
alternative flat format more convenient.
Help with transcribing
This page describes the format of data to be submitted to FreeBMD. You can produce
these formats using your favourite wordprocessor, spreadsheet or database. There
will almost certainly be a way to "save as" or "export" to comma delimited ASCII
text. Alternatively, use one of the previously mentioned add-on
packages written by FreeBMD volunteers.
If you are new to transcribing we suggest that you prepare a small number of entries
and try submitting them. This way, if you haven't quite achieved the correct layout,
there won't be too much to alter. You will find the submission mechanism will help to
some extent in identifying any problems there might be. If you can't resolve the
errors, check the Transcribers' Knowledge Base or email the
mailing list; see here for how to subscribe to the
FreeBMD Advice mailing list.
Definition of the Standard Format
The standard format consists of
a header
which gives information that applies to the whole file
one or more source information lines
each of which identifies the source and date of the
following data lines
data lines
which give the actual entries together with information about the
context of the entries (e.g. where pages start in the index)
Example
This is an example of a submission in the standard format (note that the ellipses (...)
indicate other entries omitted from the example).
Note that there is a variation of the standard format called (for reasons
lost in antiquity) the Flat Format. This format is particularly useful
for files that consist of unrelated individual entries and its definition is
here.
Conventions used in describing the Standard Format
In the following sections, looking at the different parts of the Standard Format,
the conventions used are these:
Text in italics should be typed exactly as shown.
Fields written in plain or underlined text are where
your information goes.
The underlined sections must be entered, or your submission
will be rejected.
is the email address of the transcriber and it is
optional (it can just be omitted).
Password
should not be used and should be omitted.
(This field is a hangover from early in the project's history.)
Sequenced
is one of SEQUENCED,
RANDOM or ONENAME.
This information is used to assist with the correlation of transcriptions. This determines what entries have been double keyed and thereby also assists in identifying suspect entries.
SEQUENCED
Should be used to transcribe complete pages from the index. If only part of a page has been transcribed, +BREAK should be used to indicate where the index contains more entries than the transcription.
RANDOM
Should be used to transcribe entries that are not related to the location in the index page, for example where only isolated entries from a particular surname are being transcribed. UCF characters are not permitted in surnames in RANDOM files.
ONENAME
Should be used to transcribe sections of pages that relate to a single name. Normally the source specifier +B should be used, this being where the transcription is from the paper indexes. Several years/quarters can be mixed in a file by using multiple +B lines. (Note that events (Births, Deaths, Marriages) cannot be mixed in a file.) +BREAK should be used if a section of surnames is omitted (e.g. between transcriptions of Brown, John and Brown, Martin). The system puts an implicit +BREAK when the surname changes and because of this UCF characters are not permitted in surnames in ONENAME files.
Please note:
Only transcriptions from index pages are allowed to be uploaded to FreeBMD
The Flat File format can be used with any type of file although it is most commonly used with RANDOM
See below for use of +PAGE (required particularly for SEQUENCED)
For ONENAME and SEQUENCED, the entries in a sequence (between +BREAK or +PAGE) cannot all contain UCF characters.
RecordType
is one of BIRTHS,
MARRIAGES, or DEATHS.
CharacterSet
is a supported character set (ISO 8859-1 assumed if omitted).
FreeBMD's standard character set is ISO 8859-1, but most others are supported. In particular,
if you are transcribing on a DOS or Windows machine (unless it is Windows NT - except
in a DOS window under NT, of course!), it is pretty likely that you are using the character
set known as "code page 850" - in which case, use cp850 in this field.
Macintosh users should probably use macintosh, but this may vary according to
software used. This information is used to correctly recognise accented characters. This is a
complex area, so if you need to use some other character set, please contact us at for advice.
When other people have been involved in producing the data, or the file,
then this field can be used to credit them. The information put here will be
available to users of the search system.
+CREDIT,Name,EMail,Comment
If there is a Credit line present then the entries will be credited to the
transcriber identified by the credit line, otherwise they will be credited
to the submitter.
Name
The name of the person who actually transcribed the entries
Email
The email address of the person who actually transcribed
the entries
Comment
One of the following values:
CreditAnon
Don't report Name or EMail
Credit or CreditReport
Report only Name
CreditInvite
Report Name and EMail and invite research enquiries
Other
Credit line is ignored
Source Information
Entries can be gathered, for example, from microfiche, microfilm or the original
index books and the source information defines the type of the source and some
additional information. For each type the following information is mandatory:
Year
The year of the transcribed information.
Quarter
The quarter in which the information appears in the index,
one of March, June, September
and December. However this field is empty if the year is from 1984 onwards
(when index pages are for the whole year).
Some types have optional fields as follows
Source
Where the fiche/film/book was accessed - this is to allow
the possibility in the future of identifying different versions of the sources, which
may be useful for error correction.
TranscriptionDate
The date the transcription was (most recently) created. Any format accepted for network time can be used but dd-mmm-yyyy (e.g. 07-Jun-2022) is recommended.
This source type is only for use with scans provided by the
FreeBMD project.
FreeBMDReference
is allocated by us to keep track of
the various scans and to ensure, when there is more than one set of scans,
we can tell which one was used for the transcription.
The FreeBMDReference
for a scan occurs between the month and either the range or scan file name, so for
It is permitted, although not required, to include the scan filename,
separated by space or / or \. Thus
LDS-211-000-0951147/1893b3-001.tif
is a valid FreeBMDReference
althought it should be noted that the filename must be the actual name of
the scan file even if this differs
from the +PAGE value in the transcription.
When a file is uploaded, if FreeBMDReference does not conform
to these rules it will be ignored.
Each data line is transcribed using the rule "Type What You See", that is the
line should be an accurate representation of what is in the index. If you think
what is in the index is wrong you can add a #THEORY line
but the entry itself should still be what you see.
In applying "Type What You See" you do not transcribe:
Commas between fields;
The rows of identical dots that separate fields in the later printed index
(see below); or
Full stops after Age, Volume or Page Number.
These are all merely data separators, and carry no data value.
Note also:
Victoria handwriting used what looks to our 21st century eyes like "fs"
to represent "ss" - transcribe as "ss";
Raised letters, with or without dots beneath, are typographical conventions -
just transcribe the letter;
Apart from volume numbers in Roman numerals (which must be in upper case), the case of a letter does not affect the meaning - transcribing the case (upper
or lower case) as seen is preferable but not critical.
Alternatives (e.g. 1704 & 1836) or aliases (e.g. BONUS alias CHAPMAN) are normally transcribed as
two records; click on the appropriate link for more details.
Where a field contains a question mark (?) special rules apply
Where surnames are only shown when they change, consider the surname to be at the start of
every line following until it changes.
Accented characters can be used in some fields (e.g name fields). Here
is the standard character set, but
almost any known set can be used (see +INFO above).
Commas within fields are permitted so long as that's how they appear in
the source. Put the contents of the whole field in quotes. e.g. "St. Geo., Hanover Square"
If you are transcribing using transcription software, like WinBMD, you do not
need to be concerned about which fields are required because the software will
show the fields relevant to the event, year and quarter you are transcribing.
The one exception to this is Deaths entries before 1866 which do not have an
Age at Death field in the GRO index. However, for historical reasons, there is
still an Age at Death field in FreeBMD transcriptions before 1866 but it the
Age at Death is transcribed as a blank field.
Over the years different fields have been present in the index and a list of
the differences can be found here.
Where there is a fullstop at the end of a name that is not part of the row of
fullstops that separates fields (in the later printed index) it should be transcribed.
Examples:
Smith John.......Aston,6d 999
Smith John J.....Aston,6d 999
These assume NO full stop and none should be transcribed
Smith John. .....Aston,6d 999
Smith John J. ...Aston,6d 999
These assume full stop and should be transcribed.
Start/End Of Page
This is used in a SEQUENCED dataset at the start and end of each
page and is used to assist in collation of entries, linking the transcription back
to its originating scan or fiche and to estimate completeness. At the start of a
transcription, at the end of the transcription and at each new page within the transcription (if any) enter
+PAGE,PageNumber
where PageNumber is
Source
PageNumber
Scan
the last part of the name of the scan file, which is numeric possibly followed by a letter A or B, e.g. for a scan file named 1840M2-L-0243a.gif the page would be 243a. Please note:
ignore text such as "rescan" in the scan file name
only put in a letter suffix ("a" in the above example) if it occurs in the scan file name
if a scan is split across several images, normally suffixed w,x,y,z, transcribe the entries from all the images as one page with a page number without the suffix
if the scan is of a double page each page should start with +PAGE, the second page being one greater than the first
Fiche or film with sequential page numbers
sequential page number
Other
start with page number 1, increasing by 1
The page number should be the page number of the data that follows the +PAGE. If the +PAGE is at the end of the transcription it should be one greater than the last page number transcribed.
Comments
You may put in comments if you wish; there are three different types of comment
each of which has a different use.
Line starts with #COMMENT
Used to indicate that what has been transcribed differs in some way from what
is in the index, e.g. #COMMENT handwritten addendum says "see Mar 1887". This type of
comment will be accessible from the search results.
Line starts with #THEORY
Used to indicate that what has been transcribed is what is in the index
but there is reason to believe the index is wrong, e.g. #THEORY surname should probably
Lane not Laine due to name sequence". The reason for the assertion should be given.
However, #THEORY should not be used to record possible errors in the district,
volume or page number; such errors are handled automatically by the system.
This type of comment will be accessible from the search results.
Line starts with # (not followed by COMMENT or
THEORY)
Used to give information about the transcription, e.g.
# scan got very faint at
this point. This type of comment will not be accessible from the search results.
Transcription software uses this type of comment to include information about
the transcription immediately after the header with a format of
#,volumeformat,username,syndicatename,filename,date,flag,flag,flag,flag,flag,softwareid
FreeBMD only takes note of the syndicatename and softwareid from this line. The syndicatename is the name of the syndicate for which the transcriptons was done. The softwareid gives the identity of the transcription software as the software name (optional, for historical reasons WinBMD is assumed if omitted) and software version number (dot separated version number, e.g. 7.3.1, with at least two components). The name and version can be separated by space or dot, or nothing if the name ends in a letter. The name is case insensitive.
There is more information about how to write appropriate comments on
the Comments Help page.
Using #COMMENT
Note that the # must be the first character on the line and the
comment applies to the immediately preceding entry. For a comment that
applies to the immediately preceding entry, plus entries following, to a total of
N entries (including the one preceding) use the following form.
#COMMENT(N)
Using #THEORY
#THEORY (typed exactly as shown) is a special type of comment that you can use to
identify a record which you think might be wrong, that is the entry is perfectly
readable but you think the original source is wrong. Using #THEORY makes it easy
for the record to be identified and is displayed with the information about an
entry.
For example, you might be transcribing a page of the name JONES, when you come across the following:
JONE, Albert, District, 1a, 123,
after which the name JONES continues.
Following the rule of 'type what you see', you should type JONE, but then if you wish, you can insert in the row immediately following:
#THEORY Surname should be JONES.
If there is more than one record affected, say N records, use the following form: #THEORY(N)
So, continuing our previous example (but with three erroneous Jone entries):
Note that the #THEORY(N) is put after the first record but N is the total number of records including the first.
Using #THEORY,REF
A special form of #THEORY is used where the data line contains a
late entry reference that does not
conform to a format recognised by FreeBMD. That it has not been recognised
will be shown by a warning being given when the file is uploaded. An
example of a recognised late entry reference is see M/88.
If, however, it was see Septbr/95 then it would not be
recognised. To indicate the correct reference a line with exactly the
following format is added after the entry :
#THEORY,REF,see S/95
where the quarter must be M, J, S or D and the year must be either two or four digits.
Multiple lines can be covered with #THEORY(N).
Uncertain Character Format cannot be used although multiple
#THEORY,REF lines can be used to cover alternatives.
Data Breaks
In a SEQUENCED or ONENAME dataset, there
may be breaks in the data - that is, where there are entries in the index that
are not in the dataset you are submitting (where you broke off transcribing,
where part of your source was missing, or whatever). These are indicated with:
+BREAK
In a RANDOM dataset, there is implicitly a
+BREAK between each entry.
What should I call my Upload File
If you are using transcription software (such as WinBMD, MacBMD, etc.) the format
of the filename will be taken care of for you and the filename will comply with
the requirements of this section and you need read no further.
The upload system will accept files with names in any format, however it is good
practice, and helps those managing the system, if filenames comply with the
standards of this section. The standards below apply to SEQUENCED files (i.e.
transcriptions of pages of the index). ONENAME and RANDOM files, which normally
contain one name study data, can be named as appropriate (e.g. the surname of the
study or the name of the researcher).
Standard format for SEQUENCED files
The standard format for the filename of a SEQUENCED transcription file has the
following layout
<year><event><quarter><letter><page>
where
<year>
is the four digit year
<event>
is the event, one of B, M or D
<quarter>
is the quarter, one of 1, 2, 3 or 4 (omitted for 1984 onward)
<letter>
is the first letter of the first surname in the transcription in upper case, however if the first surname is omitted (which is transcribed as "?") or is "Unknown", use a first letter of "UNK" (although, due to software limitations, "U" is also acceptable)
<page>
is the page number of the scan from which the transcription was done
(prefixed with zeros to make four digits), including any letter suffix
Examples 1837M3P2034 1912B10034A 1988BJ0312
Legacy format for SEQUENCED files
This format is based on the DOS file format that was limited to 8 characters -
although deprecated this format is still allowed. The format for the filename
of a SEQUENCED transcription file is
<shortyear><event><quarter><page>
where
<shortyear>
is the last two digits of the year
<event>
is the event, one of B, M or D
<quarter>
is the quarter, one of 1, 2, 3 or 4 (omitted for 1984 onward)
<page>
is the page number of the scan from which the transcription was done
including any letter suffix (prefixed with zeros to make four characters)
Examples 36M32034 12B1034A
Uploading using File Management
When you use
"Manage your files" and click on "Upload new file" there are 2 boxes at
the top:
File name
this is the name used to store your file in the FreeBMD system and, for
SEQUENCED files, should conform to the rules above. Typically the name you
put in here would be the same as the name of the file (see below) but
without the extension, e.g. if the file is 1876B3A0041.BMD you would put
1876B3A0041 in here.
Upload
this is where you specify the file you want uploaded from your computer,
for example 1876B3A0041.BMD in some folder on your system. The name is
not stored by the FreeBMD system.
Changes in Entry Information
Over the years extra information was recorded in the indexes.
Births to June quarter 1911 and Marriages to December quarter 1911
A single uncertain character. It could be anything but is definitely one character. It can be repeated for each uncertain character.
* (Asterisk)
Several adjacent uncertain characters. A single * is used when there are 1 or more adjacent uncertain characters. It is not used immediately before or after a _ or another *.
Note: If it is clear there is a space, then * * is used to represent 2 words, neither of which can be read.
[abc]
A single character that could be any one of the contained characters and only those characters. There must be at least two characters between the brackets. For example, [79] would mean either a 7 or a 9, whereas [C_] would mean a C or some other character.
{min,max}
Repeat count - the preceding character occurs somewhere between min and max times. max may be omitted, meaning
there is no upper limit. So _{1,} would be equivalent to *, and _{0,1} means that it is unclear if there
is any character. Ensure the complete field is enclosed in quotes to avoid the comma
being taken as a field separator, e.g. "williams{0,1}".
? (Question mark)
Only used where it is unambiguous that there are no characters in the field, e.g a missing Volume.
The question mark must be the only character in the field.
Note: If it is unclear whether the field is empty or not _{0,1} is used.
Note: Using a single * is preferable to spending a long time trying to
decide the min and max values to use in the
_{min,max} format, which is more precise.
Technical
note: Although this UCF format has many similarities to regular expressions (e.g.
Perl, Unix) it is not identical and in particular there is no escape mechanism.