FreeBMD Submission Formats

This page describes the main format for submission of transcriptions to FreeBMD. However, if you are using one of the add-on packages to do transcription, for example, WinBMD or SpeedBMD, you do not need to understand all the detail on this page although if you have problems this page will be a useful reference source. For help with using one of the add-on packages please see these help instructions or use the help with the package.

If you already have entries which you need to submit to FreeBMD, you may find the alternative flat format more convenient.

Help with transcribing

This page describes the format of data to be submitted to FreeBMD. You can produce these formats using your favourite wordprocessor, spreadsheet or database. There will almost certainly be a way to "save as" or "export" to comma delimited ASCII text. Alternatively, use one of the previously mentioned add-on packages written by FreeBMD volunteers.

If you are new to transcribing we suggest that you prepare a small number of entries and try submitting them. This way, if you haven't quite achieved the correct layout, there won't be too much to alter. You will find the submission mechanism will help to some extent in identifying any problems there might be. If you can't resolve the errors, check the Transcribers' Knowledge Base or email the mailing list; see here for how to subscribe to the Admins mailing list.

Definition of the Standard Format

The standard format consists of
a header
which gives information that applies to the whole file
one or more source information lines
each of which identifies the source and date of the following data lines
data lines
which give the actual entries together with information about the context of the entries (e.g. where pages start in the index)

Example

This is an example of a submission in the standard format (note that the ellipses (...) indicate other entries omitted from the example).
+INFO,[email protected],,SEQUENCED,BIRTHS,cp850
+CREDIT,Ben Laurie,[email protected],CREDIT
+F,1837,Sep,COL-GRE,2
+PAGE,123
Forden,Henry,Shaftsbury,8,69
Forden,male,Devizes,8,243
...
Giddins,Catherine,Manchester,20,290
Giddins,Emma,Hatfield&Welwyn,6,358
Giddins,George,Oxford,16,71
Giddins,George John,Hertford,6,381
Gidd_ns,Thomas,Oundle,15,203
+PAGE,124
+S,1837,Dec,ANC-01
+PAGE,1045
Powers,Edith Frances,Colne,8,213
Powers,George William,Devizes,8,243
....
Rushton,Martha Maria,Devizes,8,249
Rushton,Mary Ann,Huntingdon,14,145
+PAGE,1046
Rushton,Naomi,St. Neot's,14,168
Rushthorpe,John,Marylebone,1,123
+BREAK

Note that there is a variation of the standard format called (for reasons lost in antiquity) the Flat Format. This format is particularly useful for files that consist of unrelated individual entries and its definition is here.

Conventions used in describing the Standard Format

In the following sections, looking at the different parts of the Standard Format, the conventions used are these:

Header Information

The Information Line

+INFO,Email,Password,Sequenced,RecordType,CharacterSet

Email
is the email address of the transcriber and it is optional (it can just be omitted).
Password
should not be used and should be omitted. (This field is a hangover from early in the project's history.)
Sequenced
is one of SEQUENCED, RANDOM or ONENAME. This information is used to assist with the correlation of transcriptions. This determines what entries have been double keyed and thereby also assists in identifying suspect entries.

SEQUENCED
Should be used to transcribe complete pages from the index. If only part of a page has been transcribed, +BREAK should be used to indicate where the index contains more entries than the transcription.
RANDOM
Should be used to transcribe entries that are not related to the location in the index page, for example where only isolated entries from a particular surname are being transcribed. UCF characters are not permitted in surnames in RANDOM files.
ONENAME
Should be used to transcribe sections of pages that relate to a single name. Normally the source specifier +B should be used, this being where the transcription is from the paper indexes. Several years/quarters can be mixed in a file by using multiple +B lines. (Note that events (Births, Deaths, Marriages) cannot be mixed in a file.) +BREAK should be used if a section of surnames is omitted (e.g. between transcriptions of Brown, John and Brown, Martin). The system puts an implicit +BREAK when the surname changes and because of this UCF characters are not permitted in surnames in ONENAME files.
Please note:
  1. Only transcriptions from index pages are allowed to be uploaded to FreeBMD
  2. The Flat File format can be used with any type of file although it is most commonly used with RANDOM
  3. See below for use of +PAGE (required particularly for SEQUENCED)
  4. For ONENAME and SEQUENCED, the entries in a sequence (between +BREAK or +PAGE) cannot all contain UCF characters.
RecordType
is one of BIRTHS, MARRIAGES, or DEATHS.
CharacterSet
is a supported character set (ISO 8859-1 assumed if omitted). FreeBMD's standard character set is ISO 8859-1, but most others are supported. In particular, if you are transcribing on a DOS or Windows machine (unless it is Windows NT - except in a DOS window under NT, of course!), it is pretty likely that you are using the character set known as "code page 850" - in which case, use cp850 in this field. Macintosh users should probably use macintosh, but this may vary according to software used. This information is used to correctly recognise accented characters. This is a complex area, so if you need to use some other character set, please contact us at Contact 
Support for advice.

Credit Line

When other people have been involved in producing the data, or the file, then this field can be used to credit them. The information put here will be available to users of the search system.

+CREDIT,Name,EMail,Comment

If there is a Credit line present then the entries will be credited to the transcriber identified by the credit line, otherwise they will be credited to the submitter.
Name
The name of the person who actually transcribed the entries
Email
The email address of the person who actually transcribed the entries
Comment
One of the following values:
CreditAnon
Don't report Name or EMail
Credit or CreditReport
Report only Name
CreditInvite
Report Name and EMail and invite research enquiries
Other
Credit line is ignored

Source Information

Entries can be gathered, for example, from microfiche, microfilm or the original index books and the source information defines the type of the source and some additional information. For each type the following information is mandatory:

Year
The year of the transcribed information.
Quarter
The quarter in which the information appears in the index, one of March, June, September and December.

Some types have optional fields as follows

Source
where the fiche/film/book was accessed - this is to allow the possibility in the future of identifying different versions of the sources, which may be useful for error correction.
TranscriptionDate
the preferred format for the date is"day monthname year" (e.g. 25 March 1960).

Fiche Info

+F,Year,Quarter,FicheRange,FicheNumber,Source,TranscriptionDate
FicheRange
the start and end letters of the fiche separated by a hyphen, e.g. LAN-MON.
FicheNumber
the number as it appears on the fiche.

Microfilm Info

+M,Year,Quarter,FilmRange,FilmNumber,Source,TranscriptionDate

Book Info

+B,Year,Quarter,Source,TranscriptionDate

Scan Info

+S,Year,Quarter,FreeBMDReference,TranscriptionDate

This source type is only for use with scans provided by the FreeBMD project.

FreeBMDReference
is allocated by us to keep track of the various scans and to ensure, when there is more than one set of scans, we can tell which one was used for the transcription.

The FreeBMDReference for a scan occurs between the month and either the range or scan file name, so for

1840/Deaths/June/UKD-01/A-C/1840D2-A-C-0010.tif 

the FreeBMDReference is UKD-01, and for

1893/Births/September/LDS-211-000-0951147/1893b3-001.tif 

the FreeBMDReference is LDS-211-000-0951147.

Typical values are:

UKD-01
GRO-B2108
LDS-211-000-0951131 

It is permitted, although not required, to include the scan filename, separated by space or / or \. Thus

LDS-211-000-0951147/1893b3-001.tif 

is a valid FreeBMDReference althought it should be noted that the filename must be the actual name of the scan file even if this differs from the +PAGE value in the transcription.

When a file is uploaded, if FreeBMDReference does not conform to these rules it will be ignored.

See the scan filename format for more information.

Unknown Source Info

+U,Year,Quarter

Data line

Each data line is transcribed using the rule "Type What You See", that is the line should be an accurate representation of what is in the index. If you think what is in the index is wrong you can add a #THEORY line but the entry itself should still be what you see.

In applying "Type What You See" you do not transcribe:

  1. Commas between fields;
  2. The rows of identical dots that separate fields in the later printed index (see below); or
  3. Full stops after Age, Volume or Page Number. These are all merely data separators, and carry no data value.
Note also:
  1. Victoria handwriting used what looks to our 21st century eyes like "fs" to represent "ss" - transcribe as "ss";
  2. Raised letters, with or without dots beneath, are typographical conventions - just transcribe the letter;
  3. The case of a letter does not affect the meaning - transcribing the case (upper or lower case) as seen is preferable but not critical.
  4. Alternatives (e.g. 1704 & 1836) or aliases (e.g. BONUS alias CHAPMAN) are normally transcribed as two records; click on the appropriate link for more details.
  5. Where a field contains a question mark (?) special rules apply

Accented characters can be used in some fields (e.g name fields). Here is the standard character set, but almost any known set can be used (see +INFO above).

Commas within fields are permitted so long as that's how they appear in the source. Put the contents of the whole field in quotes. e.g. "St. Geo., Hanover Square"

Transcribing Full Stops (Periods)

Where there is a fullstop at the end of a name that is not part of the row of fullstops that separates fields (in the later printed index) it should be transcribed.

Examples:

Smith  John.......Aston,6d   999
Smith  John J.....Aston,6d   999
These assume NO full stop and none should be transcribed

Smith  John. .....Aston,6d   999
Smith  John J. ...Aston,6d   999
These assume full stop and should be transcribed.

Start/End Of Page

This is used in a SEQUENCED dataset at the start and end of each page and is used to assist in collation of entries, linking the transcription back to its originating scan or fiche and to estimate completeness. At the start of a transcription, at the end of the transcription and at each new page enter

+PAGE,PageNumber

where PageNumber is

SourcePageNumber
Scanthe last part of the name of the scan file, which is numeric possibly followed by a letter A or B, e.g. for a scan file named 1840M2-L-0243a.gif the page would be 243a. Please note:
  • ignore text such as "rescan" in the scan file name
  • only put in a letter suffix ("a" in the above example) if it occurs in the scan file name
  • if a scan is split across several images, normally suffixed w,x,y,z, transcribe the entries from all the images as one page with a page number without the suffix
  • if the scan is of a double page each page should start with +PAGE, the second page being one greater than the first
Fiche or film with sequential page numberssequential page number
Otherstart with page number 1, increasing by 1

It is important to include +PAGE,n at the beginning/end of each page. The page number should be the number of the page that follows. If the +PAGE is at the end of the dataset (or the complete volume), it should be one greater than the last page number transcribed.

Comments

You may put in comments if you wish; there are three different types of comment each of which has a different use.

Line starts with #COMMENT
Used to indicate that what has been transcribed differs in some way from what is in the index, e.g. #COMMENT handwritten addendum says "see Mar 1887". This type of comment will be accessible from the search results.
Line starts with #THEORY
Used to indicate that what has been transcribed is what is in the index but there is reason to believe the index is wrong, e.g. #THEORY surname should probably Lane not Laine. This type of comment will be accessible from the search results.
Line starts with # (not followed by COMMENT or THEORY)
Used to give information about the transcription, e.g. # scan got very faint at this point. This type of comment will not be accessible from the search results. WinBMD and SpeedBMD use this type of comment to include information about the transcription immediately after the header with a format of #,volumeformat,username,syndicate,filename,date,flag,flag,... FreeBMD only takes note of the syndicate name from this line.

Using #COMMENT

Note that the # must be the first character on the line and the comment applies to the immediately preceding entry. For a comment that applies to the immediately preceding entry, plus entries following, to a total of N entries (including the one preceding) use the following form.

#COMMENT(N)

Using #THEORY

#THEORY (typed exactly as shown) is a special type of comment that you can use to identify a record which you think might be wrong, that is the entry is perfectly readable but you think the original source is wrong. Using #THEORY makes it easy for the record to be identified and is displayed with the information about an entry.

For example, you might be transcribing a page of the name JONES, when you come across the following:
JONE, Albert, District, 1a, 123, 
after which the name JONES continues. Following the rule of 'type what you see', you should type JONE, but then if you wish, you can insert in the row immediately following:
#THEORY Surname should be JONES.

If there is more than one record affected, say N records, use the following form:
#THEORY(N)

So, continuing our previous example (but with three erroneous Jone entries):

JONE, Albert, District, 1a, 123
#THEORY(3) Surname should be JONES.
JONE, Charles, District, 3a, 324
JONE, David, District, 11a, 642
JONES, Edward, District, 9a, 912

Note that the #THEORY(N) is put after the first record but N is the total number of records including the first.

Using #THEORY,REF

A special form of #THEORY is used where the data line contains a late entry reference that does not conform to a format recognised by FreeBMD. That it has not been recognised will be shown by a warning being given when the file is uploaded. An example of a recognised late entry reference is see M/88. If, however, it was see Septbr/95 then it would not be recognised. To indicate the correct reference the following is used:
#THEORY,REF,see S/95

The quarter should be M, J, S or D and either two or four digits can be used for the year. Uncertain Character Format cannot be used although multiple #THEORY,REF lines can be used to cover alternatives.

Data Breaks

In a SEQUENCED or ONENAME dataset, there will be breaks in the data - that is, where there are entries in the index that are not in the dataset you are submitting (where you broke off transcribing, where part of your source was missing, or whatever). These are indicated with:

+BREAK

In a RANDOM dataset, there is implicitly a +BREAK between each entry.

What should I call my Upload File

When you "Manage your files" and click on "Upload new file" there are 2 boxes at the top:

File name
this is the name used to store your file in the FreeBMD system and it must consist of alphanumeric or underscore characters (but must not contain the characters "." or "-"). Typically the name you put in here would be the same as the name of the file (see below) with without the extension, e.g. if the file is 1876B3A0041.BMD you would put 1876B3A0041 in here.
Upload
this is where you specify the file you want uploaded from your computer. The name is not stored by the FreeBMD system.

Changes in Entry Information

Over the years extra information was recorded in the indexes.

Births to June quarter 1911 and Marriages to December quarter 1911

Surname,GivenNames,District,Volume,Page

Births from September quarter 1911

Surname,GivenNames,MothersName,District,Volume,Page

Marriages from March quarter 1912

Surname,GivenNames,SpousesName,District,Volume,Page

Deaths to March Quarter 1969

Surname,GivenNames,AgeAtDeath,District,Volume,Page
Deaths to December quarter 1865 did not have an AgeAtDeath field - for those, leave it blank.

Deaths from June Quarter 1969

Surname,GivenNames,DateOfBirth,District,Volume,Page

Uncertain character format

_ (Underscore) A single uncertain character. It could be anything but is definitely one character. It can be repeated for each uncertain character.
* (Asterisk) Several adjacent uncertain characters. A single * is used when there are 1 or more adjacent uncertain characters. It is not used immediately before or after a _ or another *.
Note: If it is clear there is a space, then * * is used to represent 2 words, neither of which can be read.
[abc] A single character that could be any one of the contained characters and only those characters. There must be at least two characters between the brackets.
For example, [79] would mean either a 7 or a 9, whereas [C_] would mean a C or some other character.
{min,max} Repeat count - the preceding character occurs somewhere between min and max times. max may be omitted, meaning there is no upper limit. So _{1,} would be equivalent to *, and _{0,1} means that it is unclear if there is any character. Ensure the complete field is enclosed in quotes to avoid the comma being taken as a field separator, e.g. "williams{0,1}".
? (Question mark) Only used where it is unambiguous that there are no characters in the field, e.g a missing Volume. The question mark must be the only character in the field.
Note: If it is unclear whether the field is empty or not _{0,1} is used.

Note: Using a single * is preferable to spending a long time trying to decide the min and max values to use in the _{min,max} format, which is more precise.

Technical note: Although this UCF format has many similarities to regular expressions (e.g. Perl, Unix) it is not identical and in particular there is no escape mechanism.

FreeBMD Main Page


Search engine, layout and database Copyright © 1998-2011 The Trustees of FreeBMD (Ben Laurie, Graham Hart, Camilla von Massenbach, David Mayall and Allan Raymond), a charity registered in England and Wales, Number 1096940.
We make no warranty whatsoever as to the accuracy or completeness of the FreeBMD data.
Use of the FreeBMD website is conditional upon acceptance of the Terms and Conditions
Explore FreeBMD
Hosted by The Bunker