This page describes the main format for submission of transcriptions to FreeBMD. However, if you are using one of the add-on packages to do transcription, for example, WinBMD or SpeedBMD, you do not need to understand all the detail on this page although if you have problems this page will be a useful reference source. For help with using one of the add-on packages please see these help instructions or use the help with the package.
If you already have entries which you need to submit to FreeBMD, you may find the alternative flat format more convenient.
This page describes the format of data to be submitted to FreeBMD. You can produce these formats using your favourite wordprocessor, spreadsheet or database. There will almost certainly be a way to "save as" or "export" to comma delimited ASCII text. Alternatively, use one of the previously mentioned add-on packages written by FreeBMD volunteers.
If you are new to transcribing we suggest that you prepare a small number of entries and try submitting them. This way, if you haven't quite achieved the correct layout, there won't be too much to alter. You will find the submission mechanism will help to some extent in identifying any problems there might be. If you can't resolve the errors, check the Transcribers' Knowledge Base or email the mailing list; see here for how to subscribe to the Admins mailing list.
+INFO,[email protected],,SEQUENCED,BIRTHS,cp850 +CREDIT,Ben Laurie,[email protected],CREDIT +F,1837,Sep,COL-GRE,2 +PAGE,123 Forden,Henry,Shaftsbury,8,69 Forden,male,Devizes,8,243 ... Giddins,Catherine,Manchester,20,290 Giddins,Emma,Hatfield&Welwyn,6,358 Giddins,George,Oxford,16,71 Giddins,George John,Hertford,6,381 Gidd_ns,Thomas,Oundle,15,203 +PAGE,124 +S,1837,Dec,ANC-01 +PAGE,1045 Powers,Edith Frances,Colne,8,213 Powers,George William,Devizes,8,243 .... Rushton,Martha Maria,Devizes,8,249 Rushton,Mary Ann,Huntingdon,14,145 +PAGE,1046 Rushton,Naomi,St. Neot's,14,168 Rushthorpe,John,Marylebone,1,123 +BREAK
Note that there is a variation of the standard format called (for reasons lost in antiquity) the Flat Format. This format is particularly useful for files that consist of unrelated individual entries and its definition is here.
In the following sections, looking at the different parts of the Standard Format, the conventions used are these:
When other people have been involved in producing the data, or the file, then this field can be used to credit them. The information put here will be available to users of the search system.
+CREDIT,Name,EMail,Comment
If there is a Credit line present then the entries will be credited to the transcriber identified by the credit line, otherwise they will be credited to the submitter.Entries can be gathered, for example, from microfiche, microfilm or the original index books and the source information defines the type of the source and some additional information. For each type the following information is mandatory:
Some types have optional fields as follows
This source type is only for use with scans provided by the FreeBMD project.
The FreeBMDReference for a scan occurs between the month and either the range or scan file name, so for
1840/Deaths/June/UKD-01/A-C/1840D2-A-C-0010.tif
the FreeBMDReference is UKD-01, and for
1893/Births/September/LDS-211-000-0951147/1893b3-001.tif
the FreeBMDReference is LDS-211-000-0951147.
Typical values are:
UKD-01 GRO-B2108 LDS-211-000-0951131
It is permitted, although not required, to include the scan filename, separated by space or / or \. Thus
LDS-211-000-0951147/1893b3-001.tif
is a valid FreeBMDReference althought it should be noted that the filename must be the actual name of the scan file even if this differs from the +PAGE value in the transcription.
When a file is uploaded, if FreeBMDReference does not conform to these rules it will be ignored.
See the scan filename format for more information.
In applying "Type What You See" you do not transcribe:
Accented characters can be used in some fields (e.g name fields). Here is the standard character set, but almost any known set can be used (see +INFO above).
Commas within fields are permitted so long as that's how they appear in the source. Put the contents of the whole field in quotes. e.g. "St. Geo., Hanover Square"
Examples:
Smith John.......Aston,6d 999 Smith John J.....Aston,6d 999 These assume NO full stop and none should be transcribed Smith John. .....Aston,6d 999 Smith John J. ...Aston,6d 999 These assume full stop and should be transcribed.
This is used in a SEQUENCED dataset at the start and end of each page and is used to assist in collation of entries, linking the transcription back to its originating scan or fiche and to estimate completeness. At the start of a transcription, at the end of the transcription and at each new page enter
+PAGE,PageNumber
where PageNumber isSource | PageNumber |
---|---|
Scan | the last part of the name of the scan file, which is numeric possibly followed by a letter A or B, e.g. for a scan file named 1840M2-L-0243a.gif the page would be 243a. Please note:
|
Fiche or film with sequential page numbers | sequential page number |
Other | start with page number 1, increasing by 1 |
It is important to include +PAGE,n at the beginning/end of each page. The page number should be the number of the page that follows. If the +PAGE is at the end of the dataset (or the complete volume), it should be one greater than the last page number transcribed.
You may put in comments if you wish; there are three different types of comment each of which has a different use.
Note that the # must be the first character on the line and the comment applies to the immediately preceding entry. For a comment that applies to the immediately preceding entry, plus entries following, to a total of N entries (including the one preceding) use the following form.
#COMMENT(N)JONE, Albert, District, 1a, 123,after which the name JONES continues. Following the rule of 'type what you see', you should type JONE, but then if you wish, you can insert in the row immediately following:
#THEORY Surname should be JONES.
If there is more than one record affected, say N records, use the following form:
#THEORY(N)
So, continuing our previous example (but with three erroneous Jone entries):
JONE, Albert, District, 1a, 123 #THEORY(3) Surname should be JONES. JONE, Charles, District, 3a, 324 JONE, David, District, 11a, 642 JONES, Edward, District, 9a, 912
Note that the #THEORY(N) is put after the first record but N is the total number of records including the first.
see M/88
.
If, however, it was see Septbr/95
then it would not be
recognised. To indicate the correct reference the following is used:
#THEORY,REF,see S/95
The quarter should be M, J, S or D and either two or four digits can
be used for the year. Uncertain Character Format cannot
be used although multiple #THEORY,REF
lines can be used to cover
alternatives.
In a SEQUENCED or ONENAME dataset, there will be breaks in the data - that is, where there are entries in the index that are not in the dataset you are submitting (where you broke off transcribing, where part of your source was missing, or whatever). These are indicated with:
+BREAK
In a RANDOM dataset, there is implicitly a +BREAK between each entry.
When you "Manage your files" and click on "Upload new file" there are 2 boxes at the top:
Over the years extra information was recorded in the indexes.
_ (Underscore) |
A single uncertain character. It could be anything but is definitely one character. It can be repeated for each uncertain character. |
* (Asterisk) |
Several adjacent uncertain characters. A single * is used when there are 1 or more adjacent uncertain characters. It is not used immediately before or after a _ or another * .
Note: If it is clear there is a space, then * * is used to represent 2 words, neither of which can be read.
|
[abc] |
A single character that could be any one of the contained characters and only those characters. There must be at least two characters between the brackets. For example, [79] would mean either a 7 or a 9 , whereas [C_] would mean a C or some other character.
|
{min,max} |
Repeat count - the preceding character occurs somewhere between min and max times. max may be omitted, meaning
there is no upper limit. So _{1,} would be equivalent to * , and _{0,1} means that it is unclear if there
is any character. Ensure the complete field is enclosed in quotes to avoid the comma
being taken as a field separator, e.g. "williams{0,1}".
|
? (Question mark) |
Only used where it is unambiguous that there are no characters in the field, e.g a missing Volume.
The question mark must be the only character in the field.
Note: If it is unclear whether the field is empty or not _{0,1} is used.
|
Note: Using a single *
is preferable to spending a long time trying to
decide the min
and max
values to use in the
_{min,max}
format, which is more precise.
Technical note: Although this UCF format has many similarities to regular expressions (e.g. Perl, Unix) it is not identical and in particular there is no escape mechanism.
Search engine, layout and database
Copyright © 1998-2011 The Trustees of FreeBMD (Ben Laurie, Graham Hart, Camilla von Massenbach, David Mayall and Allan Raymond), a charity registered in England and Wales, Number 1096940.
We make no warranty whatsoever as to the accuracy or completeness of the FreeBMD data. Use of the FreeBMD website is conditional upon acceptance of the Terms and Conditions |