- OFFICE ADDRESS
- 10037 – 20 Silver Fox Ave. New Minas, Nova Scotia B4N 5K1 CANADA
Customer Support
Published on: 21 September 2023
New Charting Companion “SQL query” feature
Charting Companion 8.2 introduces the SQL query feature, which enables you to extract custom or bespoke information from your genealogy database.
“SQL” stands for Structured Query Language. It is a set of general-purpose commands to change or retrieve information from any relational database. CC uses SQLite.
The SQL query operates on GEDCOM files, because each genealogy program’s native database design (called a schema) is different.
Charting Companion database schema
The best way to explain SQL is with some simple examples. First, you need to be familiar with Charting Companion’s schema. It is documented in detail at:
https://progenygenealogy.com/blogs/charting-companion-database-schema/
Your genealogy data is stored in three main tables: PERSONS, PLACES, EVENTS. There are additional tables IMAGES, NOTES, CITATIONS, AUTHORS.
Each table contains fields or columns. Columns can be numeric or text. The PERSONS table includes :
Examples
Keywords
So, to display a specific person’s information, click on the menu “Find ➔ SQL query”.
In the dialog, enter this command in the “SQL Query” box, then click “Execute”:
select PIN, NAME from PERSONS where PIN = 1
Charting Companion returns the PIN & NAME of the person with PIN #1.
This is not terribly useful, but it does introduce some SQL concepts:
Conditions
To list all men, enter:
select NAME from PERSONS where SEX = 1
To list all persons whose name contains “JONES”, enter:
select PIN, NAME from PERSONS where NAME like "%JONES%"
‘like’ means ‘contains’. The percent “%” symbol is a wildcard. It stands for any number of characters preceding or following the string JONES.
Put the two conditions together and find all men named Jones:
select PIN, NAME from PERSONS
where NAME like "%JONES%"
and SEX = 1
You can break up a command over multiple lines for clarity.
To search for men named “SMITH” or “SMYTH” or “SMYTHE”:
select PIN, NAME from PERSONS
where (NAME like "%SMITH%" or NAME like "%SMYTH%" or NAME like "%SMYTHE%")
and SEX = 1
Note the parentheses around the names, so they are all combined with the SEX condition. The order of precedence of ‘and/or’ is like multiplication/addition in arithmetic.
To specify in which order you want the results sorted, use ‘order’:
select PIN, NAME from PERSONS
where SEX = 1
order by NAME
The EVENTS table
To display all births between 1940 and 1950, enter:
select PIN, EVENTS.FROM_DATE
from EVENTS
where EVENTS.EV_TYPE = 1 and EVENTS.FROM_DATE >= 19400000 and EVENTS.FROM_DATE <= 19501231
Event type ‘1’ is a birth. The event types are documented in the link at the beginning of this article.
‘>=’ means “greater than or equal to”.
Dates are stored as ‘YYYYMMDD’, where the MM and DD can be zero if unknown.
That’s OK, but it would be nice to show the name as well. This is where things get interesting: we can combine two or more tables. To pull in the names, say:
select PIN, EVENTS.FROM_DATE,
(select PERSONS.NAME from PERSONS where PIN = EVENTS.PIN) as NAME
from EVENTS
where EVENTS.EV_TYPE = 1 and EVENTS.FROM_DATE >= 19400000 and EVENTS.FROM_DATE <= 19500000
Notice the second line, enclosed in parentheses. We are introducing a sub-clause where we are we are tying the PERSONS table, which contains the names, to the EVENTS table, which only has PINs. This is called a “normalized” database, where data is stored in only one place (a table column) to avoid redundancy and save space.
The ‘as’ keyword is called an alias. It assigns a shorter name to the column. Calling it “NAME” tells CC to reformat the name by stripping the delimiters.
The PLACES table
Now we’re getting serious. To find all births in Sydney, we combine PERSONS, EVENTS and PLACES:
select PIN,
(select PERSONS.NAME from PERSONS where PIN = EVENTS.PIN) as NAME,
(select PLACES.PLACE_NAME from PLACES where PLACE_NO = EVENTS.PLACE_NOB and PLACE_NO > 0) as Place_Name
from EVENTS
where EVENTS.EV_TYPE = 1 and Place_Name like "%Sydney%"
The third line ties EVENTS and PLACES. It specifies that the Place number in the EVENTS table (EVENTS.PLACE_NOB) is used to look up its corresponding PLACES record (PLACE_NO ) and its name in the PLACES table (PLACES.PLACE_NAME).
We assign an alias to the Place name so we can parse it for “Sydney” in the last line.
‘PLACE_NO > 0’ excludes events with no place specified.
Saving your results
Click “Save” to save the results to a comma-separated text file (.CSV), or a genuine Excel spreadsheet (.XLSX).
Click “Clipboard” to copy the results to your computer’s clipboard. You can paste them into a document or other spreadsheet.
Click on a column header to sort the results. When saving, the results will be written in the original order. Use the ‘order’ keyword if you want to save the results sorted your way.
Dates
To isolate the year, month or day, use:
Historical notes
The Charting Companion schema will strike the more experienced reader as somewhat spartan or terse. For example, the PLACES table serves multiple roles, also storing Source, Address, Repository, Publisher, Title, User-defined event descriptions and text dates. This complicates retrieval. There is a reason for this.
Back in the 90’s, CC used the xBase format, where each table was a separate file. Early Windows limited the number of file handles that a program could open, so we couldn’t get crazy with too many tables.
Between 1999 and 2010, Progeny published the Pedigree Resource file for the LDS Church. PRF was a collection of CDs and DVDs made from the GEDCOM files submitted to the Church by genealogists all over the World.
We squeezed 1M names on a CD, and 5M on a DVD. Each extra byte added to a Person record would cost us 1M of disk space. So we had to be frugal in the file design. The “Name” field is divided into six parts. Rather than squander separate fixed-length fields for each of six name parts, we combine them all into one field, delimited by binary codes. If a Person Name overflows, the remainder is stored in additional Person records. Similar for long Place Names.
We stored some values in binary format, rather than all-character as prescribed by xBase, to reduce space.
Fast GEDCOM
The new GEDCOM import, ten times faster than the original, takes advantage of technologies not available when PAF*Mate and GED*Mate, the grand-daddies of Charting Companion, were developed in 1995.
Currently, the Fast GEDCOM only uses the PERSON, PLACES, EVENTS and IMAGES tables. If you want to do queries on SOURCES, NOTES or CITATIONS, un-check the “Fast GEDCOM” option when opening a file.
Don’t worry about clobbering the database, it is re-created every time you import a newer copy of the .GED file. To force a re-build, hold down the Shift key when you click on the “Open” button.
We do plan on making the Fast GEDCOM import everything soon.
How to use
Don’t worry if you don’t follow the detailed explanations. Just copy & paste the examples, and modify them to suit your needs: names, dates, place names, etc.
If you have a query and you can’t figure out how to do it, contact VicGUM, or the Charting Companion User Group on Facebook, or Progeny, an expert might be able to conjure a spell for you :o)
Remember: with great power comes great responsibility…
Resources
For more information on SQL, see:
https://www.w3schools.com/sql/
https://www.tutorialspoint.com/sql/index.htm
https://mode.com/sql-tutorial/