Loom’s goal is to give our users the flexibility to slice and dice legal data in as many different ways as possible. However, finding structure in unstructured data isn’t always easy. Below, Loom’s Data Manager, Nicole Watts, describes the process of choosing our data sources and the early challenges we faced when deciding how to structure the Loom database.
Canadians are lucky to live in a country where there is a strong, nation-wide commitment to making legal data publicly accessible through venues such as CanLII. With an abundance of unstructured Canadian legal data openly available, the first thing we had to determine when beginning work on the Loom system was our starting point.
We chose to begin with the Ontario Superior Court of Justice for a few reasons. The data set for this court is the largest in Canada, with over 60,000 published decisions available on CanLII. Working off of the theory that it’s easier to scale down than to scale up, we decided that beginning with the largest data set would allow us to create an extensive database network containing as many relationships as possible. We would then be able to use this same structure and apply it to other provinces and court levels. Starting with a smaller data set would certainly have been easier, but going that route meant that we might run into major structural issues when we tried to apply this database structure to larger and potentially more complex data sets.
Another consideration was that the Law Society of Upper Canada is the largest law society in Canada with over 50,000 members. It was clear that beginning in Ontario would allow us to provide relevant data to the largest Canadian legal community. (That said, we are aware that over 60% of Canada’s population resides outside of Ontario and are actively working to provide data for the rest of the country as well.) Within the Ontario Superior Court of Justice, we found that half of the decisions were civil decisions, while Family Law and criminal decisions accounted for a smaller proportion of the data set. Again, taking the approach of providing data to the largest population possible, we decided to begin our data analysis in the civil practice area.
We have a team of lawyers and legal analysts whose job is to review each published decision and ensure that they are all accurately categorized. We started out by asking them to track simple decision metrics such as the name of the judge who authored the decision, hearing dates, party types, counsel, case parameters, and which motion was brought. The more decisions we went through, the more relationships and trends we observed. While this was an exciting process, it was was also an uphill battle, as we would discover new items that we wanted to track only after having already reviewed several thousand decisions. We then had to go back and review the exact same decisions again in order to track this new metric.
We’ve found that we need to constantly restructure our database to account for new data and relationships as we come across them. We have discovered new relationships and patterns in Criminal and Family decisions and will be adding even more searchable fields for these practice areas to provide the most granular searches for the user. Overall, our goal is to track as many metrics as possible.
We are continually updating our data coverage, and new reports with new searchable fields will be added on a regular basis, so check back frequently to see what new searches you can perform.