Data Load & Data Migration

Choices and selection factors

Introduction

This document is relevant at the stages of planning, product selection, and architectural design. The primary audience are decision makers, architects, and the curious.

There are three ways to load data into ShareDo, with factors such as volume, source of data, and data set (e.g. organisation, matter, pricing, etc.) acting as factors to determine the method. Largely, the selection of the method is decided by these factors.

Method Overview
Data Load Tool This is the extract, transform and load of complex and large volume data into ShareDo as part of an ETL activity. Please refer to the data loading principles and practises for details.
API

This is the ingress of (low volume) data into an online ShareDo via a REST interface.

This is outlined here: Calling ShareDo APIs‍.

Data Table Upload This is the import of a ‘few rows’ of data into a single subset, sometimes performed by users. Please see Data Table Upload for details 

The methods are not mutually exclusive for any given client ShareDo deployment – it is viable to load data en masse from a legacy system and then to adopt an API integration to keep (aka load) data as it changes in a parallel system.

Introduction video to data load and migration in ShareDo:

View the playlist on YouTube: Platform Deep Dive - ShareDoShow.

Data Load Tool

ShareDo provides a dedicated (bulk) data onboarding framework. This framework is designed to overcome the shortcomings of a pure ETL-based approach, which is prone, with complex schemas such as ShareDo, to leave data in an inconsistent state and is not easily extendable to meet different client processing scenarios.

The data-onboarding framework provides an extensible framework to accommodate different data domains, e.g.:

  • Work Items such as Statements of Work, Tasks, Matters, Proceedings, Offers, Key Dates, and their constituent parts.
  • Documents related to cases.
  • ODS Entities, e.g. People, Organisations, Teams.

For each of these items, the framework provides:

  • A Canonical SQL schema for the import of these entities.

It is important to note that the framework does NOT provide facilities for transforming data sources to this schema, as other (ETL) tools best serve this.

 
  • But does provide a framework to implement client-specific adapters if required
  • A Configuration utility – enabling users to specify:
    • The validation rules that should be run against the information
    • The matching logic when data already exists
    • The ability to extend the match and validation logic to cover bespoke requirements
  • A Loader utility that:
    • Enables control of a data load across the steps of validation, load, and unload
    • Supports large volume (usually initial) data load and subsequent follow-on loads
    • Enables loading of all primary data sets and many secondary sets
    • Allows extension to support bespoke data load requirements
    • Provides actionable details on validation errors.
    • A set of reporting and analysis tools to assist with the onboarding of significant volumes of data

Determining Factors

The following table outlines the key design decisions concerning this data load method.

Data Sets That Can Be Handled The Dataload tool can handle large complex data structures in a single load
Skills

Good knowledge of SQL is required

Good knowledge of the source system

Data Volumes Can cope with very large data volumes – we have clients managing Terabytes of data using this method.
Impact On Live Operations Should typically not be used when large numbers of users are on the system
Effort To Configure

Medium – some assistance may be required from sharedo in the early stages

Most of the effort required will be mapping the data from the source system.

Effort To Run Easy – once the data is stages it is a simple few clicks to execute the load
Frequency One off to ‘seed’ the system or onboard large new sets of data
Primary Advantage Ability to run at scale and performantly
Primary Disadvantage Upfront build of ETL logic

API

ShareDo is constructed on an ‘API first’ basis. What this means in practice is that every piece of functionality that is available in the ShareDo UX is also available via a REST API. There are currently more than 400 API endpoints available for use that enable every piece of functionality that can be performed through the UI to also be achieved through system-to-system integration.

In addition, a subset of these APIs is also made available via a versioned public API, and it is typically these APIs that should be used when building external system integrations.

The /public/ moniker on the APIs indicates this and provides for clear separation between the 400+ internal APIs and the APIs meant for consumption by integrations. In addition, every API exposed from /public/ will be versioned and will provide for Swagger/openAPI descriptors.

These APIs are designed for easy discovery and cohesion with a general URL nomenclature of:

/api/public/{area}/{version}/{resource}/{id}/{subResource}/{id}/{subResource}/{id}…. Etc

And

/api/public/{area}/{version}/{rpc}

/api/public/{area}/{version}/{resource}/{id}/{rpc}

Determining Factors

The following table outlines the key design decisions concerning this data load method.

Data Sets That Can Be Handled Any – however if complex data sets need to be loaded then you may need to call multiple APIs.
Skills Solid development skills usually, but not necessarily required, with any development language that can call REST APIs e.g. Java, Javascript, .NET, etc.
Data volumes Ideal for small incremental real-time changes rather than large scale onboarding.
Impact on Live Operations Minimal – the REST APIs are designed to be used with the system online.
Effort to Configure Easy – the API is well documented.
Effort To Run Medium – APIs are precise in their execution and may require monitoring in the calling system to handle unplanned responses for retry, manual investigation, etc.
Frequency Near real-time.
Primary Advantage Real-time.
Primary Disadvantage Does not record historic dates on data.

Data Table Upload

The Data Table Upload feature enables users to import structured data from an Excel or CSV file into the system. This feature offers a flexible method for uploading data while ensuring both consistency and accuracy. The upload process is based on a configurable framework that supports various Templates, which define the available upload types. 

Once an upload is initiated, a run-time engine processes the data, similar to the system’s Import/Export engine. The upload is executed through a workflow. 

Determining Factors

The following table outlines the key design decisions concerning this data load method.

Data Sets That Can Be Handled Excel dataloaders are only designed to support flat data structures, i.e. a single Excel sheet / CSV
Skills Knowledge of ShareDo workflows is required
Data volumes Typically, small batches of data
Impact On Live Operations Minimal (for small dataset) – file uploads are expected to be run as part of normal operations
Effort to configure Knowledge of workflows is required to configure custom uploads
Effort To Run Easy – it's from within the ShareDo UI
Frequency Ad-hoc due to the manual nature
Primary Advantage Ability to be during the business day.
Primary Disadvantage Workflows to handle complex data are time-consuming to build.

Choosing The Right Method

While this article describes the data load options, clients should consult ShareDo to verify assumptions and gain insights on usage.

A decision flow for the key differentiators would be: