Importing in Miranda: Difference between revisions

No edit summary
 
Line 1: Line 1:
This page briefly describes the broad outline of how we import records into the Miranda beta. We will continue to refine and improve our importing process.
This page briefly describes the broad outline of how we import records into the Miranda beta. Currently, there are two import methods: small batch importing and bulk importing. In both cases we import a JSON file that contains Miranda records which were created by transforming current holdings records into Miranda records, or by creating original Miranda records.


=== Small batch imports: ===
We will continue to refine and improve our importing process.
For small numbers of Miranda records, we currently use an in-platform importer within the administrative interface. The size limit on the admin-side importer makes it unfeasible for larger imports.
===Small batch imports:===
For individual or a small number of Miranda records, typically a JSON file containing under 300 records, we can import directly into Miranda from the system's web-based administrative interface. Through this interface, we can also validate records against the Miranda schema to ensure a successful import. The size limit of this importer, however, makes it unfeasible for larger imports.
===Bulk imports:===
For larger files or a batch of multiple files, we use the bulk import process, which is run through an AWS ECS task from the AWS Console. This decouples the import process from the administrative interface, allowing the import to operate as a long-running background process. In this method, files are first uploaded to a specified S3 Bucket. The files are then read by the task and imported into Miranda. This process can be monitored through logs that written to CloudWatch while the task runs.


=== Bulk imports ===
During both import processes, Miranda read each import file and separates it into the varying records for import. It evaluates each record against the schema and either accepts it, or returns an error noting a validation issue with the record. If the record is accepted, three things then occur. The record is added to Miranda's data store, if properly flagged, it's added to Miranda's search index, and, if it references a file, such as an image, which the fileURL field, that URL will be added to a queue to be copied into Miranda's own file system.
Bulk imports are conducted directly through an AWS S3 bucket.  
===Future importing===
 
=== Future importing ===
In the next phase of the project, we plan to automate regular updates from our Voyager and other Folger systems, and to allow administrators to create Miranda records in-platform, among other tasks.
In the next phase of the project, we plan to automate regular updates from our Voyager and other Folger systems, and to allow administrators to create Miranda records in-platform, among other tasks.

Latest revision as of 14:53, 10 January 2019

This page briefly describes the broad outline of how we import records into the Miranda beta. Currently, there are two import methods: small batch importing and bulk importing. In both cases we import a JSON file that contains Miranda records which were created by transforming current holdings records into Miranda records, or by creating original Miranda records.

We will continue to refine and improve our importing process.

Small batch imports:

For individual or a small number of Miranda records, typically a JSON file containing under 300 records, we can import directly into Miranda from the system's web-based administrative interface. Through this interface, we can also validate records against the Miranda schema to ensure a successful import. The size limit of this importer, however, makes it unfeasible for larger imports.

Bulk imports:

For larger files or a batch of multiple files, we use the bulk import process, which is run through an AWS ECS task from the AWS Console. This decouples the import process from the administrative interface, allowing the import to operate as a long-running background process. In this method, files are first uploaded to a specified S3 Bucket. The files are then read by the task and imported into Miranda. This process can be monitored through logs that written to CloudWatch while the task runs.

During both import processes, Miranda read each import file and separates it into the varying records for import. It evaluates each record against the schema and either accepts it, or returns an error noting a validation issue with the record. If the record is accepted, three things then occur. The record is added to Miranda's data store, if properly flagged, it's added to Miranda's search index, and, if it references a file, such as an image, which the fileURL field, that URL will be added to a queue to be copied into Miranda's own file system.

Future importing

In the next phase of the project, we plan to automate regular updates from our Voyager and other Folger systems, and to allow administrators to create Miranda records in-platform, among other tasks.