https://wiki.folger.edu/collections-architecture-portal/_mw/api.php?action=feedcontributions&user=NateParsons&feedformat=atomMiranda Behind-the-Scenes Portal - User contributions [en]2024-03-29T01:04:51ZUser contributionsMediaWiki 1.39.6https://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Data_model&diff=18Data model2017-10-21T00:14:02Z<p>NateParsons: /* Content Table */</p>
<hr />
<div>The Miranda Digital Assset Management Platform is designed to support an arbitrary number of different content types, each defined via a JSON schema. Each of these content type objects is stored within a NoSQL field within a postgresql table that contains a variety of other fields that help manage the data access rules around that content and provide metadata about when that content was last imported into the system from an external source or edited within the DAP.<br />
<br />
So each record consists of:<br />
* A row within the postgresql content table<br />
* Within that row, a field that contains the JSON for the content type<br />
* Within that JSON, <br />
** a set of core metadata fields specific to that content type<br />
** a set of fields identifying connections between this record and other records<br />
** a set of fields identifying connections between this record and any binary files managed by the system<br />
<br />
=== Database Schema ===<br />
{| class="confluenceTable"<br />
!Name<br />
!Type<br />
!Description<br />
|-<br />
|id<br />
|integer<br />
|Serial identifier in the database for relational indexes. Not for use in API.<br />
|-<br />
|id_dap<br />
|uuid <guid><br />
|Universal identifier for the DAP record across services. Used to request records by ID in API. <br />
|-<br />
|date_created<br />
|datetime<br />
|On first insertion. Do not change this on updates.<br />
|-<br />
|date_updated<br />
|datetime<br />
|On first insertion set to the same exact value as date_created.<br />
|-<br />
|type<br />
|string<br />
|The record type allows well-indexed querying to differentiate by some top-level categorization of records. It is also a lookup identifier for record-specific configuration or logic.<br />
<br />
Types include: content (a content item), collection (a set of content items, grouped together to demonstrate a related collection for users), container (a means of organizing a sub-set of related items as part of a larger content record.<br />
|-<br />
|metadata<br />
|jsonb<br />
|The full schema-compliant metadata record. It should include everything needed for surfacing this discrete item in the API or pushing to the search index. (Excluding questions of related records.)<br />
|}</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Data_model&diff=17Data model2017-10-21T00:09:06Z<p>NateParsons: adding db schema stuff</p>
<hr />
<div>The Miranda Digital Assset Management Platform is designed to support an arbitrary number of different content types, each defined via a JSON schema. Each of these content type objects is stored within a NoSQL field within a postgresql table that contains a variety of other fields that help manage the data access rules around that content and provide metadata about when that content was last imported into the system from an external source or edited within the DAP.<br />
<br />
=== Database Schema ===<br />
<br />
==== Content Table ====<br />
{| class="confluenceTable"<br />
!Name<br />
!Type<br />
!Description<br />
|-<br />
|id<br />
|integer<br />
|Serial identifier in the database for relational indexes. Not for use in API.<br />
|-<br />
|id_dap<br />
|uuid <guid><br />
|Universal identifier for the DAP record across services. Used to request records by ID in API. <br />
|-<br />
|date_created<br />
|datetime<br />
|On first insertion. Do not change this on updates.<br />
|-<br />
|date_updated<br />
|datetime<br />
|On first insertion set to the same exact value as date_created.<br />
|-<br />
|type<br />
|string<br />
|The record type allows well-indexed querying to differentiate by some top-level categorization of records. It is also a lookup identifier for record-specific configuration or logic.<br />
<br />
Types include: content (a content item), collection (a set of content items, grouped together to demonstrate a related collection for users), container (a means of organizing a sub-set of related items as part of a larger content record.<br />
|-<br />
|metadata<br />
|jsonb<br />
|The full schema-compliant metadata record. It should include everything needed for surfacing this discrete item in the API or pushing to the search index. (Excluding questions of related records.)<br />
|}<br />
<br />
==== Import Management Table ====<br />
{| class="confluenceTable"<br />
!Field Name<br />
!Field Type<br />
!Description<br />
|-<br />
|id<br />
|integer<br />
|Foreign key if this is a separate table.<br />
|-<br />
|id_dap<br />
|uuid <guid><br />
|Universal identifier for the DAP record across services. Used to request records by ID in API.<br />
|-<br />
|remote_system<br />
|string<br />
|Machine name in the DAP system recognizing the system of record. This is used to lookup details on the originating system which might be used for configuring update rules, specifying an import or systems connection process or what not.<br />
|-<br />
|remote_id<br />
|string<br />
|If the remote system utilizes a GUID or ID system, that ID should be captured here<br />
|-<br />
|date_last_import<br />
|datetime<br />
|This is only incremented if the status of the import is successful. If an item is only ever updated via importer, this will always be the same as the date_changed core column. If we allow records to be overridden or switch to manual management, this becomes a marker for divergence.<br />
|-<br />
|date_last_import_attempt<br />
|datetime<br />
|This is updated any time import of a specific item is attempted.<br />
|-<br />
|status_import<br />
|integer <smallint><br />
|Status code associated with the last import. This is used for report building, while general system logging and monitoring are used for audit.<br />
|}</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Data_model&diff=16Data model2017-10-21T00:00:23Z<p>NateParsons: Created page with "The Miranda Digital Assset Management Platform is designed to support an arbitrary number of different content types, each defined via a JSON schema. Each of these content ty..."</p>
<hr />
<div>The Miranda Digital Assset Management Platform is designed to support an arbitrary number of different content types, each defined via a JSON schema. Each of these content type objects is stored within a NoSQL field within a postgresql table that contains a variety of other fields that help manage the data access rules around that content and provide metadata about when that content was last imported into the system from an external source or edited within the DAP.</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Technical_Architecture&diff=15Technical Architecture2017-10-20T23:49:36Z<p>NateParsons: /* Content Types */</p>
<hr />
<div><br />
=== Platform Goals & Desired Outcomes ===<br />
<br />
* Modular / component based design that allows for the system to evolve over time<br />
* A system designed to plug in or take advantage of community services, software, and processes.<br />
* Flexibility to add, remove, and modify a diverse set of content types<br />
* Ability to easily improve and iterate on the user interface<br />
* Provide machine access to Folger data holdings to the public<br />
* Develop a platform with low cost of entry for other institutions to adopt.<br />
<br />
== Logical Architecture ==<br />
[[File:Logical Architecture V2.png|centre|thumb|800x800px]]<br />
<br />
== Platform Components ==<br />
<br />
=== Automated Data Import & Mapping Services ===<br />
The Automated data import and mapping services are made up of three major sub-components. These are a JSON importing web interface, a flexible data validation service written in Symfony, and a generalized system for capturing and storing binary assets that leverages PostgreSQL's ability to act as both a relational database and a NoSQL / document storage repository.<br />
<br />
=== Data Import ===<br />
The original roadmap envisioned a system that could be configured to request metadata directly from external systems, process that data from it's export format into the DAP's internal data format, and validate that required fields & data structures requried for that DAP content type to pass validation. During our prototyping we investigated the ETL operations required to to import data exported from Ex Libris Voyager and the Luna Imaging server. <br />
<br />
This process helped identify the requirements and functionality needed for the general content import process pipeline, but also led us to modify the technical approach to focus on importing a more standardized JSON import format. This approach allows many more systems to have their data imported into the DAP without technical additions or developments needing to be deployed to the core DAP system, and helps establish a JSON data standard for various cultural institution content types.<br />
<br />
==== Content Types ====<br />
Different sorts of content can be configured within the DAP. Each content type is defined within the system by a JSON schema that identifies required fields, nullable fields, and field names for that content type. The system has been designed to allow for additional fields that aren't part of this schema to be included in DAP records. These additional fields will be exposed and queriable via the API but typically will not be levarged by the search client or other implementations that aren't aware of these extra fields. <br />
<br />
==== Content Types Hierarchy ====<br />
While it is possible for the DAP to be configured with unique JSON schemas for each content type, it is also possible to leverage a more generalized schema for a collection of content types. Within the DAP configuration a hierarchy of content types can be defined, with each level of the hiearchy having a default schema that can be used if a child content type does not specify a validation schema. This allows for a variety of logical content types to be specified in the system that can share field sets. For example if you wanted to store both press releases and news announcments within the DAP and in fact each of those content types had the same fields or you created a content type with a superset of fields, you could create an "article" schema and then define both News and Press Releases to be children of article. <br />
<br />
{| class="wikitable"<br />
!Default Content Type (Fallback Schema)<br />
* Articles (Article Schema)<br />
** News (No Schema defined)<br />
** Press Releases (No Schema defined)<br />
|}<br />
<br />
==== Schema Service ====<br />
To help publicize what content types have been configured within the DAP, the system includes a micro-service api that can announce what schemas are available and provide the schema that will be used to validate a particular content type. (Using the example above for instance querying the service for press releases content type would return the articles schema, querying the service for the articles content type would return the articles schema, and querying the system for an unconfigured content type such as "picture" would return the fallback schema.</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Technical_Architecture&diff=14Technical Architecture2017-10-20T23:36:23Z<p>NateParsons: /* Logical Architecture */</p>
<hr />
<div><br />
=== Platform Goals & Desired Outcomes ===<br />
<br />
* Modular / component based design that allows for the system to evolve over time<br />
* A system designed to plug in or take advantage of community services, software, and processes.<br />
* Flexibility to add, remove, and modify a diverse set of content types<br />
* Ability to easily improve and iterate on the user interface<br />
* Provide machine access to Folger data holdings to the public<br />
* Develop a platform with low cost of entry for other institutions to adopt.<br />
<br />
== Logical Architecture ==<br />
[[File:Logical Architecture V2.png|centre|thumb|800x800px]]<br />
<br />
== Platform Components ==<br />
<br />
=== Automated Data Import & Mapping Services ===<br />
The Automated data import and mapping services are made up of three major sub-components. These are a JSON importing web interface, a flexible data validation service written in Symfony, and a generalized system for capturing and storing binary assets that leverages PostgreSQL's ability to act as both a relational database and a NoSQL / document storage repository.<br />
<br />
=== Data Import ===<br />
The original roadmap envisioned a system that could be configured to request metadata directly from external systems, process that data from it's export format into the DAP's internal data format, and validate that required fields & data structures requried for that DAP content type to pass validation. During our prototyping we investigated the ETL operations required to to import data exported from Ex Libris Voyager and the Luna Imaging server. <br />
<br />
This process helped identify the requirements and functionality needed for the general content import process pipeline, but also led us to modify the technical approach to focus on importing a more standardized JSON import format. This approach allows many more systems to have their data imported into the DAP without technical additions or developments needing to be deployed to the core DAP system, and helps establish a JSON data standard for various cultural institution content types.<br />
<br />
==== Content Types ====<br />
Different sorts of content can be configured within the DAP. The system itself considers format is a JSON file that includes a few key fields that are identified by a JSON schema. Each content type configured in the DAP can either specify a specific content-type specific schema to use, or specify a schema higher up in <br />
<br />
==== Content Hierarchy ====<br />
<br />
==== Schema Service ====<br />
The DAP import format is a JSON file that includes a few key fields that are identified by a JSON schema. Each content type configured in the DAP can either specify a specific content-type specific schema to use, or specify a schema higher up in</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=File:Logical_Architecture_V2.png&diff=13File:Logical Architecture V2.png2017-10-20T22:54:49Z<p>NateParsons: </p>
<hr />
<div></div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Technical_Architecture&diff=6Technical Architecture2017-09-21T00:03:50Z<p>NateParsons: /* Logical Architecture */</p>
<hr />
<div><br />
=== Platform Goals & Desired Outcomes ===<br />
<br />
* Modular / component based design that allows for the system to evolve over time<br />
* A system designed to plug in or take advantage of community services, software, and processes.<br />
* Flexibility to add, remove, and modify a diverse set of content types<br />
* Ability to easily improve and iterate on the user interface<br />
* Provide machine access to Folger data holdings to the public<br />
* Develop a platform with low cost of entry for other institutions to adopt.<br />
<br />
== Logical Architecture ==</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Technical_Architecture&diff=5Technical Architecture2017-09-21T00:02:24Z<p>NateParsons: Created page with " === Platform Goals & Desired Outcomes === * Modular / component based design that allows for the system to evolve over time * A system designed to plug in or take advantage..."</p>
<hr />
<div><br />
<br />
=== Platform Goals & Desired Outcomes ===<br />
<br />
* Modular / component based design that allows for the system to evolve over time<br />
* A system designed to plug in or take advantage of community services, software, and processes.<br />
* Flexibility to add, remove, and modify a diverse set of content types<br />
* Ability to easily improve and iterate on the user interface<br />
* Provide machine access to Folger data holdings to the public<br />
* Develop a platform with low cost of entry for other institutions to adopt.<br />
<br />
== Logical Architecture ==</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Main_Page&diff=4Main Page2017-09-20T23:59:45Z<p>NateParsons: /* Table of contents */</p>
<hr />
<div>Welcome to the <strong>Folger's Digital Asset Platform Developer Portal!</strong><br />
<br />
This portal has been created to help technical and technical leaning folks learn how to interact with the Platform's APIs, data import processes, and learn more about how the platform itself works. Eventually this space will include directions on how to install and configure your very own copy of the Digital Asset Platform!<br />
<br />
== Table of contents ==<br />
* [[Introduction to the platform]]<br />
* [[Technical Architecture]]<br />
* [[Data model]]<br />
* [[Elasticsearch API]]<br />
* [[GraphQL API]]<br />
* [[Schema microservice]]<br />
* [[Reporting Bugs]]<br />
* [[Keep up with developments (Join our mailing list)]]<br />
* [[Development Roadmap]]</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Introduction_to_the_platform&diff=3Introduction to the platform2017-09-20T23:56:25Z<p>NateParsons: Overview of the DAP</p>
<hr />
<div><br />
== What is the Folger Digital Asset Platform? ==<br />
<br />
The Folger Digital Asset Platform (the D.A.P. or DAP) is an open source software system (licensed under the GNU General Public License v3.0) that allows for the storage of arbitrary binary files and assets and their associated metadata and inter-asset relationships. It has been designed to allow for flexible definition of the kinds of assets it stores, flexible binary file asset storage, enterprise class search, and is built from the ground up to leverage APIs. The DAP has an import component for rapidly ingesting data, an indexing service to manage what metadata is made available for public search, a GraphQL API for asset consumption, and a microservice to publicize configured asset types and their validation schemas. <br />
<br />
<br />
=== Why is it being built? ===<br />
<br />
<br />
The DAP is being built to help fill a need in the mid-sized cultural institution space that is currently not well served by existing open or closed source solutions. In particular as “born digital”/”Digital native” types of content such as databases, blog posts, mobile applications and the like become more commonplace the existing “record” management systems that libraries and cultural institutions have relied on are either too limited, too expensive, or too opinionated to fit mid-market organizations needs. <br />
<br />
<br />
==== Key business areas this impacted for the Folger included: ====<br />
<br />
* '''Enhanced remote access''' -- Not everyone can physically travel to the Folger Library in Washington, DC. Existing tools were designed to facilitate discovery of assets but not as helpful in the consumption of these assets. <br />
* '''Audience expansion''' -- The Folger currently services an audience of around 1 million people annually. However many audiences could not be easily targeted with library or institution specific data formats, data exchange standards, and other types of technical roadblocks of high learning curves. By developing a system utilizing widely adopted, well documented, and cross-sector supported standards and solutions the DAP makes it much more likely that new audiences and organizations can and will leverage Folger assets.<br />
* '''International partnerships''' -- As the Folger seeks to develop strong partnerships with similar organizations across the globe it needed a non-proprietary mechanism for connecting with organizations that had chosen different technical infrastructures for managing their own internal assets. In many ways this is the Folger’s own internal use case for audience expansion, and the partnerships will allow the Folger to develop the DAP with real world experience in sharing, connecting, and juxtaposing their assets with other organizations’ holdings.<br />
* '''Digital Acquisition and Preservation''' -- While many of the original assets the Folger collected were very well understood and the universe of types of content was very static (as it’s rare that a new kind of 1500s era manuscript is discovered) the digital world is in flux and new content types are being defined yearly, and some recent content types are no longer being developed. In addition the Folger itself is creating new kinds of digital assets out of their physical holdings such as the Internet Shakespeare Editions, Folger Digital Texts archive, and research products such as Folger’s research into female owners of early modern books.” <br />
<br />
<br />
== Major platform components ==<br />
<br />
=== Content Importer === <br />
Imports JSON structured content<br />
=== Content Validator === <br />
Validates imported JSON by matching it against an available configured schema.<br />
Allows for a hierarchy of validation schemas to be defined with fallbacks. <br />
=== Content Search === <br />
System allows individual content items to be flagged as “searchable” or not, allowing you to keep internal metadata in the system but not junk up your search results with it. <br />
=== GraphQL API === <br />
System allows individual content items to be flagged as “published” or not, allowing you to keep some data in the system as private / draft content.<br />
=== Schema Microservice === <br />
Allows developers to easily see what content types are configured in any particular installation of the DAP and to retrieve the JSON validation schema that is used to test each content type during import.<br />
=== Sample Web Client === <br />
Demonstrates how to build a web client that leverages both elastic search and our graphQL api.<br />
The client also demonstrates how various system integrations can happen in this client middleware. (For instance using 3rd party viewers to let web users browse and navigate visual assets in the client.)<br />
<br />
<br />
== What software and systems is the DAP built with? ==<br />
* PHP 7 - http://php.net/manual/en/<br />
* Symfony 3 - https://symfony.com/doc/current/index.html<br />
* GraphQL - http://graphql.org<br />
* JSON - http://www.json.org<br />
* Postgresql - https://www.postgresql.org<br />
* ElasticSearch - https://www.elastic.co/guide/index.html<br />
* Pattern Lab - http://patternlab.io</div>NateParsonshttps://wiki.folger.edu/collections-architecture-portal/_mw/index.php?title=Main_Page&diff=2Main Page2017-09-20T23:50:51Z<p>NateParsons: configuring top level nav</p>
<hr />
<div>Welcome to the <strong>Folger's Digital Asset Platform Developer Portal!</strong><br />
<br />
This portal has been created to help technical and technical leaning folks learn how to interact with the Platform's APIs, data import processes, and learn more about how the platform itself works. Eventually this space will include directions on how to install and configure your very own copy of the Digital Asset Platform!<br />
<br />
== Table of contents ==<br />
* [[Introduction to the platform]]<br />
* [[Data model]]<br />
* [[Importing content]]<br />
* [[Elasticsearch API]]<br />
* [[GraphQL API]]<br />
* [[Schema microservice]]<br />
* [[Reporting Bugs]]<br />
* [[Keep up with developments (Join our mailing list)]]<br />
* [[Development Roadmap]]<br />
<br />
* [//www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]<br />
* [//www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]<br />
* [https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce MediaWiki release mailing list]<br />
* [//www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]<br />
<br />
Consult the [//meta.wikimedia.org/wiki/Help:Contents User's Guide] for information on using the wiki software.</div>NateParsons