GraphQL Research Notes
Overview
GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data.
- A GraphQL server provides a client with a predefined schema written down in Schema Definition Language (SDL).
- The schema defines the queries that can be made.
- The SDL consists of
Type
that describe object types that can be queried on that server and the fields they have. - It can return many resources in a single request which makes it faster than REST API.
An example of a type Project
written using SDL is shown below:
type Project {
name: String!
id: Int!
}
The following shows an example of a query using GraphQL which will return all the Project names and ID.
Input:
{
Project
name
id
}
}
Output:
{
"data": {
"Project": {
"name": "ORCA",
"id": 1
}
}
}
Pros and cons
Pros:
- Able to get many resources in a single request compared to REST API.
- Able to only fetch the needed information in a single request instead of fetching all the data.
- There is no need to validate the data format, as GraphQL will do the validation. Developers need only to write resolvers – how the data will be received.
- A developer can view the available schemas before making the request.
- There is only one version of GraphQL API thus allowing more maintainable and cleaner code.
- Shows detailed error message including all the resolvers and referring to the exact query part during error in query. This is useful during debugging.
- Centralizes our DB code, making it easier to switch or update DB libraries.
Cons:
- Performance issues with complex queries- it could face performance issues if a client asks for too many nested fields at once.
- Not recommended for small applications. Moreover, the learning curve is higher compared to other methods.
- File uploading is a bit complex.
GraphQL Servers
There are numerous servers for GraphQL that support different programming languages. A list of all the servers can be seen here.
Prebuilt GraphQL servers
AWS AppSync
- AWS AppSync is a fully managed service that develops GraphQL APIs by handling the heavy lifting of securely connecting to data sources like AWS DynamoDB, Lambda, and more.
- automatically scales GraphQL API execution engine up and down to meet API request volumes.
- Pricing is $4.00 per million Query and Data Modification Operations and $0.08 per million minutes of connection to the AWS AppSync service. Details of pricing can be found here.
- Details on deploying AppSync GraphQL API using terraform can be found below.
As of 07/18/2021, AWS AppSync is currently yet not approved in NGAP AWS account by NASA. However, it could be a good future approach when approved.
Hasura
- Hasura is an open source service that can create APIs without having to build, operate & scale a GraphQL server.
- Supports GraphQL on Postgres, AWS Aurora, Microsoft SQL server.
- Can be run in cloud (fastest way) or using docker locally.
- comes with its own authentication and authorization features. To prevent GraphQL endpoint from being publicly accessible, developers have to create an admin secret.
- Pricing- Fully managed cloud service is $99/month/project and supports upto 20GB data with $2/additional GB data.
- written in Haskell programming language.
PostGraphile
- Postgraphile is similar to Hasura and can create GraphQL API from a PostgreSQL schema faster.
- Most operations can be performed using CLI.
- uses PostgreSQL's "role-based grant system" and "row-level security policies".
- Pricing is $25/month for PostGraphile Pro which has additional features compared to PostGraphile.
PostGraphile can be deployed using AWS lambda on Mac, Linux or Windows. Check this example.
Apollo Server
- Apollo Server is open source and uses javascript.
- needs apollo-server and graphql libraries as preriquisites.
- Pricing- $59/user /month
- A good example to create the server can be found [here].(https://www.apollographql.com/docs/apollo-server/getting-started/)
- It can be deployed using lambda in AWS by utilizing serverless framework. A few examples are given below.
- Can be deployed faster using Heroku but has additional cost.
Building own GraphQL servers
- GraphQL.js- Server using Javascript.
- Apollo Server- Server using Javascript.
- Express GraphQL- Server using Javascript.
- Graphehe- Server using Python. Most popular and contributors.
- Ariadne- Server using Python
- Strawberry- Server using Python
- CMR-GraphQL- Currently being used by NASA GHRC developers Check out the example of their GraphQL schema and resolver here
GraphQL in ORCA
Some of the lambdas that might be affected are:
- post_copy_request_to_queue
- db_deploy
- post_to_database
- request_from_archive
- copy_from_archive
- request_status_for_granule
- request_status_for_job
- db_deploy
- post_to_database
Apart from updating those lambdas, developers need to create the GraphQL endpoint using terraform or AWS SAM, update requirements.txt, requirements-dev.txt or bin/build.sh, bin/run_tests.sh by adding additional dependiencies like graphene
in case of using graphene, or npm install apollo-server graphql
in case of using Apollo server.
In case of using Javascript libraries like Apollo server, additional files/codes are needed to write the schema and resolver.
In case of using Python library like Graphene, developers need to update .py files to import libraries, create class and queries, and to create schema and resolver.
A few suggestions are given below:
post_copy_request_to_queue
- Developers might need to modify
get_metadata_sql(key_path)
and use the graphql query. See this example. - Update
test_post_copy_request_to_queue.py
based on changes inpost_copy_request_to_queue.py
. One test could betest_get_metadata_sql_happy_path()
shared_recovery.update_status_for_file()
andshared_recovery.post_entry_to_queue()
functions for sending to SQS might need to be removed and replaced with code that leverages GraphQL to write to the database.- Additional changes are expected.
copy_from_archive
shared_recovery.update_status_for_file()
andshared_recovery.post_entry_to_queue()
functions for sending to SQS might need to be removed and replaced with code that leverages GraphQL to write to the database.- Additional changes are expected.
request_from_archive
shared_recovery.update_status_for_file()
,shared_recovery.create_status_for_job()
andshared_recovery.post_entry_to_queue()
functions for sending to SQS might need to be removed and replaced with code that leverages GraphQL to write to the database.db_queue_url
arg ininner_task()
will not be needed if SQS is not used.- Modify
process_granule()
function. - Additional changes are expected.
request_status_for_granule
- Modify
get_most_recent_job_id_for_granule()
function to use GraphQL query. - Modify
get_status_totals_for_job()
function to use GraphQL query. - Additional changes are expected.
request_status_for_job
- Modify
get_granule_status_entries_for_job()
function to use GraphQL query. - Modify
get_granule_status_entries_for_job()
function to use GraphQL query.
db_deploy
app_db_exists()
,app_schema_exists()
,app_version_table_exists()
andget_migration_version()
functions might need to be updated.- Additional changes are expected.
post_to_database
- This will need to be removed if GraphQL is used since it will write to the DB instead of SQS.
GraphQL server recommendation
Based on this research, GraphQL has a higher learning curve compared to other technologies and will take some time for developers to learn and then implement. If using Javascript libraries, developers should have good background in this language in order to execute this approach. Using some prebuilt GraphQL servers that automatically generates GraphQL schema and resolvers could save some time and simplify the design. Building a prototype in ORCA could reveal if it is worth the effort and time. However, using lambda, API gateway and SQS(the resources in existing ORCA architecture) seem to contain more resources and examples online than GraphQL.
Recommendation #1- Hasura
- Hasura GraphQL engine can be deployed using Docker and Docker Compose or using Hasura cloud.
- The easiest way to set up Hasura GraphQL engine on local environment without any cost is using Docker.
- It supports GraphQL on Postgres, AWS Aurora and seems to be compatible with the current ORCA architecture.
- Cost to use the cloud is $99/month/project and supports up to 20GB data with $2/additional GB data.
- Creating the server using the given
docker-compose.yml
file will be easy and the server can be queried from the Hasura console. Instructions to create the server can be found here. - Instructions on connecting Postgres to the GraphQL server can be found here
Hasura cloud service is not approved by NGAP, so it cannot be used for now. However, developers can use the docker version for testing.
Practical Evaluation
- Setting up locally is the easiest out of all three recommendations.
- Only supports PostgreSQL, MS SQL Server, and Citrus, with BigQuery in beta.
- Did not attempt to deploy to AWS.