GraphQL Best Practices
Introduction:
DataHub’s GraphQL API is designed to power the UI. The following guidelines are written with this use-case in mind.
General Best Practices
Query Optimizations
One of GraphQL's biggest advantages over a traditional REST API is its support for declarative data fetching. Each component can (and should) query exactly the fields it requires to render, with no superfluous data sent over the network. If instead your root component executes a single, enormous query to obtain data for all of its children, it might query on behalf of components that aren't even rendered given the current state. This can result in a delayed response, and it drastically reduces the likelihood that the query's result can be reused by a server-side response cache. [ref]
- Minimize over-fetching by only requesting data needed to be displayed.
- Limit result counts and use pagination (additionally see section below on
Deep Pagination
).- Avoid deeply nested queries and instead break out queries into separate requests for the nested objects.
Client-side Caching
Clients, such as Apollo Client (javascript, python apollo-client-python
), offer client-side caching to prevent requests to the service and are able to understand the content of the GraphQL query. This enables more advanced caching vs HTTP response caching.
Reuse Pieces of Query Logic with Fragments
One powerful feature of GraphQL that we recommend you use is fragments. Fragments allow you to define pieces of a query that you can reuse across any client-side query that you define. Basically, you can define a set of fields that you want to query, and reuse it in multiple places.
This technique makes maintaining your GraphQL queries much more doable. For example, if you want to request a new field for an entity type across many queries, you’re able to update it in one place if you’re leveraging fragments.
Search Query Best Practices
Deep Pagination: search vs scroll APIs
search*
APIs such as searchAcrossEntities
are designed for minimal pagination (< ~50). They do not perform well for deep pagination requests. Use the equivalent scroll*
APIs such as scrollAcrossEntities
when expecting the need to paginate deeply into the result set.
Note: that it is impossible to use search*
for paginating beyond 10k results.
Examples
In the following examples we demonstrate pagination for both scroll*
and search*
requests. This particular request is searching for two entities, Datasets and Charts, that
contain pet
in the entities' name or title. The results will only include the URN for the entities.
{
scrollAcrossEntities(
input: {
types: [DATASET, CHART]
count: 2
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
nextScrollId
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}
Page 1 Result:
{
"data": {
"scrollAcrossEntities": {
"nextScrollId": "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0=",
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
}
}
]
}
},
"extensions": {}
}
Page 2 Request:
{
scrollAcrossEntities(
input: {
scrollId: "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0="
types: [DATASET, CHART]
count: 2
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
nextScrollId
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}
Page 2 Result:
{
"data": {
"scrollAcrossEntities": {
"nextScrollId": "eyJzb3J0IjpbMS43MTg3NSwidXJuOmxpOmRhdGFzZXQ6KHVybjpsaTpkYXRhUGxhdGZvcm06c25vd2ZsYWtlLGxvbmdfdGFpbF9jb21wYW5pb25zLmFkb3B0aW9uLnBldHMsUFJPRCkiXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ==",
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
}
}
]
}
},
"extensions": {}
}
{
searchAcrossEntities(
input: {
types: [DATASET, CHART]
count: 2,
start: 0
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}
Page 1 Response:
{
"data": {
"searchAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
}
}
]
}
},
"extensions": {}
}
Page 2 Request:
{
searchAcrossEntities(
input: {
types: [DATASET, CHART]
count: 2,
start: 2
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}
Page 2 Response:
{
"data": {
"searchAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
}
}
]
}
},
"extensions": {}
}
SearchFlags: Highlighting and Aggregation
When performing queries which accept searchFlags
and highlighting and aggregation is not needed, be sure to disable these flags.
- skipHighlighting: true
- skipAggregates: true
As a fallback, if only certain fields require highlighting use customHighlightingFields
to limit highlighting to the specific fields.
Example for skipping highlighting and aggregates, typically used for scrolling search requests.
{
scrollAcrossEntities(
input: {types: [DATASET], count: 2, query: "pet", searchFlags: {skipAggregates: true, skipHighlighting: true}}
) {
searchResults {
entity {
... on Dataset {
urn
}
}
matchedFields {
name
value
}
}
facets {
displayName
aggregations {
value
count
}
}
}
}
Response:
Note that a few matchedFields
are still returned by default [urn
, customProperties
]
{
"data": {
"scrollAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
}
]
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
}
]
}
],
"facets": []
}
},
"extensions": {}
}
Custom highlighting can be used for searchAcrossEntities when only a limited number of fields are useful for highlighting. In this example we specifically request highlighting for description
.
{
searchAcrossEntities(
input: {types: [DATASET], count: 2, query: "pet", searchFlags: {customHighlightingFields: ["description"]}}
) {
searchResults {
entity {
... on Dataset {
urn
}
}
matchedFields {
name
value
}
}
}
}
Response:
{
"data": {
"searchAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
},
{
"name": "description",
"value": "Table with all pet-related details"
}
]
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
}
]
}
]
}
},
"extensions": {}
}
Aggregation
When aggregation is required with searchAcrossEntities
, it is possible to set the count
to 0 to avoid fetching the top search hits, only returning the aggregations. Alternatively aggregateAcrossEntities provides counts and can provide faster results from server-side caching.
Request:
{
searchAcrossEntities(
input: {types: [DATASET], count: 0, query: "pet", searchFlags: {skipHighlighting: true}}
) {
searchResults {
entity {
... on Dataset {
urn
}
}
matchedFields {
name
value
}
}
facets {
displayName
aggregations {
value
count
}
}
}
}
Response:
{
"data": {
"searchAcrossEntities": {
"searchResults": [],
"facets": [
{
"displayName": "Container",
"aggregations": [
{
"value": "urn:li:container:b41c14bc5cb3ccfbb0433c8cbdef2992",
"count": 4
},
{
"value": "urn:li:container:701919de0ec93cb338fe9bac0b35403c",
"count": 3
}
]
},
{
"displayName": "Sub Type",
"aggregations": [
{
"value": "table",
"count": 9
},
{
"value": "view",
"count": 6
},
{
"value": "explore",
"count": 5
},
{
"value": "source",
"count": 4
},
{
"value": "incremental",
"count": 1
}
]
},
{
"displayName": "Type",
"aggregations": [
{
"value": "DATASET",
"count": 24
}
]
},
{
"displayName": "Environment",
"aggregations": [
{
"value": "PROD",
"count": 24
}
]
},
{
"displayName": "Glossary Term",
"aggregations": [
{
"value": "urn:li:glossaryTerm:Adoption.DaysInStatus",
"count": 1
},
{
"value": "urn:li:glossaryTerm:Ecommerce.HighRisk",
"count": 1
},
{
"value": "urn:li:glossaryTerm:Classification.Confidential",
"count": 1
}
]
},
{
"displayName": "Domain",
"aggregations": [
{
"value": "urn:li:domain:094dc54b-0ebc-40a6-a4cf-e1b75e8b8089",
"count": 6
},
{
"value": "urn:li:domain:7d64d0fa-66c3-445c-83db-3a324723daf8",
"count": 2
}
]
},
{
"displayName": "Owned By",
"aggregations": [
{
"value": "urn:li:corpGroup:Adoption",
"count": 5
},
{
"value": "urn:li:corpuser:shannon@longtail.com",
"count": 4
},
{
"value": "urn:li:corpuser:admin",
"count": 2
},
{
"value": "urn:li:corpGroup:Analytics Engineering",
"count": 2
},
{
"value": "urn:li:corpuser:avigdor@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:prentiss@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:tasha@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:ricca@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:emilee@longtail.com",
"count": 1
}
]
},
{
"displayName": "Platform",
"aggregations": [
{
"value": "urn:li:dataPlatform:looker",
"count": 8
},
{
"value": "urn:li:dataPlatform:dbt",
"count": 7
},
{
"value": "urn:li:dataPlatform:snowflake",
"count": 7
},
{
"value": "urn:li:dataPlatform:s3",
"count": 1
},
{
"value": "urn:li:dataPlatform:mongodb",
"count": 1
}
]
},
{
"displayName": "Tag",
"aggregations": [
{
"value": "urn:li:tag:prod_model",
"count": 3
},
{
"value": "urn:li:tag:pii",
"count": 2
},
{
"value": "urn:li:tag:business critical",
"count": 2
},
{
"value": "urn:li:tag:business_critical",
"count": 2
},
{
"value": "urn:li:tag:Tier1",
"count": 1
},
{
"value": "urn:li:tag:prod",
"count": 1
}
]
},
{
"displayName": "Type",
"aggregations": [
{
"value": "DATASET",
"count": 24
}
]
}
]
}
},
"extensions": {}
}
Limit Search Entity Types
When querying for specific entities, enumerate only the entity types required using types
, for example [DATASET
, CHART
]
Limit Results
Limit search results based on the amount of information being requested. For example, a minimal number of attributes can fetch 1,000 - 2,000 results in a single page, however as the number of attributes increases (especially nested objects) the count
should be lowered, 20-25 for very complex requests.
Lineage Query Best Practices
There are two primary ways to query lineage:
Search Across Lineage
searchAcrossLineage
/ scrollAcrossLineage
root query:
- Recommended for all lineage queries
- Only the shortest path is guaranteed to show up in
paths
- Supports querying indirect lineage (depth > 1)
- Depending on the fanout of the lineage, 3+ hops may not return data, use 1-hop queries for the fastest response times.
- Specify using a filter with name
"degree"
and values"1"
,"2"
, and / or"3+"
The following examples are demonstrated using sample data for urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)
.
The following example queries show UPSTREAM lineage with progressively higher degrees, first with degree ["1"]
and then ["1","2"]
.
1-Hop Upstreams:
Request:
{
searchAcrossLineage(
input: {urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", query: "*", count: 10, start: 0, direction: UPSTREAM, orFilters: [{and: [{field: "degree", condition: EQUAL, values: ["1"]}]}], searchFlags: {skipAggregates: true, skipHighlighting: true}}
) {
start
count
total
searchResults {
entity {
urn
type
... on Dataset {
name
}
}
paths {
path {
... on Dataset {
urn
}
}
}
degree
}
}
}
Response:
{
"data": {
"searchAcrossLineage": {
"start": 0,
"count": 10,
"total": 1,
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
"type": "DATASET",
"name": "SampleHdfsDataset"
},
"paths": [
{
"path": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
}
]
}
],
"degree": 1
}
]
}
},
"extensions": {}
}
Request:
{
searchAcrossLineage(
input: {urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", query: "*", count: 10, start: 0, direction: UPSTREAM, orFilters: [{and: [{field: "degree", condition: EQUAL, values: ["1","2"]}]}], searchFlags: {skipAggregates: true, skipHighlighting: true}}
) {
start
count
total
searchResults {
entity {
urn
type
... on Dataset {
name
}
}
paths {
path {
... on Dataset {
urn
}
}
}
degree
}
}
}
{
"data": {
"searchAcrossLineage": {
"start": 0,
"count": 10,
"total": 2,
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
"type": "DATASET",
"name": "SampleHdfsDataset"
},
"paths": [
{
"path": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
}
]
}
],
"degree": 1
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
"type": "DATASET",
"name": "SampleKafkaDataset"
},
"paths": [
{
"path": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)"
}
]
}
],
"degree": 2
}
]
}
},
"extensions": {}
}
Lineage Subquery
The previous query requires a root or starting node in the lineage graph. The following request offers a way to request lineage for multiple nodes at once with a few limitations.
lineage
query on EntityWithRelationship
entities:
- A more direct reflection of the graph index
- 1-hop lineage only
- Multiple URNs
- Should not be requested too many times in a single request. 20 is a tested limit
The following examples are based on the sample lineage graph shown here:
Example Request:
query getBulkEntityLineageV2($urns: [String!]! = ["urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)", "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)"]) {
entities(urns: $urns) {
urn
type
... on DataJob {
jobId
dataFlow {
flowId
}
properties {
name
}
upstream: lineage(input: {direction: UPSTREAM, start: 0, count: 10}) {
total
relationships {
type
entity {
urn
type
}
}
}
downstream: lineage(input: {direction: DOWNSTREAM, start: 0, count: 10}) {
total
relationships {
type
entity {
urn
type
}
}
}
}
}
}
Example Response:
{
"data": {
"entities": [
{
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
"type": "DATA_JOB",
"jobId": "task_123",
"dataFlow": {
"flowId": "dag_abc"
},
"properties": {
"name": "User Creations"
},
"upstream": {
"total": 1,
"relationships": [
{
"type": "Consumes",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
"type": "DATASET"
}
}
]
},
"downstream": {
"total": 2,
"relationships": [
{
"type": "DownstreamOf",
"entity": {
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
"type": "DATA_JOB"
}
},
{
"type": "Produces",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"type": "DATASET"
}
}
]
}
},
{
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
"type": "DATA_JOB",
"jobId": "task_456",
"dataFlow": {
"flowId": "dag_abc"
},
"properties": {
"name": "User Deletions"
},
"upstream": {
"total": 2,
"relationships": [
{
"type": "DownstreamOf",
"entity": {
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
"type": "DATA_JOB"
}
},
{
"type": "Consumes",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
"type": "DATASET"
}
}
]
},
"downstream": {
"total": 1,
"relationships": [
{
"type": "Produces",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
"type": "DATASET"
}
}
]
}
}
]
},
"extensions": {}
}