Skip to main content
Version: Next

GraphQL Best Practices

Introduction:

DataHub’s GraphQL API is designed to power the UI. The following guidelines are written with this use-case in mind.

General Best Practices

Query Optimizations

One of GraphQL's biggest advantages over a traditional REST API is its support for declarative data fetching. Each component can (and should) query exactly the fields it requires to render, with no superfluous data sent over the network. If instead your root component executes a single, enormous query to obtain data for all of its children, it might query on behalf of components that aren't even rendered given the current state. This can result in a delayed response, and it drastically reduces the likelihood that the query's result can be reused by a server-side response cache. [ref]

  1. Minimize over-fetching by only requesting data needed to be displayed.
  2. Limit result counts and use pagination (additionally see section below on Deep Pagination).
  3. Avoid deeply nested queries and instead break out queries into separate requests for the nested objects.

Client-side Caching

Clients, such as Apollo Client (javascript, python apollo-client-python), offer client-side caching to prevent requests to the service and are able to understand the content of the GraphQL query. This enables more advanced caching vs HTTP response caching.

Reuse Pieces of Query Logic with Fragments

One powerful feature of GraphQL that we recommend you use is fragments. Fragments allow you to define pieces of a query that you can reuse across any client-side query that you define. Basically, you can define a set of fields that you want to query, and reuse it in multiple places.

This technique makes maintaining your GraphQL queries much more doable. For example, if you want to request a new field for an entity type across many queries, you’re able to update it in one place if you’re leveraging fragments.

Search Query Best Practices

Deep Pagination: search vs scroll APIs

search* APIs such as searchAcrossEntities are designed for minimal pagination (< ~50). They do not perform well for deep pagination requests. Use the equivalent scroll* APIs such as scrollAcrossEntities when expecting the need to paginate deeply into the result set.

Note: that it is impossible to use search* for paginating beyond 10k results.

Examples

In the following examples we demonstrate pagination for both scroll* and search* requests. This particular request is searching for two entities, Datasets and Charts, that contain pet in the entities' name or title. The results will only include the URN for the entities.

Page 1 Request:
{
scrollAcrossEntities(
input: {
types: [DATASET, CHART]
count: 2
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
nextScrollId
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}

Page 1 Result:

{
"data": {
"scrollAcrossEntities": {
"nextScrollId": "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0=",
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
}
}
]
}
},
"extensions": {}
}

Page 2 Request:

{
scrollAcrossEntities(
input: {
scrollId: "eyJzb3J0IjpbMi4wNzk2ODc2LCJ1cm46bGk6ZGF0YXNldDoodXJuOmxpOmRhdGFQbGF0Zm9ybTpzbm93Zmxha2UsbG9uZ190YWlsX2NvbXBhbmlvbnMuYWRvcHRpb24ucGV0X3Byb2ZpbGVzLFBST0QpIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0="
types: [DATASET, CHART]
count: 2
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
nextScrollId
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}

Page 2 Result:

{
"data": {
"scrollAcrossEntities": {
"nextScrollId": "eyJzb3J0IjpbMS43MTg3NSwidXJuOmxpOmRhdGFzZXQ6KHVybjpsaTpkYXRhUGxhdGZvcm06c25vd2ZsYWtlLGxvbmdfdGFpbF9jb21wYW5pb25zLmFkb3B0aW9uLnBldHMsUFJPRCkiXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ==",
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
}
}
]
}
},
"extensions": {}
}
Page 1 Request:
{
searchAcrossEntities(
input: {
types: [DATASET, CHART]
count: 2,
start: 0
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}

Page 1 Response:

{
"data": {
"searchAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pet_profiles,PROD)"
}
}
]
}
},
"extensions": {}
}

Page 2 Request:

{
searchAcrossEntities(
input: {
types: [DATASET, CHART]
count: 2,
start: 2
query: "*"
orFilters: [
{ and: [{ field: "name", condition: CONTAIN, values: ["pet"] }] },
{ and: [{ field: "title", condition: CONTAIN, values: ["pet"] }] }
]
}
) {
searchResults {
entity {
... on Dataset {
urn
}
... on Chart {
urn
}
}
}
}
}

Page 2 Response:

{
"data": {
"searchAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_status_history,PROD)"
}
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.adoption.pets,PROD)"
}
}
]
}
},
"extensions": {}
}

SearchFlags: Highlighting and Aggregation

When performing queries which accept searchFlags and highlighting and aggregation is not needed, be sure to disable these flags.

  • skipHighlighting: true
  • skipAggregates: true

As a fallback, if only certain fields require highlighting use customHighlightingFields to limit highlighting to the specific fields.

Example for skipping highlighting and aggregates, typically used for scrolling search requests.

{
scrollAcrossEntities(
input: {types: [DATASET], count: 2, query: "pet", searchFlags: {skipAggregates: true, skipHighlighting: true}}
) {
searchResults {
entity {
... on Dataset {
urn
}
}
matchedFields {
name
value
}
}
facets {
displayName
aggregations {
value
count
}
}
}
}

Response:

Note that a few matchedFields are still returned by default [urn, customProperties]

{
"data": {
"scrollAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
}
]
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
}
]
}
],
"facets": []
}
},
"extensions": {}
}

Custom highlighting can be used for searchAcrossEntities when only a limited number of fields are useful for highlighting. In this example we specifically request highlighting for description.

{
searchAcrossEntities(
input: {types: [DATASET], count: 2, query: "pet", searchFlags: {customHighlightingFields: ["description"]}}
) {
searchResults {
entity {
... on Dataset {
urn
}
}
matchedFields {
name
value
}
}
}
}

Response:

{
"data": {
"searchAcrossEntities": {
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:dbt,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
},
{
"name": "description",
"value": "Table with all pet-related details"
}
]
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,long_tail_companions.analytics.pet_details,PROD)"
},
"matchedFields": [
{
"name": "urn",
"value": ""
},
{
"name": "customProperties",
"value": ""
}
]
}
]
}
},
"extensions": {}
}

Aggregation

When aggregation is required with searchAcrossEntities, it is possible to set the count to 0 to avoid fetching the top search hits, only returning the aggregations. Alternatively aggregateAcrossEntities provides counts and can provide faster results from server-side caching.

Request:

{
searchAcrossEntities(
input: {types: [DATASET], count: 0, query: "pet", searchFlags: {skipHighlighting: true}}
) {
searchResults {
entity {
... on Dataset {
urn
}
}
matchedFields {
name
value
}
}
facets {
displayName
aggregations {
value
count
}
}
}
}

Response:

{
"data": {
"searchAcrossEntities": {
"searchResults": [],
"facets": [
{
"displayName": "Container",
"aggregations": [
{
"value": "urn:li:container:b41c14bc5cb3ccfbb0433c8cbdef2992",
"count": 4
},
{
"value": "urn:li:container:701919de0ec93cb338fe9bac0b35403c",
"count": 3
}
]
},
{
"displayName": "Sub Type",
"aggregations": [
{
"value": "table",
"count": 9
},
{
"value": "view",
"count": 6
},
{
"value": "explore",
"count": 5
},
{
"value": "source",
"count": 4
},
{
"value": "incremental",
"count": 1
}
]
},
{
"displayName": "Type",
"aggregations": [
{
"value": "DATASET",
"count": 24
}
]
},
{
"displayName": "Environment",
"aggregations": [
{
"value": "PROD",
"count": 24
}
]
},
{
"displayName": "Glossary Term",
"aggregations": [
{
"value": "urn:li:glossaryTerm:Adoption.DaysInStatus",
"count": 1
},
{
"value": "urn:li:glossaryTerm:Ecommerce.HighRisk",
"count": 1
},
{
"value": "urn:li:glossaryTerm:Classification.Confidential",
"count": 1
}
]
},
{
"displayName": "Domain",
"aggregations": [
{
"value": "urn:li:domain:094dc54b-0ebc-40a6-a4cf-e1b75e8b8089",
"count": 6
},
{
"value": "urn:li:domain:7d64d0fa-66c3-445c-83db-3a324723daf8",
"count": 2
}
]
},
{
"displayName": "Owned By",
"aggregations": [
{
"value": "urn:li:corpGroup:Adoption",
"count": 5
},
{
"value": "urn:li:corpuser:shannon@longtail.com",
"count": 4
},
{
"value": "urn:li:corpuser:admin",
"count": 2
},
{
"value": "urn:li:corpGroup:Analytics Engineering",
"count": 2
},
{
"value": "urn:li:corpuser:avigdor@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:prentiss@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:tasha@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:ricca@longtail.com",
"count": 1
},
{
"value": "urn:li:corpuser:emilee@longtail.com",
"count": 1
}
]
},
{
"displayName": "Platform",
"aggregations": [
{
"value": "urn:li:dataPlatform:looker",
"count": 8
},
{
"value": "urn:li:dataPlatform:dbt",
"count": 7
},
{
"value": "urn:li:dataPlatform:snowflake",
"count": 7
},
{
"value": "urn:li:dataPlatform:s3",
"count": 1
},
{
"value": "urn:li:dataPlatform:mongodb",
"count": 1
}
]
},
{
"displayName": "Tag",
"aggregations": [
{
"value": "urn:li:tag:prod_model",
"count": 3
},
{
"value": "urn:li:tag:pii",
"count": 2
},
{
"value": "urn:li:tag:business critical",
"count": 2
},
{
"value": "urn:li:tag:business_critical",
"count": 2
},
{
"value": "urn:li:tag:Tier1",
"count": 1
},
{
"value": "urn:li:tag:prod",
"count": 1
}
]
},
{
"displayName": "Type",
"aggregations": [
{
"value": "DATASET",
"count": 24
}
]
}
]
}
},
"extensions": {}
}

Limit Search Entity Types

When querying for specific entities, enumerate only the entity types required using types , for example [DATASET , CHART]

Limit Results

Limit search results based on the amount of information being requested. For example, a minimal number of attributes can fetch 1,000 - 2,000 results in a single page, however as the number of attributes increases (especially nested objects) the count should be lowered, 20-25 for very complex requests.

Lineage Query Best Practices

There are two primary ways to query lineage:

Search Across Lineage

searchAcrossLineage / scrollAcrossLineage root query:

  • Recommended for all lineage queries
  • Only the shortest path is guaranteed to show up in paths
  • Supports querying indirect lineage (depth > 1)
    • Depending on the fanout of the lineage, 3+ hops may not return data, use 1-hop queries for the fastest response times.
    • Specify using a filter with name "degree" and values "1" , "2", and / or "3+"

The following examples are demonstrated using sample data for urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD).

The following example queries show UPSTREAM lineage with progressively higher degrees, first with degree ["1"] and then ["1","2"].

1-Hop Upstreams:

Request:

{
searchAcrossLineage(
input: {urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", query: "*", count: 10, start: 0, direction: UPSTREAM, orFilters: [{and: [{field: "degree", condition: EQUAL, values: ["1"]}]}], searchFlags: {skipAggregates: true, skipHighlighting: true}}
) {
start
count
total
searchResults {
entity {
urn
type
... on Dataset {
name
}
}
paths {
path {
... on Dataset {
urn
}
}
}
degree
}
}
}

Response:

{
"data": {
"searchAcrossLineage": {
"start": 0,
"count": 10,
"total": 1,
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
"type": "DATASET",
"name": "SampleHdfsDataset"
},
"paths": [
{
"path": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
}
]
}
],
"degree": 1
}
]
}
},
"extensions": {}
}
1-Hop & 2-Hop Upstreams:

Request:

{
searchAcrossLineage(
input: {urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)", query: "*", count: 10, start: 0, direction: UPSTREAM, orFilters: [{and: [{field: "degree", condition: EQUAL, values: ["1","2"]}]}], searchFlags: {skipAggregates: true, skipHighlighting: true}}
) {
start
count
total
searchResults {
entity {
urn
type
... on Dataset {
name
}
}
paths {
path {
... on Dataset {
urn
}
}
}
degree
}
}
}
{
"data": {
"searchAcrossLineage": {
"start": 0,
"count": 10,
"total": 2,
"searchResults": [
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)",
"type": "DATASET",
"name": "SampleHdfsDataset"
},
"paths": [
{
"path": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
}
]
}
],
"degree": 1
},
{
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)",
"type": "DATASET",
"name": "SampleKafkaDataset"
},
"paths": [
{
"path": [
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:hdfs,SampleHdfsDataset,PROD)"
},
{
"urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,SampleKafkaDataset,PROD)"
}
]
}
],
"degree": 2
}
]
}
},
"extensions": {}
}

Lineage Subquery

The previous query requires a root or starting node in the lineage graph. The following request offers a way to request lineage for multiple nodes at once with a few limitations.

lineage query on EntityWithRelationship entities:

  • A more direct reflection of the graph index
  • 1-hop lineage only
  • Multiple URNs
  • Should not be requested too many times in a single request. 20 is a tested limit

The following examples are based on the sample lineage graph shown here:

Example Request:

query getBulkEntityLineageV2($urns: [String!]! = ["urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)", "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)"]) {
entities(urns: $urns) {
urn
type
... on DataJob {
jobId
dataFlow {
flowId
}
properties {
name
}
upstream: lineage(input: {direction: UPSTREAM, start: 0, count: 10}) {
total
relationships {
type
entity {
urn
type
}
}
}
downstream: lineage(input: {direction: DOWNSTREAM, start: 0, count: 10}) {
total
relationships {
type
entity {
urn
type
}
}
}
}
}
}

Example Response:

{
"data": {
"entities": [
{
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
"type": "DATA_JOB",
"jobId": "task_123",
"dataFlow": {
"flowId": "dag_abc"
},
"properties": {
"name": "User Creations"
},
"upstream": {
"total": 1,
"relationships": [
{
"type": "Consumes",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
"type": "DATASET"
}
}
]
},
"downstream": {
"total": 2,
"relationships": [
{
"type": "DownstreamOf",
"entity": {
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
"type": "DATA_JOB"
}
},
{
"type": "Produces",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
"type": "DATASET"
}
}
]
}
},
{
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_456)",
"type": "DATA_JOB",
"jobId": "task_456",
"dataFlow": {
"flowId": "dag_abc"
},
"properties": {
"name": "User Deletions"
},
"upstream": {
"total": 2,
"relationships": [
{
"type": "DownstreamOf",
"entity": {
"urn": "urn:li:dataJob:(urn:li:dataFlow:(airflow,dag_abc,PROD),task_123)",
"type": "DATA_JOB"
}
},
{
"type": "Consumes",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)",
"type": "DATASET"
}
}
]
},
"downstream": {
"total": 1,
"relationships": [
{
"type": "Produces",
"entity": {
"urn": "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_deleted,PROD)",
"type": "DATASET"
}
}
]
}
}
]
},
"extensions": {}
}