English

Queries and Indexes

Every datastore query uses an index, a table that contains the results for the query in the desired order. An App Engine application defines its indexes in a configuration file named datastore-indexes.xml. The development web server can automatically generate suggestions for this file as it encounters queries that do not yet have indexes configured.

The index-based query mechanism supports most common kinds of queries, but it does not support some queries you may be used to from other database technologies. Restrictions on queries, and their explanations, are described below.

Introducing Queries

A query retrieves entities from the datastore that meet a set of conditions. The query specifies an entity kind, zero or more conditions based on entity property values (sometimes called "filters"), and zero or more sort order descriptions. When the query is executed, it fetches all entities of the given kind that meet all of the given conditions, sorted in the order described.

A query can also return just the keys of the result entities instead of the entities themselves.

JDO can perform queries for entities that meet certain criteria. You can also use a JDO Extent to represent the collection of every entity of a kind (every stored object of a class).

Queries with JDOQL

JDO includes a query language for retrieving objects that meet a set of criteria. This language, called JDOQL, refers to JDO data classes and fields directly, and includes type checking for query parameters and results. JDOQL is similar to SQL, but is more appropriate for object-oriented databases like the App Engine datastore. (The App Engine datastore does not support SQL queries with the JDO interface.)

The query API supports several calling styles. You can specify a complete query in a string, using the JDOQL string syntax. You can also specify some or all parts of the query by calling methods on the query object.

Here is a simple example of a query using the method style of calling, with one filter and one sort order, using parameter substitution for the value used in the filter. The Query object's execute() method is called with the values to substitute in the query, in the order they are declared.

import java.util.List;
import javax.jdo.Query;

// ...

    Query query = pm.newQuery(Employee.class);
    query.setFilter("lastName == lastNameParam");
    query.setOrdering("hireDate desc");
    query.declareParameters("String lastNameParam");

    try {
        List<Employee> results = (List<Employee>) query.execute("Smith");
        if (results.iterator().hasNext()) {
            for (Employee e : results) {
                // ...
            }
        } else {
            // ... no results ...
        }
    } finally {
        query.closeAll();
    }

Here is the same query using the string syntax:

    Query query = pm.newQuery("select from Employee " +
                              "where lastName == lastNameParam " +
                              "order by hireDate desc " +
                              "parameters String lastNameParam")

    List<Employee> results = (List<Employee>) query.execute("Smith");

You can mix these styles of defining the query. For example:

    Query query = pm.newQuery(Employee.class,
                              "lastName == lastNameParam order by hireDate desc");
    query.declareParameters("String lastNameParam");

    List<Employee> results = (List<Employee>) query.execute("Smith");

You can reuse a single Query instance with different values substituted for the parameters by calling the execute() method multiple times. Each call performs the query and returns the results as a collection.

The JDOQL string syntax supports value literals within the string for string values and numeric values. Surround strings in either single-quotes (') or double-quotes ("). All other value types must use parameter substitution. Here is an example using a string literal value:

    Query query = pm.newQuery(Employee.class,
                              "lastName == 'Smith' order by hireDate desc");

Query Filters

A filter specifies a field name, an operator, and a value. The value must be provided by the app; it cannot refer to another property, or be calculated in terms of other properties. The operator can be any of the following: < <= == >= >

Note: The Java datastore interface does not support the != and IN filter operators that are implemented in the Python datastore interface. (In the Python interface, these operators are implemented in the client-side libraries as multiple datastore queries; they are not features of the datastore itself.)

The subject of a filter can be any object field, including the primary key and the entity group parent (see Transactions).

An entity must match all filters to be a result. In the JDOQL string syntax, multiple filters are specified separated by && (logical "and"). Other logical combinations of filters (logical "or", "not") are not supported.

Due to the way the App Engine datastore executes queries, a single query cannot use inequality filters (< <= >= >) on more than one property. Multiple inequality filters on the same property (such as querying for a range of values) are permitted. See Restrictions on Queries.

    query.setFilter("lastName == 'Smith' && hireDate > hireDateMinimum");
    query.declareParameters("Date hireDateMinimum");

Query Sort Orders

A sort order specifies a property and a direction, either ascending or descending. The results are returned sorted by the given orders, in the order they were specified. If no sort orders are specified for the query, the results are ordered by their entity keys.

Due to the way the App Engine datastore executes queries, if a query specifies inequality filters on a property and sort orders on other properties, the property used with the inequality filters must be ordered before the other properties. See Restrictions on Queries.

    query.setOrdering("hireDate desc, firstName asc");

Query Ranges

A query can specify a range of results to be returned to the application. The range specifies which result in the complete result set should be the first one returned, and which should be the last, using numeric indexes, starting with 0 for the first result. For example, a range of 5, 10 returns the 6th, 7th, 8th, 9th and 10th results.

The starting offset has implications for performance: the datastore must retrieve and then discard all results prior to the starting offset. For example, a query with a range of 5, 10 fetches 10 results from the datastore, then discards the first 5 and returns the remaining 5 to the application.

    query.setRange(5, 10);

Extents

A JDO Extent represents every object in the datastore of a particular class..

You start an Extent using the PersistenceManager's getExtent() method, passing it the data class. The Extent class implements the Iterable interface for accessing results. When you are done accessing results, you call the closeAll() method.

The following example iterates over every Employee object in the datastore:

import java.util.Iterator;
import javax.jdo.Extent;

// ...

    Extent extent = pm.getExtent(Employee.class, false);
    for (Employee e : extent) {
        // ...
    }
    extent.closeAll();

An extent retrieves results in batches, and can exceed the 1,000-result limit that applies to queries.

Introducing Indexes

The App Engine datastore maintains an index for every query an application intends to make. As the application makes changes to datastore entities, the datastore updates the indexes with the correct results. When the application executes a query, the datastore fetches the results directly from the corresponding index.

An application has an index for each combination of kind, filter property and operator, and sort order used in a query. Consider the example query, stated in JDOQL:

select from Person where lastName == "Smith"
                      && height < 72
                order by height desc

The index for this query is a table of keys for entities of the kind Person, with columns for the values of the height and lastName properties. The index is sorted by height in descending order.

Two queries of the same form but with different filter values use the same index. For example, the following query uses the same index as the query above:

select from Person where lastName == "Jones"
                      && height < 64
                order by height desc

The datastore executes a query using the following steps:

  1. The datastore identifies the index that corresponds with the query's kind, filter properties, filter operators, and sort orders.
  2. The datastore starts scanning the index at the first entity that meets all of the filter conditions using the query's filter values.
  3. The datastore continues to scan the index, returning each entity, until it finds the next entity that does not meet the filter conditions, until it reaches the end of the index, or until it has collected the maximum number of results requested by the query.

An index table contains columns for every property used in a filter or sort order. The rows are sorted by the following aspects, in order:

  • ancestors
  • property values used in equality filters
  • property values used in inequality filters
  • property values used in sort orders

This puts all results for every possible query that uses this index in consecutive rows in the table.

This mechanism supports a wide range of queries and is suitable for most applications. However, it does not support some kinds of queries you may be used to from other database technologies.

Entities Without a Filtered Property Are Never Returned by a Query

An index only contains entities that have every property referred to by the index. If an entity does not have a property referred to by an index, the entity will not appear in the index, and will never be a result for the query that uses the index.

Note that the App Engine datastore makes a distinction between an entity that does not possess a property and an entity that possesses the property with a null value (null). If you want every entity of a kind to be a potential result for a query, you can use a JDO or JPA data class, which always assigns a value to every property that corresponds to a field in the class.

Properties that Aren't Indexed

Property values that aren't indexed are not findable by queries. This includes properties that are marked as not indexed, as well as properties with values of the long text value type (Text) or the long binary value type (Blob).

A query with a filter or sort order on a property will never match an entity whose value for the property is a Text or Blob, or which was written with that property marked as not indexed. Properties with such values behave as if the property is not set with regard to query filters and sort orders.

Property Values of Mixed Types are Ordered By Type

When two entities have properties of the same name but of different value types, an index of the property sorts the entities first by value type, then by an order appropriate to the type. For example, if two entities each have a property named "age," one with an integer value and one with a string value, the entity with the integer value will always appear before the entity with the string value when sorted by the "Age" property, regardless of the values themselves.

This is especially worth noting in the case of integers and floating point numbers, which are treated as separate types by the datastore. A property with the integer value 38 is sorted before a property with the floating point value 37.5, because all integers are sorted before floats.

(If you're using JDO or JPA, this situation does not arise unless you modify a field's type without updating existing entities in the datastore, or use the low-level datastore API or a non-Java API.)

Defining Indexes With Configuration

App Engine builds indexes for several simple queries by default. For other queries, the application must specify the indexes it needs in a configuration file named datastore-indexes.xml. If the application running under App Engine tries to perform a query for which there is no corresponding index (either provided by default or described in datastore-indexes.xml), the query will fail.

App Engine provides automatic indexes for the following forms of queries:

  • queries using only equality and ancestor filters
  • queries using only inequality filters (which can only be of a single property)
  • queries with only one sort order on a property, either ascending or descending

Other forms of queries require their indexes to be specified in datastore-indexes.xml, including:

  • queries with multiple sort orders
  • queries with a sort order on keys in descending order
  • queries with one or more inequality filters on a property and one or more equality filters over other properties
  • queries with inequality filters and ancestor filters

The development web server makes managing index configuration easy: Instead of failing to execute a query that does not have an index and requires it, the development web server can generate configuration for an index that would allow the query to succeed. If your local testing of your application calls every possible query the application will make (every combination of kind, ancestor, filter and sort order), the generated entries will represent a complete set of indexes. If your testing might not exercise every possible query form, you can review and adjust the index configuration before uploading the application.

You can define indexes manually using a configuration file named datastore-indexes.xml in the WEB-INF/ directory of your application's WAR. If you have automatic index configuration enabled (see below), the development server creates an index configuration in a file named datastore-indexes-auto.xml in a directory named WEB-INF/appengine-generated/, and uses both files to determine the full set of indexes.

Consider once again the following example query:

select from Person where lastName = 'Smith'
                      && height < 72
                   order by height desc

The configuration for the index needed by this query would appear in datastore-indexes.xml as follows:

<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes
  xmlns="http://appengine.google.com/ns/datastore-indexes/1.0"
  autoGenerate="true">
    <datastore-index kind="Person" ancestor="false">
        <property name="lastName" direction="asc" />
        <property name="height" direction="desc" />
    </datastore-index>
</datastore-indexes>

If the <datastore-indexes> element in datastore-indexes.xml has the attribute autoGenerate="true" (as above) or if the app does not have a datastore-indexes.xml file, automatic index configuration is enabled. With automatic index configuration enabled, if the app performs this query in the development server and no configuration for the index exists, the server adds this index configuration to the datastore-indexes-auto.xml file.

For more information on datastore-indexes.xml and datastore-indexes-auto.xml, see Java Datastore Index Configuration.

Queries on Keys

Entity keys can be the subject of a query filter or sort order. In JDO, you refer to the entity key in the query using the primary key field of the object. The datastore considers the complete key value for such queries, including the entity's parent path, the kind, and the app-assigned key name string or system-assigned numeric ID.

Because an entity key is unique across all entities in the system, key queries make it easy to retrieve entities of a given kind in batches, such as for a batch dump of the contents of the datastore. Unlike JDOQL ranges, this works efficiently for any number of entities.

Keys are ordered first by parent path, then by kind, then by key name or ID. Kinds and key names are strings and are ordered by byte value. IDs are integers and are ordered numerically. If entities of the same parent and kind use a mix of key name strings and numeric IDs, entities with numeric IDs are considered to be less than entities with key name strings. Elements of the parent path are compared similarly: by kind (string), then by key name (string) or ID (number).

Queries involving keys use indexes just like queries involving properties, with one minor difference: Unlike with a property, a query with an equality filter on the key that also has additional filters must use a custom index defined in the app's index configuration file. As with all queries, the development web server creates appropriate configuration entries in this file when a query that needs a custom index is tested.

Restrictions on Queries

The nature of the index query mechanism imposes a few restrictions on what a query can do.

Filtering Or Sorting On a Property Requires That the Property Exists

A query filter condition or sort order for a property also implies a condition that the entity have a value for the property.

A datastore entity is not required to have a value for a property that other entities of the same kind have. A filter on a property can only match an entity with a value for the property. Entities without a value for a property used in a filter or sort order are omitted from the index built for the query.

No Filter That Matches Entities That Do Not Have a Property

It is not possible to perform a query for entities that are missing a given property. One alternative is to create a fixed (modeled) property with a default value of null, then create a filter for entities with null as the property value.

Inequality Filters Are Allowed On One Property Only

A query may only use inequality filters (<, <=, >=, >) on one property across all of its filters.

For example, this query is allowed:

select from Person where birthYear >= minBirthYearParam
                      && birthYear <= maxBirthYearParam

However, this query is not allowed, because it uses inequality filters on two different properties in the same query:

select from Person where birthYear >= minBirthYearParam
                      && height >= minHeightParam   // ERROR

Filters can combine equal (==) comparisons for different properties in the same query, including queries with one or more inequality conditions on a property. This is allowed:

select from Person where lastName == lastNameParam
                      && city == cityParam
                      && birthYear >= minBirthYearParam

The query mechanism relies on all results for a query to be adjacent to one another in the index table, to avoid having to scan the entire table for results. A single index table cannot represent multiple inequality filters on multiple properties while maintaining that all results are consecutive in the table.

Properties In Inequality Filters Must Be Sorted Before Other Sort Orders

If a query has both a filter with an inequality comparison and one or more sort orders, the query must include a sort order for the property used in the inequality, and the sort order must appear before sort orders on other properties.

This query is not valid, because it uses an inequality filter and does not order by the filtered property:

select from Person where birthYear >= minBirthYearParam
                order by lastName                    // ERROR

Similarly, this query is not valid because it does not order by the filtered property before ordering by other properties:

select from Person where birthYear > minBirthYearParam
                order by lastName, birthYear         // ERROR

This query is valid:

select from Person where birthYear >= minBirthYearParam
                order by birthYear, lastName

To get all results that match an inequality filter, a query scans the index table for the first matching row, then returns all consecutive results until it finds a row that doesn't match. For the consecutive rows to represent the complete result set, the rows must be ordered by the inequality filter before other sort orders.

Sort Orders and Properties With Multiple Values

Due to the way properties with multiple values are indexed, the sort order for these properties is unusual:

  • If the entities are sorted by a multi-valued property in ascending order, the value used for ordering is the smallest value.
  • If the entities are sorted by a multi-valued property in descending order, the value used for ordering is the greatest value.
  • Other values do not affect the sort order, nor does the number of values.
  • In the case of a tie, the key of the entity is used as the tie-breaker.

This sort order has the unusual consequence that [1, 9] comes before [4, 5, 6, 7] in both ascending and descending order.

One important caveat is queries with both an equality filter and a sort order on a multi-valued property. In those queries, the sort order is disregarded. For single-valued properties, this is a simple optimization. Every result would have the same value for the property, so the results do not need to be sorted further.

However, multi-valued properties may have additional values. Since the sort order is disregarded, the query results may be returned in a different order than if the sort order were applied. (Restoring the dropped sort order would be expensive and require extra indices, and this use case is rare, so the query planner leaves it off.)

Big Entities and Exploding Indexes

As described above, every property (that doesn't have a Text or Blob value) of every entity is added to at least one index table, including a simple index provided by default, and any indexes described in the application's datastore-indexes.xml file that refer to the property. For an entity that has one value for each property, App Engine stores a property value once in its simple index, and once for each time the property is referred to in a custom index. Each of these index entries must be updated every time the value of the property changes, so the more indexes that refer to the property, the more time it will take to update the property.

To prevent the update of an entity from taking too long, the datastore limits the number of index entries that a single entity can have. The limit is large, and most applications will not notice. However, there are some circumstances where you might encounter the limit. For example, an entity with very many single-value properties can exceed the index entry limit.

Properties with multiple values store each value as a separate entry in an index. An entity with a single property with very many values can exceed the index entry limit.

Custom indexes that refer to multiple properties with multiple values can get very large with only a few values. To completely record such properties, the index table must include a row for every permutation of the values of every property for the index.

For example, the following index (described in datastore-indexes.xml syntax) includes the x and y properties for entities of the kind MyModel:

<?xml version="1.0" encoding="utf-8"?>
<datastore-indexes>
    <datastore-index kind="MyModel">
        <property name="x" direction="asc" />
        <property name="y" direction="asc" />
    </datastore-index>
</datastore-indexes>

The following code creates an entity with 2 values for the property x and 2 values for the property y:

        MyModel m = new MyModel();

        m.setX(Arrays.asList("one", "two"));
        m.setY(Arrays.asList("three", "four"));

        pm.makePersistent(m);

To accurately represent these values, the index must store 12 property values: 2 each for the built-in indexes on x and y, and 2 for each of the 4 permutations of x and y in the custom index. With many values of multi-valued properties, this can mean an index must store very many index entries for a single entity. You could call an index that refers to multiple properties with multiple values an "exploding index," because it can get very large with just a few values.

If a put() would result in a number of index entries that exceeds the limit, the call will fail with an exception. If you create a new index that would contain a number of index entries that exceeds the limit for any entity when built, queries against the index will fail, and the index will appear in the "Error" state in the Admin Console.

You can avoid exploding indexes by avoiding queries that would require a custom index using a list property. As described above, this includes queries with multiple sort orders, a mix of equality and inequality filters, and ancestor filters.