Querying — peewee 3.17.5 documentation (2024)

This section will cover the basic CRUD operations commonly performed on arelational database:

Model.create(), for executing INSERT queries.
Model.save() and Model.update(), for executing UPDATEqueries.
Model.delete_instance() and Model.delete(), for executingDELETE queries.
Model.select(), for executing SELECT queries.

Note

There is also a large collection of example queries taken from thePostgresql Exercises website. Examples arelisted on the query examples document.

Creating a new record

You can use Model.create() to create a new model instance. This methodaccepts keyword arguments, where the keys correspond to the names of themodel’s fields. A new instance is returned and a row is added to the table.

>>> User.create(username='Charlie')<__main__.User object at 0x2529350>

This will INSERT a new row into the database. The primary key willautomatically be retrieved and stored on the model instance.

Alternatively, you can build up a model instance programmatically and then callsave():

>>> user = User(username='Charlie')>>> user.save() # save() returns the number of rows modified.1>>> user.id1>>> huey = User()>>> huey.username = 'Huey'>>> huey.save()1>>> huey.id2

When a model has a foreign key, you can directly assign a model instance to theforeign key field when creating a new record.

>>> tweet = Tweet.create(user=huey, message='Hello!')

You can also use the value of the related object’s primary key:

>>> tweet = Tweet.create(user=2, message='Hello again!')

If you simply wish to insert data and do not need to create a model instance,you can use Model.insert():

>>> User.insert(username='Mickey').execute()3

After executing the insert query, the primary key of the new row is returned.

Note

There are several ways you can speed up bulk insert operations. Check outthe Bulk inserts recipe section for more information.

Bulk inserts

There are a couple of ways you can load lots of data quickly. The naiveapproach is to simply call Model.create() in a loop:

data_source = [ {'field1': 'val1-1', 'field2': 'val1-2'}, {'field1': 'val2-1', 'field2': 'val2-2'}, # ...]for data_dict in data_source: MyModel.create(**data_dict)

The above approach is slow for a couple of reasons:

If you are not wrapping the loop in a transaction then each call tocreate() happens in its own transaction. That is going to bereally slow!
There is a decent amount of Python logic getting in your way, and eachInsertQuery must be generated and parsed into SQL.
That’s a lot of data (in terms of raw bytes of SQL) you are sending to yourdatabase to parse.
We are retrieving the last insert id, which causes an additional query tobe executed in some cases.

You can get a significant speedup by simply wrapping this in a transaction withatomic().

# This is much faster.with db.atomic(): for data_dict in data_source: MyModel.create(**data_dict)

The above code still suffers from points 2, 3 and 4. We can get another bigboost by using insert_many(). This method accepts a list oftuples or dictionaries, and inserts multiple rows in a single query:

data_source = [ {'field1': 'val1-1', 'field2': 'val1-2'}, {'field1': 'val2-1', 'field2': 'val2-2'}, # ...]# Fastest way to INSERT multiple rows.MyModel.insert_many(data_source).execute()

The insert_many() method also accepts a list of row-tuples,provided you also specify the corresponding fields:

# We can INSERT tuples as well...data = [('val1-1', 'val1-2'), ('val2-1', 'val2-2'), ('val3-1', 'val3-2')]# But we need to indicate which fields the values correspond to.MyModel.insert_many(data, fields=[MyModel.field1, MyModel.field2]).execute()

It is also a good practice to wrap the bulk insert in a transaction:

# You can, of course, wrap this in a transaction as well:with db.atomic(): MyModel.insert_many(data, fields=fields).execute()

Note

SQLite users should be aware of some caveats when using bulk inserts.Specifically, your SQLite3 version must be 3.7.11.0 or newer to takeadvantage of the bulk insert API. Additionally, by default SQLite limitsthe number of bound variables in a SQL query to 999 for SQLite versionsprior to 3.32.0 (2020-05-22) and 32766 for SQLite versions after 3.32.0.

Inserting rows in batches

Depending on the number of rows in your data source, you may need to break itup into chunks. SQLite in particular typically has a limit of 999 or 32766variables-per-query (batch size would then be 999 // row length or 32766 // row length).

You can write a loop to batch your data into chunks (in which case it isstrongly recommended you use a transaction):

# Insert rows 100 at a time.with db.atomic(): for idx in range(0, len(data_source), 100): MyModel.insert_many(data_source[idx:idx+100]).execute()

Peewee comes with a chunked() helper function which you can use forefficiently chunking a generic iterable into a series of batch-sizediterables:

from peewee import chunked# Insert rows 100 at a time.with db.atomic(): for batch in chunked(data_source, 100): MyModel.insert_many(batch).execute()

Alternatives

The Model.bulk_create() method behaves much likeModel.insert_many(), but instead it accepts a list of unsaved modelinstances to insert, and it optionally accepts a batch-size parameter. To usethe bulk_create() API:

# Read list of usernames from a file, for example.with open('user_list.txt') as fh: # Create a list of unsaved User instances. users = [User(username=line.strip()) for line in fh.readlines()]# Wrap the operation in a transaction and batch INSERT the users# 100 at a time.with db.atomic(): User.bulk_create(users, batch_size=100)

Note

If you are using Postgresql (which supports the RETURNING clause), thenthe previously-unsaved model instances will have their new primary keyvalues automatically populated.

In addition, Peewee also offers Model.bulk_update(), which canefficiently update one or more columns on a list of models. For example:

# First, create 3 users with usernames u1, u2, u3.u1, u2, u3 = [User.create(username='u%s' % i) for i in (1, 2, 3)]# Now we'll modify the user instances.u1.username = 'u1-x'u2.username = 'u2-y'u3.username = 'u3-z'# Update all three users with a single UPDATE query.User.bulk_update([u1, u2, u3], fields=[User.username])

This will result in executing the following SQL:

UPDATE "users" SET "username" = CASE "users"."id" WHEN 1 THEN "u1-x" WHEN 2 THEN "u2-y" WHEN 3 THEN "u3-z" ENDWHERE "users"."id" IN (1, 2, 3);

Note

For large lists of objects, you should specify a reasonable batch_size andwrap the call to bulk_update() withDatabase.atomic():

with database.atomic(): User.bulk_update(list_of_users, fields=['username'], batch_size=50)

Warning

Model.bulk_update() may not be the most efficient method forupdating large numbers of records. This functionality is implemented suchthat we create a “mapping” of primary key to corresponding field values forall rows being updated using a SQL CASE statement.

Alternatively, you can use the Database.batch_commit() helper toprocess chunks of rows inside batch-sized transactions. This method alsoprovides a workaround for databases besides Postgresql, when the primary-key ofthe newly-created rows must be obtained.

# List of row data to insert.row_data = [{'username': 'u1'}, {'username': 'u2'}, ...]# Assume there are 789 items in row_data. The following code will result in# 8 total transactions (7x100 rows + 1x89 rows).for row in db.batch_commit(row_data, 100): User.create(**row)

Bulk-loading from another table

If the data you would like to bulk load is stored in another table, you canalso create INSERT queries whose source is a SELECT query. Use theModel.insert_from() method:

res = (TweetArchive .insert_from( Tweet.select(Tweet.user, Tweet.message), fields=[TweetArchive.user, TweetArchive.message]) .execute())

The above query is equivalent to the following SQL:

INSERT INTO "tweet_archive" ("user_id", "message")SELECT "user_id", "message" FROM "tweet";

Updating existing records

Once a model instance has a primary key, any subsequent call tosave() will result in an UPDATE rather than another INSERT.The model’s primary key will not change:

>>> user.save() # save() returns the number of rows modified.1>>> user.id1>>> user.save()>>> user.id1>>> huey.save()1>>> huey.id2

If you want to update multiple records, issue an UPDATE query. The followingexample will update all Tweet objects, marking them as published, if theywere created before today. Model.update() accepts keyword argumentswhere the keys correspond to the model’s field names:

>>> today = datetime.today()>>> query = Tweet.update(is_published=True).where(Tweet.creation_date < today)>>> query.execute() # Returns the number of rows that were updated.4

For more information, see the documentation on Model.update(),Update and Model.bulk_update().

Note

If you would like more information on performing atomic updates (such asincrementing the value of a column), check out the atomic updaterecipes.

Atomic updates

Peewee allows you to perform atomic updates. Let’s suppose we need to updatesome counters. The naive approach would be to write something like this:

>>> for stat in Stat.select().where(Stat.url == request.url):...  stat.counter += 1...  stat.save()

Do not do this! Not only is this slow, but it is also vulnerable to raceconditions if multiple processes are updating the counter at the same time.

Instead, you can update the counters atomically using update():

>>> query = Stat.update(counter=Stat.counter + 1).where(Stat.url == request.url)>>> query.execute()

You can make these update statements as complex as you like. Let’s give all ouremployees a bonus equal to their previous bonus plus 10% of their salary:

>>> query = Employee.update(bonus=(Employee.bonus + (Employee.salary * .1)))>>> query.execute() # Give everyone a bonus!

We can even use a subquery to update the value of a column. Suppose we had adenormalized column on the User model that stored the number of tweets auser had made, and we updated this value periodically. Here is how you mightwrite such a query:

>>> subquery = Tweet.select(fn.COUNT(Tweet.id)).where(Tweet.user == User.id)>>> update = User.update(num_tweets=subquery)>>> update.execute()

Upsert

Peewee provides support for varying types of upsert functionality. With SQLiteprior to 3.24.0 and MySQL, Peewee offers the replace(), whichallows you to insert a record or, in the event of a constraint violation,replace the existing record. For Sqlite 3.24+ and Postgres, peewee provides fullsupport for ON CONFLICT queries.

Example of using replace() and on_conflict_replace():

class User(Model): username = TextField(unique=True) last_login = DateTimeField(null=True)# Insert or update the user. The "last_login" value will be updated# regardless of whether the user existed previously.user_id = (User .replace(username='the-user', last_login=datetime.now()) .execute())# This query is equivalent:user_id = (User .insert(username='the-user', last_login=datetime.now()) .on_conflict_replace() .execute())

Note

In addition to replace, SQLite, MySQL and Postgresql provide an ignoreaction (see: on_conflict_ignore()) if you simply wish toinsert and ignore any potential constraint violation.

MySQL supports upsert via the ON DUPLICATE KEY UPDATE clause. Forexample:

Deleting records

To delete a single model instance, you can use theModel.delete_instance() shortcut. delete_instance()will delete the given model instance and can optionally delete any dependentobjects recursively (by specifying recursive=True).

>>> user = User.get(User.id == 1)>>> user.delete_instance() # Returns the number of rows deleted.1>>> User.get(User.id == 1)UserDoesNotExist: instance matching query does not exist:SQL: SELECT t1."id", t1."username" FROM "user" AS t1 WHERE t1."id" = ?PARAMS: [1]

To delete an arbitrary set of rows, you can issue a DELETE query. Thefollowing will delete all Tweet objects that are over one year old:

>>> query = Tweet.delete().where(Tweet.creation_date < one_year_ago)>>> query.execute() # Returns the number of rows deleted.7

For more information, see the documentation on:

Model.delete_instance()
Model.delete()
DeleteQuery

Selecting a single record

You can use the Model.get() method to retrieve a single instancematching the given query. For primary-key lookups, you can also use theshortcut method Model.get_by_id().

This method is a shortcut that calls Model.select() with the givenquery, but limits the result set to a single row. Additionally, if no modelmatches the given query, a DoesNotExist exception will be raised.

>>> User.get(User.id == 1)<__main__.User object at 0x25294d0>>>> User.get_by_id(1) # Same as above.<__main__.User object at 0x252df10>>>> User[1] # Also same as above.<__main__.User object at 0x252dd10>>>> User.get(User.id == 1).usernameu'Charlie'>>> User.get(User.username == 'Charlie')<__main__.User object at 0x2529410>>>> User.get(User.username == 'nobody')UserDoesNotExist: instance matching query does not exist:SQL: SELECT t1."id", t1."username" FROM "user" AS t1 WHERE t1."username" = ?PARAMS: ['nobody']

For more advanced operations, you can use SelectBase.get(). Thefollowing query retrieves the latest tweet from the user named charlie:

>>> (Tweet...  .select()...  .join(User)...  .where(User.username == 'charlie')...  .order_by(Tweet.created_date.desc())...  .get())<__main__.Tweet object at 0x2623410>

For more information, see the documentation on:

Model.get()
Model.get_by_id()
Model.get_or_none() - if no matching row is found, return None.
Model.select()
SelectBase.get()
SelectBase.first() - return first record of result-set or None.

Create or get

Peewee has one helper method for performing “get/create” type operations:Model.get_or_create(), which first attempts to retrieve the matchingrow. Failing that, a new row will be created.

For “create or get” type logic, typically one would rely on a uniqueconstraint or primary key to prevent the creation of duplicate objects. As anexample, let’s say we wish to implement registering a new user account usingthe example User model. The User model has a uniqueconstraint on the username field, so we will rely on the database’s integrityguarantees to ensure we don’t end up with duplicate usernames:

try: with db.atomic(): return User.create(username=username)except peewee.IntegrityError: # `username` is a unique column, so this username already exists, # making it safe to call .get(). return User.get(User.username == username)

You can easily encapsulate this type of logic as a classmethod on your ownModel classes.

The above example first attempts at creation, then falls back to retrieval,relying on the database to enforce a unique constraint. If you prefer toattempt to retrieve the record first, you can useget_or_create(). This method is implemented along the samelines as the Django function of the same name. You can use the Django-stylekeyword argument filters to specify your WHERE conditions. The functionreturns a 2-tuple containing the instance and a boolean value indicating if theobject was created.

Here is how you might implement user account creation usingget_or_create():

user, created = User.get_or_create(username=username)

Suppose we have a different model Person and would like to get or create aperson object. The only conditions we care about when retrieving the Personare their first and last names, but if we end up needing to create a newrecord, we will also specify their date-of-birth and favorite color:

person, created = Person.get_or_create( first_name=first_name, last_name=last_name, defaults={'dob': dob, 'favorite_color': 'green'})

Any keyword argument passed to get_or_create() will be used inthe get() portion of the logic, except for the defaults dictionary,which will be used to populate values on newly-created instances.

For more details read the documentation for Model.get_or_create().

Selecting multiple records

We can use Model.select() to retrieve rows from the table. When youconstruct a SELECT query, the database will return any rows that correspondto your query. Peewee allows you to iterate over these rows, as well as useindexing and slicing operations:

>>> query = User.select()>>> [user.username for user in query]['Charlie', 'Huey', 'Peewee']>>> query[1]<__main__.User at 0x7f83e80f5550>>>> query[1].username'Huey'>>> query[:2][<__main__.User at 0x7f83e80f53a8>, <__main__.User at 0x7f83e80f5550>]

Select queries are smart, in that you can iterate, index and slicethe query multiple times but the query is only executed once.

In the following example, we will simply call select() anditerate over the return value, which is an instance of Select.This will return all the rows in the User table:

>>> for user in User.select():...  print(user.username)...CharlieHueyPeewee

Note

Subsequent iterations of the same query will not hit the database as theresults are cached. To disable this behavior (to reduce memory usage), callSelect.iterator() when iterating.

When iterating over a model that contains a foreign key, be careful with theway you access values on related models. Accidentally resolving a foreign keyor iterating over a back-reference can cause N+1 query behavior.

When you create a foreign key, such as Tweet.user, you can use thebackref to create a back-reference (User.tweets). Back-referencesare exposed as Select instances:

>>> tweet = Tweet.get()>>> tweet.user # Accessing a foreign key returns the related model.<tw.User at 0x7f3ceb017f50>>>> user = User.get()>>> user.tweets # Accessing a back-reference returns a query.<peewee.ModelSelect at 0x7f73db3bafd0>

You can iterate over the user.tweets back-reference just like any otherSelect:

>>> for tweet in user.tweets:...  print(tweet.message)...hello worldthis is funlook at this picture of my food

In addition to returning model instances, Select queries can returndictionaries, tuples and namedtuples. Depending on your use-case, you may findit easier to work with rows as dictionaries, for example:

>>> query = User.select().dicts()>>> for row in query:...  print(row){'id': 1, 'username': 'Charlie'}{'id': 2, 'username': 'Huey'}{'id': 3, 'username': 'Peewee'}

See namedtuples(), tuples(),dicts() for more information.

Iterating over large result-sets

By default peewee will cache the rows returned when iterating over aSelect query. This is an optimization to allow multiple iterationsas well as indexing and slicing without causing additional queries. Thiscaching can be problematic, however, when you plan to iterate over a largenumber of rows.

To reduce the amount of memory used by peewee when iterating over a query, usethe iterator() method. This method allows you to iteratewithout caching each model returned, using much less memory when iterating overlarge result sets.

# Let's assume we've got 10 million stat objects to dump to a csv file.stats = Stat.select()# Our imaginary serializer classserializer = CSVSerializer()# Loop over all the stats and serialize.for stat in stats.iterator(): serializer.serialize_object(stat)

For simple queries you can see further speed improvements by returning rows asdictionaries, namedtuples or tuples. The following methods can be used on anySelect query to change the result row type:

dicts()
namedtuples()
tuples()

Don’t forget to append the iterator() method call to alsoreduce memory consumption. For example, the above code might look like:

# Let's assume we've got 10 million stat objects to dump to a csv file.stats = Stat.select()# Our imaginary serializer classserializer = CSVSerializer()# Loop over all the stats (rendered as tuples, without caching) and serialize.for stat_tuple in stats.tuples().iterator(): serializer.serialize_tuple(stat_tuple)

When iterating over a large number of rows that contain columns from multipletables, peewee will reconstruct the model graph for each row returned. Thisoperation can be slow for complex graphs. For example, if we were selecting alist of tweets along with the username and avatar of the tweet’s author, Peeweewould have to create two objects for each row (a tweet and a user). In additionto the above row-types, there is a fourth method objects()which will return the rows as model instances, but will not attempt to resolvethe model graph.

For example:

query = (Tweet .select(Tweet, User) # Select tweet and user data. .join(User))# Note that the user columns are stored in a separate User instance# accessible at tweet.user:for tweet in query: print(tweet.user.username, tweet.content)# Using ".objects()" will not create the tweet.user object and assigns all# user attributes to the tweet instance:for tweet in query.objects(): print(tweet.username, tweet.content)

For maximum performance, you can execute queries and then iterate over theresults using the underlying database cursor. Database.execute()accepts a query object, executes the query, and returns a DB-API 2.0 Cursorobject. The cursor will return the raw row-tuples:

query = Tweet.select(Tweet.content, User.username).join(User)cursor = database.execute(query)for (content, username) in cursor: print(username, '->', content)

Filtering records

You can filter for particular records using normal python operators. Peeweesupports a wide variety of query operators.

>>> user = User.get(User.username == 'Charlie')>>> for tweet in Tweet.select().where(Tweet.user == user, Tweet.is_published == True):...  print(tweet.user.username, '->', tweet.message)...Charlie -> hello worldCharlie -> this is fun>>> for tweet in Tweet.select().where(Tweet.created_date < datetime.datetime(2011, 1, 1)):...  print(tweet.message, tweet.created_date)...Really old tweet 2010-01-01 00:00:00

You can also filter across joins:

>>> for tweet in Tweet.select().join(User).where(User.username == 'Charlie'):...  print(tweet.message)hello worldthis is funlook at this picture of my food

If you want to express a complex query, use parentheses and python’s bitwiseor and and operators:

>>> Tweet.select().join(User).where(...  (User.username == 'Charlie') |...  (User.username == 'Peewee Herman'))

Note

Note that Peewee uses bitwise operators (& and |) rather thanlogical operators (and and or). The reason for this is that Pythoncoerces the return value of logical operations to a boolean value. This isalso the reason why “IN” queries must be expressed using .in_() ratherthan the in operator.

Check out the table of query operations to see whattypes of queries are possible.

Note

A lot of fun things can go in the where clause of a query, such as:

A field expression, e.g. User.username == 'Charlie'
A function expression, e.g. fn.Lower(fn.Substr(User.username, 1, 1)) == 'a'
A comparison of one column to another, e.g. Employee.salary < (Employee.tenure * 1000) + 40000

You can also nest queries, for example tweets by users whose usernamestarts with “a”:

# get users whose username starts with "a"a_users = User.select().where(fn.Lower(fn.Substr(User.username, 1, 1)) == 'a')# the ".in_()" method signifies an "IN" querya_user_tweets = Tweet.select().where(Tweet.user.in_(a_users))

More query examples

Note

For a wide range of example queries, see the Query Examplesdocument, which shows how to implements queries from the PostgreSQL Exerciseswebsite.

Get active users:

User.select().where(User.active == True)

Get users who are either staff or superusers:

User.select().where( (User.is_staff == True) | (User.is_superuser == True))

Get tweets by user named “charlie”:

Tweet.select().join(User).where(User.username == 'charlie')

Get tweets by staff or superusers (assumes FK relationship):

Tweet.select().join(User).where( (User.is_staff == True) | (User.is_superuser == True))

Get tweets by staff or superusers using a subquery:

staff_super = User.select(User.id).where( (User.is_staff == True) | (User.is_superuser == True))Tweet.select().where(Tweet.user.in_(staff_super))

Sorting records

To return rows in order, use the order_by() method:

>>> for t in Tweet.select().order_by(Tweet.created_date):...  print(t.pub_date)...2010-01-01 00:00:002011-06-07 14:08:482011-06-07 14:12:57>>> for t in Tweet.select().order_by(Tweet.created_date.desc()):...  print(t.pub_date)...2011-06-07 14:12:572011-06-07 14:08:482010-01-01 00:00:00

You can also use + and - prefix operators to indicate ordering:

# The following queries are equivalent:Tweet.select().order_by(Tweet.created_date.desc())Tweet.select().order_by(-Tweet.created_date) # Note the "-" prefix.# Similarly you can use "+" to indicate ascending order, though ascending# is the default when no ordering is otherwise specified.User.select().order_by(+User.username)

You can also order across joins. Assuming you want to order tweets by theusername of the author, then by created_date:

query = (Tweet .select() .join(User) .order_by(User.username, Tweet.created_date.desc()))

SELECT t1."id", t1."user_id", t1."message", t1."is_published", t1."created_date"FROM "tweet" AS t1INNER JOIN "user" AS t2 ON t1."user_id" = t2."id"ORDER BY t2."username", t1."created_date" DESC

When sorting on a calculated value, you can either include the necessary SQLexpressions, or reference the alias assigned to the value. Here are twoexamples illustrating these methods:

# Let's start with our base query. We want to get all usernames and the number of# tweets they've made. We wish to sort this list from users with most tweets to# users with fewest tweets.query = (User .select(User.username, fn.COUNT(Tweet.id).alias('num_tweets')) .join(Tweet, JOIN.LEFT_OUTER) .group_by(User.username))

You can order using the same COUNT expression used in the select clause. Inthe example below we are ordering by the COUNT() of tweet ids descending:

query = (User .select(User.username, fn.COUNT(Tweet.id).alias('num_tweets')) .join(Tweet, JOIN.LEFT_OUTER) .group_by(User.username) .order_by(fn.COUNT(Tweet.id).desc()))

Alternatively, you can reference the alias assigned to the calculated value inthe select clause. This method has the benefit of being a bit easier toread. Note that we are not referring to the named alias directly, but arewrapping it using the SQL helper:

query = (User .select(User.username, fn.COUNT(Tweet.id).alias('num_tweets')) .join(Tweet, JOIN.LEFT_OUTER) .group_by(User.username) .order_by(SQL('num_tweets').desc()))

Or, to do things the “peewee” way:

ntweets = fn.COUNT(Tweet.id)query = (User .select(User.username, ntweets.alias('num_tweets')) .join(Tweet, JOIN.LEFT_OUTER) .group_by(User.username) .order_by(ntweets.desc())

Getting random records

Occasionally you may want to pull a random record from the database. You canaccomplish this by ordering by the random or rand function (depending onyour database):

Postgresql and Sqlite use the Random function:

# Pick 5 lucky winners:LotteryNumber.select().order_by(fn.Random()).limit(5)

MySQL uses Rand:

# Pick 5 lucky winners:LotteryNumber.select().order_by(fn.Rand()).limit(5)

Paginating records

The paginate() method makes it easy to grab a page orrecords. paginate() takes two parameters,page_number, and items_per_page.

Attention

Page numbers are 1-based, so the first page of results will be page 1.

>>> for tweet in Tweet.select().order_by(Tweet.id).paginate(2, 10):...  print(tweet.message)...tweet 10tweet 11tweet 12tweet 13tweet 14tweet 15tweet 16tweet 17tweet 18tweet 19

If you would like more granular control, you can always uselimit() and offset().

Counting records

You can count the number of rows in any select query:

>>> Tweet.select().count()100>>> Tweet.select().where(Tweet.id > 50).count()50

Peewee will wrap your query in an outer query that performs a count, whichresults in SQL like:

SELECT COUNT(1) FROM ( ... your query ... );

Aggregating records

Suppose you have some users and want to get a list of them along with the countof tweets in each.

query = (User .select(User, fn.Count(Tweet.id).alias('count')) .join(Tweet, JOIN.LEFT_OUTER) .group_by(User))

The resulting query will return User objects with all their normal attributesplus an additional attribute count which will contain the count of tweets foreach user. We use a left outer join to include users who have no tweets.

Let’s assume you have a tagging application and want to find tags that have acertain number of related objects. For this example we’ll use some differentmodels in a many-to-many configuration:

class Photo(Model): image = CharField()class Tag(Model): name = CharField()class PhotoTag(Model): photo = ForeignKeyField(Photo) tag = ForeignKeyField(Tag)

Now say we want to find tags that have at least 5 photos associated with them:

query = (Tag .select() .join(PhotoTag) .join(Photo) .group_by(Tag) .having(fn.Count(Photo.id) > 5))

This query is equivalent to the following SQL:

SELECT t1."id", t1."name"FROM "tag" AS t1INNER JOIN "phototag" AS t2 ON t1."id" = t2."tag_id"INNER JOIN "photo" AS t3 ON t2."photo_id" = t3."id"GROUP BY t1."id", t1."name"HAVING Count(t3."id") > 5

Suppose we want to grab the associated count and store it on the tag:

query = (Tag .select(Tag, fn.Count(Photo.id).alias('count')) .join(PhotoTag) .join(Photo) .group_by(Tag) .having(fn.Count(Photo.id) > 5))

Retrieving Scalar Values

You can retrieve scalar values by calling Query.scalar(). Forinstance:

>>> PageView.select(fn.Count(fn.Distinct(PageView.url))).scalar()100

You can retrieve multiple scalar values by passing as_tuple=True:

>>> Employee.select(...  fn.Min(Employee.salary), fn.Max(Employee.salary)... ).scalar(as_tuple=True)(30000, 50000)

Window functions

A Window function refers to an aggregate function that operates ona sliding window of data that is being processed as part of a SELECT query.Window functions make it possible to do things like:

Perform aggregations against subsets of a result-set.
Calculate a running total.
Rank results.
Compare a row value to a value in the preceding (or succeeding!) row(s).

peewee comes with support for SQL window functions, which can be created bycalling Function.over() and passing in your partitioning or orderingparameters.

For the following examples, we’ll use the following model and sample data:

class Sample(Model): counter = IntegerField() value = FloatField()data = [(1, 10), (1, 20), (2, 1), (2, 3), (3, 100)]Sample.insert_many(data, fields=[Sample.counter, Sample.value]).execute()

Our sample table now contains:

id	counter	value
1	1	10.0
2	1	20.0
3	2	1.0
4	2	3.0
5	3	100.0

Ordered Windows

Let’s calculate a running sum of the value field. In order for it to be a“running” sum, we need it to be ordered, so we’ll order with respect to theSample’s id field:

query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).over(order_by=[Sample.id]).alias('total'))for sample in query: print(sample.counter, sample.value, sample.total)# 1 10. 10.# 1 20. 30.# 2 1. 31.# 2 3. 34.# 3 100 134.

For another example, we’ll calculate the difference between the current valueand the previous value, when ordered by the id:

difference = Sample.value - fn.LAG(Sample.value, 1).over(order_by=[Sample.id])query = Sample.select( Sample.counter, Sample.value, difference.alias('diff'))for sample in query: print(sample.counter, sample.value, sample.diff)# 1 10. NULL# 1 20. 10. -- (20 - 10)# 2 1. -19. -- (1 - 20)# 2 3. 2. -- (3 - 1)# 3 100 97. -- (100 - 3)

Partitioned Windows

Let’s calculate the average value for each distinct “counter” value. Noticethat there are three possible values for the counter field (1, 2, and 3).We can do this by calculating the AVG() of the value column over awindow that is partitioned depending on the counter field:

query = Sample.select( Sample.counter, Sample.value, fn.AVG(Sample.value).over(partition_by=[Sample.counter]).alias('cavg'))for sample in query: print(sample.counter, sample.value, sample.cavg)# 1 10. 15.# 1 20. 15.# 2 1. 2.# 2 3. 2.# 3 100 100.

We can use ordering within partitions by specifying both the order_by andpartition_by parameters. For an example, let’s rank the samples by valuewithin each distinct counter group.

query = Sample.select( Sample.counter, Sample.value, fn.RANK().over( order_by=[Sample.value], partition_by=[Sample.counter]).alias('rank'))for sample in query: print(sample.counter, sample.value, sample.rank)# 1 10. 1# 1 20. 2# 2 1. 1# 2 3. 2# 3 100 1

Bounded windows

By default, window functions are evaluated using an unbounded preceding startfor the window, and the current row as the end. We can change the bounds ofthe window our aggregate functions operate on by specifying a start and/orend in the call to Function.over(). Additionally, Peewee comeswith helper-methods on the Window object for generating theappropriate boundary references:

Window.CURRENT_ROW - attribute that references the current row.
Window.preceding() - specify number of row(s) preceding, or omitnumber to indicate all preceding rows.
Window.following() - specify number of row(s) following, or omitnumber to indicate all following rows.

To examine how boundaries work, we’ll calculate a running total of thevalue column, ordered with respect to id, but we’ll only look therunning total of the current row and it’s two preceding rows:

query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).over( order_by=[Sample.id], start=Window.preceding(2), end=Window.CURRENT_ROW).alias('rsum'))for sample in query: print(sample.counter, sample.value, sample.rsum)# 1 10. 10.# 1 20. 30. -- (20 + 10)# 2 1. 31. -- (1 + 20 + 10)# 2 3. 24. -- (3 + 1 + 20)# 3 100 104. -- (100 + 3 + 1)

Note

Technically we did not need to specify the end=Window.CURRENT becausethat is the default. It was shown in the example for demonstration.

Let’s look at another example. In this example we will calculate the “opposite”of a running total, in which the total sum of all values is decreased by thevalue of the samples, ordered by id. To accomplish this, we’ll calculatethe sum from the current row to the last row.

query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).over( order_by=[Sample.id], start=Window.CURRENT_ROW, end=Window.following()).alias('rsum'))# 1 10. 134. -- (10 + 20 + 1 + 3 + 100)# 1 20. 124. -- (20 + 1 + 3 + 100)# 2 1. 104. -- (1 + 3 + 100)# 2 3. 103. -- (3 + 100)# 3 100 100. -- (100)

Filtered Aggregates

Aggregate functions may also support filter functions (Postgres and Sqlite3.25+), which get translated into a FILTER (WHERE...) clause. Filterexpressions are added to an aggregate function with theFunction.filter() method.

For an example, we will calculate the running sum of the value field withrespect to the id, but we will filter-out any samples whose counter=2.

query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).filter(Sample.counter != 2).over( order_by=[Sample.id]).alias('csum'))for sample in query: print(sample.counter, sample.value, sample.csum)# 1 10. 10.# 1 20. 30.# 2 1. 30.# 2 3. 30.# 3 100 130.

Note

The call to filter() must precede the call toover().

Reusing Window Definitions

If you intend to use the same window definition for multiple aggregates, youcan create a Window object. The Window object takes thesame parameters as Function.over(), and can be passed to theover() method in-place of the individual parameters.

Here we’ll declare a single window, ordered with respect to the sample id,and call several window functions using that window definition:

win = Window(order_by=[Sample.id])query = Sample.select( Sample.counter, Sample.value, fn.LEAD(Sample.value).over(win), fn.LAG(Sample.value).over(win), fn.SUM(Sample.value).over(win)).window(win) # Include our window definition in query.for row in query.tuples(): print(row)# counter value lead() lag() sum()# 1 10. 20. NULL 10.# 1 20. 1. 10. 30.# 2 1. 3. 20. 31.# 2 3. 100. 1. 34.# 3 100. NULL 3. 134.

Multiple window definitions

In the previous example, we saw how to declare a Window definitionand re-use it for multiple different aggregations. You can include as manywindow definitions as you need in your queries, but it is necessary to ensureeach window has a unique alias:

w1 = Window(order_by=[Sample.id]).alias('w1')w2 = Window(partition_by=[Sample.counter]).alias('w2')query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).over(w1).alias('rsum'), # Running total. fn.AVG(Sample.value).over(w2).alias('cavg') # Avg per category.).window(w1, w2) # Include our window definitions.for sample in query: print(sample.counter, sample.value, sample.rsum, sample.cavg)# counter value rsum cavg# 1 10. 10. 15.# 1 20. 30. 15.# 2 1. 31. 2.# 2 3. 34. 2.# 3 100 134. 100.

Similarly, if you have multiple window definitions that share similardefinitions, it is possible to extend a previously-defined window definition.For example, here we will be partitioning the data-set by the counter value, sowe’ll be doing our aggregations with respect to the counter. Then we’ll definea second window that extends this partitioning, and adds an ordering clause:

w1 = Window(partition_by=[Sample.counter]).alias('w1')# By extending w1, this window definition will also be partitioned# by "counter".w2 = Window(extends=w1, order_by=[Sample.value.desc()]).alias('w2')query = (Sample .select(Sample.counter, Sample.value, fn.SUM(Sample.value).over(w1).alias('group_sum'), fn.RANK().over(w2).alias('revrank')) .window(w1, w2) .order_by(Sample.id))for sample in query: print(sample.counter, sample.value, sample.group_sum, sample.revrank)# counter value group_sum revrank# 1 10. 30. 2# 1 20. 30. 1# 2 1. 4. 2# 2 3. 4. 1# 3 100. 100. 1

Frame types: RANGE vs ROWS vs GROUPS

Depending on the frame type, the database will process ordered groupsdifferently. Let’s create two additional Sample rows to visualize thedifference:

>>> Sample.create(counter=1, value=20.)<Sample 6>>>> Sample.create(counter=2, value=1.)<Sample 7>

Our table now contains:

id	counter	value
1	1	10.0
2	1	20.0
3	2	1.0
4	2	3.0
5	3	100.0
6	1	20.0
7	2	1.0

Let’s examine the difference by calculating a “running sum” of the samples,ordered with respect to the counter and value fields. To specify theframe type, we can use either:

Window.RANGE
Window.ROWS
Window.GROUPS

The behavior of RANGE, when there are logical duplicates,may lead to unexpected results:

query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).over( order_by=[Sample.counter, Sample.value], frame_type=Window.RANGE).alias('rsum'))for sample in query.order_by(Sample.counter, Sample.value): print(sample.counter, sample.value, sample.rsum)# counter value rsum# 1 10. 10.# 1 20. 50.# 1 20. 50.# 2 1. 52.# 2 1. 52.# 2 3. 55.# 3 100 155.

With the inclusion of the new rows we now have some rows that have duplicatecategory and value values. The RANGE frame typecauses these duplicates to be evaluated together rather than separately.

The more expected result can be achieved by using ROWS asthe frame-type:

query = Sample.select( Sample.counter, Sample.value, fn.SUM(Sample.value).over( order_by=[Sample.counter, Sample.value], frame_type=Window.ROWS).alias('rsum'))for sample in query.order_by(Sample.counter, Sample.value): print(sample.counter, sample.value, sample.rsum)# counter value rsum# 1 10. 10.# 1 20. 30.# 1 20. 50.# 2 1. 51.# 2 1. 52.# 2 3. 55.# 3 100 155.

Peewee uses these rules for determining what frame-type to use:

If the user specifies a frame_type, that frame type will be used.
If start and/or end boundaries are specified Peewee will default tousing ROWS.
If the user did not specify frame type or start/end boundaries, Peewee willuse the database default, which is RANGE.

The Window.GROUPS frame type looks at the window range specificationin terms of groups of rows, based on the ordering term(s). Using GROUPS, wecan define the frame so it covers distinct groupings of rows. Let’s look at anexample:

query = (Sample .select(Sample.counter, Sample.value, fn.SUM(Sample.value).over( order_by=[Sample.counter, Sample.value], frame_type=Window.GROUPS, start=Window.preceding(1)).alias('gsum')) .order_by(Sample.counter, Sample.value))for sample in query: print(sample.counter, sample.value, sample.gsum)# counter value gsum# 1 10 10# 1 20 50# 1 20 50 (10) + (20+0)# 2 1 42# 2 1 42 (20+20) + (1+1)# 2 3 5 (1+1) + 3# 3 100 103 (3) + 100

As you can hopefully infer, the window is grouped by its ordering term, whichis (counter, value). We are looking at a window that extends between oneprevious group and the current group.

Note

For information about the window function APIs, see:

Function.over()
Function.filter()
Window

For general information on window functions, read the postgres window functions tutorial

Additionally, the postgres docsand the sqlite docscontain a lot of good information.

Retrieving row tuples / dictionaries / namedtuples

Sometimes you do not need the overhead of creating model instances and simplywant to iterate over the row data without needing all the APIs providedModel. To do this, use:

dicts()
namedtuples()
tuples()
objects() – accepts an arbitrary constructor functionwhich is called with the row tuple.

stats = (Stat .select(Stat.url, fn.Count(Stat.url)) .group_by(Stat.url) .tuples())# iterate over a list of 2-tuples containing the url and countfor stat_url, stat_count in stats: print(stat_url, stat_count)

Similarly, you can return the rows from the cursor as dictionaries usingdicts():

stats = (Stat .select(Stat.url, fn.Count(Stat.url).alias('ct')) .group_by(Stat.url) .dicts())# iterate over a list of 2-tuples containing the url and countfor stat in stats: print(stat['url'], stat['ct'])

Returning Clause

PostgresqlDatabase supports a RETURNING clause on UPDATE,INSERT and DELETE queries. Specifying a RETURNING clause allows youto iterate over the rows accessed by the query.

By default, the return values upon execution of the different queries are:

INSERT - auto-incrementing primary key value of the newly-inserted row.When not using an auto-incrementing primary key, Postgres will return the newrow’s primary key, but SQLite and MySQL will not.
UPDATE - number of rows modified
DELETE - number of rows deleted

When a returning clause is used the return value upon executing a query will bean iterable cursor object.

Postgresql allows, via the RETURNING clause, to return data from the rowsinserted or modified by a query.

For example, let’s say you have an Update that deactivates alluser accounts whose registration has expired. After deactivating them, you wantto send each user an email letting them know their account was deactivated.Rather than writing two queries, a SELECT and an UPDATE, you can dothis in a single UPDATE query with a RETURNING clause:

query = (User .update(is_active=False) .where(User.registration_expired == True) .returning(User))# Send an email to every user that was deactivated.for deactivate_user in query.execute(): send_deactivation_email(deactivated_user.email)

The RETURNING clause is also available on Insert andDelete. When used with INSERT, the newly-created rows will bereturned. When used with DELETE, the deleted rows will be returned.

The only limitation of the RETURNING clause is that it can only consist ofcolumns from tables listed in the query’s FROM clause. To select allcolumns from a particular table, you can simply pass in the Modelclass.

As another example, let’s add a user and set their creation-date to theserver-generated current timestamp. We’ll create and retrieve the new user’sID, Email and the creation timestamp in a single query:

query = (User .insert(email='foo@bar.com', created=fn.now()) .returning(User)) # Shorthand for all columns on User.# When using RETURNING, execute() returns a cursor.cursor = query.execute()# Get the user object we just inserted and log the data:user = cursor[0]logger.info('Created user %s (id=%s) at %s', user.email, user.id, user.created)

By default the cursor will return Model instances, but you canspecify a different row type:

data = [{'name': 'charlie'}, {'name': 'huey'}, {'name': 'mickey'}]query = (User .insert_many(data) .returning(User.id, User.username) .dicts())for new_user in query.execute(): print('Added user "%s", id=%s' % (new_user['username'], new_user['id']))

Just as with Select queries, you can specify various result row types.

Common Table Expressions

Peewee supports the inclusion of common table expressions (CTEs) in all typesof queries. CTEs may be useful for:

Factoring out a common subquery.
Grouping or filtering by a column derived in the CTE’s result set.
Writing recursive queries.

To declare a Select query for use as a CTE, usecte() method, which wraps the query in a CTEobject. To indicate that a CTE should be included as part of aquery, use the Query.with_cte() method, passing a list of CTE objects.

Simple Example

For an example, let’s say we have some data points that consist of a key and afloating-point value. Let’s define our model and populate some test data:

class Sample(Model): key = TextField() value = FloatField()data = ( ('a', (1.25, 1.5, 1.75)), ('b', (2.1, 2.3, 2.5, 2.7, 2.9)), ('c', (3.5, 3.5)))# Populate data.for key, values in data: Sample.insert_many([(key, value) for value in values], fields=[Sample.key, Sample.value]).execute()

Let’s use a CTE to calculate, for each distinct key, which values wereabove-average for that key.

# First we'll declare the query that will be used as a CTE. This query# simply determines the average value for each key.cte = (Sample .select(Sample.key, fn.AVG(Sample.value).alias('avg_value')) .group_by(Sample.key) .cte('key_avgs', columns=('key', 'avg_value')))# Now we'll query the sample table, using our CTE to find rows whose value# exceeds the average for the given key. We'll calculate how far above the# average the given sample's value is, as well.query = (Sample .select(Sample.key, Sample.value) .join(cte, on=(Sample.key == cte.c.key)) .where(Sample.value > cte.c.avg_value) .order_by(Sample.value) .with_cte(cte))

We can iterate over the samples returned by the query to see which samples hadabove-average values for their given group:

>>> for sample in query:...  print(sample.key, sample.value)# 'a', 1.75# 'b', 2.7# 'b', 2.9

Complex Example

For a more complete example, let’s consider the following query which usesmultiple CTEs to find per-product sales totals in only the top sales regions.Our model looks like this:

class Order(Model): region = TextField() amount = FloatField() product = TextField() quantity = IntegerField()

Here is how the query might be written in SQL. This example can be found inthe postgresql documentation.

WITH regional_sales AS ( SELECT region, SUM(amount) AS total_sales FROM orders GROUP BY region ), top_regions AS ( SELECT region FROM regional_sales WHERE total_sales > (SELECT SUM(total_sales) / 10 FROM regional_sales) )SELECT region, product, SUM(quantity) AS product_units, SUM(amount) AS product_salesFROM ordersWHERE region IN (SELECT region FROM top_regions)GROUP BY region, product;

With Peewee, we would write:

reg_sales = (Order .select(Order.region, fn.SUM(Order.amount).alias('total_sales')) .group_by(Order.region) .cte('regional_sales'))top_regions = (reg_sales .select(reg_sales.c.region) .where(reg_sales.c.total_sales > ( reg_sales.select(fn.SUM(reg_sales.c.total_sales) / 10))) .cte('top_regions'))query = (Order .select(Order.region, Order.product, fn.SUM(Order.quantity).alias('product_units'), fn.SUM(Order.amount).alias('product_sales')) .where(Order.region.in_(top_regions.select(top_regions.c.region))) .group_by(Order.region, Order.product) .with_cte(reg_sales, top_regions))

Recursive CTEs

Peewee supports recursive CTEs. Recursive CTEs can be useful when, for example,you have a tree data-structure represented by a parent-link foreign key.Suppose, for example, that we have a hierarchy of categories for an onlinebookstore. We wish to generate a table showing all categories and theirabsolute depths, along with the path from the root to the category.

We’ll assume the following model definition, in which each category has aforeign-key to its immediate parent category:

class Category(Model): name = TextField() parent = ForeignKeyField('self', backref='children', null=True)

To list all categories along with their depth and parents, we can use arecursive CTE:

# Define the base case of our recursive CTE. This will be categories that# have a null parent foreign-key.Base = Category.alias()level = Value(1).alias('level')path = Base.name.alias('path')base_case = (Base .select(Base.id, Base.name, Base.parent, level, path) .where(Base.parent.is_null()) .cte('base', recursive=True))# Define the recursive terms.RTerm = Category.alias()rlevel = (base_case.c.level + 1).alias('level')rpath = base_case.c.path.concat('->').concat(RTerm.name).alias('path')recursive = (RTerm .select(RTerm.id, RTerm.name, RTerm.parent, rlevel, rpath) .join(base_case, on=(RTerm.parent == base_case.c.id)))# The recursive CTE is created by taking the base case and UNION ALL with# the recursive term.cte = base_case.union_all(recursive)# We will now query from the CTE to get the categories, their levels, and# their paths.query = (cte .select_from(cte.c.name, cte.c.level, cte.c.path) .order_by(cte.c.path))# We can now iterate over a list of all categories and print their names,# absolute levels, and path from root -> category.for category in query: print(category.name, category.level, category.path)# Example output:# root, 1, root# p1, 2, root->p1# c1-1, 3, root->p1->c1-1# c1-2, 3, root->p1->c1-2# p2, 2, root->p2# c2-1, 3, root->p2->c2-1

Data-Modifying CTE

Peewee supports data-modifying CTE’s.

Example of using a data-modifying CTE to move data from one table to an archivetable, using a single query:

class Event(Model): name = CharField() timestamp = DateTimeField()class Archive(Model): name = CharField() timestamp = DateTimeField()# Move rows older than 24 hours from the Event table to the Archive.cte = (Event .delete() .where(Event.timestamp < (datetime.now() - timedelta(days=1))) .returning(Event) .cte('moved_rows'))# Create a simple SELECT to get the resulting rows from the CTE.src = Select((cte,), (cte.c.id, cte.c.name, cte.c.timestamp))# Insert into the archive table whatever data was returned by the DELETE.res = (Archive .insert_from(src, (Archive.id, Archive.name, Archive.timestamp)) .with_cte(cte) .execute())

The above corresponds to, roughly, the following SQL:

WITH "moved_rows" AS ( DELETE FROM "event" WHERE ("timestamp" < XXXX-XX-XXTXX:XX:XX) RETURNING "id", "name", "timestamp")INSERT INTO "archive" ("id", "name", "timestamp")SELECT "moved_rows"."id", "moved_rows"."name", "moved_rows"."timestamp"FROM "moved_rows";

For additional examples, refer to the tests in models.py and sql.py:

Foreign Keys and Joins

This section has been moved into its own document: Relationships and Joins.