Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 605 Vote(s) - 3.43 Average
  • 1
  • 2
  • 3
  • 4
  • 5
What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?

#21
## ORM "N plus one" Problem

The "N plus one" problem is a common performance issue that can occur when using
Object-Relational Mapping (ORM) frameworks. ORM frameworks are tools used to map
database tables to objects in object-oriented programming languages. This
problem arises when retrieving data from a relational database using ORM in a
specific way.

To understand the "N plus one" problem, let's consider an example scenario where
you have two tables: `Customer` and `Order`. Each customer can have multiple
orders, and there is a one-to-many relationship between the `Customer` and
`Order` tables. In ORM, you define these relationships using object-oriented
concepts such as classes and references.

Now, let's say you want to retrieve all the customers along with their orders.
In ORM, you might use a query like this:

```python
customers = Customer.objects.all()

for customer in customers:
orders = customer.orders.all()
# Do something with the orders
```

In this code, you first retrieve all the customers using
`Customer.objects.all()`. Then, for each customer, you retrieve their orders
using `customer.orders.all()`.

The issue with this approach is that it results in multiple queries being
executed to the database. For example, if you have 100 customers, this code will
execute 101 queries: one to retrieve all the customers and 100 more to retrieve
the orders for each customer (hence the name "N plus one" problem). This can
significantly impact performance, especially when dealing with large datasets.

The "N plus one" problem arises because the ORM framework performs a separate
query for each customer's orders instead of fetching all the necessary data in a
single query. This behavior is often the default in ORM frameworks to avoid
unnecessarily loading all the associated data, which can be a performance
concern in other scenarios.

To mitigate the "N plus one" problem, ORM frameworks usually provide ways to
optimize data retrieval, such as **eager loading** or **explicit joins**. Eager
loading allows you to fetch the required data in a single query, reducing the
number of database round-trips. By specifying the relationships you want to
include, the ORM framework can generate a more efficient query that retrieves
all the necessary data at once.

**As a demonstration of the "N plus one" problem and its solution, the following
shows the actual SQL emitted from an ORM using SQLAlchemy.**

Original ORM query with the N plus one problem (1 query for customers and N for
each customer's order):

```python
with Session(engine) as session:
customers = session.scalars(select(Customer))
for customer in customers:
print(f"> Customer: #{customer.customer_id}")
for order in customer.orders:
print(f"> order #{order.order_id} at {order.order_datetime}")
```

```sql
-- This query gets all customers:
SELECT customer.customer_id, ...
FROM customer

-- The following SQL is executed once for each customer:
SELECT "order".order_id AS order_order_id, ...
FROM "order"
WHERE "order".customer_id = %(param_1)s
```

After specifying **eager loading** (with `selectinload()`), only 2 queries are
required:

```python
with Session(engine) as session:
customers = session.scalars(
select(Customer).options(selectinload(Customer.orders)))
for customer in customers:
print(f"> Customer: #{customer.customer_id}")
for order in customer.orders:
print(f"> order #{order.order_id} at {order.order_datetime}")
```

```sql
SELECT customer.customer_id, ...
FROM customer

-- This loads all the orders you need in one query:
SELECT "order".order_id AS order_order_id, ...
FROM "order"
WHERE "order".customer_id IN (%(primary_keys_1)s, %(primary_keys_2)s, ...)
```

Or, **explicitly join** and query the required fields (only 1 query is
required):

```python
with Session(engine) as session:
stmt = (
select(
Customer.customer_id,
Order.order_id,
Order.order_datetime,
)
.select_from(Customer)
.join(Customer.orders)
.order_by(Customer.customer_id)
)
results = session.execute(stmt)

current_customer_id = None
for row in results:
customer_id = row.customer_id
if current_customer_id != customer_id:
current_customer_id = customer_id
print(f"> Customer: #{current_customer_id}")
print(f"> order #{row.order_id} at {row.order_datetime}")
```

```sql
SELECT customer.customer_id, "order".order_id, ...
FROM customer
JOIN "order" ON customer.customer_id = "order".customer_id
ORDER BY customer.customer_id
```

In summary, the "N plus one" problem in ORM occurs when the framework executes
multiple queries to retrieve associated data for each item in a collection,
resulting in a significant performance overhead. Understanding and addressing
this problem by optimizing data retrieval strategies can help improve the
efficiency of ORM-based applications.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through