Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 605 Vote(s) - 3.43 Average
  • 1
  • 2
  • 3
  • 4
  • 5
What is the "N+1 selects problem" in ORM (Object-Relational Mapping)?

#1
The "N+1 selects problem" is generally stated as a problem in Object-Relational mapping (ORM) discussions, and I understand that it has something to do with having to make a lot of database queries for something that seems simple in the object world.

Does anybody have a more detailed explanation of the problem?
Reply

#2
Suppose you have COMPANY and EMPLOYEE. COMPANY has many EMPLOYEES (i.e. EMPLOYEE has a field COMPANY_ID).

In some O/R configurations, when you have a mapped Company object and go to access its Employee objects, the O/R tool will do one select for every employee, wheras if you were just doing things in straight SQL, you could `select * from employees where company_id = XX`. Thus N (# of employees) plus 1 (company)

This is how the initial versions of EJB Entity Beans worked. I believe things like Hibernate have done away with this, but I'm not too sure. Most tools usually include info as to their strategy for mapping.
Reply

#3
SELECT
table1.*
, table2.*
INNER JOIN table2 ON table2.SomeFkId = table1.SomeId

That gets you a result set where child rows in table2 cause duplication by returning the table1 results for each child row in table2. O/R mappers should differentiate table1 instances based on a unique key field, then use all the table2 columns to populate child instances.

SELECT table1.*

SELECT table2.* WHERE SomeFkId = #

The N+1 is where the first query populates the primary object and the second query populates all the child objects for each of the unique primary objects returned.

Consider:

class House
{
int Id { get; set; }
string Address { get; set; }
Person[] Inhabitants { get; set; }
}

class Person
{
string Name { get; set; }
int HouseId { get; set; }
}

and tables with a similar structure. A single query for the address "22 Valley St" may return:

Id Address Name HouseId
1 22 Valley St Dave 1
1 22 Valley St John 1
1 22 Valley St Mike 1

The O/RM should fill an instance of Home with ID=1, Address="22 Valley St" and then populate the Inhabitants array with People instances for Dave, John, and Mike with just one query.

A N+1 query for the same address used above would result in:

Id Address
1 22 Valley St

with a separate query like

SELECT * FROM Person WHERE HouseId = 1

and resulting in a separate data set like

Name HouseId
Dave 1
John 1
Mike 1

and the final result being the same as above with the single query.

The advantages to single select is that you get all the data up front which may be what you ultimately desire. The advantages to N+1 is query complexity is reduced and you can use lazy loading where the child result sets are only loaded upon first request.


Reply

#4
The supplied link has a very simply example of the n + 1 problem. If you apply it to Hibernate it's basically talking about the same thing. When you query for an object, the entity is loaded but any associations (unless configured otherwise) will be lazy loaded. Hence one query for the root objects and another query to load the associations for each of these. 100 objects returned means one initial query and then 100 additional queries to get the association for each, n + 1.

[

[To see links please register here]

][1]


[1]:

[To see links please register here]

Reply

#5
In my opinion the article written in [Hibernate Pitfall: Why Relationships Should Be Lazy][1] is exactly opposite of real N+1 issue is.

If you need correct explanation please refer [Hibernate - Chapter 19: Improving Performance - Fetching Strategies][2]

> Select fetching (the default) is
> extremely vulnerable to N+1 selects
> problems, so we might want to enable
> join fetching


[1]:

[To see links please register here]

[2]:

[To see links please register here]

Reply

#6
We moved away from the ORM in Django because of this problem. Basically, if you try and do

for p in person:
print p.car.colour

The ORM will happily return all people (typically as instances of a Person object), but then it will need to query the car table for each Person.

A simple and very effective approach to this is something I call "**fanfolding**", which avoids the nonsensical idea that query results from a relational database should map back to the original tables from which the query is composed.

Step 1: Wide select

select * from people_car_colour; # this is a view or sql function

This will return something like

p.id | p.name | p.telno | car.id | car.type | car.colour
-----+--------+---------+--------+----------+-----------
2 | jones | 2145 | 77 | ford | red
2 | jones | 2145 | 1012 | toyota | blue
16 | ashby | 124 | 99 | bmw | yellow

Step 2: Objectify

Suck the results into a generic object creator with an argument to split after the third item. This means that "jones" object won't be made more than once.

Step 3: Render

for p in people:
print p.car.colour # no more car queries

See [this web page][1] for an implementation of **fanfolding** for python.


[1]:

[To see links please register here]

Reply

#7
Take Matt Solnit example, imagine that you define an association between Car and Wheels as LAZY and you need some Wheels fields. This means that after the first select, hibernate is going to do "Select * from Wheels where car_id = :id" FOR EACH Car.

This makes the first select and more 1 select by each N car, that's why it's called n+1 problem.

To avoid this, make the association fetch as eager, so that hibernate loads data with a join.

But attention, if many times you don't access associated Wheels, it's better to keep it LAZY or change fetch type with Criteria.
Reply

#8
I can't comment directly on other answers, because I don't have enough reputation. But it's worth noting that the problem essentially only arises because, historically, a lot of dbms have been quite poor when it comes to handling joins (MySQL being a particularly noteworthy example). So n+1 has, often, been notably faster than a join. And then there are ways to improve on n+1 but still without needing a join, which is what the original problem relates to.

However, MySQL is now a lot better than it used to be when it comes to joins. When I first learned MySQL, I used joins a lot. Then I discovered how slow they are, and switched to n+1 in the code instead. But, recently, I've been moving back to joins, because MySQL is now a heck of a lot better at handling them than it was when I first started using it.

These days, a simple join on a properly indexed set of tables is rarely a problem, in performance terms. And if it does give a performance hit, then the use of index hints often solves them.

This is discussed here by one of the MySQL development team:

[To see links please register here]


So the summary is: If you've been avoiding joins in the past because of MySQL's abysmal performance with them, then try again on the latest versions. You'll probably be pleasantly surprised.
Reply

#9
Supplier with a one-to-many relationship with Product. One Supplier has (supplies) many Products.

***** Table: Supplier *****
+-----+-------------------+
| ID | NAME |
+-----+-------------------+
| 1 | Supplier Name 1 |
| 2 | Supplier Name 2 |
| 3 | Supplier Name 3 |
| 4 | Supplier Name 4 |
+-----+-------------------+

***** Table: Product *****
+-----+-----------+--------------------+-------+------------+
| ID | NAME | DESCRIPTION | PRICE | SUPPLIERID |
+-----+-----------+--------------------+-------+------------+
|1 | Product 1 | Name for Product 1 | 2.0 | 1 |
|2 | Product 2 | Name for Product 2 | 22.0 | 1 |
|3 | Product 3 | Name for Product 3 | 30.0 | 2 |
|4 | Product 4 | Name for Product 4 | 7.0 | 3 |
+-----+-----------+--------------------+-------+------------+

Factors:

- Lazy mode for Supplier set to “true” (default)

- Fetch mode used for querying on Product is Select

- Fetch mode (default): Supplier information is accessed

- Caching does not play a role for the first time the

- Supplier is accessed

Fetch mode is Select Fetch (default)

// It takes Select fetch mode as a default
Query query = session.createQuery( "from Product p");
List list = query.list();
// Supplier is being accessed
displayProductsListWithSupplierName(results);

select ... various field names ... from PRODUCT
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?
select ... various field names ... from SUPPLIER where SUPPLIER.id=?

Result:

- 1 select statement for Product
- N select statements for Supplier

This is N+1 select problem!
Reply

#10
The issue as others have stated more elegantly is that you either have a Cartesian product of the OneToMany columns or you're doing N+1 Selects. Either possible gigantic resultset or chatty with the database, respectively.

I'm surprised this isn't mentioned but this how I have gotten around this issue... **I make a semi-temporary ids table**. [I also do this when you have the `IN ()` clause limitation][1].

This doesn't work for all cases (probably not even a majority) but it works particularly well if you have a lot of child objects such that the Cartesian product will get out of hand (ie lots of `OneToMany` columns the number of results will be a multiplication of the columns) and its more of a batch like job.

First you insert your parent object ids as batch into an ids table.
This batch_id is something we generate in our app and hold onto.

INSERT INTO temp_ids
(product_id, batch_id)
(SELECT p.product_id, ?
FROM product p ORDER BY p.product_id
LIMIT ? OFFSET ?);

Now for each `OneToMany` column you just do a `SELECT` on the ids table `INNER JOIN`ing the child table with a `WHERE batch_id=` (or vice versa). You just want to make sure you order by the id column as it will make merging result columns easier (otherwise you will need a HashMap/Table for the entire result set which may not be that bad).

Then you just periodically clean the ids table.

This also works particularly well if the user selects say 100 or so distinct items for some sort of bulk processing. Put the 100 distinct ids in the temporary table.

Now the number of queries you are doing is by the number of OneToMany columns.


[1]:

[To see links please register here]

Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through