Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 424 Vote(s) - 3.42 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Handling very large data with mysql

#1
Sorry for the long post!

I have a database containing ~30 tables (InnoDB engine). Only two of these tables, namely, "transaction" and "shift" are quite large (the first one have 1.5 million rows and shift has 23k rows). Now everything works fine and I don't have problem with the current database size.

However, we will have a similar database (same datatypes, design ,..) but much larger, e.g., the "transaction" table will have about **1 billion records** (about 2,3 million transaction per day) and we are thinking about how we should deal with such volume of data in MySQL? (it is both read and write intensive). I read a lot of related posts to see if Mysql (and more specifically InnoDB engine) can perform well with billions of records, but still I have some questions. Some of those related posts that I've read are in the following:

- [Can MySQL reasonably perform queries on billions of rows?][1]
- [Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?][2]
- [Best data store for billions of rows][3]
- [How big can a MySQL database get before performance starts to degrade][4]
- [Why MySQL could be slow with large tables?][5]
- [Can Mysql handle tables which will hold about 300 million records?][6]

What I've understood so far to improve the performance for very large tables:

1. (for innoDB tables which is my case) increasing the `innodb_buffer_pool_size` (e.g., up to 80% of RAM).
Also, I found some other MySQL performance tunning settings [here in
percona blog][7]
2. having proper indexes on the table (using EXPLAN on queries)
3. partitioning the table
4. MySQL Sharding or clustering

Here are my questions/confusions:

- About partitioning, I have some doubts whether we should use it or not. On one hand many people suggested it to improve performance when table is very large. On the other hand, I've read many posts saying it does not improve query performance and it does not make queries run faster (e.g., [here][8] and [here][9]). Also, I read in [MySQL Reference Manual][10] that **InnoDB foreign keys and MySQL partitioning are not compatible** (we have foreign keys).

- Regarding indexes, right now they perform well, but as far as I understood, for very large tables indexing is more restrictive (as Kevin Bedell mentioned in his answer [here][1]). Also, indexes speed up reads while slow down write (insert/update). So, for the new similar project that we will have this large DB, should we first insert/load all the data and then create indexes? (to speed up the insert)

- If we cannot use partitioning for our big table ("transaction" table), what is an alternative option to improve the performance? (except MySQl variable settings such as `innodb_buffer_pool_size`). Should we use Mysql clusters? (we have also lots of joins)


<h1>EDIT</h1>


This is the `show create table` statement for our largest table named "transaction":

CREATE TABLE `transaction` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`terminal_transaction_id` int(11) NOT NULL,
`fuel_terminal_id` int(11) NOT NULL,
`fuel_terminal_serial` int(11) NOT NULL,
`xboard_id` int(11) NOT NULL,
`gas_station_id` int(11) NOT NULL,
`operator_id` text NOT NULL,
`shift_id` int(11) NOT NULL,
`xboard_total_counter` int(11) NOT NULL,
`fuel_type` int(11) NOT NULL,
`start_fuel_time` int(11) NOT NULL,
`end_fuel_time` int(11) DEFAULT NULL,
`preset_amount` int(11) NOT NULL,
`actual_amount` int(11) DEFAULT NULL,
`fuel_cost` int(11) DEFAULT NULL,
`payment_cost` int(11) DEFAULT NULL,
`purchase_type` int(11) NOT NULL,
`payment_ref_id` text,
`unit_fuel_price` int(11) NOT NULL,
`fuel_status_id` int(11) DEFAULT NULL,
`fuel_mode_id` int(11) NOT NULL,
`payment_result` int(11) NOT NULL,
`card_pan` text,
`state` int(11) DEFAULT NULL,
`totalizer` int(11) NOT NULL DEFAULT '0',
`shift_start_time` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `terminal_transaction_id` (`terminal_transaction_id`,`fuel_terminal_id`,`start_fuel_time`) USING BTREE,
KEY `start_fuel_time_idx` (`start_fuel_time`),
KEY `fuel_terminal_idx` (`fuel_terminal_id`),
KEY `xboard_idx` (`xboard_id`),
KEY `gas_station_id` (`gas_station_id`) USING BTREE,
KEY `purchase_type` (`purchase_type`) USING BTREE,
KEY `shift_start_time` (`shift_start_time`) USING BTREE,
KEY `fuel_type` (`fuel_type`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1665335 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT

Thanks for your time,



[1]:

[To see links please register here]

[2]:

[To see links please register here]

[3]:

[To see links please register here]

[4]:

[To see links please register here]

[5]:

[To see links please register here]

[6]:

[To see links please register here]

[7]:

[To see links please register here]

[8]:

[To see links please register here]

[9]:

[To see links please register here]

[10]:

[To see links please register here]

Reply

#2
When collecting billions of rows, it is better (when possible) to consolidate, process, summarize, whatever, the data _before_ storing. Keep the raw data in a file if you think you need to get back to it.

Doing that will eliminate most of your questions and concerns, plus speed up the processing.
Reply

#3
Crossed 2.7 BL data with a real time VTS system without a trouble. Special case is that the database not only stores data, but real time read availability is a crucial part too, otherwise real time tracking purpose is not met. Following things helped in the first place;

1. Handsome normalization;
2. Serious indexing;
3. InnoDB;
4. Calculated columns as cache;
5. Query optimization;
6. It survives (so far) with x4 cores & x8 GB RAM on an SSD (VPS)
7. Summary tables for reporting & backlog;
Reply

#4
* Can MySQL reasonably perform queries on billions of rows? -- MySQL can 'handle' billions of rows. "Reasonably" depends on the queries; let's see them.

* Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows? -- 5.7 has some improvements, but 5.5 is pretty good, in spite of being <strike>nearly 6</strike> 8 years old, and <strike>on the verge of</strike> no longer being supported.

* Best data store for billions of rows -- If you mean 'Engine', then InnoDB.

* How big can a MySQL database get before the performance starts to degrade -- Again, that depends on the queries. I can show you a 1K row table that will meltdown; I have worked with billion-row tables that hum along.

* Why MySQL could be slow with large tables? -- range scans lead to I/O, which is the slow part.

* Can Mysql handle tables which will hold about 300 million records? -- again, yes. The limit is somewhere around a trillion rows.

* (for InnoDB tables which is my case) increasing the innodb_buffer_pool_size (e.g., up to 80% of RAM). Also, I found some other MySQL performance tunning settings here in Percona blog -- yes

* having proper indexes on the table (using EXPLAIN on queries) -- well, let's see them. There are a lot of mistakes that can be made in this _critical_ area.

* partitioning the table -- "Partitioning is not a panacea!" I harp on that in [_my blog_](

[To see links please register here]

)

* MySQL Sharding -- Currently this is DIY

* MySQL clustering -- Currently the best answer is some Galera-based option (PXC, MariaDB 10, DIY w/Oracle). Oracle's "Group Replication" is a viable contender.

* Partitioning does not support `FOREIGN KEY` or "global" `UNIQUE`.

* UUIDs, at the scale you are talking about, will not just slow down the system, but actually kill it. [_Type 1 UUIDs_](

[To see links please register here]

) may be a workaround.

* Insert and index-build speed -- There are too many variations to give a single answer. Let's see your tentative `CREATE TABLE` and how you intend to feed the data in.

* Lots of joins -- "Normalize, but don't over-normalize." In particular, do not normalize datetimes or floats or other "continuous" values.

* Do build [_summary tables_](

[To see links please register here]

)

* 2,3 million transactions per day -- If that is 2.3M _inserts_ (30/sec), then there is not much of a performance problem. If more complex, then RAID, SSD, batching, etc, may be necessary.

* deal with such volume of data -- If most activity is with the "recent" rows, then the buffer_pool will nicely 'cache' the activity, thereby avoiding I/O. If the activity is "random", then MySQL (or _anyone_ else) will have I/O issues.

* Shrinking the datatypes helps in a table like yours. I doubt if you need 4 bytes to specify `fuel_type`. There are multiple 1-byte approaches.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through