Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 715 Vote(s) - 3.43 Average
  • 1
  • 2
  • 3
  • 4
  • 5
MyISAM versus InnoDB

#11
I've worked on a high-volume system using MySQL and I've tried both MyISAM and InnoDB.

I found that the table-level locking in MyISAM caused serious performance problems for our workload which sounds similar to yours. Unfortunately I also found that performance under InnoDB was also worse than I'd hoped.

In the end I resolved the contention issue by fragmenting the data such that inserts went into a "hot" table and selects never queried the hot table.

This also allowed deletes (the data was time-sensitive and we only retained X days worth) to occur on "stale" tables that again weren't touched by select queries. InnoDB seems to have poor performance on bulk deletes so if you're planning on purging data you might want to structure it in such a way that the old data is in a stale table which can simply be dropped instead of running deletes on it.

Of course I have no idea what your application is but hopefully this gives you some insight into some of the issues with MyISAM and InnoDB.
Reply

#12
Also check out some drop-in replacements for MySQL itself:

**MariaDB**

[To see links please register here]


MariaDB is a database server that offers drop-in replacement functionality for MySQL. MariaDB is built by some of the original authors of MySQL, with assistance from the broader community of Free and open source software developers. In addition to the core functionality of MySQL, MariaDB offers a rich set of feature enhancements including alternate storage engines, server optimizations, and patches.

**Percona Server**

[To see links please register here]


An enhanced drop-in replacement for MySQL, with better performance, improved diagnostics, and added features.
Reply

#13
For a load with more writes and reads, you will benefit from InnoDB. Because InnoDB provides row-locking rather than table-locking, your `SELECT`s can be concurrent, not just with each other but also with many `INSERT`s. However, unless you are intending to use SQL transactions, set the InnoDB commit flush to 2 ([innodb_flush_log_at_trx_commit][doc]). This gives you back a lot of raw performance that you would otherwise lose when moving tables from MyISAM to InnoDB.

Also, consider adding replication. This gives you some read scaling and since you stated your reads don't have to be up-to-date, you can let the replication fall behind a little. Just be sure that it can catch up under anything but the heaviest traffic or it will always be behind and will never catch up. If you go this way, however, I *strongly* recommend you isolate reading from the slaves and replication lag management to your database handler. It is so much simpler if the application code does not know about this.

Finally, be aware of different table loads. You will not have the same read/write ratio on all tables. Some smaller tables with near 100% reads could afford to stay MyISAM. Likewise, if you have some tables that are near 100% write, you may benefit from `INSERT DELAYED`, but that is only supported in MyISAM (the `DELAYED` clause is ignored for an InnoDB table).

But benchmark to be sure.

[doc]:

[To see links please register here]

Reply

#14
Slightly off-topic, but for documentation purposes and completeness, I would like to add the following.

In general using InnoDB will result in a much LESS complex application, probably also more bug-free. Because you can put all referential integrity (Foreign Key-constraints) into the datamodel, you don't need anywhere near as much application code as you will need with MyISAM.

Every time you insert, delete or replace a record, you will HAVE to check and maintain the relationships. E.g. if you delete a parent, all children should be deleted too. For instance, even in a simple blogging system, if you delete a blogposting record, you will have to delete the comment records, the likes, etc. In InnoDB this is done automatically by the database engine (if you specified the contraints in the model) and requires no application code. In MyISAM this will have to be coded into the application, which is very difficult in web-servers. Web-servers are by nature very concurrent / parallel and because these actions should be atomical and MyISAM supports no real transactions, using MyISAM for web-servers is risky / error-prone.

Also in most general cases, InnoDB will perform much better, for a multiple of reasons, one them being able to use record level locking as opposed to table-level locking. Not only in a situation where writes are more frequent than reads, also in situations with complex joins on large datasets. We noticed a 3 fold performance increase just by using InnoDB tables over MyISAM tables for very large joins (taking several minutes).

I would say that in general InnoDB (using a 3NF datamodel complete with referential integrity) should be the default choice when using MySQL. MyISAM should only be used in very specific cases. It will most likely perform less, result in a bigger and more buggy application.

Having said this. Datamodelling is an art seldom found among webdesigners / -programmers. No offence, but it does explain MyISAM being used so much.
Reply

#15
**Please note** that my formal education and experience is with Oracle, while my work with MySQL has been entirely personal and on my own time, so if I say things that are true for Oracle but are not true for MySQL, I apologize. While the two systems share a lot, the relational theory/algebra is the same, and relational databases are still relational databases, there are still plenty of differences!!

I particularly like (as well as row-level locking) that InnoDB is transaction-based, meaning that you may be updating/inserting/creating/altering/dropping/etc several times for one "operation" of your web application. The problem that arises is that if only *some* of those changes/operations end up being committed, but others do not, you will most times (depending on the specific design of the database) end up with a database with conflicting data/structure.

**Note:** With Oracle, create/alter/drop statements are called "DDL" (Data Definition) statements, and implicitly trigger a commit. Insert/update/delete statements, called "DML" (Data Manipulation), are *not* committed automatically, but only when a DDL, commit, or exit/quit is performed (or if you set your session to "auto-commit", or if your client auto-commits). It's imperative to be aware of that when working with Oracle, but I am not sure how MySQL handles the two types of statements. Because of this, I want to make it clear that I'm not sure of this when it comes to MySQL; only with Oracle.

An example of when transaction-based engines excel:
-----
Let's say that I or you are on a web-page to sign up to attend a free event, and one of the main purposes of the system is to only allow up to 100 people to sign up, since that is the limit of the seating for the event. Once 100 sign-ups are reached, the system would disable further signups, at least until others cancel.

In this case, there may be a table for guests (name, phone, email, etc.), and a second table which tracks the number of guests that have signed up. We thus have two operations for one "transaction". Now suppose that after the guest info is added to the GUESTS table, there is a connection loss, or an error with the same impact. The GUESTS table was updated (inserted into), but the connection was lost before the "available seats" could be updated.

Now we have a guest added to the guest table, but the number of available seats is now incorrect (for example, value is 85 when it's actually 84).

*Of course* there are many ways to handle this, such as tracking available seats with "100 minus number of rows in guests table," or some code that checks that the info is consistent, etc....
But with a transaction-based database engine such as InnoDB, either *ALL* of the operations are committed, or *NONE* of them are. This can be helpful in many cases, but like I said, it's not the ONLY way to be safe, no (a nice way, however, handled by the database, not the programmer/script-writer).

That's all "transaction-based" essentially means in this context, unless I'm missing something -- that either the whole transaction succeeds as it should, or *nothing* is changed, since making only partial changes could make a minor to SEVERE mess of the database, perhaps even corrupting it...

But I'll say it one more time, it's not the only way to avoid making a mess. But it is one of the methods that the engine itself handles, leaving you to code/script with only needing to worry about "was the transaction successful or not, and what do I do if not (such as retry)," instead of manually writing code to check it "manually" from outside of the database, and doing a lot more work for such events.

Lastly, a note about table-locking vs row-locking:
---------
**DISCLAIMER:** I may be wrong in all that follows in regard to MySQL, and the hypothetical/example situations are things to look into, but I may be wrong in what *exactly* is possible to cause corruption with MySQL. The examples are however very real in general programming, even if MySQL has more mechanisms to avoid such things...

Anyway, I am fairly confident in agreeing with those who have argued that how many connections are allowed at a time **does *not*** work around a locked table. In fact, multiple connections **are the entire point of locking a table!!** So that other processes/users/apps are not able to corrupt the database by making changes at the same time.

How would two or more connections working on the same row make a REALLY BAD DAY for you??
Suppose there are two processes both want/need to update the same value in the same row, let's say because the row is a record of a bus tour, and each of the two processes simultaneously want to update the "riders" or "available_seats" field as "the current value plus 1."

Let's do this hypothetically, step by step:

1. Process one reads the current value, let's say it's empty, thus '0' so far.
2. Process two reads the current value as well, which is still 0.
3. Process one writes (current + 1) which is 1.
4. Process two *should* be writing 2, but since it read the current value *before* process one write the new value, it too writes 1 to the table.

I'm *not certain* that two connections could intermingle like that, both reading before the first one writes... But if not, then I would still see a problem with:

1. Process one reads the current value, which is 0.
2. Process one writes (current + 1), which is 1.
3. Process two reads the current value now. But while process one DID write (update), it has not committed the data, thus only that same process can read the new value that it updated, while all others see the older value, until there is a commit.

Also, at least with Oracle databases, there are isolation levels, which I will not waste our time trying to paraphrase. Here is a good article on that subject, and each isolation level having it's pros and cons, which would go along with how important transaction-based engines may be in a database...

Lastly, there may likely be different safeguards in place within MyISAM, instead of foreign-keys and transaction-based interaction. Well, for one, there is the fact that an entire table is locked, which makes it less likely that transactions/FKs are *needed*.

And alas, if you are aware of these concurrency issues, yes you can play it less safe and just write your applications, set up your systems so that such errors are not possible (your code is then responsible, rather than the database itself). However, in my opinion, I would say that it is always best to use as many safeguards as possible, programming defensively, and always being aware that human error is impossible to completely avoid. It happens to everyone, and anyone who says they are immune to it must be lying, or hasn't done more than write a "Hello World" application/script. ;-)

I hope that SOME of that is helpful to some one, and even more-so, I hope that I have not just now been a culprit of assumptions and being a human in error!! My apologies if so, but the examples are good to think about, research the risk of, and so on, even if they are not potential in this specific context.

Feel free to correct me, edit this "answer," even vote it down. Just please try to improve, rather than correcting a bad assumption of mine with another. ;-)

This is my first response, so please forgive the length due to all the disclaimers, etc... I just don't want to sound arrogant when I am not absolutely certain!
Reply

#16
InnoDB offers:

ACID transactions
row-level locking
foreign key constraints
automatic crash recovery
table compression (read/write)
spatial data types (no spatial indexes)

In InnoDB all data in a row except for TEXT and BLOB can occupy 8,000 bytes at most. No full text indexing is available for InnoDB. In InnoDB the COUNT(*)s (when WHERE, GROUP BY, or JOIN is not used) execute slower than in MyISAM because the row count is not stored internally. InnoDB stores both data and indexes in one file. InnoDB uses a buffer pool to cache both data and indexes.

MyISAM offers:

fast COUNT(*)s (when WHERE, GROUP BY, or JOIN is not used)
full text indexing
smaller disk footprint
very high table compression (read only)
spatial data types and indexes (R-tree)

MyISAM has table-level locking, but no row-level locking. No transactions. No automatic crash recovery, but it does offer repair table functionality. No foreign key constraints. MyISAM tables are generally more compact in size on disk when compared to InnoDB tables. MyISAM tables could be further highly reduced in size by compressing with myisampack if needed, but become read-only. MyISAM stores indexes in one file and data in another. MyISAM uses key buffers for caching indexes and leaves the data caching management to the operating system.

Overall I would recommend InnoDB for most purposes and MyISAM for specialized uses only. InnoDB is now the default engine in new MySQL versions.
Reply

#17
People often talk about performance, reads vs. writes, foreign keys, etc. but there's one other must-have feature for a storage engine in my opinion: **atomic updates.**

Try this:

1. Issue an UPDATE against your MyISAM table that takes 5 seconds.
2. While the UPDATE is in progress, say 2.5 seconds in, hit Ctrl-C to interrupt it.
3. Observe the effects on the table. How many rows were updated? How many were not updated? Is the table even readable, or was it corrupted when you hit Ctrl-C?
4. Try the same experiment with UPDATE against an InnoDB table, interrupting the query in progress.
5. Observe the InnoDB table. *Zero* rows were updated. InnoDB has assured you have atomic updates, and if the full update could not be committed, it rolls back the whole change. Also, the table is not corrupt. This works even if you use `killall -9 mysqld` to simulate a crash.

Performance is desirable of course, but *not losing data* should trump that.
Reply

#18
bottomline: if you are working offline with selects on large chunks of data, MyISAM will probably give you better (much better) speeds.

there are some situations when MyISAM is infinitely more efficient than InnoDB: when manipulating large data dumps offline (because of table lock).

example: I was converting a csv file (15M records) from NOAA which uses VARCHAR fields as keys. InnoDB was taking forever, even with large chunks of memory available.

this an example of the csv (first and third fields are keys).

USC00178998,20130101,TMAX,-22,,,7,0700
USC00178998,20130101,TMIN,-117,,,7,0700
USC00178998,20130101,TOBS,-28,,,7,0700
USC00178998,20130101,PRCP,0,T,,7,0700
USC00178998,20130101,SNOW,0,T,,7,

since what i need to do is run a batch offline update of observed weather phenomena, i use MyISAM table for receiving data and run JOINS on the keys so that i can clean the incoming file and replace VARCHAR fields with INT keys (which are related to external tables where the original VARCHAR values are stored).
Reply

#19
Almost every time I start a new project I Google this same question to see if I come up with any new answers.

It eventually boils down to - I take the latest version of MySQL and run tests.

I have tables where I want to do key/value lookups... and that's all. I need to get the value (0-512 bytes) for a hash key. There is not a lot of transactions on this DB. The table gets updates occasionally (in it's entirety), but 0 transactions.

So we're not talking about a complex system here, we are talking about a simple lookup,.. and how (other than making the table RAM resident) we can optimize performance.

I also do tests on other databases (ie NoSQL) to see if there is anywhere I can get an advantage. The biggest advantage I have found is in key mapping but as far as the lookup goes, MyISAM is currently topping them all.

Albeit, I wouldn't perform financial transactions with MyISAM tables but for simple lookups, you should test it out.. typically 2x to 5x the queries/sec.

Test it, I welcome debate.

Reply

#20
If it is 70% inserts and 30% reads then it is more like on the InnoDB side.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through