Create an account

Very important

  • To access the important data of the forums, you must be active in each forum and especially in the leaks and database leaks section, send data and after sending the data and activity, data and important content will be opened and visible for you.
  • You will only see chat messages from people who are at or below your level.
  • More than 500,000 database leaks and millions of account leaks are waiting for you, so access and view with more activity.
  • Many important data are inactive and inaccessible for you, so open them with activity. (This will be done automatically)


Thread Rating:
  • 466 Vote(s) - 3.47 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Translating a MySQL data/query-set into the equivalent Cassandra representation

#1
Consider a 500 million row MySQL table with the following table structure ...

CREATE TABLE foo_objects (
id int NOT NULL AUTO_INCREMENT,
foo_string varchar(32),
metadata_string varchar(128),
lookup_id int,
PRIMARY KEY (id),
UNIQUE KEY (foo_string),
KEY (lookup_id),
);

... which is being queried using only the following two queries ...

# lookup by unique string key, maximum of one row returned
SELECT * FROM foo_objects WHERE foo_string = ?;
# lookup by numeric lookup key, may return multiple rows
SELECT * FROM foo_objects WHERE lookup_id = ?;

Given those queries, how would you represent the given data-set using Cassandra?
Reply

#2
you have two options:

(1) is sort of traditional: have one CF (columnfamily) with your foo objects, one row per foo, one column per field. then create two index CFs, where the row key in one is the string values, and the row key in the other is lookup_id. Columns in the index rows are foo ids. So you do a GET on the index CF, then a MULTIGET on the ids returned.

Note that if you can make id the same as lookup_id then you have one less index to maintain.

High-level clients like Digg's lazyboy (

[To see links please register here]

) will automate maintaining the index CFs for you. Cassandra itself does not do this automatically (yet).

(2) is like (1), but you duplicate the entire foo objects into subcolumns of the index rows (that is, the index top-level columns are supercolumns). If you're not actually querying by the foo id itself, you don't need to store it in its own CF at all.
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

©0Day  2016 - 2023 | All Rights Reserved.  Made with    for the community. Connected through