Wednesday, January 6. 2010Describing Drizzle's Development ProcessYesterday, I was working on a survey that Selena Deckelmann put together for open source databases. She will be presenting the results at Linux.conf.au this month. One of the questions on the survey was this: How would you describe your development process? followed by these answer choices:
I thought a bit about the question and then answered the following in the "Other, please specify:" area: Bit of a mix between all three above.
The more I think about it, the more I really do feel that Drizzle's development process is indeed a mixture of individuals, groups, and a Benevolent dictator. And I think it works pretty well. Who's the Benevolent Dictator of Drizzle?First, let me get the BDFL question out of the way. We've made a big deal in the Drizzle community and mailing lists that anyone and everyone is encouraged to participate in the development process — so why would I say that Drizzle has a benevolent dictator? Well, although he would probably disagree with the tile of BDFL, Brian Aker does have some dictator-like abilities with regards to the development process, and rightfully so. Brian came up with many of the concepts that Drizzle aspires to be, and Brian has more experience working on the code base than any other contributor.
After having worked closely with Brian now for 18 months or so, I can definitively say that Brian's brain works in a very, well, interesting way. Those of us who work with him understand that sometimes his brain works so fast, his typing fingers struggle to keep up, resulting in something I call "Krowspeak". It's kinda funny sometimes trying to translate With this wonderfully unique noodle, Brian tends to knock out large chunks of code at a time, and often he wants to push these chunks of code into our build and regression system and into trunk to see the results of his work quickly. Sometimes, this can cause other branches to get out of sync and get merge conflicts, and Brian will inform branch owners of the conflicts and work with them to resolve them. So, regarding dictator-like development processes, I suppose we have Brian acting as the merge dictator because he's got a lot of experience and understands best how both his code and other's code integrates. We tried a little while back having myself and Monty Taylor be merge captains, but that distribution of merge work actually created a number of other problems and we've since gone back to Brian being the merge captain by himself, with Lee, Monty, and myself improving our automated build and regression system to help Brian with the repetitive work. That said, what Brian does not do is make decisions in a dictator-like way. Decisions about the code style, reviews, features, syntax changes, etc are made on the mailing list by consensus vote. If a consensus is not reached, generally, no change is made which would depend on the decision. Brian does not influence the direction of the software or the source code style any more than anyone else on the mailing list which expresses an opinion about an issue; and for this, I greatly respect his wisdom to seek consensus in an open and community-oriented way. Groups Empowered to Make DecisionsI'm assuming that what Selena's "large/small group empowered to make decisions" answer meant was what is sometimes called "Cabal Leadership" of a project. In other words, there is some group which steers the project and makes decisions about the project which affect the rest of the project's contributors. Drizzle has at least one such group, the Sun Microsystems Drizzle Team, which is composed of Brian, Monty Taylor, Lee Bieber, Eric Day, Stewart Smith, and myself. One might call us the core committers for Drizzle. However, while the Sun Drizzle team certainly is empowered to guide development, it is no different than any other group of developers that choose to contribute to Drizzle. There isn't a "what the Sun Drizzle team decides" rule in effect. Our "power" in the development process is no greater or less than any other group of contributors. We act merely as a team of individuals who work on the Drizzle code and advocate for the project's goals. Individuals Empowered to Make DecisionsOne thing I've been impressed with in the past 18 months is how the Drizzle community has embraced the opinions and work of individual contributors. I believe Toru Maesaka, Andrew Hutchings, Diego Medina and Padraig O'Sullivan were among the first individuals to begin actively contributing to Drizzle. Since then, dozens of others have joined the developer and advocate community, with each individual carving out a piece of the source code or community activities that they want to work on. I have learned much from all these individuals over the last year or so, and I've tried my best to share knowledge and encourage others to do the same. Our IRC channel and mailing list are active places of discussion. Our code reviews are always completely open to the public for comments and discussed transparently on Launchpad, and this code review process has been a great mixing bowl of opinion, discussion, learning and debate. I love it. More and more we have developers showing up and taking ownership of a bug, a blueprint, or just a part of the code that interests them. And nobody stands in their way and says "Oh, no, you shouldn't work on that because <insert another contributor's name> owns that code." Instead, what you will more likely see on the lists or on IRC is a response like "hey, that's awesome! be sure to chat with <insert another contributor's name>. They are interested in that code, too, and you should share ideas!" This is incredibly refreshing to see.
In short, the Drizzle developer process is a nice mix of empowered individuals and groups, and a dash of dictatorship just to keep things moving efficiently. It's open, transparent, and fun to work on Drizzle. Come join us Tuesday, December 15. 2009Great Job, MySQL Engineering!Just a quick note to congratulate MySQL engineering on their next milestone release, MySQL 5.5. There seem to be some excellent new features, some of which have been hotly requested for quite some time:
On the subject of SIGNAL/RESIGNAL, Roland Bouman has, as usual, an excellent and informative article on the subject. My personal opinion on SIGNAL/RESIGNAL is that it is one of the most poorly-architected, clunky error-raising frameworks even invented, but alas, if you stick with ANSI standard SQL it's the only game in town. I look forward to seeing the progress on this, especially in relation to the addition of DIAGNOSTICS. Regarding semi-synchronous replication, Mark Callaghan had a quick write-up about it. Check out his short article or Giuseppe's 5.5 primer for information. Personally, I'm interested in this in Drizzle, because I'm working on the new replication system. Putting this functionality into Drizzle shouldn't actually be too difficult. I'm looking into it.
Anyway, nice job MySQL! Keep up the good work, and continue the milestone release model. It works. It would be nice to hear MySQL engineers blogging about what they are working on, though. Give the community a chance to comment on your work, etc, and drum up interest in the new features before they arrive. Hint, hint Tuesday, November 24. 2009A Laptop for Developers without paying The Windows TaxI find it amazing that the U.S. Department of Justice can continue to cover its eyes and ears while Microsoft is allowed to exert its monopolistic power over all hardware manufacturers. About 20 months ago, I was able to purchase a Lenovo Thinkpad T61 from the lenovo.com website without an operating system installed. Today, I went to purchase a new Lenovo Thinkpad laptop, again without having to pay the Windows Tax. Turns out Lenovo has stopped offering this option. What a complete PILE OF SHIT. Somebody in Microsoft's "Business Development" or "Partners" team must have told Lenovo to stop offering its customers a simple choice of not having to pay the OEM license fees for Windows. And there's nothing anyone can do about it. Microsoft is just too big and too pervasive for anybody to have a damn effect on them. Frankly, it's anti-choice, anti-competition, anti-innovation behaviour from Microsoft. And its ridiculous. Does anyone out there know how to get a decent laptop any more without having to fork over my money to a software giant that continues to bully all competition out of the market? Your suggestions are most welcome. P.S. Mac is not an option for me. Sorry. P.P.S The only thing this post has to do with MySQL is the general discussion on the acquisition of Sun by Oracle, and the pending investigation into possibly monopoly concerns by the EC...but of course I can't comment on that directly...grr.
UPDATE:
UPDATE 2:
Wednesday, November 18. 2009Macro Support in new Drizzle Client Console?Hi all! I've been reading through the requested features for the new client on the wiki here: I think all the stuff on that link is excellent so far. I'd also like to request a feature that I think will be a really cool timesaver for DBAs and developers using Drizzle. Macro Support Remember, "way back when" you used Microsoft Excel and were able to start recording your actions, then when you stopped recording, Excel would store a "macro" of your actions that you could subsequently replay? I think this would be incredibly useful for folks who do repetitive work in the console. Sure, I know, I know...the first reaction folks will say is "but HEY, you guys removed stored procedures!" Yeah, yeah... but the feature I'm proposing here is different from stored procedures in the following ways:
Imagine the following rough example interface...
drizzle> RECORD MACRO "sales_report_with_email" (to_email);
macro recording started.
drizzle> mode python;
in python mode.
python> import datetime
python> today= datetime.datetime.now().isoformat()
python> filename= "%s-%s-%s" % ("sales", to_email, today)
python> Ctrl-D
drizzle> SELECT * FROM sales
WHERE manager = @to_email; > csv(@filename);
drizzle> mode python;
In python mode.
python> report_txt= open(filename, "r+b").read()
python> import smtplib
python> mailserver = smtplib.SMTP('localhost')
python> mailserver.sendmail('theboss@company.com', to_email, report_txt)
python> mailserver.quit()
python> print "Mail sent to %s\n" % to_email
python> Ctrl-D
drizzle> STOP MACRO;
Macro "sales_report_with_email" saved.
drizzle> macro("sales_report_with_email", "myboss@company.com");
Mail sent to myboss@company.com
Pretty powerful, eh? If you follow the flow above, you will notice the only real trick to solve is passing the macro's arguments into the console's variable array, and from the console's variable array into the Python interpreter's variable scope. But this is a fairly simple problem to solve...
Thoughts? Suggestions? If you've got comments, please feel free to share here, or on the Drizzle Discussion mailing list, or even update the wiki pages posted above. Thanks! Tuesday, November 10. 2009Somebody in Marketing Needs to be FiredSneak Peek - Drizzle Transaction Log and INFORMATION_SCHEMAI've been coding up a storm in the last couple days and have just about completed coding on three new INFORMATION_SCHEMA views which allow anyone to query the new Drizzle transaction log for information about its contents. I've also finished a new UDF for Drizzle called PRINT_TRANSACTION_MESSAGE() that prints out the Transaction message's contents in a easy-to-read format. I don't have time for a full walk-through blog entry about it, so I'll just paste some output below and let y'all take a looksie. A later blog entry will feature lots of source code explaining how you, too, can easily add INFORMATION_SCHEMA views to your Drizzle plugins. Below is the results of the following sequence of actions:
Enjoy!
jpipes@serialcoder:~/repos/drizzle/replication-group-commit/tests$ ./dtr --mysqld="--default-replicator-enable"\
--mysqld="--transaction-log-enable"\
--mysqld="--transaction-log-enable-checksum"\
--start-and-exit
<snip>
Servers started, exiting
jpipes@serialcoder:~/repos/drizzle/replication-group-commit/tests$ ../client/drizzle --port=9306
Welcome to the Drizzle client.. Commands end with ; or \g.
Your Drizzle connection id is 2
Server version: 2009.11.1181 Source distribution (replication-group-commit)
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
drizzle> use test
Database changed
drizzle> CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY , padding VARCHAR(200) NOT NULL );
Query OK, 0 rows affected (0.01 sec)
drizzle> INSERT INTO t1 VALUES (1, "I love testing.");
Query OK, 1 row affected (0.01 sec)
drizzle> INSERT INTO t1 VALUES (2, "I hate testing.");
Query OK, 1 row affected (0.01 sec)
drizzle> UPDATE t1 SET padding="I love it when a plan comes together" WHERE id = 2;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0
drizzle> DROP TABLE t1;
Query OK, 0 rows affected (0.17 sec)
drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG\G
*************************** 1. row ***************************
FILE_NAME: transaction.log
FILE_LENGTH: 639
NUM_LOG_ENTRIES: 5
NUM_TRANSACTIONS: 5
MIN_TRANSACTION_ID: 0
MAX_TRANSACTION_ID: 9
MIN_END_TIMESTAMP: 1257888458463696
MAX_END_TIMESTAMP: 1257888473929116
1 row in set (0 sec)
drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG_ENTRIES;
+--------------+-------------+--------------+
| ENTRY_OFFSET | ENTRY_TYPE | ENTRY_LENGTH |
+--------------+-------------+--------------+
| 0 | TRANSACTION | 141 |
| 141 | TRANSACTION | 121 |
| 262 | TRANSACTION | 121 |
| 383 | TRANSACTION | 181 |
| 564 | TRANSACTION | 75 |
+--------------+-------------+--------------+
5 rows in set (0 sec)
drizzle> SELECT * FROM INFORMATION_SCHEMA.TRANSACTION_LOG_TRANSACTIONS;
+--------------+----------------+-----------+------------------+------------------+----------------+------------+
| ENTRY_OFFSET | TRANSACTION_ID | SERVER_ID | START_TIMESTAMP | END_TIMESTAMP | NUM_STATEMENTS | CHECKSUM |
+--------------+----------------+-----------+------------------+------------------+----------------+------------+
| 0 | 0 | 1 | 1257888458463668 | 1257888458463696 | 1 | 3275955647 |
| 141 | 7 | 1 | 1257888462222183 | 1257888462226990 | 1 | 407829420 |
| 262 | 8 | 1 | 1257888465371330 | 1257888465378423 | 1 | 4073072174 |
| 383 | 9 | 1 | 1257888470209443 | 1257888470215165 | 1 | 92884681 |
| 564 | 9 | 1 | 1257888473929111 | 1257888473929116 | 1 | 2850269133 |
+--------------+----------------+-----------+------------------+------------------+----------------+------------+
5 rows in set (0 sec)
drizzle> SELECT PRINT_TRANSACTION_MESSAGE("transaction.log", ENTRY_OFFSET) as trx
> FROM INFORMATION_SCHEMA.TRANSACTION_LOG_ENTRIES\G
*************************** 1. row ***************************
trx: transaction_context {
server_id: 1
transaction_id: 0
start_timestamp: 1257888458463668
end_timestamp: 1257888458463696
}
statement {
type: RAW_SQL
start_timestamp: 1257888458463676
end_timestamp: 1257888458463694
sql: "CREATE TABLE t1 ( id INT NOT NULL PRIMARY KEY , padding VARCHAR(200) NOT NULL )"
}
*************************** 2. row ***************************
trx: transaction_context {
server_id: 1
transaction_id: 7
start_timestamp: 1257888462222183
end_timestamp: 1257888462226990
}
statement {
type: INSERT
start_timestamp: 1257888462222185
end_timestamp: 1257888462226989
insert_header {
table_metadata {
schema_name: "test"
table_name: "t1"
}
field_metadata {
type: INTEGER
name: "id"
}
field_metadata {
type: VARCHAR
name: "padding"
}
}
insert_data {
segment_id: 1
end_segment: true
record {
insert_value: "1"
insert_value: "I love testing."
}
}
}
*************************** 3. row ***************************
trx: transaction_context {
server_id: 1
transaction_id: 8
start_timestamp: 1257888465371330
end_timestamp: 1257888465378423
}
statement {
type: INSERT
start_timestamp: 1257888465371332
end_timestamp: 1257888465378422
insert_header {
table_metadata {
schema_name: "test"
table_name: "t1"
}
field_metadata {
type: INTEGER
name: "id"
}
field_metadata {
type: VARCHAR
name: "padding"
}
}
insert_data {
segment_id: 1
end_segment: true
record {
insert_value: "2"
insert_value: "I hate testing."
}
}
}
*************************** 4. row ***************************
trx: transaction_context {
server_id: 1
transaction_id: 9
start_timestamp: 1257888470209443
end_timestamp: 1257888470215165
}
statement {
type: UPDATE
start_timestamp: 1257888470209446
end_timestamp: 1257888470215163
update_header {
table_metadata {
schema_name: "test"
table_name: "t1"
}
key_field_metadata {
type: INTEGER
name: "id"
}
set_field_metadata {
type: VARCHAR
name: "padding"
}
}
update_data {
segment_id: 1
end_segment: true
record {
key_value: "2"
key_value: "I love it when a plan comes together"
after_value: "I love it when a plan comes together"
}
}
}
*************************** 5. row ***************************
trx: transaction_context {
server_id: 1
transaction_id: 9
start_timestamp: 1257888473929111
end_timestamp: 1257888473929116
}
statement {
type: RAW_SQL
start_timestamp: 1257888473929113
end_timestamp: 1257888473929115
sql: "DROP TABLE `t1`"
}
5 rows in set (0.06 sec)
FYI, if you look closely, you'll see some odd things — namely that there is a transaction with an ID of zero. I'm aware of this and am working on fixing it Wednesday, November 4. 2009The Great EscapeThis week, I am working on putting together test cases which validate the Drizzle transaction log's handling of BLOB columns.
I ran into an interesting set of problems and am wondering how to go about handling them. Perhaps the LazyWeb will have some solutions. The problem, in short, is inconsistency in the way that the NUL character is escaped (or not escaped) in both the MySQL/Drizzle protocol and the MySQL/Drizzle client tools. And, by client tools, I mean both everyone's favourite little mysql command-line client, but also the mysqltest client, which provides infrastructure and runtime services for the MySQL and Drizzle test suites. Even within the server and client protocol, there appears to be some inconsistency in how and when things are escaped. Take a look at this interesting output from the drizzle client program (FYI, output is identical for mysql client, I checked...) drizzle> select 'test\0me'; +---------+ | test | +---------+ | test me | +---------+ 1 row in set (0 sec) You'll notice that in the first SELECT statement, the column header is cut off — i.e. the column header is not escaping the \0 NUL character in the string 'test\0me'. However, the result data does not truncate the string but replaces the NUL character with a space character. So, I came to the conclusion that the drizzle client does not escape column headers but does do some sort of escaping for the result data. Given this conclusion, you will understand my raised eyebrow when the following SELECT statement was displayed: drizzle> select 'test\0me' = 'test me'; +------------------------+ | 'test\0me' = 'test me' | +------------------------+ | 0 | +------------------------+ 1 row in set (0 sec) Hmmm...so maybe column headers are being escaped by the MySQL/Drizzle client? Clearly, the NUL character was escaped as the characters '\\' followed by the character '0' in the column header above. Indeed, quite puzzling. OK, so the above anomaly needs to be investigated. However, a similar issue exists for the mysqltest/drizzletest client program. To see the problem, check the following out. I create a simple test case with the following in it:
--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings
SELECT 'test\0me';
CREATE TABLE t1 (fld BLOB NULL);
INSERT INTO t1 VALUES ('test\0me');
SELECT COUNT(*) FROM t1;
DROP TABLE t1;
Now, what you would expect to see for the output of the above — at least if you expect results similar to the MySQL/Drizzle client output — is the following:
DROP TABLE IF EXISTS t1;
SELECT 'test\0me';
test
test me
CREATE TABLE t1 (fld BLOB NULL);
INSERT INTO t1 VALUES ('test\0me');
SELECT COUNT(*) FROM t1;
COUNT(*)
1
DROP TABLE t1;
That is what you would expect to see in the output of course... Here is what you actually get in the output: DROP TABLE IF EXISTS t1; SELECT 'test\0me'; test test So, the mysqltest/drizzletest client apparently does not escape the NUL character for the result data at all. It looks like it does do some escaping/replacing for the NUL character in the column header, though, otherwise the second "test" line would not appear. This leads to the result file being essentially truncated as soon as a NUL character is included in any output to the mysqltest/drizzletest client. This essentially makes the mysqltest/drizzletest client useless for testing and validating BLOB data. Possible Solutions?I think the cleanest solution would be to create a shared library of code that would be responsible for uniformly and consistently escaping data, and then linking the various clients (and server) with this library and removing all of the various escaping functions currently in the server. This would, of course, take some time, but would be the most future proof solution. Anyone else have ideas on solving the problem of being able to test and validate binary data via the test suite? Cheers! Thursday, October 29. 2009A Month of MilestonesI'm finding myself smiling today. I lay in bed last night thinking about a number of milestones that this month marks for me. October 15th marked four months since the last time I had a cigarette. I feel good about my chances at remaining smoke-free for the remainder of my life.
October 18th marked one year since I officially began working on the Drizzle project. Although, as Giuseppe can attest to, I had been contributing to Drizzle before October 18th, 2008, that date was the official start. I think about how much has been accomplished by the Drizzle community since that time. The Drizzle of October 2008 is barely recognizable now. Monty's incredible work on the build system, Stewart's continued removal of legacy Unireg code like the FRM files, Eric Day joining the Sun Drizzle team and contributing amazing work on the protocol and client libraries, Monty's reworking of the plugin system, new datetime (temporal) work, Padraig O'Sullivan's enormous contributions in the arena of the INFORMATION_SCHEMA, the optimizer, runtime, replication, and memcached, and an automation system that provides per-commit regression feedback. It's truly fantastic to be part of a living, breathing, active project with so many special contributors who bring infectious enthusiasm to the world of database development. I'm privileged to be a part of it. And finally, Tuesday the 28th marked the day that the new transactional replication system hit Drizzle's trunk. While a ton of work is of course left to be done on the replication system, Tuesday's code hitting trunk was a big milestone that I'm really happy about. Anyway, just wanted to share some happiness. Cheers. Wednesday, October 28. 2009If Ever There Was a Sure-fire Tenant...My wife and I have a double that we rent out to two couples. Luckily, one of these couples has been in one side of the double for a couple years now. They are quite stable, and are excellent renters, at least as much as a landlord likes. Stability == good for landlords. I was reviewing some code today, and a thought crossed my mind that sparked my landlord brain. I was staring at the copyright and license header in a Drizzle plugin and it struck me... If ever there was a steady, rock-solid tenant, I would guess that it would be the occupant of this address: 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Now, I don't know if the Free Software Foundation owns the address above, or if they rent. But, it occurred to me that if they do rent the fifth floor of that building, that the owner of that building must have one of the most trustworthy and reliable tenants ever.
I mean, we all know how much of a pain in the ass it is to move households. You've got to notify everyone about the new address, find friends who will help move furniture for the price of a six-pack, etc. But think of the giant problem the FSF would have if it ever decided to move. Think of the tens (hundreds?) of thousands of source code files which would suddenly have an erroneous address. I wonder if the owner of that Franklin Street building has thought of this, and has smiled, knowing just how much of a pain changing addresses would be for the FSF. Hmmm, food for thought. Tuesday, October 27. 2009Drizzle Replication - The Transaction LogIn this installment of my Drizzle Replication blog series, I'll be talking about the Transaction Log. Before reading this entry, you may want to first read up on the Transaction Message, which is a central concept to this blog entry. The transaction log is just one component of Drizzle's default replication services, but it also serves as a generalized log of atomic data changes to a particular server. In this way, it is only partially related to replication. The transaction log is used by components of the replication services to store changes made to a server's data. However, there is nothing that mandates that this particular transaction log be a required feature for Drizzle replication systems. For instance, Eric Lambert is currently working on a Gearman-based replication service which, while following the same APIs, does not require the transaction log to function. Furthermore, other, non-replication-related modules may use the transaction log themselves. For instance, a future Recovery and/or Backup module may just as easily use the transaction log for its own purposes as well. Before we get into the details, it's worth noting the general goals we've had for the transaction log, as these goals may help explain some of the design choices made. In short, the goals for the transaction log are:
Overview of the Transaction Log Structure
Each entry in the transaction log is preceded by a 4 bytes containing an integer code identifying the type of entry to follow. The bytes which follow this type header are interpreted based on the type of entry. For entries of type Transaction message, the graphics here show the layout of the entry in the log. First, a 4 byte length header is written, then the serialized Transaction message, then a 4 byte checksum of the serialized Transaction message. Details of the TransactionLog::apply() MethodFor those interested in how the transaction log is written to, I'm going to detail the apply() method of the TransactionLog class in /plugin/transaction_log/transaction_log.cc. The TransactionLog class is simply a subclass of plugin::TransactionApplier and therefore must implement the single pure virtual apply method of that class interface. The TransactionLog class has a private drizzled::atomic<off_t> called log_offset which is an offset into the transaction log file that is incremented with each atomic write to the log file. You will notice in the code below that this atomic off_t is stored locally, then incremented by the total length of the log entry to be written. A buffer is then written to the log file using pwrite() at the original offset. In this way, we completely avoid calling pthread_mutex_lock() or similar when writing to the log file, which should increase scalability of the transaction log.
Reading the Transaction LogOK, so the above code shows how the transaction log is written. What about reading the log file? Well, it's pretty simple. There is an example program in /drizzle/message/transaction_reader.cc which has code showing how to do this. Here's a snippet from that program:
Shortcomings of the Transaction LogSo far, we've generally focused on a scalable design for the transaction log and have not spent too much time on performance tuning the code — and yes, performance != scalability. There are a number of problems with the current code which we will address in future versions of the transaction log. Namely:
Summary and Request for CommentsThat's it for the discussion about the transaction log. I'll post some more code examples from the replication plugins which utilize the transaction log in a later blog entry.
What do you think of the design of the transaction log? What would you change? Comments are always welcome! Cheers. Drizzle Replication - Changes in API to support Group CommitHi all. It's been quite some time since my last article on the new replication system in Drizzle. My apologies for the delay in publishing the next article in the replication series. The delay has been due to a reworking of the replication system to fully support "group commit" behaviour and to support fully transactional replication. The changes allow replicator and applier plugins to understand much more about the actual changes which occurred on the server, and to understand the transactional container properly. The goals of Drizzle's replication system are as follows:
This article serves to build on the last article and explain the changes to the Google Protobuffer message definitions used in the replication API. The actual replication API described in the last article remains almost the same. However, instead of being named CommandApplier and CommandReplicator, those plugin base classes are now named TransactionApplier and TransactionReplicator respectively. And, instead of consuming a Command message, they consume Transaction messages.
New Message DefinitionsAs I mentioned above, the Command message previously discussed in the first replication article, has been changed in favour of a more space-efficient and transactional message format. The proto file is now called /drizzled/message/transaction.proto. You can look at the proto file online. The Command Message has become the Statement message, and a new Transaction message serves as a container for multiple Statement messages representing (for most cases) an atomic change in the state of the database server. I'll discuss later in the article those specific cases where a Transaction message's contents may contain only a partial atomic change to the server. The image to the right depicts the Transaction message container. As you can see, the Transaction message contains two things: a TransactionContext message and an array of one or more Statement messages. The TransactionContext MessageEach Transaction message contains a single TransactionContext message. The TransactionContext message contains information about the entire transaction. The data members of the TransactionContext are as follows:
Since TransactionContext is simply a Google Protobuffer message, accessing data members is simple and straightforward. If you're writing a replicator or applier, a reference to a const Transaction message will be supplied to you via the standard API. For instance, let's assume we're writing a replicator and we want to filter all messages that are from the server with a server_id of 100. Kind of a silly example, but nevertheless, it allows us to see some example code. As you may remember, the API for a replicator is dirt simple. There is a replicate() pure virtual method which accepts two parameters, the GPB message and a reference to the Applier which will "apply" the message to some target. The new function signature is the same as the last one, with the term "Command" replaced with the term "Transaction":
Suppose our replicator class is called MyReplicator. Here is how to query the transaction context of the Transaction message and filter out transactions coming from server #100.
See? Pretty darn simple. The Statement MessageAs noted above, the Transaction message contains an array of Statement messages. In Protobuffer terminology, the Transaction message contains a "repeated" Statement data member. The Statement message is an envelope containing the following information:
To access the Statement messages in a Transaction, use something like the following code, which loops over the Transaction message's vector of Statement messages:
Serialized Polymorphism with the type MemberThe type data member is of critical importance to the Statement message, as it allows us to have a sort of polymorphism serialized within the Statement message itself. This polymorphism allows the generic Statement message to contain specialized submessages depending on what type of event occurred on the server. The above paragraph probably sounds overly complicated, but in reality things are pretty simple. As usual, it's easiest to see what's going on by looking at an example in code. For our example, let's build out our fictional processStatement() method from the snippet above. The processStatement() method is basically a giant switch statement, switching off of the supplied Statement message parameter's type data member property. Here is the outline of the processStatement() method, with only our switch statement and some comments visible which should give you an idea of how we deal with specific types of Statements:
The /drizzled/message/transaction.proto file will always contain lots of documentation explaining how each of the specific submessages in the Statement message class are handled. To the right is a graphic depicting the InsertHeader and InsertData message classes which compose the "meat" of Statements that inserted new records into the database. Whenever the Statement message's type is INSERT, the Statement message will contain two submessages, one called insert_header and another called insert_data which will be populated with the InsertHeader and InsertData messages. The header message will contain information about the table and fields affected, while the data message will contain the values to be inserted into the table. Here is some example code which queries the header and data messages and constructs an SQL string from them:
The example code above is far from production-ready, of course. I don't take into account different field types, instead simply enclosing everything in single quotes. Also, I don't handle errors or escaping strings. The point isn't to be perfect, but to show you the general way to get information out of the Statement message... Partial Atomic Transactions
Above, I stated that the Transaction messages sent to Replicators and Appliers represent an atomic change to the state of a server. This is true, most of the time. There are times when it is simply inefficient or impossible to create a Transaction message that represents the actual atomic change on a server. For instance, imagine a table having 100 million records. Now, imagine issuing an UPDATE against that table that potentially affected every row in the table. In order to transmit to replicas the atomic change to the server, one gigantic Transaction message would need to be constructed on the master server. Not only is there a distinct chance that the master would run out of memory constructing such a large message object, but it's safe to say that the master server would suffer from performance degradation during this construction. There must, therefore, be a way to start streaming the changes made to the master server before the actual final commit has happened on the master. You may have noticed two data members of the InsertData message above named segment_id and end_segment. The first is of type uint32_t and the second is a bool. Together, these two data members fulfill the need to transmit transaction messages that are part of a bulk data modification. When a reader of a Transaction message sees that the end_segment data member is false, then the reader knows that another data segment will follow the current data message and will contain more inserts, updates, or deletes for the current transaction. Summary and Request for CommentsHopefully, I've explained the changes that have been made to Drizzle's replication system well enough above, but I understand the changes to the message definitions are substantial and am available at any time to discuss the changes and assist people with their code. You can find me on IRC, Freenode's #drizzle channel, via the Drizzle discussion mailing list, or via email joinfu@sun.com. I very much welcome comments. The new replication system is just finishing up the valgrind regression tests and should hit trunk later today. The next article covers the new Transaction Log, which is a serialized log of the Transaction messages used in the replication system. Friday, October 23. 2009Yet Another Post on REPLACESometimes, as Sergei rightly mentioned, I can be, well, "righteously indignant" about what I perceive to be a hack. In this case, after Sergei repeatedly tried to set me straight about what was going on "under the covers" during a REPLACE operation, I was still arguing that he was incorrect. Doh. I then realized that Sarah Sproenhle's original comment about my test table not having a primary key was the reason that I was seeing the behaviour that I had been seeing. My original test case was failing, expecting to see a DELETE + an INSERT, when a REPLACE INTO was issued against a table. When I placed the PRIMARY KEY on the table in my test case and re-ran the test case, it still failed because the DELETE still was not in the transaction log. Well, it turns out that the reason was because ha_update_row() was actually called and not ha_delete_row() + ha_write_row(). And, because of the documentation for the REPLACE command, I wasn't checking that ha_update_row() may have been called — since I didn't realize a REPLACE could actually do an UPDATE. Anyway, I wanted to post to say that most of this whole kerfuffle was my fault. Though I think that both the online and code documentation should reflect the fact that a REPLACE can do an UPDATE, the source of the failure was not what I originally wrote. In contrast, ha_write_row() does indeed return ER_FOUND_DUPP_KEY appropriately during a REPLACE call. Mmmmm, that piece of humble pie was delicious. Thursday, October 22. 2009The Deal with REPLACE .. Or Is It UPDATE?Yesterday, I posed a question to the ZanyWeb about what exactly a REPLACE statement does behind the scenes in the storage engine. There were many excellent comments and these comments exposed some misunderstandings (including some of my own misconceptions) about the REPLACE statement itself and what goes on behind the scenes in the storage engine. The question I asked was this: if I execute the following statements in a client, what would you expect would happen behind the scenes in the storage engine? CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , padding VARCHAR(200) NOT NULL ); INSERT INTO t1 VALUES (1, "I love testing."); INSERT INTO t1 VALUES (2, "I hate testing."); REPLACE INTO t1 VALUE (2, "I love testing."); Based purely on the manual, one would expect, as Ryan Thiessen expressed in his comment try to insert once, detect a failure and then delete/insert in failure case. or, as Ryan put it in pseudo-code:
INSERT INTO t1 VALUES (1, "I love testing.");
INSERT INTO t1 VALUES (2, "I hate testing.");
INSERT INTO t1 VALUE (2, "I love testing.");
if error() { DELETE FROM t1 WHERE id = 2; INSERT INTO t1 VALUE (2, "I love testing."); }
Unfortunately, this is not the case. At least, it is not always the case. ha_write_row() vs. ha_update_row()In MySQL, as well as in Drizzle, there is a pluggable storage engine API. This API consists primarily of two classes: In MySQL, one class is called handlerton and the other is called handler — in Drizzle, these classes are called StorageEngine and Cursor (Current Set Of Records), because that is what they actually represent. Without going into too many details, the Cursor (handler) interface has four calls which are relevant to this discussion:
OK, So What Does Happen?What actually happens behind the scenes for the code above is the following "optimized" execution path, again in pseudo-code: // (0) Enter mysql_insert() and prepare to do an INSERT mysql_insert(); // (1) tell engine that a write can replace an existing record Cursor::extra(HA_EXTRA_WRITE_CAN_REPLACE); // (2) tell engine to ignore duplicate keys Cursor::extra(HA_EXTRA_IGNORE_DUP_KEY); // (3) For each record to be inserted... for each record begin: // (4) Insert the record by calling ha_write_row() Cursor::ha_write_row(); // (5) Cursor tries to "insert" the record error= Cursor::write_row(); // (6) If the write fails, then DELETE, the record and try to INSERT again. if error is true: Cursor::ha_delete_row(); Cursor::ha_write_row(); The above code looks very similar to what Ryan has already said he expected the underlying code to look like, no? The problem is the calls to Cursor::extra() (steps 1 and 2) and what happens inside InnoDB (step 5) when Cursor::write_row() is called. InnoDB tries to insert the record and realizes that the new record violates an existing primary key value but since MySQL has already told it to ignore duplicate key violations, InnoDB updates the existing row and returns successfully from write_row()!. Unfortunately, because of this "optimization", Cursor::ha_write_row(), which is the kernel's wrapper around the virtualized storage engine's Cursor::write_row call is completely unaware that an UPDATE and not an INSERT has occurred. Why is this a problem? Well, what happens after a call to Cursor::write_row() succeeds? That's correct: post-change logging occurs. In other words, logging for row-based replication occurs. Well, Cursor::ha_write_row() then logs an INSERT and not the appropriate UPDATE. Oops. There are three major things to note about the above logic:
When Optimizations Aren'tThis kind of coding can be argued to be an "optimization". I will argue that it is not an optimization, but is overly clever and reduces the clarity of the code thereby making the code unnecessarily difficult to trace and follow. Personally, I wasted days of research time trying to understand how a call to ha_write_row() was actually updating an existing record. In my opinion, a proper optimization would be to:
Does REPLACE belong in Drizzle?One of the reasons that REPLACE may exist in MySQL is to make the INSERT statement idempotent when run on a replication slave when run in mixed or statement-based mode. Don't believe me? Here is a code snippet and comment from the sql/log_event.cc file in MySQL 5.4: Write_rows_log_event::do_before_row_operations(const Slave_reporting_capability *const) ... /** todo: to introduce a property for the event (handler?) which forces applying the event in the replace (idempotent) fashion. */ if (bit_is_set(slave_exec_mode, SLAVE_EXEC_MODE_IDEMPOTENT) == 1 || m_table->s->db_type()->db_type == DB_TYPE_NDBCLUSTER) { /* We are using REPLACE semantics and not INSERT IGNORE semantics when writing rows, that is: new rows replace old rows. We need to inform the storage engine that it should use this behaviour. */ /* Tell the storage engine that we are using REPLACE semantics. */ thd->lex->duplicates= DUP_REPLACE; /* Pretend we're executing a REPLACE command: this is needed for InnoDB and NDB Cluster since they are not (properly) checking the lex->duplicates flag. */ thd->lex->sql_command= SQLCOM_REPLACE; /* Do not raise the error flag in case of hitting to an unique attribute */ m_table->file->extra(HA_EXTRA_IGNORE_DUP_KEY); ... By contrast to MySQL 5.4, Drizzle's replication system is a reflection of the atomic changes made to the state of a server. These changes are, by definition, deterministic, and applying a transaction log to a Drizzle replica can result in one and only one state in the resulting server. In Drizzle, having the application of the transaction log be idempotent is not necessary, as the transaction log stream is always deterministic. For instance, if an INSERT ... ON DUPLICATE KEY UPDATE is issued against a Drizzle server, the replication system shall create either an InsertStatement or an UpdateStatement message, depending on what precisely occurred on the server, and these messages are logged in Drizzle's transaction log. There is no fudging of statements. What happened on the server is logged as an exact change in the state of the server, nothing more.
So...anyway, because of the above "optimization", I'm now left with a tricky problem: continue to pay the interest on this Technical Debt or rework the Cursor interface so that behaviour is always idempotent via the wrapper interface and has no side effects. I'm not sure what I will do...but this has delayed me substantially. Wednesday, October 21. 2009Pop Quiz - What Does REPLACE Do?Hi ZanyWeb. Here's a pop quiz for you, and the answer may surprise you. The MySQL manual states the following about the REPLACE statement: REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted. Sounds pretty clear to me. If a row with the same primary key exists, it is deleted and then a new row is inserted. So, given the above, if I execute the following statements in a client, what would you expect would happen behind the scenes in the storage engine? CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , padding VARCHAR(200) NOT NULL ); INSERT INTO t1 VALUES (1, "I love testing."); INSERT INTO t1 VALUES (2, "I hate testing."); REPLACE INTO t1 VALUE (2, "I love testing."); If you're like me, you would expect the above to actually execute the following changes: CREATE TABLE t1 ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY , padding VARCHAR(200) NOT NULL ); INSERT INTO t1 VALUES (1, "I love testing."); INSERT INTO t1 VALUES (2, "I hate testing."); DELETE FROM t1 WHERE id = 2; INSERT INTO t1 VALUE (2, "I love testing."); But, this is not actually what happens.
Would anyone like to guess what actually happens? Add your guess to the comments on this entry. I'll post the answer later today. Friday, August 14. 2009Drizzle Replication - The CommandReplicator and CommandApplier Plugin API
IMPORTANT:
OK, so here is the next installment in the Drizzle replication article series. Today, I'll be talking about the flow of the Command message object through the CommandReplicator and CommandApplier APIs. If you missed the first article about the structure of the Command message and Google Protobuffers, you may want to read that first. We'll only be talking in this article about what happens on one server. We will be discussing the Command Log in the next article, and then discuss how messages are passed from one server to another. But, before we discuss those things, it is critical to first understand the CommandReplicator and CommandApplier plugin classes, which are two of the abstract interfaces out of which replication modules can be built (and from which a number of modules have already been built).
This article is out of date and the replication API has been updated. Please see the follow-up article for the most up to date information! In this article, I'll be showing code that was written by myself and by Padraig O'Sullivan, another Drizzle contributor. Flow of Command messagesSIDEBAR: Namespaces in DrizzleThe astute reader may have noticed that there is quite a big difference in the organization of the Drizzle codebase versus MySQL. Drizzle uses C++ namespaces to make the code easier to read and understand. New code written for Drizzle is always namespaced in a namespace corresponding to the exact directory structure in which you find the source files for a particular class. For instance, all plugin interfaces classes (abstract base classes) are in the drizzled::plugin namespace and are found in /drizzled/plugin/. All protobuffer message classes are defined in /drizzled/message/ and are in the namespace drizzled::message. We feel organizing the code like so, and using C++ namespaces, makes the code clearer and easier to read. You can read more about Drizzle's coding style, including how we use namespaces on our wiki. Command messages are created by a component within the Drizzle kernel whenever any SQL statement which modifies the state of the server is executed. This component is called ReplicationServices. This component is a singleton object and manages communication between the kernel and two kinds of plugins, called CommandReplicator and CommandApplier. The component is declared in drizzled/replication_services.h and defined in drizzled/replication_services.cc.
The advantage of the ReplicationServices component frees replication module developers from having to understand anything about what's going on in the kernel. You don't have to understand anything about the Session object, how statements are executed in the kernel, or even how storage engines are actually applying changes to their data. All you need to learn is the structure of the Command message (covered in the previous article) and the very, very simple CommandReplicator and CommandApplier plugin APIs. So, let's look at those APIs. The CommandReplicator Plugin APIHere is the abstract base class drizzled::plugin::CommandReplicator, available in drizzled/plugin/command_replicator.h. /** * Class which replicates Command messages */ class CommandReplicator { public: CommandReplicator() {} virtual ~CommandReplicator() {} /** * Replicate a Command message to a CommandApplier. * * @note * * It is important to note that memory allocation for the * supplied pointer is not guaranteed after the completion * of this function -- meaning the caller can dispose of the * supplied message. Therefore, replicators and appliers * implementing an asynchronous replication system must copy * the supplied message to their own controlled memory storage * area. * * @param Pointer to the applier of the command message * @param Command message to be replicated */ virtual void replicate(CommandApplier *in_applier, drizzled::message::Command &to_replicate)= 0; };
See, I told you it was simple The replicate() method accepts only two parameters. The first is a pointer to an object which implements the CommandApplier interface (inherits from drizzled::plugin::CommandApplier). The second is the Command message that the ReplicationServices component constructs when a data-modification event occurs on the server. The CommandApplier Plugin APIBefore I show some implementation of CommandReplicators, let's first quickly take a look at the drizzled::plugin::CommandApplier interface. It is equally simple. /** * Base class for appliers of Command messages */ class CommandApplier { public: CommandApplier() {} virtual ~CommandApplier() {} /** * Apply something to a target. * * @note * * It is important to note that memory allocation for the * supplied pointer is not guaranteed after the completion * of this function -- meaning the caller can dispose of the * supplied message. Therefore, appliers which are * implementing an asynchronous replication system must copy * the supplied message to their own controlled memory storage * area. * * @param Command message to be applied */ virtual void apply(const drizzled::message::Command &to_apply)= 0; }; The CommandApplier quite predictably implements a single method: apply(), which accepts a single parameter of a Command message. Yes, it's that simple. Putting it All TogetherBefore we get to the example implementation of the above plugin interfaces, it's worth pointing out that Drizzle's replication system allows multiple CommandReplicator and CommandApplier plugins to register themselves with the ReplicationServices component. The ReplicationServices::push() internal method simply loops through the active replicators and appliers, calling the replicate() method, passing in a constructed Command message and a pointer to each registered applier. Here is the code showing this process:
void ReplicationServices::push(message::Command &to_push)
{
vector
Pretty simple, no? There's a couple big things to point out about the code you've seen so far. First, Drizzle is focused on providing clean, simple, and encapsulated interfaces for plugin developers. There's no need for plugin developers to understand the often-messy and poorly documented internals of the kernel. A plugin developer only needs to do two things:
Secondly, the interface allows individuals and companies to completely customize how Drizzle replication works for them. You want replication to do X, Y, and Z? Fine, no problem. We give you some example plugins which clearly demonstrate the interfaces, and you can take it from there, without worrying about messing up the kernel. So, now that you've seen the plugin interfaces, let's discuss how these plugin interfaces are implemented in a couple plugins that are distributed with Drizzle right now. The FilteredReplicator PluginAfter I wrote the default replicator plugin (which literally does nothing but the following:
void DefaultReplicator::replicate(plugin::CommandApplier *in_applier, message::Command &to_replicate)
{
/*
* We do absolutely nothing but call the applier's apply() method, passing
* along the supplied Command. Yep, told you it was simple...
*
* Perfectly fine to use const_cast<> below. All that does is allow the replicator
* to conform to the CommandApplier::apply() API call which dictates that the applier
* shall never modify the supplied Command message argument. Since the replicator
* itself *can* modify the supplied Command message, we use const_cast<> here to
* set the message to a readonly state that the compiler will like.
*/
in_applier->apply(const_cast
Padraig O'Sullivan took the reins and wrote a FilteredReplicator plugin which allows a DBA to filter Command messages by schema name, table name, and a regular expression. His FilteredReplicator is much more interesting than the default replicator, so let's take a look at some of his code. Here is the replicate() method from the FilteredReplicator plugin:
void FilteredReplicator::replicate(plugin::CommandApplier *in_applier,
message::Command &to_replicate)
{
string schema_name;
string table_name;
/*
* First, we check to see if the command consists of raw SQL. If so,
* we need to parse this SQL and determine whether to filter the event
* based on the information we obtain from the parsed SQL.
* If not raw SQL, check if this event should be filtered or not
* based on the schema and table names in the command message.
*/
if (to_replicate.type() == message::Command::RAW_SQL)
{
parseQuery(to_replicate.sql(),
schema_name,
table_name);
}
else
{
schema_name.assign(to_replicate.schema());
table_name.assign(to_replicate.table());
}
/*
* Convert the schema name and table name strings to lowercase so that it
* does not matter what case the table or schema name was specified in. We
* also keep all entries in the vectors of schemas and tables to filter in
* lowercase.
*/
std::transform(schema_name.begin(), schema_name.end(),
schema_name.begin(), ::tolower);
std::transform(table_name.begin(), table_name.end(),
table_name.begin(), ::tolower);
if (isSchemaFiltered(schema_name) ||
isTableFiltered(table_name))
{
return;
}
/*
* We can now simply call the applier's apply() method, passing
* along the supplied command.
*/
in_applier->apply(to_replicate);
}
The above code is quite easy to read and understand. I encourage you to check out the rest of Padraig's implementation in /plugin/filtered_replicator/filtered_replicator.cc and see just how easy it is to start writing new Drizzle replication modules. The CommandLog::apply() MethodTo demonstrate a CommandApplier implementation, I present the apply() method of the CommandLog plugin, which inherits from drizzled::plugin::CommandApplier. Without going into too much about the Command Log, which is the subject of the next article, I'll let you take a look at just the apply() method, to see how a CommandApplier does its main job (see /plugin/command_log/command_log.cc):
void CommandLog::apply(const message::Command &to_apply)
{
string buffer(""); /* Buffer we will write serialized command to */
static const uint32_t HEADER_TRAILER_BYTES= sizeof(uint64_t) + /* 8-byte length header */
sizeof(uint32_t); /* 4 byte checksum trailer */
size_t length;
ssize_t written;
off_t cur_offset;
to_apply.SerializeToString(&buffer);
length= buffer.length();
/*
* Do an atomic increment on the offset of the log file position
*/
cur_offset= log_offset.fetch_and_add((off_t) (HEADER_TRAILER_BYTES + length));
/*
* We adjust cur_offset back to the original log_offset before
* the increment above...
*/
cur_offset-= (off_t) (HEADER_TRAILER_BYTES + length);
/*
* Quick safety...if an error occurs below, the log file will
* not be active, therefore a caller could have been ready
* to write...but the log is crashed.
*/
if (unlikely(state == CRASHED))
return;
/* We always write in network byte order */
unsigned char nbo_length[8];
int8store(nbo_length, length);
/* Write the length header */
do
{
written= pwrite(log_file, nbo_length, sizeof(uint64_t), cur_offset);
}
while (written == -1 && errno == EINTR); /* Just retry the write when interrupted by a signal... */
if (unlikely(written != sizeof(uint64_t)))
{
errmsg_printf(ERRMSG_LVL_ERROR,
_("Failed to write full size of command. Tried to write %" PRId64 " bytes at offset %" PRId64 ", but only wrote %" PRId64 " bytes. Error: %s\n"),
sizeof(int64_t),
(int64_t) cur_offset,
(int64_t) written,
strerror(errno));
state= CRASHED;
/*
* Reset the log's offset in case we want to produce a decent error message including
* the original offset where an error occurred.
*/
log_offset= cur_offset;
is_active= false;
return;
}
cur_offset+= (off_t) written;
/*
* Quick safety...if an error occurs above in another writer, the log
* file will be in a crashed state.
*/
if (unlikely(state == CRASHED))
{
/*
* Reset the log's offset in case we want to produce a decent error message including
* the original offset where an error occurred.
*/
log_offset= cur_offset;
return;
}
/* Write the command message itself */
do
{
written= pwrite(log_file, buffer.c_str(), length, cur_offset);
}
while (written == -1 && errno == EINTR); /* Just retry the write when interrupted by a signal... */
if (unlikely(written != (ssize_t) length))
{
errmsg_printf(ERRMSG_LVL_ERROR,
_("Failed to write full serialized command. Tried to write %" PRId64 " bytes at offset %" PRId64 ", but only wrote %" PRId64 " bytes. Error: %s\n"),
(int64_t) length,
(int64_t) cur_offset,
(int64_t) written,
strerror(errno));
state= CRASHED;
log_offset= cur_offset;
is_active= false;
}
cur_offset+= (off_t) written;
/*
* Quick safety...if an error occurs above in another writer, the log
* file will be in a crashed state.
*/
if (unlikely(state == CRASHED))
{
/*
* Reset the log's offset in case we want to produce a decent error message including
* the original offset where an error occurred.
*/
log_offset= cur_offset;
return;
}
uint32_t checksum= 0;
if (do_checksum)
{
checksum= crc32(0L, (unsigned char *) buffer.c_str(), length);
}
/* We always write in network byte order */
unsigned char nbo_checksum[4];
int4store(nbo_checksum, checksum);
/* Write the checksum trailer */
do
{
written= pwrite(log_file, nbo_checksum, sizeof(uint32_t), cur_offset);
}
while (written == -1 && errno == EINTR); /* Just retry the write when interrupted by a signal... */
if (unlikely(written != (ssize_t) sizeof(uint32_t)))
{
errmsg_printf(ERRMSG_LVL_ERROR,
_("Failed to write full checksum of command. Tried to write %" PRId64 " bytes at offset %" PRId64 ", but only wrote %" PRId64 " bytes. Error: %s\n"),
(int64_t) sizeof(uint32_t),
(int64_t) cur_offset,
(int64_t) written,
strerror(errno));
state= CRASHED;
log_offset= cur_offset;
is_active= false;
return;
}
}
Conclusion, for Now...I do hope this article has been useful in getting to know how the Drizzle replication system passes the Command message around within a single server. You got to see the replicator and applier plugin APIs and example implementations of those APIs in the form of the FilteredReplicator and CommandLog classes. Next, we'll be looking in detail at the Command Log itself, its format, and writing/reading from it.
Please do email me (joinfu@sun.com) or feel free to post to the Drizzle Discussion list any criticism, suggestions, or requests. I'm eager to hear from you. Wednesday, August 12. 2009Drizzle Replication - The Command Message
IMPORTANT:
I wanted to start writing about how Drizzle's new replication system works, how its internals are structured, how logs are formatted, what are its (current) limitations, what are planned features, how you can get involved in development, and a lot more. Before jumping in, you may want to read a quick overview about the concepts of Drizzle replication here.
This article is out of date and the replication API has been updated. Please see the follow-up article for the most up to date information! Fortunately, some advice from my friend Edwin DeSouza got me back to reality: "Jay, do a series of small, targeted, easily digestible blog posts". And, so, this is the first in a series of quick blog posts about Drizzle replication. Today, I'll start at the very beginning and talk about the basic unit of "currency" in Drizzle's replication system: the Command Message. BackgroundWhen two servers in a replicated environment must communicate with each other about changes in the state of one server, they must do so by sending raw bytes of information across some port or socket. One server must "package up" information about what changed and send it to the other server. The receiving server must then "unpack" the data and interpret it, subsequently applying the data change to its own schema.
In the image above, all of the arrows represent Command Message objects being passed between components within a single server or across the wire to another server. This packing and unpacking of raw bytes is fairly complicated, and can get downright confusing when you take into account that different hardware architectures store raw bytes in different orders. Code needed to interpret raw streams of data is complex and, more importantly, makes the resulting code difficult to read. Furthermore, what we want to focus on in development is what to do with the unpacked data messages, not how to pack, unpack, and serialize the darn things. Google Protobuffers to the RescueAt Drizzle, we're all about laziness. We don't reinvent wheels if there is an excellent library that already does what we need. It just so happens that a library exists that can solve the low-level problems of packing (serializing), unpacking (deserializing or parsing), and "versioning" formatted binary streams of data so that raw data can change its structure without having to re-implement all new serialization and deserialization routines. Enter Google Protobuffers. The Google Protobuffers library (just GPB from now on) allows you to create a text file called a "proto" that contains a blueprint of a "message" class that represents some piece of data. This proto file is then consumed by the protoc program, which generates code files in a variety of programming languages. The code files contain class definitions of the message you define in your proto file. This may all seem a bit strange until I show you exactly how it all works...and I will do so shortly for a C++ example. But first, let's take a look at the proto file (the blueprint if you will) of the Command message, which stores information about data changes occurring on a server. Here is a section of the proto file, stored in /drizzled/message/replication.proto: import "table.proto"; package drizzled.message; option optimize_for = SPEED; /* Context for a transaction. */ message TransactionContext { required int32 server_id = 1; required uint64 transaction_id = 2; /* Globally-unique transaction ID */ } /* Insert one record into a single table. */ message InsertRecord { repeated Table.Field insert_field = 3; repeated bytes insert_value = 4; } /* Update one record in a single table. */ message UpdateRecord { repeated Table.Field update_field = 3; repeated bytes before_value = 4; repeated bytes after_value = 5; repeated Table.Field where_field = 6; repeated bytes where_value = 7; } /* Deletes one record in a single table */ message DeleteRecord { repeated Table.Field where_field = 3; repeated bytes where_value = 4; } /* A component of a transaction -- a single instruction or command */ message Command { enum Type { START_TRANSACTION = 0; /* A START TRANSACTION statement */ COMMIT = 1; /* A COMMIT statement */ ROLLBACK = 2; /* A ROLLBACK statement */ INSERT = 3; /* An insert of a single record */ DELETE = 4; /* A delete of a single record */ UPDATE = 5; /* An update of a single record */ RAW_SQL = 6; /* A raw SQL statement */ } required Type type = 1; required uint64 timestamp = 2; /* A nanosecond precision timestamp */ /* Transaction Context is duplicated here so that Commands may be sent over the wire separately from the rest of the Commands in a transaction. */ required TransactionContext transaction_context = 3; optional string schema = 4; /* The schema affected */ optional string table = 5; /* The table affected */ optional string sql = 6; /* May contain the actual SQL supplied for the original statement */ /* The below implement the actual change. Each Command will have zero or one of the below sub-messages defined. */ optional InsertRecord insert_record = 7; optional DeleteRecord delete_record = 8; optional UpdateRecord update_record = 9; } The above proto file format defines the members of the Command message and some sub-messages that are pieces of a Command message. There are exactly three required elements of a Command message:
There are a number of optional members of the Command class, including the actual SQL string used in the original statement, and a set of optional sub-message classes that contain specialized data for certain types of commands. Creating Generated Source FilesThere's probably a lot of readers currently asking themselves (or yelling at their blog reader) what the heck is the big deal about the above...I mean, it looks just like a simple definition of a POD (plain old data) class. Well, the real magic happens when you compile the above proto file into a source code file, and are able to take advantage of the GPB framework. Let's do that now. We use the protoc program, passing in a couple simple arguments. Here, I'll generate a source file with code creating C++ classes for the message definitions in our proto file. Drizzle keeps all the proto definition files for its message classes in the /drizzled/message/ subdirectory, so I'll change to that directory and generate the source files: jpipes@serialcoder:~/repos/drizzle/trunk$ cd drizzled/message/ jpipes@serialcoder:~/repos/drizzle/trunk/drizzled/message$ protoc --cpp_out=. replication.proto jpipes@serialcoder:~/repos/drizzle/trunk/drizzled/message$ ls -lah replication.* -rw-r--r-- 1 jpipes jpipes 95K 2009-08-12 06:48 replication.pb.cc -rw-r--r-- 1 jpipes jpipes 66K 2009-08-12 06:48 replication.pb.h -rw-r--r-- 1 jpipes jpipes 3.4K 2009-08-03 10:16 replication.proto The --cpp_out=. simply directs protoc to generate C++ files, and look for proto files in the current directory. As you can see, protoc generated two files: replication.pb.h and replication.pb.cc. Here's the beauty of GPB: there's absolutely no reason for you to ever look at the code in either of these generated files at all, unless you are just curious as to the style of code that GPB generates. The excellent online documentation for GPB contains all the information you'll ever need in order to use these generated code files. All C++ classes generated by GPB will always follow an identical interface, so there is no reason to look at the generated files at all. We can always look at our .proto file (which is much simpler and shorter) to determine what methods our generated classes will have for us to use. I already stated that the advantage of GPB is that serialization and deserialization is already done for you. Let's look at some code that is in the Drizzle distribution which demonstrates using the Command message. Constructing Command MessagesInside Drizzle's kernel, there is a component called drizzled::ReplicationServices which is responsible for constructing Command messages when data change events occur in the server and then passing these Command messages to replicators that are listening for change events. Here is some code from the drizzled::ReplicationServices::insertRecord() method which constructs a Command message.
#include <drizzled/replication.pb.h>
...
void ReplicationServices::setCommandTransactionContext(message::Command &in_command,
Session *in_session) const
{
message::TransactionContext *trx= in_command.mutable_transaction_context();
trx->set_server_id(in_session->getServerId());
trx->set_transaction_id(in_session->getTransactionId());
in_command.set_session_id((uint32_t) in_session->getSessionId());
}
...
void ReplicationServices::insertRecord(Session *in_session, Table *in_table)
{
...
message::Command command;
command.set_type(message::Command::INSERT);
command.set_timestamp(in_session->getCurrentTimestamp());
setCommandTransactionContext(command, in_session);
const char *schema_name= in_table->getShare()->db.str;
const char *table_name= in_table->getShare()->table_name.str;
command.set_schema(schema_name);
command.set_table(table_name);
/*
* Now we construct the specialized InsertRecord command inside
* the message::Command container...
*/
message::InsertRecord *change_record= command.mutable_insert_record();
Field *current_field;
Field **table_fields= in_table->field;
String *string_value= new (in_session->mem_root) String(ReplicationServices::DEFAULT_RECORD_SIZE);
string_value->set_charset(system_charset_info);
message::Table::Field *current_proto_field;
/* We will read all the table's fields... */
in_table->setReadSet();
while ((current_field= *table_fields++) != NULL)
{
current_proto_field= change_record->add_insert_field();
current_proto_field->set_name(current_field->field_name);
current_proto_field->set_type(message::Table::Field::VARCHAR); /* @TODO real types! */
string_value= current_field->val_str(string_value);
change_record->add_insert_value(string_value->c_ptr());
string_value->free();
}
push(command);
}
The code above shows just how easy it is to construct a Command message containing information about a new record that has been inserted into a schema on the server. We start by simply creating a message::Command variable on the stack, named "command". We then set the required member variables of the Command by calling the set_type() and set_timestamp() methods of the Command message and the set_server_id() and set_transaction_id() methods of the TransactionContext submessage of the Command object. When you create any GPB message class, any scalar member variables (variables that are not other message classes or arrays of things), the class will always have "setter" methods that follow the naming convention set_xxx(), where xxx is the exact name of the field. In the above case, you can clearly see the correlation between the "type" data member and the set_type() method. But, what about those methods which begin with mutable_ ? Those member methods return a pointer to a data member of a class whose type is another message class. For example, the mutable_transaction_context() method of the Command message returns a pointer to a TransactionContext object that may have its own data members set using setter methods. It is important to note that GPB-generated code follows strict RAII and encapsulation principles, like good C++ libraries always do. This means that the pointers returned by mutable_xxx() methods of a message class are managed by the class itself, and you do not need to manage the memory yourself. When the Command message's destructor is called, it will clean up any resources it has allocated for sub-messages like the TransactionContext class. Accessing Command Message data membersOK, so above you see example code which sets the data members of a Command message. Let's check out some code which accesses the data member of the Command message. Here is a snippet of code from the SubscriberApplier class in the async_replication module (/plugin/async_replication/subscriber_applier.cc):
void SubscriberApplier::apply(const message::Command &to_apply)
{
string query;
...
switch (to_apply.type())
{
...
case INSERT:
query.assign("INSERT INTO `");
query.append(to_apply.schema());
query.append("`.`");
query.append(to_apply.table());
query.append("` (");
const message::InsertRecord &record= to_apply.insert_record();
int32_t num_fields= record.insert_field_size();
int32_t x;
for (x= 0; x < num_fields; x++)
{
if (x != 0)
query.push_back(',');
const message::Table::Field f= record.insert_field(x);
query.push_back('`');
query.append(f.name());
query.push_back('`');
}
query.append(") VALUES ");
/*
* There may be an INSERT VALUES (),() type statement. We know the
* number of records is equal to the field_values array size divided
* by the number of fields.
*
* So, we do an inner and an outer loop. Outer loop is on the number
* of records and the inner loop on the number of fields. In this way,
* we know that record.field_values(outer_loop * num_fields) + inner_loop))
* always gives us our correct field value.
*/
int32_t num_records= (record.insert_value_size() / num_fields);
int32_t y;
for (x= 0; x < num_records; x++)
{
if (x != 0)
query.push_back(',');
query.push_back('(');
for (y= 0; y < num_fields; y++)
{
if (y != 0)
query.push_back(',');
query.push_back('"');
query.append(record.insert_value((x * num_fields) + y));
query.push_back('"');
}
query.push_back(')');
}
break;
...
}
result= dispatch_command(server_command, session, query.c_str(), query.length()) == false;
...
}
In the above code sample, you can see that "getter" methods for a Command class fall into a few different categories. For simple, scalar data members or sub-message data members, the getter method is simply the name of the data member itself, and these methods always return const references to their data members. For instance, Command::schema() returns a const std::string& with the the name of the schema and Command::transaction_context() returns a const message::TransactionContext& of the transaction_context message class data member. For optional data members, there will always be a has_xxx "check for existence" method. So, since the "sql" data member of the Command message is optional, one might do the following:
message::Command &command= get_some_command_object();
if (command.has_sql())
{
cout << command.sql() << endl;
}
Which would print out the SQL statement of the Command message if the sql data member has been set. For data members that are of the "repeated" type (for instance, the InsertRecord message's insert_value data member, which is of type repeated string), the number of individual elements in the vector is always accessed with a method called xxx_size(), where xxx is the exact name of the data member. You can see in the example code above calls to insert_record.insert_value_size(), which returns the count of the number of elements in the insert_value data member. To access specific elements of a repeated data member, simply call the method named for the data member, passing in the index of the element you wish to access. Conclusion, For Now...The above is just a quick introduction to the Command message, which is the basic unit of currency in Drizzle's replication system. The next article will show how the Command message is passed through the plugin::CommandReplicator and plugin::CommandApplier interfaces, the basic worker plugins of the Drizzle replication system. After that, we'll be discussing the format of the Command message log. Questions on anything above? Please do write to the drizzle-discuss@ mailing list or drop my an email to joinfu@sun.com. Cheers. UPDATE: Read the next article on the CommandReplicator and CommandApplier interfaces. Thursday, June 25. 2009Performance Tuning webinar today 1pm EST/ 10am PDTJust a quick note to say I'll be giving a 1 hour webinar on performance tuning practices for MySQL at 1pm EST, 10am PDT today. Here is a link to sign up for it: http://www.mysql.com/news-and-events/web-seminars/display-373.html
Hope to see you on the webinar Wednesday, June 10. 2009Drizzle Performance Regression Solved - TCMalloc vs. No TCMallocAs many of you who follow the drizzle-discuss mailing list know, for the last three months, Drizzle developers have been hot (and many times cold) on the tail of a performance problem that we were seeing when comparing Drizzle with MySQL (any version of MySQL, not just 5.4). Briefly, on certain machines, we were seeing Drizzle performing at approximately 50% of MySQL, with throughput measured in transactions per second on both a readonly and readwrite workload with Sysbench. The frustrating part of the results was that on other machines, even other machines with virtually identical architectures, compilers, and operating system, we were seeing Drizzle outperforming MySQL by around 20-30%. So, the Sun Drizzle team, friends in the Sun PAE (performance applications engineering?) team and various contributors set out to iteratively analyze exactly what was going on during sysbench runs in both MySQL and Drizzle. We analyzed the call stacks of both servers using callgrind, cachegrind, vmstat, and other tools, looking for differences which could explain the differences in performance. These efforts ultimately showed either red herrings, or showed results which did not indicate where such a dramatic performance difference was coming from. The efforts did, however, lead us to a much better understanding of the calling patterns, lock contention, and other parts of the servers, and this work will, I am sure, prove very valuable in the coming months as we continue to refine the execution of the server and remove more global contention points. So, what was the eventual culprit? Turns out that Drizzle was, by default, linking the TCMalloc library from Google (libgoogle-perftools-dev package on Debian systems) when it was found installed on the machine. This makes sense. We thought that the TCMalloc library would provide a benefit to our Session-mem_root-based allocation strategy adopted from the MySQL core kernel. Unfortunately, it turns out that TCMalloc dramatically degrades the throughput of the server. When we disabled tcmalloc in our build and re-ran the benchmarks, our numbers went through the roof. The machine these results are from has the following specifications:
The benchmarks were run for a minimum of 10 iterations of 60 seconds each on a readonly and readwrite workload with 1M rows in the table. The configuration was:
The Drizzle Automation Suite was used for benchmarking. Below, you'll see the before and afters of disabling TCMalloc on both a readonly and readwrite workload. We think you'll agree the results are, well, dramatic. READONLY workloadWith TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 1067.444444 | | 4 | 1435.190000 | | 8 | 1937.624444 | | 16 | 2601.817778 | | 32 | 3367.795556 | | 64 | 3930.240000 | | 128 | 3940.764444 | | 256 | 3071.503333 | | 512 | 2003.308889 | | 1024 | 1224.704444 | | 2048 | 530.994444 | +------+-------------+ Without TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 1511.681000 | | 4 | 2714.570000 | | 8 | 4408.986000 | | 16 | 5795.430000 | | 32 | 5619.712000 | | 64 | 4988.760000 | | 128 | 4483.512000 | | 256 | 3914.125000 | | 512 | 2541.946000 | | 1024 | 1325.511000 | | 2048 | 643.446000 | +------+-------------+ And for you pretty graph people: READWRITE workloadWith TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 589.006667 | | 4 | 842.576667 | | 8 | 1142.627778 | | 16 | 1561.532222 | | 32 | 2160.194444 | | 64 | 2169.077778 | | 128 | 1793.243333 | | 256 | 1241.846667 | | 512 | 860.450000 | | 1024 | 491.360000 | +------+-------------+ Without TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 676.238000 | | 4 | 1096.708000 | | 8 | 1661.204000 | | 16 | 2210.335000 | | 32 | 2353.749000 | | 64 | 2202.926000 | | 128 | 2087.273000 | | 256 | 1717.978000 | | 512 | 1361.468000 | | 1024 | 1000.169000 | | 2048 | 288.299000 | +------+-------------+ Again, pretty graph: Note: With TCMalloc linked, the readwrite workload would not complete at 2048 connections. Without TCMalloc linked, it did complete, although with a significant reduction in throughput. This is likely because of the known InnoDB issue regarding 1024 active transactions limit... Open Invitation to Benchmark Drizzle, MySQL, XtraDB and PostgreSQL
The above results compare only Drizzle to itself with and without TCMalloc. We benchmarked also against MySQL 5.4, but as we've previously stated, we don't think comparison numbers should be published unless by a third or objective party. This serves as an open invitation to benchmark Drizzle against MySQL 5.4, XTraDB, PostgreSQL 8.4 and anything else. We'd love to see a validation that our principles of smaller, cleaner code with fewer global contention points, using standard libraries and having features live in a module ecosystem truly does enable a faster, leaner query-running machine. By the same token, if published benchmarks identify cases where Drizzle underperforms compared to another RDBMS, we'd love to tackle the performance problems the benchmarks show. The more data, the better Monday, April 27. 2009Drizzle, MySQL and a Public Apology to Karen PadirOK, so those who know me well can attest to the fact that I can sometimes be a little quick to speak my mind. Sometimes I say snarky things without thinking much. Last week, I blogged a short, snide blog post that Drizzle was not the MySQL Drizzle project and that it was not "shepparded by MySQL" (and, yes, I spelled shepherded incorrectly. Thanks, PeterG). The blog post referred to Karen Padir's keynote on Tuesday morning, but I misquoted her. Karen said "shepherded by Sun Microsystems", not "shepherded by MySQL". I was actually quoting a blog post from the same day which had misquoted Karen as saying "shepherded by Sun/MySQL". Karen, I'm sorry I misquoted you. I let my emotions dictate my words, and I should have been more thoughtful about what I was trying to say. So, what was I trying to say? One of the things the Drizzle project has been successful in doing is opening up the development of the server to a wider audience. The community around Drizzle is extremely active, friendly, and contribution-focused. Ideas are expressed on open, public mailing lists, and all suggestions and input are debated and welcomed in a transparent manner. Although MySQL (note, not Sun Microsystems, but the MySQL engineering group) has tried to "open up" over the last year or so, there is still an undeniable Cathedral approach to software development in that organization. I hope that the MySQL engineering group will continue its movement towards openness, but fundamental change must occur in the mindset and management of the group. I believe that the Drizzle project has shown just how much appetite there is for open discussion and insight into the development of the database server. I simply don't understand why change in the MySQL engineering corps has been so slow. The pace of change frustrated me while I was at MySQL, and it continues to frustrate me as a member of the Sun Drizzle team. In addition to the above, various groups in the MySQL internal teams have actively sought to undermine the Drizzle project and continue to dismiss the project as without technical worth or merit. So, when I heard Drizzle being referred to as "the MySQL Drizzle project", it irked me. If the MySQL engineering group had embraced Drizzle and been active in contributing to it, I wouldn't have any problem with referring to the project in that way. Furthermore, Drizzle has a lot of contributors outside of Sun Microsystems. If MySQL wants to call it the MySQL Drizzle project, then they should be able to call it the Mixi Drizzle project or anything else they want to. Is it true that the Drizzle code base finds its lineage in MySQL? Certainly, but I don't believe at this point that the MySQL group has contributed to Drizzle enough to warrant changing its name. Karen, I hope you accept my apology for misquoting you, and I hope that this blog entry makes more clear what my original post failed to convey. As the leader of the MySQL engineering team, I hope you can find ways in which the database group can play a more active role in contributing to Drizzle. And, of course, I look forward to working with you to help you achieve this goal. Tuesday, April 21. 2009Sorry, but Drizzle is Not...the "MySQL Drizzle Project". Nor is it "shepparded by MySQL". Sorry, Karen, but in the future, MySQL may play more of a role in Drizzle, but currently, it doesn't. And it doesn't by choice. Fear Eats You Up Inside - A Tale of a Team in ChaosThis blog entry is a bit of a written confession from the Sun Drizzle team. It tells the story of the last few weeks, and how we didn't "Do the Right Thing". It's a tale which ends, hopefully, in our redemption through acknowledgement of what happened, a written commitment that it won't happen again, and a bit of a manual on how to have our actions match our team values. What Happened?A few weeks ago, we received an email from Allan Packer, who works in the Performance Engineering group at Sun. The email contained the results of some benchmarks that Allan had performed comparing Drizzle and a version of MySQL 5.1. The results for Drizzle weren't flattering. In fact, at lower concurrencies, the results were downright embarrassing, with some runs showing 5.1 outpacing Drizzle in throughput by 100%.
"How could this be?" we We're All Humans, After AllWhen confronted with a situation where you have been cast in a dim light compared to something else, it's human nature to go into defensive mode. We retreat within our shells, searching for answers and closing out the rest of the world. The Sun Drizzle team closed out others and retreated out of fear, afraid that "if the world knew this about us, everyone would walk away from Drizzle." Each member of the Sun Drizzle team went off in a different direction, trying to uncover why we were seeing the results we did. It was the antithesis of a concerted effort. Basically, everyone went tearing off, running benchmarks, profiling code, looking at call graphs, digging through commits, all without a plan and without the input of the Drizzle community. Suffice it to say that while we learned a bit about what pieces of the code were not the main culprits, the total effort was a chaotic, scrambling, mess. Just like telling a lie eats away at an individual until the truth comes out, fear also tends to rot you from the inside out. When you face your fears, tell the world about them, and ask for help, you turn the fear into something positive. It's taken us a few weeks to realize how silly we had acted, and Brian did a great job on Sunday pulling us together and realizing where we'd strayed from our commitment to open and public discourse. And so, yesterday, I sent an email to the drizzle-discuss mailing list explaining the results of the benchmarks, good and bad, and listed our methodology for running the benchmarks and what we have, so far, ruled out as culprits for the throughput differences we've seen. I wish we had done this on Day 1, after receiving the results from Allan. We would likely have been able to work through ideas much more quickly on the mailing list, and gotten many more suggestions from the community. We've learned our lesson in this regard. A Commitment to Open and Public DiscourseOn Sunday, we sat down and put together a few guidelines that we'll stick to when future situations like this happen — i.e. when someone publishes benchmarks which involve Drizzle or comes up with evidence that we fail in some way compared with another database.
All of the above may seem like common sense, but we figure it's best to put them in writing to remind ourselves when we do, inevitably, encounter similar situations. In SummarySo, this week I'll be working on #2 above; getting documentation up on the Drizzle.org domain about both the results of the benchmarks for all runs, and our methodology. I very much welcome anyone interested in benchmarking to offer suggestions, edit the wiki with input, and join us on the mailing list in our ongoing discussions about what may be the throughput-limiting factors in Drizzle. Thursday, April 2. 2009Towards a New Modular Replication ArchitectureOver the past week, I've been refactoring the way that the Drizzle kernel communicates with plugin modules that wish to implement functionality related to replication. There are many, many potholes in the current way that row-based replication works in Drizzle, and my refactoring efforts were solely focused on three things:
Let me expand on these two goals, and why they are critical to the success of a replication architecture. Simple, Clear Interfaces Designed for the Interface CallerI have a very strong belief that interfaces should always be designed from the perspective of the caller of the interface. By focusing on the caller of the interface, you produce interfaces that are inherently more stable and simpler than if you design an interface with the perspective of the consumer of the interface's objects — or the internal implementation's perspective. If you design an interface from the perspective of how the internals of a system work, then you invariably end up with interfaces that reflect the internal implementation of something. Interfaces should be generalized, not specific to an implementation. I'll expand on this with an example in the next section. Interfaces should be as simple and clear as possible. Why? Because people won't implement the darn interfaces if they can't understand them. Simple as that. Drizzle wants a vibrant ecosystem of plugins and modules. If developers have a difficult time understanding an interface, that means we get fewer plugins, and that just won't do. Interfaces Must Use Documented Data Structures and Standard ContainersThis may seem like a no-brainer, but if you look at the interfaces in MySQL, you will notice that so many of them pass pointers to internal MySQL objects (THD, st_table, TABLE_SHARE, etc). You will also notice that in order to use the interface, implementors must become intimately familiar with the custom iterators (List<Item> List_iterator<>, List_iterator_fast<>, etc) in order to really implement their plugins. Many of these custom objects and containers are poorly or not documented at all. This makes the plugin developer's life harder. This is why there are so very few plugins written for MySQL that have not been written by MySQL engineers who are familiar with the internal implementation of the server. The harder life is for the plugin developer, the smaller the ecosystem of plugins will be. Period. Now, why do I say "only standard containers"? Well, how many of you readers realized that List<Item> was not actually a list of Items; that in fact it is a list of Item pointers? Probably very few. By using the STL and its very-well-documented, standardized container classes in the interface, we remove the need of the plugin developer to have to learn Yet-Another-Custom template implementation. Easier for the plugin developer, so faster time-to-market for plugins. Laying Waste to Tight-Coupling of Replication with "Other Stuff" (Like Logging)It's no secret that MySQL's internal subsystems are very tightly-coupled. Back in 2005, when I wrote the system internals chapter in Pro MySQL, I noted that the internal subsystems of MySQL were interwoven, tightly-coupled, and a bit of a bowl of spaghetti. Very little substantial progress has been made since then on refactoring those systems into separate modules. Replication is no different. In MySQL, the replication subsystem is tightly coupled to a logging subsystem that most readers would be familiar with as "The Binlog". Here's the problem with that: it ties the implementation (the binlog) to the replication interface. Why is this bad? Well, it means that if you want to modularize the replication system, you can't do so without fiddling with the binlog. And likewise, if you want to improve the binlog, you invariably will be affecting the replication system. Tight-coupling of implementation to interfaces is disastrous from an architectural point of view. Touching each runs the risk of breaking the other. When these types of architectural faux-pas litter the code, it means that development and debugging life-cycles are extended well past what they should be, because developers working on one system tend to affect other subsystems that shouldn't really be affected by their changes. And so, one of the big goals of this refactoring work was to produce interfaces which did not interfere with each other. Changing an implementation shouldn't break the whole system. A New, Modular Replication ArchitectureOK, enough about the goals of the new replication architecture. Let's see some diagrams and some code, eh? In Drizzle, we strive to have a clear separation between the core drizzled kernel and modules which implement functionality. The module communicates with the kernel through an API. The goal of the this API is to shield the internal implementation of the core kernel from plugin/module developers. All plugin developers need to know is what will be passed to them from the kernel (documented data structures) and what to provide the kernel back as output (a return code, for instance). The kernel doesn't care how the plugin implements something. All it cares about is that the plugin provide methods which match the API's base class interfaces. There are a number of documented data structures which are used to in the API's calling interface. These data structures are all Google Protobuffer (GPB) Message derived classes. Because they are all GPB derived message classes, they all work in the same way: via the Protobuffer Message API, which is well-documented and automatically handles versioning when changes to these message classes are needed. The current list of Message classes are:
More Event specialty classes will be added as needed, but they will all follow the same interface as each other. In addition to the above data classes, for a replication architecture, what are the basic "worker classes" that plugins may implement? For the first phase of this new replication system, I have three basic class interfaces:
But you might be thinking... Jay, where's the Binlog? There isn't one in the interface. That's the whole point. A "binlog" is an implementation, not an interface. Need a more concrete vision of how a module would use such an interface? OK, let's imagine an example module, which we'll call "RecoveryModule". The purpose of this module would be to provide backup and restore capabilities to Drizzle. Let's see how the interface to the Drizzle kernel would be implemented in the module... The RecoveryModule Implementation OverviewAs noted, the purpose of this module is clear: to provide backup and restore functionality for Drizzle. We'll implement this functionality using a serialized transaction log. Here are the classes which we will map out for the module:
As you can see, all of the classes above extend (or more explicitly, inherit from a publicly defined interface class in /drizzled/plugin/) a base interface class. There is a clear separation of duties. Replicators replicate events. Appliers apply events. Readers read events. No mixing of one into the other. This way, a developer can work on the writing of records to a transaction log (the TrxLogWriter) and another developer can work on the implementation of applying those records during a recovery phase (TrxLogApplier). The two developers don't need to know anything about how the other class is implemented. The simply implement the interface of their class. In fact, the TrxLogWriter implementation might be in an entirely different module! The developer of the TrxLogApplier doesn't care, and isn't affected by that. All she needs to know is the interface of an EventReader and the EventApplier, and the details of the Event data structures (PODs). That's it. Clean separation. Here is a graphical overview of how such a module might look, with a separation of the components of the module, the plugin interface API, and the drizzled kernel. ![]() You will note that I also have put in a couple other classes in the module, called SqlLogWriter, which inherits from EventApplier, and SqlLogReader, which inherits from EventReader. This is to show that the module could implement the MySQL General Log easily using the existing API. A general log is merely an implementation detail. The module developer can create a general log implementation just by specializing the EventApplier class to write, for instance, raw SQL records...
Also note the AsynchReplicator class... Mark Callaghan, you could implement a SemiSynchReplicator if you wanted. Flow of Events from Drizzle Kernel to ModuleSo, how would the flow of events happen from the kernel to a module and back? I've put together a quick diagram of how the API would be used to communicate between the kernel and the module. You can see some API method calls (such as replicateEvents() or registerApplier(). These method names are of course still up for debate, but I think they adequately communicate what the API would look like and the purpose of the interface calls... ![]() OK, so in my next blog post I'll show some example code from my local branch which implements a module similar to the above. I'll end this one now because for the next blog post I need to get some code highlighting done to make it easier to read... Now Is the Time to Influence the Shape of the API
Robert Hodges, I'm lookin' at you, kid!
In all seriousness, this is the time when we are defining the interfaces between the kernel and the modules/plugins. If you have suggestions, want something included in the API, or in general want to tell me I'm full of shit, then join us on the Drizzle Discussion mailing list and let your thoughts be heard! Wednesday, April 1. 2009Libdrizzle Benchmarks - Massive Performance IncreasesLast night and today, I ran a series of benchmarks against Drizzle. These benchmarks were designed to isolate the performance improvement or regression from one change: using Eric Day's new libdrizzle client library instead of the legacy libdrizzleclient library from MySQL. The results are in, and they are stunning. Here is a graph showing the difference between Drizzle sysbench on a readonly workload with the only difference being sysbench using the libdrizzle driver versus using the libdrizzleclient (libmysql) driver for sysbench: As you can see, with libdrizzle, the throughput is dramatically increased, with Drizzle scaling to 8x the number of cores on the benchmark machine before a drop-off in throughput is seen. The grey vertical line in the chart is the number of cores on the machine used. The performance improvement varies from 2.80% improvement at 16 concurrent threads to 29.95% improvement at 128 threads and a whopping 41.16% increase at a mere 2 threads. Overall, there was an average performance increase across all concurrency levels of 16.95%. For a read-write workload, here are the results: Again, we see a massive increase in throughput at all concurrency levels. This time, the performance increase was pretty consistent across the concurrency levels, and we don't see a significant drop-off in throughput until 4x the number of processors. Across all concurrency levels, the performance increase of libdrizzle was 17.62%. What this means for MySQLAs you may or may not be aware, libdrizzle actually is able to communicate with both Drizzle and MySQL. Looking for an easy way to dramatically increase the performance and throughput of your application without tinkering with much stuff? Then get involved in the libdrizzle movement, download one of the connectors and join us on Freenode #drizzle to discuss how you can use and help us test the incredible new asynchronous, non-blocking communication library from Eric Day. What this means for DrizzleIt should be stressed that the above performance improvements are solely due to a replacement of the client library. However, Eric is now pulling apart the Protocol class in the main Drizzle server to take advantage of the asynchronous and non-blocking nature of libdrizzle. This means that you can expect to see some pretty good performance improvements on the server side as well as Eric continues his great work. Please give Eric a bunch of kudos if you see him online and go check out his talks at the MySQL and Percona conferences in a couple weeks. Monday, March 30. 2009Small but steady progress in improving Drizzle performanceWe're making steady progress in removing bottlenecks in the Drizzle code base. So far, a number of mutexes have been removed and we've begin to replace a number of contention points with atomic instructions which remove the need for a lock structure on platforms which support atomic fetch and store instructions. I'm pretty positive about the direction we are going so far. We're seeing the right trends in our scaling graphs, with very little performance drop off in read-only workloads up to 4X the number of cores on the machine, and little performance drop off on the read-write workloads up to 2X the number of cores, as you can see from the graphs below. It's a little difficult to see, but we've made a small but steady improvement from r950 to r968, with numbers increasing around 1-2% across most concurrency levels. You can see the raw numbers here: +--------------------------------+-------+-----+---------+----------+ | config_name | revno | c | tps | rwrps | +--------------------------------+-------+-----+---------+----------+ | drizzle_innodb_readonly_1000K | 950 | 2 | 710.67 | 9949.34 | | drizzle_innodb_readonly_1000K | 950 | 4 | 1163.77 | 16292.81 | | drizzle_innodb_readonly_1000K | 950 | 8 | 1692.59 | 23696.29 | | drizzle_innodb_readonly_1000K | 950 | 16 | 2470.31 | 34584.39 | | drizzle_innodb_readonly_1000K | 950 | 32 | 3104.98 | 43469.73 | | drizzle_innodb_readonly_1000K | 950 | 64 | 3376.98 | 47277.73 | | drizzle_innodb_readonly_1000K | 950 | 128 | 2986.91 | 41816.74 | | drizzle_innodb_readonly_1000K | 950 | 256 | 2657.54 | 37205.47 | | drizzle_innodb_readonly_1000K | 968 | 2 | 712.73 | 9978.25 | | drizzle_innodb_readonly_1000K | 968 | 4 | 1081.72 | 15144.10 | | drizzle_innodb_readonly_1000K | 968 | 8 | 1714.77 | 24006.77 | | drizzle_innodb_readonly_1000K | 968 | 16 | 2480.48 | 34726.77 | | drizzle_innodb_readonly_1000K | 968 | 32 | 3140.16 | 43962.29 | | drizzle_innodb_readonly_1000K | 968 | 64 | 3394.03 | 47516.32 | | drizzle_innodb_readonly_1000K | 968 | 128 | 3008.74 | 42122.30 | | drizzle_innodb_readonly_1000K | 968 | 256 | 2676.62 | 37472.65 | | drizzle_innodb_readwrite_1000K | 950 | 2 | 438.04 | 8322.77 | | drizzle_innodb_readwrite_1000K | 950 | 4 | 720.68 | 13692.98 | | drizzle_innodb_readwrite_1000K | 950 | 8 | 1068.65 | 20304.39 | | drizzle_innodb_readwrite_1000K | 950 | 16 | 1454.71 | 27639.47 | | drizzle_innodb_readwrite_1000K | 950 | 32 | 1699.74 | 32295.02 | | drizzle_innodb_readwrite_1000K | 950 | 64 | 1506.04 | 28614.71 | | drizzle_innodb_readwrite_1000K | 950 | 128 | 1341.46 | 25487.69 | | drizzle_innodb_readwrite_1000K | 950 | 256 | 1157.95 | 22001.16 | | drizzle_innodb_readwrite_1000K | 968 | 2 | 444.10 | 8437.81 | | drizzle_innodb_readwrite_1000K | 968 | 4 | 700.45 | 13308.53 | | drizzle_innodb_readwrite_1000K | 968 | 8 | 1075.59 | 20436.14 | | drizzle_innodb_readwrite_1000K | 968 | 16 | 1457.83 | 27698.76 | | drizzle_innodb_readwrite_1000K | 968 | 32 | 1732.04 | 32908.82 | | drizzle_innodb_readwrite_1000K | 968 | 64 | 1506.98 | 28632.61 | | drizzle_innodb_readwrite_1000K | 968 | 128 | 1355.17 | 25748.31 | | drizzle_innodb_readwrite_1000K | 968 | 256 | 1157.29 | 21988.59 | +--------------------------------+-------+-----+---------+----------+ Drizzle is all about open and transparent, and if you want to know how these numbers are generated, feel free to download the drizzle-automation project, a Python benchmarking and code coverage utility I've written over the past couple weeks. We use standard sysbench modified to use our client driver, with the following configuration in our /etc/drizzle-automation/bench.cnf file: [defaults] # Number of iterations the benchmark process should do for each level # of concurrency iterations= 5 # Comma-separate list of concurrency levels to run concurrency_levels= 2,4,8,16,32,64,128,256 # Options passed as-as to make when building server make_options= -j32 # Options passed as-is to configure when building server configure_options= [drizzle_innodb_readonly_1000K] # The program used to run the benchmark. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed bench_cmd= sysbench # Options given to the benchmark program on every iteration bench_options= --max-time=60 --max-requests=0 --test=oltp \ --drizzle-db=test --drizzle-port=4427 --drizzle-host=127.0.0.1 \ --drizzle-user=root --db-ps-mode=disable --db-driver=drizzleclient \ --drizzle-table-engine=innodb --oltp-read-only=on --oltp-table-size=1000000 # The program used to start the server instance. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed server_cmd= ./drizzled/drizzled # Options passed to the server on startup server_options= --port=4427 --datadir=/tmp --innodb-buffer-pool=4G \ --key-buffer-size=64M --scheduler=multi_thread --innodb_log_buffer_size=512M \ --innodb_additional_mem_pool_size=120M --table_open_cache=4096 & [drizzle_innodb_readwrite_1000K] # The program used to run the benchmark. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed bench_cmd= sysbench # Options given to the benchmark program on every iteration bench_options= --max-time=60 --max-requests=0 --test=oltp --drizzle-db=test \ --drizzle-port=4427 --drizzle-host=127.0.0.1 --drizzle-user=root --db-ps-mode=disable \ --db-driver=drizzleclient --drizzle-table-engine=innodb --oltp-read-only=off \ --oltp-table-size=1000000 # The program used to start the server instance. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed server_cmd= ./drizzled/drizzled # Options passed to the server on startup server_options= --port=4427 --datadir=/tmp --innodb-buffer-pool=4G \ --key-buffer-size=64M --scheduler=multi_thread --innodb_log_buffer_size=512M \ --innodb_additional_mem_pool_size=120M --table_open_cache=4096 & The machine we run the benchmarks on is an Intel Xeon QuadCore, and all tests fit into the available RAM on the box. These benchmarks, which we can run against a single revision or a range of revisions, are invaluable in allowing us to pinpoint the specific cause of a performance regression. Feel free to check out the automation work, contribute to the project, or suggest improvements to it on the Drizzle Discuss mailing list. Wednesday, March 4. 2009LCOV Code Coverage Pages for DrizzleYesterday, Monty and I were fussing around with lcov and genhtml trying to generate code coverage analysis for Drizzle. After a few hours, I was finally able to get some good output, and I've published the results temporarily on my website. We're currently at 70.4% code coverage which is less-than-ideal, but at least we now have a baseline from which to improve. We're all about making incremental improvements, and having statistics to tell us whether we're going in the right direction is important. This is a good first step. So, what exactly do these code coverage numbers mean?OK, so for those readers not familiar with gcov or lcov, here is what these code coverage numbers actually mean... They represent the percentage of executable source lines which are executed during a run of Drizzle's test suite. Basically, the percent gives us a rough idea of the percent of our code paths which are being verified by the test suite. What we do is have GCov produce coverage data during compile for each source code file. We then run our test suite, and then have LCov collect (capture) the source code lines in each file was executed during the run of the test suite. We then call on the genhtml utility to produce pretty HTML pages from the "info files" generated from GCov. Automating LCov ReportsThe goal is to have LCov reports run for each revision in trunk, and have the output automatically rsynced up to the Drizzle.org server. I'm working with Mike Shadle from Intel and Monty on getting this done, along with Doxygen output, and hope to get the automated test coverage reports done by the end of the week.
Part of the automation is producing a text-based report which I can then store in a database and keep historical data about our progress in improving test coverage. Of course, those historical graphs will also be published. How Can You Help Improve Drizzle's Code Coverage?Actually, this is a great starter task for new contributors to get involved in Drizzle development. The way to improve code coverage percentages is to analyze the blocks of code which are not covered in the following way:
Some more details on the above steps will come in another blog post later today... OK, gotta get back to work. Feel free to peruse the LCov HTML reports. There are some interesting things buried in there. Tuesday, March 3. 2009Signup for Drizzle Developer Day - Friday, April 24th - After MySQL Conference 2009We are having a developer day on the Friday after the MySQL Conference and Expo. Space is limited, so please be sure to sign up on the Drizzle.org wiki if you plan to attend. Here are some of the things I will be talking about:
See you there. Tuesday, February 17. 2009A Better Parser Needed?Taking a little break from refactoring temporal data handling this evening, I decided to run some profiles against both Drizzle and MySQL 5.1.33. I profiled the two servers with callgrind (a valgrind tool/skin) while running the drizzleslap/mysqlslap test case. In both cases, I had to make a small change to the drizzled/tests/test-run.pl Perl script.
For the MySQL build, I used the BUILD/compile-amd64-debug-max build script. For Drizzle, I used my standard build process which builds Drizzle with maximum debugging symbols and hooks. It's worth noting that the debug and build process for MySQL and Drizzle are very different, and the MySQL debug build contains hooks to the DBUG library, which you'll notice appear on the MySQL call graphs and consume a lot of the overall function calls. You won't see this in the Drizzle graphs because we do not use DBUG. For all intents and purposes, just ignore the calls to anything in the DBUG library in the MySQL graphs since in a non-debug build all that stuff is NOOPed out... FYI, the drizzleslap/mysqlslap test case is a decent one to run profiling against because it tests a range of different SQL statements in a concurrent environment, something you won't really see in the other tests cases. This is the reason I like using it when profiling with valgrind/callgrind/oprofile... OK, so on to the graphs... Drizzle Callgrind Profile -- The Function Calls
For the log_group_write_buf() function (defined in /storage/innobase/log/log0log.c) , this makes sense: Drizzle's default storage engine is InnoDB (yes, even for the INFORMATION_SCHEMA) and therefore you'll notice this function, which is responsible for writing to one of InnoDB's log group files. More interestingly, there is a fairly alarming 6.45% of the total execution time, for the get_text() static function (defined in drizzled/sql_lex.cc). If we look at this routine, it's fairly clear what the function does — it simply reads an unescaped text literal, without quotes. Nothing particularly fancy, although this comment above the routine might be telling: "Return an unescaped text literal without quotes. Fix sometimes to do only one scan of the string" Perhaps it's time to look into that single-scan thing...or perhaps not. The next "biggie" is the my_mb_wc_utf8mb4() routine, which is called an astonishing 20,779,971 times over the course of the execution of approximately 17,000 statements. How'd I come up with 17,000 statements? I just looked at the number of times DRIZZLEparse() is called, which is close enough...
my_mb_wc_utf8mb4() is called from a number of places, most notably from within the parser and lexer, and when converting to various string classes and primitives. It's really scary that this function is called so many times! Why? Well, one reason is that Drizzle does not support any other character set than UTF8 full 4-byte. Although we support many collations, we don't support the myriad character sets that MySQL does. So, it kind of makes sense that UTF8 routines would be called quite a bit. But...20M executions for 17K statements seems like there is an obvious inefficiency here.
This leads nicely to MySQL. If Drizzle is spending so much time in UTF8 routines, and MySQL doesn't by default use UTF8 as it's character set (and MySQL 5.1.33 doesn't support 4-byte UTF8, it would make sense that MySQL would NOT be spending nearly as much time executing character set conversion routines, right? Well, not so much. MySQL Callgrind Profile -- The Function Calls
What we notice in the MySQL calls is that pthread_getspecific is the number one execution time consumer, followed by a few of those DBUG library calls. I have a suspicion that the pthread_specific calls are actually related to the DBUG library calls, which track debugging information in the threads. I might be wrong about this but given the stark difference between pthread_specific()'s top spot in MySQL's call graph and it not appearing in Drizzle's graph, it makes sense that this is related to the DBUG library. So, I'll ignore it for now. So, after those, you'll notice a bunch of calls to memcpy() and the number of calls to memcpy() very closely matches the number of calls from the Drizzle graph. This makes sense. Drizzle's mechanism for transporting data across the wire and for translating record formats between the database kernel and the storage engine has not yet changed much, and this is where much of the calls to memcpy() are coming from. (this will change with ValueObjects, BTW, but more on that later...) After memcpy(), though, if you scan the function call list, you'll notice that MySQL, even with UTF8 not as the default character set, there is still a whole lot of calls, just like in Drizzle, to various character set routines — notably my_uni_utf8(), my_uft8_uni(), copy_and_convert(), my_mb_wc_latin1(), my_ismbchar_utf8(), and so on. It turns out that if you add up all the character set conversion and comparison routine executions in both MySQL and Drizzle, that all those function calls are taking up more than 12% of the total execution time for both Drizzle and MySQL! Call Trees Seem to Blame the Parser
I'm not going to go on too much further, as it's getting late and I'm tired, but I'm putting the call trees for Drizzle and MySQL for the profiling runs below. I think it's fairly clear that the parser is eating up a large chunk of execution time. Perhaps it's time to look into prototyping and benchmarking other parsers, or at the very least, looking into streamlining the existing parser to be more efficient when it comes to character set routines... Feel free to click on the images below for the fullsize versions. Cheers! -jay Here is the Drizzle call tree:
And here is the MySQL one:
Wednesday, February 11. 2009Drizzle, MySQL, and the mess that is Dates and SQL_MODEThe frustration builds. I have come to despise MySQL's sql_mode. It is a hack of the most gargantuan proportions. Basically, the optimizer just ignores the sql_mode whenever it is convenient for it to do so. More importantly, the optimizer silently ignores bad datetime input in various places. The reason for this is because of my statement above: sql_mode is a big ole' hack. Instead of fixing the runtime executor in MySQL to use real ValueObject types — that are immutable and know how to convert (and not convert) between each other, the runtime is a mess of checks for various runtime codes, warning modes, "count_cuted_field" crap and other miscellany that obfuscates the executor pipeline almost beyond recognition. Slowly, I am attacking the mess, but the executor is so fragile that even tiny changes can wreak havoc on the system, so the going is slow and painful. It's no wonder that the release cycle for the MySQL server is so bloody long... To give you an example of the symptoms which manifest themselves because of this hack, I present the following, executed in MySQL 5.0.67: mysql> create table t1(f1 int, f2 date); Query OK, 0 rows affected (0.03 sec) mysql> insert into t1 values(1,'2005-01-01'),(2,'2005-09-01'),(3,'2005-09-30'), (4,'2005-10-01'),(5,'2005-12-30'); Query OK, 5 rows affected (0.04 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> select @@sql_mode; +------------+ | @@sql_mode | +------------+ | | +------------+ 1 row in set (0.00 sec) mysql> select * from t1 where f2 <= '2005-09-31' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 1 | 2005-01-01 | | 2 | 2005-09-01 | | 3 | 2005-09-30 | +------+------------+ 3 rows in set (0.00 sec) OK, so far, everything is as "expected" when running in a non-strict mode. Even though "2005-09-31" is clearly an invalid date, MySQL in non-strict mode is expected to ignore such triviality and, we can assume, "corrects" the bad date to something else (maybe an integer representation of "2005-09-30"?) The MySQL manual is fairly clear about sql_mode=STRICT_ALL_TABLES and bad datetime values: Strict mode controls how MySQL handles input values that are invalid or missing. A value can be invalid for several reasons. For example, it might have the wrong data type for the column, or it might be out of range. So, we should expect the above statement to throw an error in STRICT_ALL_TABLES, right? Wrong: mysql> set @@sql_mode="STRICT_ALL_TABLES"; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1 where f2 >= '0000-00-00' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 1 | 2005-01-01 | | 2 | 2005-09-01 | | 3 | 2005-09-30 | | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 5 rows in set (0.00 sec) mysql> select * from t1 where f2 > '2005-09-99' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 2 rows in set, 1 warning (0.00 sec) mysql> show warnings; +---------+------+-------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------+ | Warning | 1292 | Incorrect date value: '2005-09-99' for column 'f2' at row 1 | +---------+------+-------------------------------------------------------------+ 1 row in set (0.00 sec) MySQL accepts the bad date, even in STRICT_ALL_TABLES mode, and does the same implicit conversion, giving us a warning when not the zero date but not telling us what conversion was done. Clearly, a conversion happened because MySQL winnowed the result set. But we have no way to know what the winnowing condition was on. Let's take this even further. The docs suggest that STRICT_ALL_TABLES must be coupled with NO_ZERO_IN_DATE if real validation on dates is to occur: Strict mode disallows invalid date values such as '2004-04-31'. It does not disallow dates with zero month or day parts such as '2004-04-00' or “zero” dates. To disallow these as well, enable the NO_ZERO_IN_DATE and NO_ZERO_DATE SQL modes in addition to strict mode. So, maybe with "STRICT_ALL_TABLES, NO_ZERO_IN_DATE" sql_mode, we'll actually get an error... mysql> set @@sql_mode="STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE"; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1 where f2 > '2005-09-99' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 2 rows in set, 1 warning (0.00 sec) mysql> select * from t1 where f2 >= '0000-00-00' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 1 | 2005-01-01 | | 2 | 2005-09-01 | | 3 | 2005-09-30 | | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 5 rows in set, 1 warning (0.00 sec) mysql> show warnings; +---------+------+-------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------+ | Warning | 1292 | Incorrect date value: '0000-00-00' for column 'f2' at row 1 | +---------+------+-------------------------------------------------------------+ 1 row in set (0.00 sec) Well, the above ridiculousness is the symptom of a hacked-up runtime engine switching on modes, return codes, and warning levels all over the place. The danger in the above is that we don't actually know how the silent truncation/conversion has affected our query results. What if the query results are not what we intended? A warning on bad input is simply NOT enough. Want to see another symptom of this kind of implicit truncation/conversion that is rampant in the runtime? Take a look at the following in MySQL: mysql> create table t1(f1 int, f2 date not null); Query OK, 0 rows affected (0.01 sec) mysql> insert into t1 values(1,'2005-01-01'),(2,'2005-09-01'),(3,'2005-09-30'), (4,'2005-10-01'),(5,'2005-12-30'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> select * from t1 where f2 is null; Empty set (0.00 sec) mysql> explain extended select * from t1 where f2 is null; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set, 1 warning (0.02 sec) mysql> show warnings; +-------+------+--------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+--------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`f2` AS `f2` from `test`.`t1` where (`test`.`t1`.`f2` = 0) | +-------+------+--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) If you take a look at the above query, you will notice that the WHERE condition is asking for rows in which f2 IS NULL. However, f2 is defined as DATE NOT NULL. POP QUIZ: Why doesn't the optimizer display a Impossible WHERE in the Extra column and optimize away the query? ANSWER: Because of all those silly silent conversions. MySQL converts "IS NULL" to "= 0" because everything is designed around comparisons to "the zero date". So, even if you have STRICT_ALL_TABLES, NO_ZERO_DATE, NO_ZERO_IN_DATE mode on, MySQL is still using the zero date behind the scenes. Here is the relevant code in sql_select.cc:
/* fix to replace 'NULL' dates with '0' (shreeve@uci.edu) */
else if (
((field->type() == DRIZZLE_TYPE_DATE) || (field->type() == DRIZZLE_TYPE_DATETIME))
&& (field->flags & NOT_NULL_FLAG)
&& ! field->table->maybe_null)
{
COND *new_cond;
if ((new_cond= new Item_func_eq(args[0],new Item_int("0", 0, 2))))
{
cond= new_cond;
/*
Item_func_eq can't be fixed after creation so we do not check
cond->fixed, also it do not need tables so we use 0 as second
argument.
*/
cond->fix_fields(session, &cond);
}
}
Suffice it to say, this "fix" is gone now from Drizzle, and we're back to producing the appropriate plan: drizzle> use test Database changed drizzle> create table t1(f1 int, f2 date not null); Query OK, 0 rows affected (0.02 sec) drizzle> insert into t1 values(1,'2005-01-01'),(2,'2005-09-01'),(3,'2005-09-30'), (4,'2005-10-01'),(5,'2005-12-30'); Query OK, 5 rows affected (0.01 sec) Records: 5 Duplicates: 0 Warnings: 0 drizzle> select * from t1 where f2 is null; Empty set (0.00 sec) drizzle> explain extended select * from t1 where f2 is null; +----+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE | +----+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+ 1 row in set, 1 warning (0.00 sec) drizzle> show warnings; +-------+------+-----------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`f2` AS `f2` from `test`.`t1` where 0 | +-------+------+-----------------------------------------------------------------------------------+ 1 row in set (0.00 sec) The deeper I dig, the more scared I become. Thursday, February 5. 2009Where Drizzle Is SucceedingOver the past few weeks, I have been happy working on Drizzle. Why have I been happy? Is it because of some new incredible code that will revolutionize the database industry? Nope. Is it because we've been able to remove all the issues that plague the server core? Nope. Is it because I see Drizzle quickly morphing into a modular, standard-conforming super-kernel? Nope. So, why am I joyous? To paraphrase the late Charlton Heston: "[Drizzle] is people!" Recently, I've seen the fruit that transparent, open source development bears. This fruit takes the form of engaged, motivated, and humble individuals who wish to make their mark on a project. Whether it's on IRC on #drizzle, the drizzle-discuss mailing list (now with 354 active members), or via the platform which Launchpad.net provides our community, I've seen new developers scrambling to pick up blueprint tasks, tackle bugs (minor and major), and stamp their footprint on the code base. With each new face comes an entirely new perspective, a new angle, a different set of skills and experience. And I'm taking the time to chat and learn with each of them. It's a humbling experience for me, as I learn from each person who visits the ever-growing IRC channel and mailing list. It doesn't matter if it's the sage advice of folks like MySQL's Mats Kindahl, Bernt Johnsen and Roy Lyseng, or database veteran Jim Starkey. It doesn't matter if the new face is a college student wishing to help in any small way they can. Everyone makes a difference in their own way. So, just like Monty Widenius says about his new company, that all employees will share in the profit, so is the case with Drizzle, and truly open source development projects. Those who contribute share in the project's success and stamp their mark, forever, on its direction and shape. It is this fact that propels me in coding, and gives me joy when I log into IRC in the mornings. Just thought I would share that happiness. Cheers.
Posted by Jay Pipes
in MySQL, PostgreSQL, Drizzle, C/C++, Launchpad
at
10:55
| Comment (1)
| Trackbacks (0)
Monday, January 12. 2009More on Day Numbers - The Actual CodeLike I mentioned in my previous article, I've been working on refactoring the temporal data handling in Drizzle. The major problem I've been dealing with is poorly or non-documented code. The lack of documentation has led me to rely on the MySQL Manual in some cases, and then additional research and lastly, my own intuition as to what was going on. One of the earliest cases of me saying "WTF?" was when I was investigating how day numbers were calculated. Here is the original, unmodified code from MySQL (/libmysql/my_time.c:746-778). I've highlighted in blue the massive amount of comments explaining the inner workings of the function and what it is doing.
/*
Calculate nr of day since year 0 in new date-system (from 1615)
SYNOPSIS
calc_daynr()
year Year (exact 4 digit year, no year conversions)
month Month
day Day
NOTES: 0000-00-00 is a valid date, and will return 0
RETURN
Days since 0000-00-00
*/
long calc_daynr(uint year,uint month,uint day)
{
long delsum;
int temp;
DBUG_ENTER("calc_daynr");
if (year == 0 && month == 0 && day == 0)
DBUG_RETURN(0); /* Skip errors */
delsum= (long) (365L * year+ 31*(month-1) +day);
if (month <= 2)
year--;
else
delsum-= (long) (month*4+23)/10;
temp=(int) ((year/100+1)*3)/4;
DBUG_PRINT("exit",("year: %d month: %d day: %d -> daynr: %ld",
year+(month <= 2),month,day,delsum+year/4-temp));
DBUG_RETURN(delsum+(int) year/4-temp);
} /* calc_daynr */
I tend to like code that is well-commented. And by well-commented, I don't mean comments that simply repeat an if condition or state the obvious. I mean, of course, comments that elucidate non-obvious parts of the code. The non-obvious parts of the above code are, well, most of it. Let's start at the top and I'll let you know why this little bastard of a function caused me so much pain. OK, so the function header comment says that the function "Calculate nr of day since year 0 in new date-system (from 1615)". Well, knowing even the small amount of stuff I knew about calendars before I really got into this stuff, I knew that there was no "year 0" in a calendar. So...I immediately suspected something was up. But, I made a big mistake at this point. I read the end of the comment "new date-system (from 1615)" as meaning that this function calculated the Gregorian Day Number. I thought since October 15, 1582 was close to the year 1615, that the function author meant the Gregorian calendar. Of course, I was wrong. The function actually is intended to calculate the number of days since "year 0" of the SQL date range. Unfortunately, the SQL calendar contains years 1 - 9999. There is no "year 0". So, I now have to figure out why this function considers 00.00.00 to be a real date. I have yet to figure this out. Perhaps it is a throwback to the MySQL days of February 31st, I don't really know. Moving on through the function, the code contains a number of lines which just baffled me given the lack of comments. For instance:
delsum-= (long) (month*4+23)/10;
What is the number 23 doing? Is it a presumed constant in a known formula? Is it an adjustment number? Who knows... And here: temp=(int) ((year/100+1)*3)/4; The line above seems to be trying to calculate the number of leap days (which are any day divisible by 4 and if divisible by 100 must also be divisible by 400.) But without comments, it's tough to tell... OK, so with a little bit of throwing my hands up, I move on to the function which takes a day number and converts it to a date... Day Number to Date?And, going the other way around, here is how a date is calculated from a day number (/sql/time.cc:142-185). Again, the voluminous comments are highlighted.
/* Change a daynr to year, month and day */
/* Daynr 0 is returned as date 00.00.00 */
void get_date_from_daynr(long daynr,uint *ret_year,uint *ret_month,
uint *ret_day)
{
uint year,temp,leap_day,day_of_year,days_in_year;
uchar *month_pos;
DBUG_ENTER("get_date_from_daynr");
if (daynr <= 365L || daynr >= 3652500)
{ /* Fix if wrong daynr */
*ret_year= *ret_month = *ret_day =0;
}
else
{
year= (uint) (daynr*100 / 36525L);
temp=(((year-1)/100+1)*3)/4;
day_of_year=(uint) (daynr - (long) year * 365L) - (year-1)/4 +temp;
while (day_of_year > (days_in_year= calc_days_in_year(year)))
{
day_of_year-=days_in_year;
(year)++;
}
leap_day=0;
if (days_in_year == 366)
{
if (day_of_year > 31+28)
{
day_of_year--;
if (day_of_year == 31+28)
leap_day=1; /* Handle leapyears leapday */
}
}
*ret_month=1;
for (month_pos= days_in_month ;
day_of_year > (uint) *month_pos ;
day_of_year-= *(month_pos++), (*ret_month)++)
;
*ret_year=year;
*ret_day=day_of_year+leap_day;
}
DBUG_VOID_RETURN;
}
And that's about the point in time I gave up trying to figure out the existing code. Sunday, January 11. 2009So, What Is a Day Number?Over the past three weeks, I've been refactoring the way that Drizzle handles temporal, or time-related, values. It's been a fun little adventure into an area of the server which hasn't changed much in a long time. Today, I wanted to share a couple things I've learned about calendars, MySQL, and dealing with date-related calculations. To start the conversation, imagine a SQL request like the following:
SELECT TO_DAYS("2008-08-11");
My guess is that many people don't really understand what is going on behind the scenes in the SQL server when they issue a request like the above. Believe it or not, there is quite a bit of calculation going on for a simple request like the above. Furthermore, there are a number of caveats that revolve around the ranges of dates that the SQL server operates on. To demonstrate, without looking at the MySQL manual, can you answer the following questions?
The above questions, and their answers, expose a number of issues with temporal values. Let's take each question in order. What is "day number zero" according to the TO_DAYS() function?OK, so if you answered 0000-01-01 for question #1, you would be correct. The TO_DAYS(date) function returns the number of days from the "date" 0000-01-01 up until date. Now, why would I put the term "date" in quotes? Well, it's because 0000-01-01 isn't actually an accurate date. "Wait a minute, Jay, of course 0000-01-01 is a real date! It's the first date of the first year in the 'modern' era...you know, A.D.!", you are probably saying. But, the fact is that there isn't such as date in existence. Or, more precisely, there is no way to accurately know what date was the first day of the year of the 'modern era'. In fact, there isn't a reliable way of telling what a "date" is if that date occurred before October 15th, 1582. October 15th, 1582, is the start of the Gregorian calendar, which is the calendar system used by the "modern world". This calendar takes into account adjustments for leap years and leap seconds (yes, there is something called a leap second). Before this date, it's not possible to use a Gregorian date such as 0301-03-23 to reliably determine calendar information for that date. The reason is that the dates October 5 - 14, 1582 do not exist. These dates were removed from the Gregorian calendar to "make up for" leap years and seconds up until the start of the Gregorian calendar. Given the above information, check out the results of the below statement.
mysql> select to_days("1582-10-08");
+-----------------------+
| to_days("1582-10-08") |
+-----------------------+
| 578094 |
+-----------------------+
1 row in set (0.00 sec)
If the dates 5-14 of October, 1582 don't exist, how can the above function return a value? Weird, huh? Well, kind of. The thing is, the TO_DAYS() function and it's sister FROM_DAYS() operate on a pseudo-calendar that represents the SQL range of dates, which is year 0 to year 9999. This pseudo-calendar simply assumes that up until October 15th, 1582, each year was comprised of 365.25 days. Which leads us nicely to our second question... Does TO_DAYS() take into account leap years?
Here, the answer depends. Can you use the results of the TO_DAYS() function to calculate intervals between dates?The answer is both yes and no. For reliable calculations, the dates must be in the Gregorian calendar. If one date is not in the Gregorian calendar and one date is, the results may not be accurate. Speaking of accuracy...The astute among you may have noticed something peculiar about the result of the SQL statement above. I just got finished saying that TO_DAYS() considers all years before 1582 as containing 365.25 days. If that is so, we can calculate the TO_DAYS("1582-10-15") using the following calculation: >>> days_in_years= (1582 * 365.25) >>> days_in_months= (31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30) >>> days_in_october= 15 >>> to_days= days_in_years + days_in_months + days_in_october >>> print to_days 578113.5 Strangely, this is what I get via MySQL:
mysql> select to_days("1582-10-15");
+-----------------------+
| to_days("1582-10-15") |
+-----------------------+
| 578101 |
+-----------------------+
1 row in set (0.00 sec)
That's a difference of 12.5. Is this a bug? Is it an adjustment for the missing days? Probably, but it's also not a bug that is likely to surface in an application. The reason is that FROM_DAYS() and TO_DAYS() are generally used in combination and the bug in the code exists in both functions, therefore they usually offset each other. The second reason is that TO_DAYS() returns an abstract number of days. It's not used to calculate intervals between dates unless that interval is in the modern era, like this: SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) - TO_DAYS(order_created) >= 7; In this case, both abstract numbers returned by TO_DAYS() will be accurate relative to each other since both dates are in the Gregorian calendar and it is the difference between the two dates which is used in the range calculation. I'll leave you today with a couple oddities of the FROM_DAYS() and TO_DAYS() functions in MySQL:
mysql> select to_days("0000-01-01");
+-----------------------+
| to_days("0000-01-01") |
+-----------------------+
| -32212253 |
+-----------------------+
1 row in set (0.00 sec)
mysql> select to_days("0000-10-01");
+-----------------------+
| to_days("0000-10-01") |
+-----------------------+
| 274 |
+-----------------------+
1 row in set (0.00 sec)
mysql> select from_days(to_days("0000-10-01"));
+----------------------------------+
| from_days(to_days("0000-10-01")) |
+----------------------------------+
| 0000-00-00 |
+----------------------------------+
1 row in set (0.00 sec)
Monday, January 5. 2009Seven Things About Me You May Not Know (And Probably Won't Care About)Due to the efforts of Brian Moon and Michelangelo van Dam, I've been sucked into a meme started by Tony Bibbs. My initial reaction to this unfortunate event was ... (envision Steven Colbert, hands raised...) "Noooooo!!!" But I got over it. Hey, it's the holiday season, I might as well be a good boy and fulfill the modern-day geek's equivalent of a chain letter. So, without further ado, here is my list of seven things about me you probably could care less about and will skip over to see if you are on my list of tagged people. (Yeah, you know you will.)
Here are the folks I'd like to know a little more about:
And, of course, the rules in case anyone missed them:
Thursday, December 18. 2008I Didn't Say "Screw Windows"A number of commenters on my previous entry thought I was basically saying "Screw Windows". Lukas Smith and Bill Karwin, both of whom I respect enormously, noted that Windows is a dominant development platform for MySQL users, and that one of the reasons for MySQL's popularity was that it runs smoothly (for a while now) on Windows. Bill and Lukas: You are 100% correct. That said, what I wrote was this: Forget Windows for now: Use open source, community-maintained, and standardized libraries within the kernel. Don't rewrite libc and various other quality open source libraries because of Not Invented Here syndrome or because Windows lacks these things. Focus on the standards and don't bother with platforms that don't conform to POSIX. If Microsoft wants future MySQL versions to run on its platforms, partner with Microsoft and have them do the port. While you're at it, drop support for old platforms like Netware and other crap that is obselete. The above is advice to the team in charge of re-architecting the database server. What it boils down to is this: focus first and forement on using standard, community-maintained open source libraries and on creating a neatly-architected, clean and lean kernel. After that, worry about Windows ports. So, it's not about kicking Windows to the curb. It's about priorities. The priority should be clean code. Because having clean, easy-to-understand code leads to:
I posit that the above three things would make a Windows port much easier, and cleaner. More developers able to contribute, less time trying to figure out the spaghetti, and less chance the port would break tangential pieces of code. And thus, I believe a solid Windows port of a core kernel would be easier and faster to complete once a POSIX-compliant and clean kernel is completed. Monday, December 15. 2008My advice to MySQLHere is my advice to MySQL. Take it or leave it. Time will tell whether I'm full of shit. MySQL 5.1 is out the door. Awesome. Great job to all the folks who fixed the thousands of bugs over the last 3 years. MySQL 5.1 should be faster and more stable than 5.0 because of those bug fixes, and features like partitioning are welcome additions to the small percentage of MySQL users who need that functionality. And, even if there are some bugs in partitioning (what feature doesn't have any bugs?), the partitioning feature is as good or better than other competing products. Good job. However, going forward, here is my advice to MySQL engineering: stop all work on new 6.0 features entirely. Don't scrap the features, just stop development on them now. Take one month to figure out how to restructure MySQL engineering and priorities with the following steps: Suggested StepsDrop the current roadmap: Continuing down the current roadmap without addressing the core problems of a Frankenstein-like core database server kernel will mean that the current roadmap features will take 2-3 times as long to develop. Stop this now. Refocus on re-architecting the kernel to a 21st century, modular design. Tell sales and marketing that you are taking these steps to ensure the long-term viability of the MySQL name and product line. Make two teams: a maintenance team which maintains server versions <= 5.1 and a team which is dedicated entirely to redesigning the MySQL server kernel into a streamlined, black box. Reduce the headcount of the MySQL engineering team if necessary to contain only those engineers who have the ability to design modular, pluggable systems. Give up on backwards compatibility: To make the changes necessary without making the kernel even more complex than it already is, you will need to relinquish the idea that backwards compatibility is necessary. Guido van Rossum already made this decision for Python 3, recognizing the need for it. MySQL needs to do the same. SQL_MODE? Scrap it. Only do what is correct according to data integrity and SQL standards. Reduce the code complexity and code paths and you will find that features are easier to develop and fixes are easier to identify. Forget Windows for now: Use open source, community-maintained, and standardized libraries within the kernel. Don't rewrite libc and various other quality open source libraries because of Not Invented Here syndrome or because Windows lacks these things. Focus on the standards and don't bother with platforms that don't conform to POSIX. If Microsoft wants future MySQL versions to run on its platforms, partner with Microsoft and have them do the port. While you're at it, drop support for old platforms like Netware and other crap that is obselete. Make all decisions open and transparent: For the non-maintenance team, make a policy that all decisions about the kernel design be done in an open forum, with the community able to participate in the discussion. Have stewards that are willing to negotiate the design decisions with the community and do everything in a transparent manner. Focus all energy towards the APIs: Think of the server kernel as simply a provider of services. Clearly and consistently define these services (as interfaces and plugin APIs) and have the community of engineers vet the design of these APIs. Once these interfaces are clearly defined, document them on public wikis. Clean up the abysmal messiness of the code base: Refactor, decruft, and standardize the code base to a C99 (minimal), warning-free, environment that uses stdint, stdbool, proper STL templates, and other stuff that is now standard for 5+ years. Clear your heads of the premature optimization syndrome that infects the code base and makes it messy and cluttered. You will find that there are many community resources that would be happy to help in this effort. Once done, BSD the kernel and turn it over to the open source community: Once the above is done, BSD the kernel code base and let the community support it entirely. Then, focus your energies on creating value-added features as plugins around that community-supported core kernel. Use the resources and expertise in your engineering department to develop niche addons that paying customers want. Package branded versions of the MySQL server (closed or open source) that include a number of these value-added plugins that target a specific industry, such as data-warehousing or security-conscious environments. Sell those packages as Enterprise packages with an Enterprise price point. Provide all support and services for these Enterprise-branded MySQL server packages. How Long?Based on what the Drizzle project has been doing, I predict that doing ALL of the above steps would take approximately 12 months to achieve a version 1.0 of a stable, modular kernel. I believe that features could be developed as plugins to that kernel in less time than if the work was not done and the features for 6.x are developed as they currently are. PredictionsIf the above steps are taken, here is what I predict would be the outcome: Reduction in maintenance costs of the core server by 80%: By turning over maintenance costs to the community, there will be a reduction in maintenance costs. By simplifying the kernel code base, there will be an even bigger reduction in maintenance costs: since one "fix" won't break other things nearly as often as "fixes" do today. This reduction in maintenance costs means that Sun can allocate more of its internal engineering resources to developing value-added plugins which are sold to customers. Because more developer resources are now dedicated to revenue-producing activities, the long-term viability of the database engineering department is ensured. Sales and marketing efforts become easier: Currently, MySQL sales and marketing are undeniably hindered by two things:
By following the steps above, these problems are tackled. A simpler and community-supported kernel means a more stable kernel. A more stable kernel means a shorter, more incremental release cycle. The lack of differentiation is solved by MySQL now being able to focus on value-added plugins in branded MySQL packaging. These branded packages are much easier for a sales force to sell, since they represent clear, differentiated value to the customer. When sales and marketing of a product become easier, only one thing is bound to happen: a strong increase in sales. MySQL will once-again return to the Open Source community: Much has been made of the inability of community contributors to get contributions into the MySQL server in a reasonable timeframe. By opening up the design and development of the kernel to the community, MySQL would restore much of the trust it has lost in recent years. Instead of being seen as "throwing a bone" to the open source community every once in a while, Sun/MySQL engineering would be seen as an active and trusted partner in open source contributions, stewardship and development. ConclusionOr, I am completely full of it and the above is a waste of time. Saturday, November 22. 2008The Drizzle Snowman - WIN!Stewart, Brian and myself are having a little fun this morning. One of the niceties of having real UTF8 support in Drizzle is now we can really fun table names. Behold, the glory of Drizzle:
drizzle>> create table ☃ (a int not null);
Query OK, 0 rows affected (0.01 sec)
drizzle>> show create table ☃\G
*************************** 1. row ***************************
Table: ☃
Create Table: CREATE TABLE `☃` (
`a` int NOT NULL
) ENGINE=InnoDB
1 row in set (0.00 sec)
Yep, that's a snowman. MySQL? Well, not so much: mysql> select @@character_set_system; +------------------------+ | @@character_set_system | +------------------------+ | utf8 | +------------------------+ 1 row in set (0.00 sec) mysql> create table ☃ (a int not null); ERROR 1064 (42000): You have an error in your SQL syntax; \ check the manual that corresponds to your MySQL server version \ for the right syntax to use near '�� (a int not null)' at line 1 /me goes off to record snowman.test. UPDATE: There isn't an error apparently, in MySQL. As long as you set names UTF8 in the client, all works as expected. Friday, November 21. 2008Drizzle Cirrus Milestone - Moving Forward
Although the MySQL server does have community contributions in some of the releases, the Cirrus milestone marks something of a new day in MySQL-related development. Cirrus contains tasks which are actively being developed by external contributors. This may not sound like a huge deal, but it is. In the past, contributions have been included in the MySQL server, however these contributions have always been included after the code has been contributed. For instance, Jeremy Cole's SHOW PROFILES patch, although heavily modified from its original submitted form, was included in MySQL Community Server after a long period of code review and modification. However, to my knowledge, the code contributor community has never been actively involved in either ongoing feature development for a release, nor actively involved in the direction in which the server is developed. Cirrus marks a new day. Not only are tasks for Cirrus assigned to external contributors, but the decision-making and strategic power of the release is in the community's hands. The only reason a community member would not have a say in the direction of the server is if they don't speak up and share an opinion. As of this morning, there are 299 members of the drizzle-discuss mailing list. All of these members have a say in what gets done in Drizzle. This makes me a happy boy. A Note on What a "Release" IsBefore the emails start firing off about what's in the first release of Drizzle and when it will come, I'd like to note that we are not going for a "big bang" approach to releasing software. The tasks I outline below are targets for a milestone. These tasks do not mean that the first release of Drizzle will contain all of the listed items. In fact, to be sure, some of them likely won't make it into the first release, and other tasks not listed currently for the milestone will "make it in". Although the community will eventually decide the release model, most (all?) of the developers sitting at Brian's table agree that an Ubuntu-like release model leads to more stable and consistent releases. By "Ubuntu-like", I mean that it is the release date which is important to be kept stable, and not the list of features contained in the release. People want consistency in when to expect the next release; it makes it easy to look forward to a certain date. What is less important is what is included in the release. What counts is that each release is stable and demonstrates incremental improvements at a consistent rate. I'll be blogging more about this concept shortly and will start a discussion on the mailing list regarding possible release dates and a schedule for locking down commits before that date. Whatever is feature-complete at the time of lock-down goes into the release. Nothing more. Why? Because stability is more important. With a set release cycle, the feature that "missed the deadline" will eventually make it into the code base in a shorter amount of time, in a consistent and stable manner. Targets for CirrusThere are a number of major areas that Cirrus is targeting:
Many tasks in the "cleanup, reuse and refactor" category have already been completed, by Monty, Brian, myself, and community contributors such as Toru Maesaka, Patrick Galbraith, Eric Day, C.J. Collier, and Yoshinori Sano. These tasks are listed on the blueprints page starting with "code-cleanup". They are also not as dependent on each other as some of the other task areas. Feel free to click through on the various links to the milestone and blueprint tasks in this blog post, comment on the mailing lists, and be an active contributor. Nothing is off limits. Thursday, November 6. 2008On Bullsh*t Blog PostsWhen you write a blog post and tag it with something you know will allow it to be aggregated into PlanetMySQL (or any other technical aggregation service), ask yourself one thing: If I was a technical person interested in MySQL, would I want to read what I just wrote? If you answered "No" to the above question, don't click the Publish button Friday, October 24. 2008Drizzle Tests - Unearthing the Pompeii of MySQL
Like the bodies underneath the piles of ash in Pompeii, many of the individual tests in the MySQL test suite are frozen in time. In a way, this is understandable, for a few reasons. Developers, in general, hate writing test cases. Let's face it, it's a lot more fun to write code than write and run test cases. And, when something is more fun, we tend to devote more energy to that kind of something, and neglect other not-so-fun stuff. We also tend not to understand the value of tests. Most developers think of tests as a way of validating their work — did the code I write do what I think it should do? But tests are more than that. Among other things, tests fulfill all of the following:
So, tests are A Good Thing™. However, a poorly written test case can be deadly to the overall health of a project. Why? Because bad test cases often will pass when run in a test suite, but not actually test anything, or not properly test what they are supposed to. This gives the software developers the illusion of health, which is worse than tests failing and the developer knowing the code base is broken. Oh, and don't get me started on disabling test cases... I'll leave that for a later post. For now, let's focus on what makes a good test case. Good tests are not easy to write. Here are some things that I think make a test A Good Test. Tests should validate a single thingI have stressed this before in various performance tuning sessions that I've given. Suppose you run a benchmark and get 1000 queries/second throughput. Afterwards, you edit your my.cnf configuration file and change two variable settings. You then re-run the same benchmark and get 1200 queries/second throughput. Question: What does this tell you? Answer: Absolutely nothing. Why? Because you don't know what effect each of those two changes had on the performance of the server. The change of variable A could have resulted in a 50% performance improvement, while the change in variable B could have had a 20% negative impact on the performance. The point is, you don't know what the results of the benchmark mean. Similarly, if a test case attempts to validate more than a single thing, you can't count on the test case's results meaning anything. Here is a perfect example from the current MySQL test suite which, as I complained about last night, the Drizzle build is actually enabling and passing. Below, the bench_count_distinct.test, in its entirety.
#
# Test of count(distinct ..)
#
--disable_warnings
drop table if exists t1;
--enable_warnings
create table t1(n int not null, key(n)) delay_key_write = 1;
let $1=100;
disable_query_log;
while ($1)
{
eval insert into t1 values($1);
eval insert into t1 values($1);
dec $1;
}
enable_query_log;
select count(distinct n) from t1;
explain extended select count(distinct n) from t1;
drop table t1;
# End of 4.1 tests
What precisely does the above test case validate? Well, here is a list of answers I thought up:
Tests should be well commentedThis is a no-brainer, but it's simply amazing to me how few relevant comments are in the existing MySQL test suite. Like the bench_count_distinct.test above, most tests either have a useless top comment which typically is just the name of the test, or something like the following, taken from 1st.test file (yes, there is a test called 1st.test): # # Check that we haven't any strange new tables or databases # show databases; show tables in mysql; Unfortunately, the above is not a joke. What, precisely, does "any strange new tables" mean? Clarity should be king in test case comments (as in code comments). "Strange" is completely vague. Instead, the comment should specify something like: # Check that we only have two databases: "mysql" and "test" show databases; # Check that the tables in the "mysql" database match the correct # system tables for this version of MySQL show tables from mysql; Better still, the test should be completely scrapped in favor of a more traditional setup() type test block, which resets a test environment to a pristine condition. Here is an example of what an excellent test case comment looks like, taken from the analyze.test test case:
#
# Bug #14902 ANALYZE TABLE fails to recognize up-to-date tables
# minimal test case to get an error.
# The problem is happening when analysing table with FT index that
# contains stopwords only. The first execution of analyze table should
# mark index statistics as up to date so that next execution of this
# statement will end up with Table is up to date status.
#
create table t1 (a mediumtext, fulltext key key1(a)) charset utf8 collate utf8_general_ci engine myisam;
insert into t1 values ('hello');
analyze table t1;
analyze table t1;
drop table t1;
Although the comment is in a little broken English, it's fairly clear what it attempting to be validated: the commands below will test to see if a FULLTEXT index containing only stopwords will break the ANALYZE TABLE. Nicely, the relevant MySQL bug ID is included in the comment. Man, do I wish all the tests in the MySQL test suite were more like the above. Tests should not mix tests and assertionsMy wife, Julie, and I have a saying about why our marriage works well: "low expectations". In the case of tests, I would say the key to a good test is "no expectations". What is an assertion? Essentially, it's a declared expectation. You assert that the behaviour of a specific command or event will be the expected value. A test case should be free of any sign of known expectations. Expectations belong in a results file, not in the test file. If expectations are in the test file, the developer or writer of the test file has polluted the test itself with known expectations. Instead, test files should contain only the statements or commands to reproduce a scenario. Need an example? Take a look at the following, also from analyze.test: # # Bug #30495: optimize table t1,t2,t3 extended errors # create table t1(a int); --error 1064 analyze table t1 extended; --error 1064 optimize table t1 extended; drop table t1; The test looks simple enough, and indeed it is. However, the --error 1064 is mixing the assertion with the test statements. Instead, the results file should contain the assertion for a returned error code of 1064 (Syntax error). By including the assertion in the testing block, we tightly couple the execution of the test with the expected results. Why is this bad? What if for the given SQL statement, storage engine A instead returned error code 1067? I would now need to create an entirely different test file containing the same test with different error assertions. Indeed, this situation is quite common. Some storage engines behave differently than others. Results of the same series of statements can be different, and yet valid for the individual engine. The proper way to deal with these situations is to record a result file for the different storage engines, but keep a single test file. The new test framework allows for this kind of differentiation. Going ForwardMy work this weekend and next week will be focused around updating all the tests in the existing test suite, seeing what is obselete, correcting and cleaning tests, and better organizing test files into suites of related functionality. The organization of tests into suites has already been done on my local laptop, and I've added a command to the new Python test runner to display some suite statistics, as shown below... ok, more on testing later. I very much welcome input on the test and results file syntax on the wiki page. Also feel free to post to the Drizzle Discuss mailing list any thoughts you have on the testing infrastructure. Thanks in advance for any suggestions.
[537][jpipes@serialcoder: runner]$ python drizzle_test_runner.py --command=list-suites
--------------------------------------------------------------------------------
Suite Name # Tests # Results # Sub-suites
--------------------------------------------------------------------------------
clients 9 12 3
optimizer 9 9 0
storage 27 57 7
replication 176 181 2
types 18 18 0
vcol 19 15 1
variables 5 5 0
stress 5 5 1
binlog 27 27 1
charsets 8 9 0
information_schema 3 3 0
sql 166 155 0
functions 23 25 0
--------------------------------------------------------------------------------
495 521 28
--------------------------------------------------------------------------------
Tuesday, October 7. 2008Performance Tuning Webinar for Commercial Application Developers Tomorrow
Tomorrow, at 1pm EDT/10am PDT, I'll be giving a webinar on performance tuning MySQL for commercial application developers. The webinar is open to all participants. I'll be covering my normal Join-fu material but will try to tailor the talk to developers working on applications for the commercial market. I'm actually looking forward to the different questions this group might propose. I can't say I'll have answers to all the questions, but I'll certainly try my best! Thursday, October 2. 2008Character Sets, Collations and the JörmungandrOne of the (many) ongoing discussions in the Drizzle developer community is the level of support the database server kernel should provide for non-Unicode character set encodings. Actually, when I say non-Unicode, I actually mean non-UTF8, since we've stripped out all other character sets and "standardized" on 4-byte UTF8. I'll come back to why exactly I put standardized in quotes in just a bit...but to sum up, in childish terms, my thoughts after spending 4 hours tonight reading about character sets and collations, here is an exchange between Toru and myself on Freenode #drizzle: <jaypipes> tmaesaka: how do you write "I wish everyone would just speak English" in Japanese? A Little BackgroundFor those of you new to the world of character sets and collations, I'll briefly summarize the concepts and terms I'll talk about in this article. Incidentally, I consider myself to be in this crowd, since I've never really had to deal with anything more than a cursory knowledge of them in reference to how they work in MySQL (not the internals). Character Sets and EncodingsA character set, or character encoding scheme, is a system for matching characters — such as "A" or "み" or "ß" — with a machine-readable code for the character. This machine-readable code can be represented simply as a decimal number, or in more complex character sets, a hexidecimal number. The "encoding" of the character set is the protocol, or instructions, that the character set uses in order to enable the computer to understand a series of byte sequences and interpret the sequence as a specific character.
Other more-complex character encodings are localized for a specific language, or writing system. For instance, the Shift_JIS character encoding scheme encodes, in 2 bytes, the ASCII character set (with 2 exceptions), the "half-width Katakana" characters, and the JIS X 0208 set of kanji symbols. Sound complicated? It is. And it gets even more complicated the further down the rabbit-hole one goes. Which leads me to Unicode... What the Heck is Unicode and UTF?Many folks think that Unicode is merely another character set or encoding scheme. It's not. It's actually more than that. It's an entire system which endeavours to standardize the way that computers can read, sort, and transform characters encoded in various character sets. Actually, The Unicode standard according to Wikipedia ...consists of a repertoire of more than 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering and bidirectional display order... Got all that? So, Unicode is a set of standards for dealing with lots of varying languages and characters, and transcoding character codes from one encoding scheme to another. What, then, is UTF[8|16|32]? UTF stands for Unicode Transformation Format, and is a set of mapping methods for translating one of Unicode's 1,114,112 code points (characters or control sequences) to a hexadecimal number. UTF8 is a variably-sized mapping method, which uses between one and four bytes to represent one of the code points. ASCII and most Western character sets take up 1 byte of storage, whilst CJK (Chinese/Japanese/Korean) characters typically consume 3 bytes of space per character. It is important to note that this 3 bytes is one more byte per character than encoding schemes like Shift_JIS, which use either 1 or 2 bytes for characters. Yoshinori Matsunobu published a short article today on these storage space differences. UTF16 is a variable-width mapping scheme which uses the first 16 bits of the hexadecimal number to represent what "category" or "plane" of characters the code point belongs to. UTF16 generally uses a little bit less storage space for CJK characters versus UTF8. However, when analyzing actual CJK text, which includes spaces and other ASCII characters, the storage difference seems to be negligible. UTF32 is a fixed-length mapping method which uses 4 bytes to store each code point. UTF8 is dominant in the web space, with all modern browsers able to understand and encode for UTF8. OK, So What is a Collation?So, if a character encoding scheme, such as UTF8, is used to identify a set of characters and symbols as a machine-readable sequence of bytes, then what exactly is a collation, and why are they important? Glad you asked. A collation , or collating sequence, refers to the order in which different characters in a character set should appear when sorted in a list. The alphabetic collating sequence is the one some of us, in our little English-only world, are familiar with. But in various regions of the world, the same set of characters may be ordered differently when appearing in a list of characters. And therefore, even with a character encoding scheme like UTF8, one must also specify a collation when listing textual results in a specific order. In MySQL, as well as Drizzle, the method for ordering results by a specific collation is fairly simple: one merely specifies the collation in the ORDER BY clause, like the example below shows:
mysql> CREATE TABLE utf8_tests (
-> my_text VARCHAR(100) NOT NULL
-> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO utf8_tests VALUES ('comb'),('cukor'),('csak'),('folyik'),('folyó'),('folyosó'),('fő'),('födém');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM utf8_tests ORDER BY my_text COLLATE utf8_general_ci;
+----------+
| my_text |
+----------+
| comb |
| csak |
| cukor |
| födém |
| fő |
| folyó |
| folyik |
| folyosó |
+----------+
8 rows in set (0.00 sec)
mysql> SELECT * FROM utf8_tests ORDER BY my_text COLLATE utf8_hungarian_ci;
+----------+
| my_text |
+----------+
| comb |
| csak |
| cukor |
| fő |
| födém |
| folyó |
| folyik |
| folyosó |
+----------+
8 rows in set (0.00 sec)
You'll notice that the words "fő" and "födém" are reversed depending on the collation used in the ORDER BY clause. Any Hungarians reading this article? If there are, you'll likely have already spotted the problem with the above output. The problem is that it's wrong. "csak" should appear after cukor, since "cs" is a digraph (two-characters interpreted as one) which comes after "c" in the Hungarian alphabet. The above behaviour is known bug in MySQL since August 2005, over three years. The above bug is something I noticed while reading up on collations and comparing what's going on in MySQL/Drizzle to what the standard expects. The ICU project has a set of HTML pages where you can type in a list of words in a language and sort by various collations, and it will show you the correct sort order. I ran into the bug above, as well as a new bug in the German collation I found today. Where Drizzle Is Right NowCurrently, all but the UTF8 character set have been removed from Drizzle. Furthermore, the UTF8 implementation in Drizzle is full 4-byte UTF8, which differs from the 3-byte variety used in MySQL <= 5.1. There are two major benefits that this decision and subsequent removal has given Drizzle:
So, it seems that although we've stripped out a lot of complexity by moving to only UTF8 and its collations, we've inherited a system that, frankly, was never designed to handle complex collations. Instead, it is designed to be fast, not entirely accurate. So, what is a project to do? We have a number of options, all of which we've been debating over on the mailing lists:
libICU is, frankly, quite a large library, and it's not certain that the performance of it would be satisfactory. However, I can certainly envision taking libICU's test case suite and converting it to the Drizzle test suite format. This would certainly poke holes in our current character set handling that need to be discovered. Although Yoshinori-san's objections about UTF8 storage requirements versus localized Japanese character sets are valid, I don't think at this point that we'll re-introduce non-UTF8 character sets into the server at this time. If there is a huge uproar over this, in the future, pluggable character sets are a possibility, after changes to the plugin API to enable it. Pluggable collations too... This last option is the one which interests me the most, and I find most appealing. In fact, I compiled a small test program based on the C++ <locale> facilities which actually produces the correct collation order for the bug demonstrated above:
Compiling and running the program shows the correct sorted order for the words: [518][jpipes@serialcoder: /home/jpipes/repos/drizzle/test-hun]$ g++ test.cc [519][jpipes@serialcoder: /home/jpipes/repos/drizzle/test-hun]$ ./a.out comb cukor csak I'm thinking that the refactoring work that still needs to be completed around CHARSET_INFO and MY_CHARSET_HANDLER should experiment with the technique above and verify any performance regression (or improvement) that may occur. Accuracy, in my opinion, and the ability to let a library not written by Drizzle developers do the heavy lifting, is more important than a small performance increase. The Edwin Strikes Back
Notable voices on the thread include Matz, creator of Ruby and a core influencer in its direction, and Tim Bray, of our own Sun Microsystems and XML fame. The original poster, one Michael Selig, began the thread, entitled Character encodings - a radical suggestion, with an ostensibly simple suggestion: Remove internal support for non-ASCII encodings completely, and when reading/writing UTF-16 (and UTF-32) files automatically transcode to/from UTF-8. Unfortunately for Michael, this small suggestion was the online equivalent of stepping in a pile of elephant dung.
Until reading the above-mentioned forum thread, I really had no idea about the complexities involved in character set handling, especially in the Asian countries. If you are interested in character sets, collations, and Unicode vs. local encodings, reading through the forum thread will truly enlighten you as to the various arguments for and against UTF8. It's highly recommended reading, but be warned, it may leave you gasping for breath at some points...enjoy. So Long, and Thanks for all the FishWell, as Giuseppe announced, I am leaving the MySQL Community Team after almost three years. I'll still be working at Sun, but as a staff engineer on the Drizzle project in the Sun CTO organization. We are looking for someone to pick up the reins in the North American MySQL community and assume the role as Community Relations manager. Interested? Get in touch with Giuseppe or myself after reading his article about the requirements of the job.
I should add that candidates should be advised about Giuseppe. As your team lead, he may subject you such horrors as excellent project and managerial skills, a kind and encouraging shoulder on which to vent, and a deep, heartfelt connection with open source and community issues. In addition, you can look forward to working with Kaj, Lenz, and Colin, three of the hardest-working people at MySQL which will eventually make you feel like you just can't do enough to keep up. So, MySQL Community, thanks for all the Fish! Of course, I won't be too far at all. Working on Drizzle, it's pretty likely you'll be hearing from me. That is, if Brian and Monty let me out of my coding rabbit-hole... Tuesday, September 30. 2008Yo! Get Your MySQL Conference Submissions In.Although Colin Charles has taken over the illustrious duties of Program Chair for the MySQL Conference and Expo 2009, I'm participating in the conference as a member of the submission voting panel along with over a dozen other folks. The deadline for submitting abstracts is October 22nd. Yes, that's 23 days away. So, if you haven't submitted yet, please consider doing so. Giuseppe and others have done a good job outlining guidelines for you to follow in order to get a submission accepted. There is an online form for submitting an abstract. Get your submissions in today! Thursday, September 25. 2008Another Quick Feature Added to MySQL Forge
Small Feature Addition to MySQL Forge
Monday, September 22. 2008A Contributor's Guide to Launchpad.net and Bazaar SlidesToday at the Riga Sun Database Group Developer Meeting, I'm giving a MySQL University session about using Launchpad.net and Bazaar for Contributors. Below, I've posted links to the slides.
A Contributor's Guide to Launchpad and Bazaar Open Office Impress slides
PDF slides
Topics included in the slides:
Thursday, September 18. 2008Slides from Drunken Query Master and Join-fu Talks at ZendCon
Below are the Open Office Impress and PDF versions of the slide decks for Legend of Drunken Query Master and Join-fu for ZendCon. Enjoy.
Legend of Drunken Query Master: The Apprentice's Journey Open Office Impress slides
PDF slides
Join-fu: The Art of SQL - ZendCon 2008 Open Office Impress slides
PDF slides
Topics included in the slide decks:
Wednesday, September 10. 2008Enabling and Fixing Drizzle Test CasesWhen Brian began the work on refactoring the MySQL 6.0 Server source code into what has now become the Drizzle Project, a number of code pieces were removed, including some major MySQL functionality such as stored procedures, server-side prepared statements, SQL Mode, some legacy code, and a variety of data types. The goal, of course, was to reduce the server code base down to a more streamlined and eventually modular kernel. Of course, that vision is great, but it's got some side effects! One of those side effects is a dramatic reduction in the number of test cases that pass the test suite in their current form, and an increase in the number of tests that have been disabled. I re-enabled and fixed a few tests yesterday, but as of this writing, there are only 54 of 408 tests currently passing in the test suite.
This is to be expected. You can't just go and strip a huge chunk of the parser and functionality out of the server and expect the original test suite to run without problems This article will explain the process of running the Drizzle test suite and identifying test cases which can be re-enabled or should be fixed. We'll focus on stuff that you can help with as a contributor who wants to start getting involved in Drizzle and making an impact without having C/C++ coding experience. If you haven't caught my previous articles on using Launchpad.net for code management, I'd suggest reading those now. In addition, although we won't be doing C/C++ coding, you'll need a build environment established in order to properly run the test suite. So, I'd also suggest reading my article on setting up a C/C++ development environment for Drizzle. The Test Suite Basics
NOTE: This section describes the Drizzle test suite. However, if you are contributing to the MySQL Server project, the instructions in this section are exactly the same if you are working with the MySQL Server. Just change dtr to mtr. The Drizzle test suite is a composite of a main Perl script — tests/test-run.pl — and a couple other tools. After you have built Drizzle with the standard build process, you will see a program in the /tests source directory called dtr. This is the test suite runner. When you issue the command:
make test
This test runner is called with some command-line options and a list of tests to run. You can verify this behaviour by looking at /tests/Makefile.am and seeing the actual command for the test make target. The general form for running a test case is the following:
cd tests
There are a number of command-line options that the test suite runner accepts, and I'll cover a smattering of them in this article. Running a Test
So, how do you know what the names of the tests are? Good question!
./dtr select
Note that you do not need to add the .test suffix. You should see results similar to the following: [505][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk/tests]$ ./dtr select Logging: ./dtr select <snip> MySQL Version 7.0.0 Using dynamic switching of binlog format Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 9306 Using MASTER_MYPORT1 = 9307 Using SLAVE_MYPORT = 9308 Using SLAVE_MYPORT1 = 9309 Using SLAVE_MYPORT2 = 9310 Killing Possible Leftover Processes Removing Stale Files Creating Directories Saving snapshot of installed databases ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.select [ pass ] 9673 ------------------------------------------------------- Stopping All Servers All 1 tests were successful. The servers were restarted 1 times Spent 9.673 of 21 seconds executing testcases As you can see, the test suite fires up a Drizzle server, loads the test file and performs the tests contained in the file. The tests in the file generally consist of SQL statements that are executed against one or more servers, but they can also be commands such as creating a new connection, logging output, and other things. For this article, we'll be focusing on the SQL command tests. In a followup article, I may highlight some of the other test-case commands available to you. Failing Test CasesWell, it's all fine and dandy if a test case succeeds like in the example above, but like I mentioned in the introduction of this article, we're focused on the test cases that aren't succeeding and getting these test cases to succeed! So, how do we find those tests which are failing? One method is to look at the Drizzle Build Farm and track down failures occurring in the test runs. Another way is to simply run a series of tests and see what fails. For simplicity's sake, I've done a little research already and know a number of tests that are failing. So, we'll go ahead and take a look at a test case that I know needs some TLC. The test case I've chosen is the func_math test from the main test suite. It's small and provides a good example of how we can work to fix up the failures. Here is what I get when running this test: [505][jpipes@serialcoder: tests]$ ./dtr func_math Logging: ./dtr func_math <snip> MySQL Version 7.0.0 Using dynamic switching of binlog format Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 9306 Using MASTER_MYPORT1 = 9307 Using SLAVE_MYPORT = 9308 Using SLAVE_MYPORT1 = 9309 Using SLAVE_MYPORT2 = 9310 Killing Possible Leftover Processes Removing Stale Files Creating Directories Saving snapshot of installed databases ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.func_math [ fail ] drizzletest: At line 134: query 'create table t1 (a varchar(90), ts datetime not null, index (a)) engine=innodb default charset=utf8' failed: 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'charset=utf8' at line 1 The result from queries just before the failure was: < snip > 656 405 122 405 645 405 INSERT INTO t1 VALUES (3); SELECT CAST(RAND(2) * 1000 AS UNSIGNED), CAST(RAND(a) * 1000 AS UNSIGNED) FROM t1; CAST(RAND(2) * 1000 AS UNSIGNED) CAST(RAND(a) * 1000 AS UNSIGNED) 656 405 122 405 645 405 858 656 354 906 SELECT CAST(RAND(2) * 1000 AS UNSIGNED), CAST(RAND(a) * 1000 AS UNSIGNED) FROM t1 WHERE a = 1; CAST(RAND(2) * 1000 AS UNSIGNED) CAST(RAND(a) * 1000 AS UNSIGNED) 656 405 122 405 645 405 DROP TABLE t1; create table t1 (a varchar(90), ts datetime not null, index (a)) engine=innodb default charset=utf8; More results from queries before failure can be found in /home/jpipes/repos/drizzle/trunk/tests/var/log/func_math.log Stopping All Servers Restoring snapshot of databases Resuming Tests ------------------------------------------------------- Stopping All Servers Failed 1/1 tests, 0.00% were successful. The log files in var/log may give you some hint of what went wrong. If you want to report this error, please read first the documentation at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html The servers were restarted 1 times Spent 0.000 of 7 seconds executing testcases mysql-test-run in default mode: *** Failing the test(s): main.func_math mysql-test-run: *** ERROR: there were failing test cases As you can see, the test fails and outputs the source of the failure. Fixing a Broken TestNow that we've identified a failing test, we need to follow a process in order to fix it. The process you should follow is this:
In this case, the failure is due to a mere syntax issue. We've removed character set support and standardized entirely on UTF8, and so the support in the parser syntax for the phrase DEFAULT CHARSET=utf8 is gone. To fix this test, we need to remove the pieces of the old MySQL syntax which are no longer supported in Drizzle. So, we pop open our favorite editor and open up the /tests/t/func_math.test file. Go ahead and remove all instances of default charset=utf8. And then re-run the test with the --record. You should see the following: [508][jpipes@serialcoder: tests]$ ./dtr --record func_math <snip> ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.func_math [ fail ] drizzletest: At line 160: query 'create table t1 (f1 varchar(32) not null, f2 smallint(5) unsigned not null, f3 int(10) unsigned not null default '0') engine=myisam' failed: 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(5) unsigned not null, f3 int(10) unsigned not null default '0') engine=myisam' at line 2 <snip> Again, it looks like we've run into another syntax problem. Above, the test case contains the old ZEROFILL syntax, which allows you to specify a number in parentheses after an integer data type. This functionality, a legacy from Unireg times, is not supported in Drizzle. So, we must remove it. After removing the (XX) ZEROFILL syntax from the CREATE TABLE definitions in the test case file, I re-run the test:
<snip>
=======================================================
TEST RESULT TIME (ms)
-------------------------------------------------------
main.func_math [ fail ]
drizzletest: At line 230: query 'CREATE TABLE t1(a SET('a','b','c'))' failed: 1064:
You have an error in your SQL syntax; check the manual that corresponds to your
MySQL server version for the right syntax to use near 'SET('a','b','c'))' at line 1
<snip>
Once again, we've run into a failure. This time, it's because of the SET data type. This data type has been removed from Drizzle. So, we must remove it from the test case here. After doing so, I re-run the test case, and finally we see a success: [509][jpipes@serialcoder: tests]$ ./dtr --record func_math Logging: ./dtr --record func_math <snip> ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.func_math [ pass ] 107 ------------------------------------------------------- Stopping All Servers All 1 tests were successful. The servers were restarted 1 times Spent 0.107 of 8 seconds executing testcases Cool. Looks good. Now we edit tests/Makefile.am and add the newly successful test to the make test target.
cd tests
Here is what the section in the Makefile.am looks like, with the bolded line being the line I add for our newly successful test:
test-drizzle:
$(PERL) -I$(top_srcdir)/tests/lib \
$(top_srcdir)/tests/test-run.pl --fast --reorder --force \
1st \
alter_table \
bench_count_distinct \
bulk_replace \
comment_column2 \
comments \
consistent_snapshot \
count_distinct \
count_distinct2 \
count_distinct3 \
create_select_tmp \
ctype_filename \
delete \
distinct \
drizzleslap \
endspace \
flush2 \
func_equal \
func_group_innodb \
func_isnull \
func_like \
func_math \
greedy_optimizer \
group_min_max_innodb \
heap_auto_increment \
Alright, cool. OK, now we simply need to verify our test case and result file changes, edit our make test target, and commit our changes. First, verification: [511][jpipes@serialcoder: tests]$ bzr status modified: tests/Makefile.am tests/r/func_math.result tests/t/func_math.test Looks good. The final step is committing our work and then pushing to a code branch on Launchpad.net. Below, I am pushing to the branch lp:~drizzle-developers/drizzle/enable-tests, which is a team branch used to push code for the various test cleanups. [514][jpipes@serialcoder: tests]$ bzr commit Makefile.am t/func_math.test \ > r/func_math.result -m "Fixed syntax errors in func_math test and re-enable \ > the test in the make test target" Committing to: /home/jpipes/repos/drizzle/trunk/ modified tests/Makefile.am modified tests/r/func_math.result modified tests/t/func_math.test Committed revision 405. [515][jpipes@serialcoder: tests]$ bzr push lp:~drizzle-developers/drizzle/enable-tests Pushed up to revision 405. And that's that! Test fixed, case, result and Makefile.am edited, and changes committed.
Posted by Jay Pipes
in Articles, MySQL, Drizzle, Launchpad, Bazaar
at
15:56
| Comments (0)
| Trackbacks (0)
Tuesday, September 9. 2008MySQL Articles in French - Hey, I'm Bilingual!OK, I'm not really bilingual at all. But, fortunately for our French dolphins out there, some French MySQL users have translated some articles on MySQL into French! Developpez.com, a French-language developer website, now has translated my Rolling Sums and Aggregates in MySQL article from a number of months ago. Thanks to Joris Crozier and Arnaud Feltz for doing the translation! If I could remember anything from my years of French class in high school, I suppose I could read the article, but alas, the mind has a habit of forgetting those things which we don't use on a daily basis.
Apart from my article, there are lots of other MySQL resources for French-speaking developers. Check em out. Wednesday, September 3. 2008Got .NET? Got MySQL? Be a ContributorReggie Burnett is one of the those engineers at MySQL who deserves greater recognition. At a company dominated by LAMP-stackers, Reggie stands up and shouts for what some inside MySQL would view as a minority: developers using Windows to develop MySQL applications. The funny thing about this is that Windows developers aren't really a minority at all are they? Both MySQL and Sun have statistics which show that more than fifty percent of developers use the Windows as their primary development platform. Whether or not the applications they develop get deployed on a Windows server, developers are actively utilizing a Windows development environment, typically using the Visual Studio IDE. So, one thing that bothers Reggie, and me, is the lack of quality resources that we have to help support the MySQL on Windows developers. Hopefully, together we can change all that. Woefully, the MySQL Developer Zone does not have a section to provide help for .NET and Windows developers. That is changing as we speak; our web team is adding a section to the zone for this. But what we need is help in writing articles which focus on Windows and .NET development with MySQL. And that is where you come in, dear community. I'm asking for any developer who has experience working with MySQL on Windows, and especially developers working with Connector/.NET and Connector/ODBC, to give your fellow developers a helping hand by sharing that experience with them. The community team wants to work with you to get articles and links to helpful content published on the MySQL Developer Zone's .NET/Windows section. So...if you can answer Yes! to any of the below questions, please contact me at SELECT REVERSE('moc.lqsym@yaj'); with your contact information please!
Besides the obvious worldwide recognition and a feel-good emotion you get from getting an article published on a top 500 website, I'll send you a MySQL Community Contributor polo shirt and blog about your contribution! Email me today to get this initiative started! MySQL and Drizzle Developers: Upgrade to Bazaar 1.6.1 Now
Ubuntu users: grab the 1.6.1 Bazaar package for your Ubuntu version from the Launchpad.net Bazaar Project Package Archive. Mac OSX users:: grab the disk image here, with these instructions. Windows users: grab the Windows installer for 1.6.1rc from the Launchpad repository. Friday, August 29. 2008Huge Performance Improvement for Bazaar Coming SoonIn writing my last article, I mentioned working with John Arbash Meinel, one of the lead developers of Bazaar, in attempting to diagnose and fix the performance bottlenecks apparent in using Bazaar with larger, history-rich projects like the MySQL Server. Well, after running some tests and building a custom branch of Bazaar that John pointed me to, I am happy to tell you that help is just around the corner. In my last article, you saw that doing a bzr branch lp:mysql-server took 91 minutes. This was a significant barrier to entry, I recognize. So, I think you'll be happy to see the results below, taken yesterday using John's patched-up Bazaar branch: [529][jpipes@serialcoder: /home/jpipes/repos/mysql-server/5.1]$ time test_bzr branch lp:mysql-server/5.1 trunk Branched 2677 revision(s). real 23m25.094s user 3m53.715s sys 0m7.788s
Great work, John! MySQL contributors and engineers, you should see these performance patches appear in Bazaar 1.6 (1.6.1rc?) shortly. That should make a bunch of folks happier. Right, Mark? Thursday, August 28. 2008A Contributor's Guide to Launchpad.net - Part 2 - Code ManagementIn this second part of my Launchpad guidebook series, I'll be covering the code management and repository features of Launchpad.net. If you missed the first part of my series, go check it out and get established on Launchpad.net. Then pop back to this article to dive into the magic of http://code.launchpad.net. In this article, we'll cover the following aspects of the code management pieces of Launchpad:
For the following article, we'll be acting as if you are contributing to the MySQL Server project and wish to create a patch to fix a bug in the MySQL server. We'll be working through all the steps to do so. If you are looking to contribute to a different project, or your own project, simply replace the names and URLs in the article with ones for your particular project. The Structure of Project Source Code on Launchpad.netProjects hosted on Launchpad.net are organized using the terminology branches, repositories and series. A branch is simply a Bazaar branch of the project's source code. A repository is a collection of a project's Bazaar branches. A series is a named branch which represents something special for the project — usually a tagged release or a development branch.
This pull, code, push and merge process is the recommended way to manage code changes for a project. It allows a core set of "merge captains" to review and check your code before merging your code into the active development branch. It is this process which I will be demonstrating in this article. Pulling Code into a Local RepositoryWhen you work on a project in Launchpad, you work on code in a local branch of the project. To get rolling, you will first want to set up a local repository if you haven't already done so. To do so, we use the bzr init-repo command: NOTE: You will need Bazaar installed to do so. Don't have Bazaar installed? See my previous article on Getting a C/C++ Development Environment Established. Not coding in C/C++? Don't worry, just read the section of that article on installing Bazaar.
cd ~/repos # Change this to the folder in which you plan to have bzr repositories...
At this point, a shared repository will be created. What the heck is a shared repository? Well, it's basically a special folder that Bazaar knows contains information about source code branches. It facilitates speedier branching and merging and is especially useful for larger source trees and changeset histories like the MySQL server. You can verify that Bazaar knows something about your newly created repository by checking for a .bzr hidden folder in your repository folder: [504][jpipes@serialcoder: /home/jpipes/repos]$ ls -la mysql-server-5.1/ total 12 drwxr-xr-x 3 jpipes jpipes 4096 2008-08-26 15:00 . drwxr-xr-x 7 jpipes jpipes 4096 2008-08-26 15:00 .. drwxr-xr-x 4 jpipes jpipes 4096 2008-08-26 15:00 .bzr The next step is pulling the active development series of your particular project. In our example, we'll pull the active development branch of the "5.1" series of the MySQL Server. To do so, we use the bzr branch command:
cd mysql-server-5.1/
WARNING: When doing the initial pull of the first branch in a shared repository, the branch command can take quite some time to execute, especially when pulling a branch of a project like the MySQL Server which has a huge history of changesets to it. Be prepared to wait a while, and if Bazaar looks like it's stuck doing stuff, just leave it alone. In the bzr branch command above, lp: designates we are looking for a branch residing on the Launchpad.net Bazaar servers. The colon is followed by the name of the project, in this case mysql-server, followed by a slash and the name of the series, in this case 5.1. You can always check the names of a project's series by going to the main code area of a project. For MySQL Server, that address would be http://code.launchpad.net/mysql-server When finished the initial branch, you'll see something like the following, shown with the time command to illustrate the amount of time you should expect for the MySQL Server initial branch: [511][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1]$ time bzr branch lp:mysql-server/5.1 trunk Server is too old for streaming pull, reconnecting. (Upgrade the server to Bazaar 1.2 to avoid this) Branched 2719 revision(s). real 91m30.337s user 14m6.825s sys 6m7.355s The total amount of "stuff" that is downloaded for the 5.1 server is around 600MB, so it shouldn't be surprising that it takes some time to do the initial branch...
You can ignore the message about upgrading the server to Bazaar 1.2; it's because I'm not using the very recent Bazaar 1.6 client. The performance of this first branch is currently being investigated by John Arbash Meinel, one of Bazaar's developers, whom I spent some "quality time" with on IRC today. Creating a Local Working Branch for Bug Fixing
OK, if you've gotten this far, then you will have a local shared repository that contains a single branch which contains the source code and changeset history for the 5.1 series of the MySQL Server. What we want to do now is fix a bug in the MySQL Server 5.1 locally on our workstation. This will be a fictitious bug called Bug#99999 - "authors.h file doesn't contain MY NAME!" Certainly, we could just start hacking up the code in the trunk branch we just pulled. But, that's not the most practical way of doing structured development on a local workstation. Instead, you should create a branch from trunk which will house only the changes specific to what you are working on: in this case, our fictitious bug#99999. Why is this a better practice than simply making the changes in the trunk branch? Well, a couple reasons:
OK, so hopefully I've convinced you to follow the advice of creating separate local working branches for actually changing source code.
bzr branch trunk bug99999-fix-authors-file
Once completed, you should see something like the following: [571][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1]$ bzr branch trunk bug99999-fix-authors-file Branched 2719 revision(s). We're now ready to start fixing our bug. Hop into the newly created working branch and open up the sql/authors.h file in your editor of choice (here, I'll use Vim)
cd bug99999-fix-authors-file
The "fix" for this bug is simply adding your name to the sql/authors.h file of course. Go ahead and add your name to the list in that file and save and close the file. Now, that has to be the easiest bug fix ever. No wait, I take that back. Before we commit anything, let's first check to see what changes we have made in the local branch. To do so, we use the bzr status command, like so:
bzr status
If you've done everything up until now, you should see something very similar to the below: [503][jpipes@serialcoder: /home/jpipes/repos/mysql-server-5.1/bug99999-fix-authors-file]$ bzr status modified: sql/authors.h The next thing we'll need to do is commit our changes to the local branch. For those of you used to CVS or Subversion, this step will look familiar, however remember that with Bazaar you are committing to the local branch, not a central repository. (This isn't always the case, but for now, assume it is...) Like in other source control systems, we'll use the commit command. There are a number of command options that you can use with the commit command, and I will outline two of them here. The most important is the -m option, which allows you to enter a string which will be the comment for the set of changes in this commit. This is an extremely useful option for smaller changesets. For larger ones, leave off the -m option and your environment's editor will pop up after hitting enter to allow you to enter in larger comments. TIP: Remember, a best practice whenever you commit source code to a revision control system is to make the changeset comments as descriptive as possible, so other developers can clearly tell what you were doing.
The second option I'll tell you about is a nifty one which integrates Bazaar with the Launchpad.net bug tracking system. If you have a Bug report that is managed by Launchpad.net, you can supply the bug number to the --fixes option and automatically close a bug report with your bzr commit. Pretty cool, eh? Below, I show how to use these two options with the bzr commit command:
bzr commit -m "Add my name to authors.h" --fixes lp:99999
You should see something like the following appear: Committing to: /home/jpipes/repos/mysql-server/bug99999-fix-authors-file/ modified sql/authors.h Committed revision 2720. Did you know that Bazaar supports more bug tracking systems than just Launchpad.net? Check out the Bazaar user guide section on Bug Tracking integration for more information. Pushing Code to LaunchpadNow that you've made your code changes, it's time to push those changes up to a branch on Launchpad.net. Why do we want to push the local branch changes to Launchpad.net, instead of sending the changes to another team member (using the bzr send or bzr export command)? Well, first of all, pushing the changes to Launchpad.net allows anyone to see and review your code changes, making the Launchpad.net website an easy and centralized place to do that. Secondly, having the branch on Launchpad.net allows you to get more out of the Launchpad.net platform and integrate your branch and code with other features of the platform, such as Bug Tracking and the Blueprints task management and milestone system. To get your local branch to Launchpad.net, you use the bzr push command. The push command takes as an argument the address of the branch to which you wish to push your local changes. If you are pushing changes to a new branch on Launchpad.net, the system will create that new branch for you automatically. If you are pushing to an existing branch, the system simply uploads your changesets and applies them to that branch. On Launchpad, there are a number of locations where we can push Bazaar branches. Each Launchpad user gets their own http://code.launchpad.net/~username area in which to put branches. Whenever you are a member of a project team, you also can push code into the code.launchpad.net/~username/projectname/ location. Also, each Launchpad user has a "Junk" area (code.launchpad.net/~username/+junk/ that they can post any old branch to. After either of these locations, you put the name of the branch you are pushing to (or creating). To see an example, let's use our "Junk" area for right now, and push our local bug99999-fix-authors-file branch to Launchpad.net. We want to push our local bug-fix branch to a general branch in our junk folder which will contain all of our bug fixing efforts. Why? Well, there's no need to create separate branches for each bug fix on Launchpad.net just so people can see the code in the single bug fix. We can push all our code changes to a general branch and then point reviewers to the specific revision we worked on. This saves a whole lot of time when pushing branches. So, here is how we push:
bzr push lp:~jaypipes/+junk/mysql-server # Of course, replace jaypipes with your own username!
After a while, you'll see the following: Created new branch. You can now go view your branch and it's associated code changes by visiting the branch's code URL, which will be http://code.launchpad.net/~yourusername/+junk/mysql-server if you've been following the steps in this article. Belonging to a Project TeamHaving a branch in your junk folder is fine, but Launchpad is all about belonging to a community of developers! When you belong to a project team, you are automatically able to push your branches to the project's code area. Importantly, if you belong to a project team, then you can use Launchpad's ability to propose a branch merge — something you cannot do if you push to your "Junk" folder. To push one of your local branches to a project, you would do:
cd ~/repos/projectname/branchname>
Once pushed, the branch will be visible to anyone when they look at the project's code branch listing, which is always at http://code.launchpad.net/projectname. If you click on your branch, you will go to the branch's main page. You'll notice links for a number of actions that you can take, including one called "Propose for merging into another branch". I'll be talking a lot about these options in future articles in this series, since many of them relate to the other parts of the Launchpad platform. Notifying a Merge Captain of Your Code Pushes
OK, now that your code is up in a branch, the next step is to ask for a review of your code to be merged into the development branch of the project. In this case, I'm pretty sure our Bug#99999 fix isn't going be passing any code reviews to get into the MySQL Server, but I'll explain the process anyway for reference. A merge captain is someone who chaperons a project's main development branch by being a gatekeeper for new code being merged into it. This is a critical role that some folks think is a fun or coveted job. It's not. As a merge captain for the Drizzle project, I can tell you that in fact it's mainly grunt work comprising tedious pulling, merging, building the code, and running of test suites. So, be nice to your merge captains! The best way to be nice to your merge captain is to follow these two simple rules:
In the case of the second rule, the standard process uses the Launchpad.net platform to notify the merge captain of a request — instead of, for instance, bugging the crap out of the captain on IRC to merge your branch. To notify the merge captain about your branch, navigate to the main code page of the branch you pushed. Remember that to do this step, you must be a member of the project team! On the branch's main page, look for the link "Propose merging into another branch" and click it. You will be taken to the Propose Branch for Merging page. Select the target branch. The default will be the active development branch for the project in question. You may also provide a branch name. In the "whiteboard" text area, you can provide a brief description of the changes contained in your branch. Typically, you will want to keep the option "Needs Review" selected, and then click the Register button. At this point, an email will be fired off to any subscribers for the project's trunk branch commits. The merge captain will, of course, be one of the ones which receives this email and will initiate the review of your code and the merging of it into trunk (or whichever target you specified). Keeping Trunk Up to DateThe final two parts of this article looks at how to keep your own local branches up to date with the development branch of your project and also shows you how to merge someone else's branch with your own. To pull all the latest changes from a remote branch into a local branch, the bzr pull command is used. You navigate to the local branch you wish to pull changes for and then issue the pull command. This will bring in all the merges and changesets of the branch you originally branched from into the local branch. In our case, the local "trunk" branch was branched originally from the branch at lp:mysql-server/5.1 and so if we do:
cd ~/repos/mysql-server-5.1/trunk
We will update our local trunk branch with the changes in the active development branch on Launchpad. Merging Local Code with a Trunk BranchA merge, as alluded to above, is simply when you want to combine the code changes of one branch with the code in another. We use the bzr merge command to do so. The merge command takes a single argument: the location of the branch you wish to merge into the current one. To demonstrate how the merge command is used, I'll just paste the work I just did for the Drizzle project in merging in a contributor's fixes into trunk: [509][jpipes@serialcoder: /home/jpipes/repos/drizzle]$ cd trunk/ [510][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ bzr merge ../grant-bug261687/ M drizzled/sql_derived.cc All changes applied successfully. [511][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ bzr commit -m \ "Merged Grant's fixes for sql_derived. Fixes bug#261687" --fixes lp:261687 Committing to: /home/jpipes/repos/drizzle/trunk/ modified drizzled/sql_derived.cc Committed revision 373. The above was executed after a pulled Grant's Launchpad.net branch called bug261687, built the code, and ran the test suite to ensure no failures. I change to my local branch of the drizzle project development branch (called "trunk" locally). I then merge Grant's branch into trunk locally with the bzr merge ../grant-bug261687 command. After merging, I must commit the trunk code. I do so, making a comment that this is a merge of Grant's work, and noting the bug # which the changeset fixes. Once I do this, I am free to bzr push lp:drizzle and push the local merge to Launchpad... ConclusionAs you can see, Launchpad and Bazaar is a feature-full code management system with a lot of bells and whistles. Hopefully, this article can get you started in your adventures in contributing to projects hosted on Bazaar/Launchpad.net. See you on Launchpad! Up next in this series: how to do task management with Launchpad's Blueprints system.
Posted by Jay Pipes
in Articles, MySQL, Drizzle, C/C++, Launchpad, Bazaar
at
20:07
| Comments (0)
| Trackbacks (0)
Friday, August 22. 2008A Contributor's Guide to Launchpad.net - Part 1 - Getting StartedThis post is the first in a series of articles which serves to highlight the services of the Launchpad platform which hosts the MySQL Server, MySQL Forge, MySQL Sandbox and Drizzle Server projects. I will be walking you through the various pieces of the platform and provide examples of using each of the services. I will cover in depth the source code management services which all three projects now rely upon. The code management services are the critical piece of the development platform. In addition, I will show you how to use the Blueprints, Bugs, Answers and Translations services that many MySQL ecosystem projects, including the MySQL Sandbox and MySQL Forge, use. In this first article, I will walk through the critical first step of establishing a Launchpad.net account and setting up the OpenSSH and OpenPGP keys for your account. In follow-up posts, I cover the code management system, blueprints system, and more. Creating Your Account on Launchpad.netThe first thing to do is obviously create your account on Launchpad.net. Doing so is trivially easy. Go to the registration page and enter in your email address. Launchpad will then email (subject line: "Finish your Launchpad registration") that address with a link to start the registration process. Click on the link in that email. You'll be asked to provide a Display Name and a password. You can choose to have your email address hidden from other users or not. After filling in the information, click the Continue button and you'll end up in your account profile area. Once in the profile area, go ahead and fill in any of the information you want to be public about yourself by clicking the "Change Details" link in the right of the page. You can upload an avatar image for yourself and fill in a little "Bio" section. In a sub-navigation area at the top of the page, you will see links to edit your Email Settings, SSH Keys, GPG Keys, and Passwords. We'll cover the SSH and GPG Keys in a bit. For now, explore this area and set your preferences the way you like. Also, at the very bottom of the Change Details page, you also have some other links to edit your IRC nicks and other stuff. Your OpenSSH KeysOne important thing to do when setting up your account is to upload your public SSH key to Launchpad.net. This helps the code management system by facilitating the bzr+ssh protocol and allowing you to push a bazaar branch to the Launchpad.net supermirror (more on that later). If you have already generated your public key, go ahead and skip the next subsection and proceed to "Upload your public key". To generate your SSH key, you will need to have OpenSSH installed. For Ubuntu users, simply do sudo apt-get install openssh. For other Linux users, use your package manager of choice. Windows users can use the PuTTY key generator, and should follow instructions on the excellent Launchpad.net Help wiki. Once you have OpenSSH installed, it's time to generate your key. Do so with the following in a terminal:
ssh-keygen -t dsa
When prompted, to accept the default key file names (~/.ssh/id_dsa.pub and ~/.ssh/id_dsa, and then a password for protecting the key file. Upload your public keyYou can output your public key using the following:
cat ~/.ssh/id_dsa.pub
Go ahead and copy the public key as-is; you'll need it soon. Now that you've generated your SSH keys, go ahead and click on the "SSH Keys" in the sub-navigation bar in your profile. Simply copy your public key into the text area marked "Add an SSH Key" and then click "Import Public Key". OK, all set, let's tackle the GPG Keys next. GPG Keys (Optional, but Recommended)Before we generate a GPG key and upload one to Launchpad.net, you will first want to ensure that you have a mail reader capable of decrypting PGP-encrypted emails. Personally, I use Thunderbird and the excellent Enigmail plugin for this, but you will want to use your own preferred MUA. Use this help article for assistance in setting up PGP for your mail client of choice. OK, next up we'll go ahead generate GnuPG keys for use with email security and the Launchpad.net mailing lists. To generate a GnuPG key pair, issue the following in a terminal:
gpg --gen-key
This will start a series of questions for you to answer, including which email address the key is for, your name, a passphrase and how many bits to make your key file (I chose 2048). Here is what the series will look like in a terminal:
[510][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --gen-key
gpg (GnuPG) 1.4.6; Copyright (C) 2006 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.
Please select what kind of key you want:
(1) DSA and Elgamal (default)
(2) DSA (sign only)
(5) RSA (sign only)
Your selection? 1
DSA keypair will have 1024 bits.
ELG-E keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)
Requested keysize is 2048 bits
Please specify how long the key should be valid.
0 = key does not expire
OK, once generated, verify that everything worked as expected using the following:
gpg --list-keys
which should produce output similar to the following: [512][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --list-keys /home/jpipes/.gnupg/pubring.gpg ------------------------------- pub 1024D/9C5804A8 2008-08-22 uid Jay Pipes The next step is to push your public key to the Ubuntu key server — technically, you could push it to any old key server, but to keep things simple, just use the ubuntu key server. You'll need to take note of your public key ID, which is the 8-character hex number following 1024D in the list-keys output. In my case, that public key ID is "9C5804A8". Issue the following command, substituting your public key ID:
gpg --send-keys --keyserver keyserver.ubuntu.com @Your_Public_Key
The final step is to make Launchpad.net aware of your new GPG key. To do so, you need to send your GPG fingerprint to Launchpad. To grab your fingerprint, issue the following command:
gpg --fingerprint
which should produce something very similar to:
[514][jpipes@serialcoder: /home/jpipes/.gnupg]$ gpg --fingerprint
/home/jpipes/.gnupg/pubring.gpg
-------------------------------
pub 1024D/9C5804A8 2008-08-22
Key fingerprint = 16C5 50D4 7061 03A1 48CD 3826 0CAD 7BD9 9C58 04A8
uid Jay Pipes
Copy the key fingerprint and paste it into the text area in your Open GPG keys page in your profile, then click the "Import Key" button. Launchpad.net will email the address of the key with a confirmation message. Click the link in the email under "Please go here to finish adding the key to your Launchpad account:". And you are all done.
Now you have all the pieces set up to begin working with Launchpad.net effectively. In the next few posts in this series, I'll walk you through how to best use the platform to be a productive contributor to the MySQL ecosystem.
Posted by Jay Pipes
in Articles, MySQL, Drizzle, MySQL Forge, Launchpad, Bazaar
at
10:41
| Comment (1)
| Trackbacks (0)
Thursday, August 21. 2008Getting a Working C/C++ Development Environment for Developing DrizzleThis article explains how to set up a properly functioning C/C++ development environment on Linux. The article is aimed at developers interested in contributing to the Drizzle server project, but the vast majority of the content applies equally well to developers wishing to contribute to the MySQL server or any other open source project written in C/C++
IMPORTANT: This article doesn't get into any religious battles over IDEs or particular editors. IDEs and editors are what you use to edit code. What this article covers is the surrounding libraries, toolchain, and dependencies needed to get into the development or contirbution process. That said, go Vim. The examples shown use the Debian/Ubuntu methods of obtaining code packages and installing them — specifically, using apt-get install to install packages. If you are running on a RPM-based distro, simply change the commands to use your package manager of preference, for instance yum install. Solaris users should get as far as they can in the installation/setup process and hop over to the Freenode #drizzle channel for help from one of Solaris experts. OK, let's get started, shall we? Installing Bazaar and Some Bzr GoodiesSo, what is this Bazaar thing and why do you need it? Good question! Bazaar is the revision control system that the Drizzle development team (as well as the MySQL engineering team) uses for source code control. The good folks over at Canonical maintain and enhance the excellent Bazaar-NG system and have an online platform called Launchpad.net which is tightly integrated with Bazaar. Launchpad.net is kind of like SourceForge.net, only focused around Bazaar as the revision control system, and includes a number of nifty little features that make it easier to manage and maintain teams of developers working on the code base. The Drizzle Server project is hosted on Launchpad.net at http://launchpad.net/drizzle. To install Bazaar, issue the following:
sudo apt-get install bzr
Once installed, you might want to install a few more things that will make your bzr life easier. The bzrtools package is a collection of command-line and graphical utilities for bzr. meld is a graphical merge conflict resolution utility that I have found invaluable at times. PoEdit is an easy way to work with the GetText translation utilities. To install these tools, do:
sudo apt-get install bzrtools meld poedit
OK, that's it for Bazaar for now. Let's move on to getting your development toolchain installed. The Required Toolchain Packages and Library DependenciesIn order to compile the Drizzle server, you will need a working GNU Toolchain with the following development tools. I've noted recommended minimum versions for each.
To install the above toolchain, do the following:
sudo apt-get install libc-dev gcc g++ bison binutils automake autoconf m4 pkg-config libtool
Once apt-get finishes installing the above, you'll have a system capable of compiling C/C++ programs, for the most part. The Drizzle server needs some additional libraries and header files in order to compile. I list them here along with a brief description of the library or file.
The following command should install the required libraries with the exception of Google Proto Buffers, which is described in the following section.
sudo apt-get install libpcre3-dev libpam0g libncurses5-dev libpam0g-dev gettext libevent-dev libz-dev libreadline-dev uuid-dev
Installing Google Proto BuffersAfter installing the libraries and toolchain, you'll need to install the Google Proto Buffers library. Unfortunately, there are not currently distribution packages for this library, so we'll install it manually. Here are the steps to do this. Grab the tarball from the following location: http://code.google.com/p/protobuf/downloads/list. Once downloaded, untar the package. You can do all of it like so:
wget http://protobuf.googlecode.com/files/protobuf-2.0.0beta.tar.gz
Next, let's go ahead and compile the sources:
cd protobuf-2.0.0beta
Once this is done, Google Proto Buffers library (libprotobuf.so.0) will be installed, likely in /usr/local/lib unless you specified a different default directory. In addition, the program protoc, which is the .proto file compiler, will be installed.
IMPORTANT: As developers on a Linux distribution, we tend to forget that when installing a new shared library object, we need to run ldconfig after installation. Why do we forget this? Well, generally, our package manager runs ldconfig automatically after installing a package, so we tend to forget about it.
sudo ldconfig
OK, the proto buffers library and protoc should be installed correctly at this point. To verify, run the following:
sudo updatedb
It should produce something like the following: [504][jpipes@serialcoder: /home/jpipes/Desktop/protobuf-2.0.0beta]$ locate libprotobuf.so.0 /home/jpipes/Desktop/protobuf-2.0.0beta/src/.libs/libprotobuf.so.0 /home/jpipes/Desktop/protobuf-2.0.0beta/src/.libs/libprotobuf.so.0.0.0 /usr/local/lib/libprotobuf.so.0 /usr/local/lib/libprotobuf.so.0.0.0 Setting Up a Local Bazaar Repository for DrizzleNow that you've installed all the required toolchain and dependencies, it's time to use Bazaar to pull the development branch of Drizzle and compile the Drizzle server. The first step to do is to set up that local bzr repository. Myself, I have all my bzr repositories in a directory called ~/repos, and that is what the below examples show, but you are of course welcome to put your repos wherever you prefer. To set up a directory and a drizzle repo under your home directory, do the following:
cd ~
At this point, you have a local bzr repository. Let's now create a local branch of the development source code trunk that we can play with. To do so, we use the bzr branch command, like so:
bzr branch lp:drizzle trunk
This tells bzr to go grab the main development branch of the "drizzle" project that resides on the Launchpad.net servers (thus, the lp: prefix), and create a local branch called "trunk". The branch operation may take a little while to complete when you do it for the first time. Subsequent branch and merge operations are much, much quicker than the first branch into a repository. When the branch succeeds, go ahead and look at the files that have been downloaded into your "trunk" branch:
cd trunk
You should see something like the following: [520][jpipes@serialcoder: /home/jpipes/repos/drizzle/trunk]$ ls -la total 308 drwxr-xr-x 17 jpipes jpipes 4096 2008-08-21 12:56 . drwxr-xr-x 13 jpipes jpipes 4096 2008-08-19 14:38 .. -rw-r--r-- 1 jpipes jpipes 76502 2008-08-04 11:02 ABOUT-NLS -rw-r--r-- 1 jpipes jpipes 347 2008-07-18 16:25 AUTHORS drwxr-xr-x 5 jpipes jpipes 4096 2008-06-30 17:47 .bzr -rw-r--r-- 1 jpipes jpipes 2945 2008-08-21 12:56 .bzrignore drwxr-xr-x 2 jpipes jpipes 4096 2008-08-17 13:20 client drwxr-xr-x 2 jpipes jpipes 4096 2008-08-05 09:57 config -rw-r--r-- 1 jpipes jpipes 42397 2008-08-14 10:01 configure.ac -rw-r--r-- 1 jpipes jpipes 19071 2008-06-30 17:47 COPYING -rw-r--r-- 1 jpipes jpipes 56574 2008-08-13 17:47 Doxyfile drwxr-xr-x 5 jpipes jpipes 4096 2008-08-21 12:56 drizzled -rw-r--r-- 1 jpipes jpipes 5854 2008-07-22 19:34 DRIZZLE.FAQ -rw-r--r-- 1 jpipes jpipes 5139 2008-06-30 17:47 EXCEPTIONS-CLIENT drwxr-xr-x 2 jpipes jpipes 4096 2008-08-13 17:47 extra drwxr-xr-x 2 jpipes jpipes 4096 2008-08-19 13:14 libdrizzle drwxr-xr-x 2 jpipes jpipes 4096 2008-08-14 10:01 m4 -rw-r--r-- 1 jpipes jpipes 2435 2008-08-14 10:01 Makefile.am drwxr-xr-x 2 jpipes jpipes 4096 2008-08-14 10:01 mystrings drwxr-xr-x 2 jpipes jpipes 4096 2008-08-17 13:20 mysys -rw-r--r-- 1 jpipes jpipes 41 2008-07-21 11:38 NEWS -rw-r--r-- 1 jpipes jpipes 402 2008-08-13 17:47 notes.textile drwxr-xr-x 6 jpipes jpipes 4096 2008-08-04 11:02 plugin drwxr-xr-x 2 jpipes jpipes 4096 2008-08-21 12:56 po -rw-r--r-- 1 jpipes jpipes 0 2008-07-21 11:38 README drwxr-xr-x 8 jpipes jpipes 4096 2008-07-28 10:07 storage drwxr-xr-x 2 jpipes jpipes 4096 2008-08-14 10:01 support-files drwxr-xr-x 10 jpipes jpipes 4096 2008-08-19 13:14 tests drwxr-xr-x 2 jpipes jpipes 4096 2008-08-13 17:47 vio Compiling DrizzleOK, you are now ready to compile the server and client tools contained in your branch. The way to do so is the following:
./config/autorun.sh # This will set up the autoconf and make environment for drizzle
If all goes well, drizzle will compile and build, get installed, and output the (hopefully!) passing test results. The output at the end should be similar to the following: MySQL Version 7.0.0 Using dynamic switching of binlog format Using MTR_BUILD_THREAD = 0 Using MASTER_MYPORT = 9306 Using MASTER_MYPORT1 = 9307 Using SLAVE_MYPORT = 9308 Using SLAVE_MYPORT1 = 9309 Using SLAVE_MYPORT2 = 9310 Killing Possible Leftover Processes Removing Stale Files Creating Directories Saving snapshot of installed databases ======================================================= TEST RESULT TIME (ms) ------------------------------------------------------- main.lock_tables_lost_commit [ pass ] 15 main.information_schema [ pass ] 2132 main.alter_table [ pass ] 1066 main.delete [ pass ] 253 main.distinct [ pass ] 345 main.insert [ pass ] 1246 main.join [ pass ] 491 main.join_crash [ pass ] 25 main.join_nested [ pass ] 396 main.join_outer [ pass ] 635 main.join_outer_innodb [ pass ] 16 main.lock [ pass ] 93 main.mysqlslap [ pass ] 19564 main.select [ pass ] 7198 main.subselect [ pass ] 12230 main.subselect3 [ pass ] 379 main.subselect_innodb [ pass ] 174 main.type_newdecimal [ pass ] 413 main.update [ pass ] 536 ------------------------------------------------------- Stopping All Servers All 19 tests were successful. The servers were restarted 3 times Spent 47.207 of 62 seconds executing testcases One thing that is kind of bothersome is that building directly from within your branch directory means that all the files involved in compilation and linking will be sitting in your source branch directory. This makes things a little annoying when you then want to make changes to the source code and commit. bzr will notice a bunch of files it doesn't know about sitting in it's source code directory. To clean up these files, you can do:
make distclean
I have found, though, that the above commands don't always fully clean the source tree. If this is too annoying, you can adopt an approach that I myself use, which is to build drizzle into a directory on a /tmp disk and don't mess up your source branch directory. I use the following shell script to do this:
#!/bin/sh
# build_and_test.sh
build_dir=/tmp
if [ $# -ne 1 ]
then
echo "Usage $0 [build_target]"
exit 1
fi
build_target=$1
build_dest=$build_dir/$build_target
echo "Building $build_target to: $build_dest"
if [ -d $build_target ]
then
rm -rf $build_dest
cp -r $build_target $build_dest
cd $build_dest
./config/autorun.sh
./configure # --disable-pedantic-warnings # --enable-debug
make clean && make -j 2 && make install
cd tests
./dtr --force --reorder --fast \
alter_table \
join_crash \
join_nested \
join_outer_innodb \
delete \
join_outer \
distinct \
type_newdecimal \
join \
heap \
subselect \
subselect3 \
subselect_innodb \
insert \
select \
update \
information_schema \
mysqlslap \
lock \
lock_tables_lost_commit
else
echo "$build_target is not a directory."
exit 1
fi
exit 0
I place the above script in my drizzle repository root directory and of course chmod u+x it. Then, if I want to build a branch I am working on, say a branch called bug200500-fix-something, I simply call:
cd ~/repos/drizzle
And that's it. Sunday, August 17. 2008 |