Thursday, June 25. 2009Performance Tuning webinar today 1pm EST/ 10am PDTJust a quick note to say I'll be giving a 1 hour webinar on performance tuning practices for MySQL at 1pm EST, 10am PDT today. Here is a link to sign up for it: http://www.mysql.com/news-and-events/web-seminars/display-373.html
Hope to see you on the webinar Wednesday, June 10. 2009Drizzle Performance Regression Solved - TCMalloc vs. No TCMallocAs many of you who follow the drizzle-discuss mailing list know, for the last three months, Drizzle developers have been hot (and many times cold) on the tail of a performance problem that we were seeing when comparing Drizzle with MySQL (any version of MySQL, not just 5.4). Briefly, on certain machines, we were seeing Drizzle performing at approximately 50% of MySQL, with throughput measured in transactions per second on both a readonly and readwrite workload with Sysbench. The frustrating part of the results was that on other machines, even other machines with virtually identical architectures, compilers, and operating system, we were seeing Drizzle outperforming MySQL by around 20-30%. So, the Sun Drizzle team, friends in the Sun PAE (performance applications engineering?) team and various contributors set out to iteratively analyze exactly what was going on during sysbench runs in both MySQL and Drizzle. We analyzed the call stacks of both servers using callgrind, cachegrind, vmstat, and other tools, looking for differences which could explain the differences in performance. These efforts ultimately showed either red herrings, or showed results which did not indicate where such a dramatic performance difference was coming from. The efforts did, however, lead us to a much better understanding of the calling patterns, lock contention, and other parts of the servers, and this work will, I am sure, prove very valuable in the coming months as we continue to refine the execution of the server and remove more global contention points. So, what was the eventual culprit? Turns out that Drizzle was, by default, linking the TCMalloc library from Google (libgoogle-perftools-dev package on Debian systems) when it was found installed on the machine. This makes sense. We thought that the TCMalloc library would provide a benefit to our Session-mem_root-based allocation strategy adopted from the MySQL core kernel. Unfortunately, it turns out that TCMalloc dramatically degrades the throughput of the server. When we disabled tcmalloc in our build and re-ran the benchmarks, our numbers went through the roof. The machine these results are from has the following specifications:
The benchmarks were run for a minimum of 10 iterations of 60 seconds each on a readonly and readwrite workload with 1M rows in the table. The configuration was:
The Drizzle Automation Suite was used for benchmarking. Below, you'll see the before and afters of disabling TCMalloc on both a readonly and readwrite workload. We think you'll agree the results are, well, dramatic. READONLY workloadWith TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 1067.444444 | | 4 | 1435.190000 | | 8 | 1937.624444 | | 16 | 2601.817778 | | 32 | 3367.795556 | | 64 | 3930.240000 | | 128 | 3940.764444 | | 256 | 3071.503333 | | 512 | 2003.308889 | | 1024 | 1224.704444 | | 2048 | 530.994444 | +------+-------------+ Without TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 1511.681000 | | 4 | 2714.570000 | | 8 | 4408.986000 | | 16 | 5795.430000 | | 32 | 5619.712000 | | 64 | 4988.760000 | | 128 | 4483.512000 | | 256 | 3914.125000 | | 512 | 2541.946000 | | 1024 | 1325.511000 | | 2048 | 643.446000 | +------+-------------+ And for you pretty graph people: READWRITE workloadWith TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 589.006667 | | 4 | 842.576667 | | 8 | 1142.627778 | | 16 | 1561.532222 | | 32 | 2160.194444 | | 64 | 2169.077778 | | 128 | 1793.243333 | | 256 | 1241.846667 | | 512 | 860.450000 | | 1024 | 491.360000 | +------+-------------+ Without TCMalloc linked: +------+-------------+ | c | tps | +------+-------------+ | 2 | 676.238000 | | 4 | 1096.708000 | | 8 | 1661.204000 | | 16 | 2210.335000 | | 32 | 2353.749000 | | 64 | 2202.926000 | | 128 | 2087.273000 | | 256 | 1717.978000 | | 512 | 1361.468000 | | 1024 | 1000.169000 | | 2048 | 288.299000 | +------+-------------+ Again, pretty graph: Note: With TCMalloc linked, the readwrite workload would not complete at 2048 connections. Without TCMalloc linked, it did complete, although with a significant reduction in throughput. This is likely because of the known InnoDB issue regarding 1024 active transactions limit... Open Invitation to Benchmark Drizzle, MySQL, XtraDB and PostgreSQL
The above results compare only Drizzle to itself with and without TCMalloc. We benchmarked also against MySQL 5.4, but as we've previously stated, we don't think comparison numbers should be published unless by a third or objective party. This serves as an open invitation to benchmark Drizzle against MySQL 5.4, XTraDB, PostgreSQL 8.4 and anything else. We'd love to see a validation that our principles of smaller, cleaner code with fewer global contention points, using standard libraries and having features live in a module ecosystem truly does enable a faster, leaner query-running machine. By the same token, if published benchmarks identify cases where Drizzle underperforms compared to another RDBMS, we'd love to tackle the performance problems the benchmarks show. The more data, the better Monday, April 27. 2009Drizzle, MySQL and a Public Apology to Karen PadirOK, so those who know me well can attest to the fact that I can sometimes be a little quick to speak my mind. Sometimes I say snarky things without thinking much. Last week, I blogged a short, snide blog post that Drizzle was not the MySQL Drizzle project and that it was not "shepparded by MySQL" (and, yes, I spelled shepherded incorrectly. Thanks, PeterG). The blog post referred to Karen Padir's keynote on Tuesday morning, but I misquoted her. Karen said "shepherded by Sun Microsystems", not "shepherded by MySQL". I was actually quoting a blog post from the same day which had misquoted Karen as saying "shepherded by Sun/MySQL". Karen, I'm sorry I misquoted you. I let my emotions dictate my words, and I should have been more thoughtful about what I was trying to say. So, what was I trying to say? One of the things the Drizzle project has been successful in doing is opening up the development of the server to a wider audience. The community around Drizzle is extremely active, friendly, and contribution-focused. Ideas are expressed on open, public mailing lists, and all suggestions and input are debated and welcomed in a transparent manner. Although MySQL (note, not Sun Microsystems, but the MySQL engineering group) has tried to "open up" over the last year or so, there is still an undeniable Cathedral approach to software development in that organization. I hope that the MySQL engineering group will continue its movement towards openness, but fundamental change must occur in the mindset and management of the group. I believe that the Drizzle project has shown just how much appetite there is for open discussion and insight into the development of the database server. I simply don't understand why change in the MySQL engineering corps has been so slow. The pace of change frustrated me while I was at MySQL, and it continues to frustrate me as a member of the Sun Drizzle team. In addition to the above, various groups in the MySQL internal teams have actively sought to undermine the Drizzle project and continue to dismiss the project as without technical worth or merit. So, when I heard Drizzle being referred to as "the MySQL Drizzle project", it irked me. If the MySQL engineering group had embraced Drizzle and been active in contributing to it, I wouldn't have any problem with referring to the project in that way. Furthermore, Drizzle has a lot of contributors outside of Sun Microsystems. If MySQL wants to call it the MySQL Drizzle project, then they should be able to call it the Mixi Drizzle project or anything else they want to. Is it true that the Drizzle code base finds its lineage in MySQL? Certainly, but I don't believe at this point that the MySQL group has contributed to Drizzle enough to warrant changing its name. Karen, I hope you accept my apology for misquoting you, and I hope that this blog entry makes more clear what my original post failed to convey. As the leader of the MySQL engineering team, I hope you can find ways in which the database group can play a more active role in contributing to Drizzle. And, of course, I look forward to working with you to help you achieve this goal. Tuesday, April 21. 2009Sorry, but Drizzle is Not...the "MySQL Drizzle Project". Nor is it "shepparded by MySQL". Sorry, Karen, but in the future, MySQL may play more of a role in Drizzle, but currently, it doesn't. And it doesn't by choice. Fear Eats You Up Inside - A Tale of a Team in ChaosThis blog entry is a bit of a written confession from the Sun Drizzle team. It tells the story of the last few weeks, and how we didn't "Do the Right Thing". It's a tale which ends, hopefully, in our redemption through acknowledgement of what happened, a written commitment that it won't happen again, and a bit of a manual on how to have our actions match our team values. What Happened?A few weeks ago, we received an email from Allan Packer, who works in the Performance Engineering group at Sun. The email contained the results of some benchmarks that Allan had performed comparing Drizzle and a version of MySQL 5.1. The results for Drizzle weren't flattering. In fact, at lower concurrencies, the results were downright embarrassing, with some runs showing 5.1 outpacing Drizzle in throughput by 100%.
"How could this be?" we We're All Humans, After AllWhen confronted with a situation where you have been cast in a dim light compared to something else, it's human nature to go into defensive mode. We retreat within our shells, searching for answers and closing out the rest of the world. The Sun Drizzle team closed out others and retreated out of fear, afraid that "if the world knew this about us, everyone would walk away from Drizzle." Each member of the Sun Drizzle team went off in a different direction, trying to uncover why we were seeing the results we did. It was the antithesis of a concerted effort. Basically, everyone went tearing off, running benchmarks, profiling code, looking at call graphs, digging through commits, all without a plan and without the input of the Drizzle community. Suffice it to say that while we learned a bit about what pieces of the code were not the main culprits, the total effort was a chaotic, scrambling, mess. Just like telling a lie eats away at an individual until the truth comes out, fear also tends to rot you from the inside out. When you face your fears, tell the world about them, and ask for help, you turn the fear into something positive. It's taken us a few weeks to realize how silly we had acted, and Brian did a great job on Sunday pulling us together and realizing where we'd strayed from our commitment to open and public discourse. And so, yesterday, I sent an email to the drizzle-discuss mailing list explaining the results of the benchmarks, good and bad, and listed our methodology for running the benchmarks and what we have, so far, ruled out as culprits for the throughput differences we've seen. I wish we had done this on Day 1, after receiving the results from Allan. We would likely have been able to work through ideas much more quickly on the mailing list, and gotten many more suggestions from the community. We've learned our lesson in this regard. A Commitment to Open and Public DiscourseOn Sunday, we sat down and put together a few guidelines that we'll stick to when future situations like this happen — i.e. when someone publishes benchmarks which involve Drizzle or comes up with evidence that we fail in some way compared with another database.
All of the above may seem like common sense, but we figure it's best to put them in writing to remind ourselves when we do, inevitably, encounter similar situations. In SummarySo, this week I'll be working on #2 above; getting documentation up on the Drizzle.org domain about both the results of the benchmarks for all runs, and our methodology. I very much welcome anyone interested in benchmarking to offer suggestions, edit the wiki with input, and join us on the mailing list in our ongoing discussions about what may be the throughput-limiting factors in Drizzle. Thursday, April 2. 2009Towards a New Modular Replication ArchitectureOver the past week, I've been refactoring the way that the Drizzle kernel communicates with plugin modules that wish to implement functionality related to replication. There are many, many potholes in the current way that row-based replication works in Drizzle, and my refactoring efforts were solely focused on three things:
Let me expand on these two goals, and why they are critical to the success of a replication architecture. Simple, Clear Interfaces Designed for the Interface CallerI have a very strong belief that interfaces should always be designed from the perspective of the caller of the interface. By focusing on the caller of the interface, you produce interfaces that are inherently more stable and simpler than if you design an interface with the perspective of the consumer of the interface's objects — or the internal implementation's perspective. If you design an interface from the perspective of how the internals of a system work, then you invariably end up with interfaces that reflect the internal implementation of something. Interfaces should be generalized, not specific to an implementation. I'll expand on this with an example in the next section. Interfaces should be as simple and clear as possible. Why? Because people won't implement the darn interfaces if they can't understand them. Simple as that. Drizzle wants a vibrant ecosystem of plugins and modules. If developers have a difficult time understanding an interface, that means we get fewer plugins, and that just won't do. Interfaces Must Use Documented Data Structures and Standard ContainersThis may seem like a no-brainer, but if you look at the interfaces in MySQL, you will notice that so many of them pass pointers to internal MySQL objects (THD, st_table, TABLE_SHARE, etc). You will also notice that in order to use the interface, implementors must become intimately familiar with the custom iterators (List<Item> List_iterator<>, List_iterator_fast<>, etc) in order to really implement their plugins. Many of these custom objects and containers are poorly or not documented at all. This makes the plugin developer's life harder. This is why there are so very few plugins written for MySQL that have not been written by MySQL engineers who are familiar with the internal implementation of the server. The harder life is for the plugin developer, the smaller the ecosystem of plugins will be. Period. Now, why do I say "only standard containers"? Well, how many of you readers realized that List<Item> was not actually a list of Items; that in fact it is a list of Item pointers? Probably very few. By using the STL and its very-well-documented, standardized container classes in the interface, we remove the need of the plugin developer to have to learn Yet-Another-Custom template implementation. Easier for the plugin developer, so faster time-to-market for plugins. Laying Waste to Tight-Coupling of Replication with "Other Stuff" (Like Logging)It's no secret that MySQL's internal subsystems are very tightly-coupled. Back in 2005, when I wrote the system internals chapter in Pro MySQL, I noted that the internal subsystems of MySQL were interwoven, tightly-coupled, and a bit of a bowl of spaghetti. Very little substantial progress has been made since then on refactoring those systems into separate modules. Replication is no different. In MySQL, the replication subsystem is tightly coupled to a logging subsystem that most readers would be familiar with as "The Binlog". Here's the problem with that: it ties the implementation (the binlog) to the replication interface. Why is this bad? Well, it means that if you want to modularize the replication system, you can't do so without fiddling with the binlog. And likewise, if you want to improve the binlog, you invariably will be affecting the replication system. Tight-coupling of implementation to interfaces is disastrous from an architectural point of view. Touching each runs the risk of breaking the other. When these types of architectural faux-pas litter the code, it means that development and debugging life-cycles are extended well past what they should be, because developers working on one system tend to affect other subsystems that shouldn't really be affected by their changes. And so, one of the big goals of this refactoring work was to produce interfaces which did not interfere with each other. Changing an implementation shouldn't break the whole system. A New, Modular Replication ArchitectureOK, enough about the goals of the new replication architecture. Let's see some diagrams and some code, eh? In Drizzle, we strive to have a clear separation between the core drizzled kernel and modules which implement functionality. The module communicates with the kernel through an API. The goal of the this API is to shield the internal implementation of the core kernel from plugin/module developers. All plugin developers need to know is what will be passed to them from the kernel (documented data structures) and what to provide the kernel back as output (a return code, for instance). The kernel doesn't care how the plugin implements something. All it cares about is that the plugin provide methods which match the API's base class interfaces. There are a number of documented data structures which are used to in the API's calling interface. These data structures are all Google Protobuffer (GPB) Message derived classes. Because they are all GPB derived message classes, they all work in the same way: via the Protobuffer Message API, which is well-documented and automatically handles versioning when changes to these message classes are needed. The current list of Message classes are:
More Event specialty classes will be added as needed, but they will all follow the same interface as each other. In addition to the above data classes, for a replication architecture, what are the basic "worker classes" that plugins may implement? For the first phase of this new replication system, I have three basic class interfaces:
But you might be thinking... Jay, where's the Binlog? There isn't one in the interface. That's the whole point. A "binlog" is an implementation, not an interface. Need a more concrete vision of how a module would use such an interface? OK, let's imagine an example module, which we'll call "RecoveryModule". The purpose of this module would be to provide backup and restore capabilities to Drizzle. Let's see how the interface to the Drizzle kernel would be implemented in the module... The RecoveryModule Implementation OverviewAs noted, the purpose of this module is clear: to provide backup and restore functionality for Drizzle. We'll implement this functionality using a serialized transaction log. Here are the classes which we will map out for the module:
As you can see, all of the classes above extend (or more explicitly, inherit from a publicly defined interface class in /drizzled/plugin/) a base interface class. There is a clear separation of duties. Replicators replicate events. Appliers apply events. Readers read events. No mixing of one into the other. This way, a developer can work on the writing of records to a transaction log (the TrxLogWriter) and another developer can work on the implementation of applying those records during a recovery phase (TrxLogApplier). The two developers don't need to know anything about how the other class is implemented. The simply implement the interface of their class. In fact, the TrxLogWriter implementation might be in an entirely different module! The developer of the TrxLogApplier doesn't care, and isn't affected by that. All she needs to know is the interface of an EventReader and the EventApplier, and the details of the Event data structures (PODs). That's it. Clean separation. Here is a graphical overview of how such a module might look, with a separation of the components of the module, the plugin interface API, and the drizzled kernel. ![]() You will note that I also have put in a couple other classes in the module, called SqlLogWriter, which inherits from EventApplier, and SqlLogReader, which inherits from EventReader. This is to show that the module could implement the MySQL General Log easily using the existing API. A general log is merely an implementation detail. The module developer can create a general log implementation just by specializing the EventApplier class to write, for instance, raw SQL records...
Also note the AsynchReplicator class... Mark Callaghan, you could implement a SemiSynchReplicator if you wanted. Flow of Events from Drizzle Kernel to ModuleSo, how would the flow of events happen from the kernel to a module and back? I've put together a quick diagram of how the API would be used to communicate between the kernel and the module. You can see some API method calls (such as replicateEvents() or registerApplier(). These method names are of course still up for debate, but I think they adequately communicate what the API would look like and the purpose of the interface calls... ![]() OK, so in my next blog post I'll show some example code from my local branch which implements a module similar to the above. I'll end this one now because for the next blog post I need to get some code highlighting done to make it easier to read... Now Is the Time to Influence the Shape of the API
Robert Hodges, I'm lookin' at you, kid!
In all seriousness, this is the time when we are defining the interfaces between the kernel and the modules/plugins. If you have suggestions, want something included in the API, or in general want to tell me I'm full of shit, then join us on the Drizzle Discussion mailing list and let your thoughts be heard! Wednesday, April 1. 2009Libdrizzle Benchmarks - Massive Performance IncreasesLast night and today, I ran a series of benchmarks against Drizzle. These benchmarks were designed to isolate the performance improvement or regression from one change: using Eric Day's new libdrizzle client library instead of the legacy libdrizzleclient library from MySQL. The results are in, and they are stunning. Here is a graph showing the difference between Drizzle sysbench on a readonly workload with the only difference being sysbench using the libdrizzle driver versus using the libdrizzleclient (libmysql) driver for sysbench: As you can see, with libdrizzle, the throughput is dramatically increased, with Drizzle scaling to 8x the number of cores on the benchmark machine before a drop-off in throughput is seen. The grey vertical line in the chart is the number of cores on the machine used. The performance improvement varies from 2.80% improvement at 16 concurrent threads to 29.95% improvement at 128 threads and a whopping 41.16% increase at a mere 2 threads. Overall, there was an average performance increase across all concurrency levels of 16.95%. For a read-write workload, here are the results: Again, we see a massive increase in throughput at all concurrency levels. This time, the performance increase was pretty consistent across the concurrency levels, and we don't see a significant drop-off in throughput until 4x the number of processors. Across all concurrency levels, the performance increase of libdrizzle was 17.62%. What this means for MySQLAs you may or may not be aware, libdrizzle actually is able to communicate with both Drizzle and MySQL. Looking for an easy way to dramatically increase the performance and throughput of your application without tinkering with much stuff? Then get involved in the libdrizzle movement, download one of the connectors and join us on Freenode #drizzle to discuss how you can use and help us test the incredible new asynchronous, non-blocking communication library from Eric Day. What this means for DrizzleIt should be stressed that the above performance improvements are solely due to a replacement of the client library. However, Eric is now pulling apart the Protocol class in the main Drizzle server to take advantage of the asynchronous and non-blocking nature of libdrizzle. This means that you can expect to see some pretty good performance improvements on the server side as well as Eric continues his great work. Please give Eric a bunch of kudos if you see him online and go check out his talks at the MySQL and Percona conferences in a couple weeks. Monday, March 30. 2009Small but steady progress in improving Drizzle performanceWe're making steady progress in removing bottlenecks in the Drizzle code base. So far, a number of mutexes have been removed and we've begin to replace a number of contention points with atomic instructions which remove the need for a lock structure on platforms which support atomic fetch and store instructions. I'm pretty positive about the direction we are going so far. We're seeing the right trends in our scaling graphs, with very little performance drop off in read-only workloads up to 4X the number of cores on the machine, and little performance drop off on the read-write workloads up to 2X the number of cores, as you can see from the graphs below. It's a little difficult to see, but we've made a small but steady improvement from r950 to r968, with numbers increasing around 1-2% across most concurrency levels. You can see the raw numbers here: +--------------------------------+-------+-----+---------+----------+ | config_name | revno | c | tps | rwrps | +--------------------------------+-------+-----+---------+----------+ | drizzle_innodb_readonly_1000K | 950 | 2 | 710.67 | 9949.34 | | drizzle_innodb_readonly_1000K | 950 | 4 | 1163.77 | 16292.81 | | drizzle_innodb_readonly_1000K | 950 | 8 | 1692.59 | 23696.29 | | drizzle_innodb_readonly_1000K | 950 | 16 | 2470.31 | 34584.39 | | drizzle_innodb_readonly_1000K | 950 | 32 | 3104.98 | 43469.73 | | drizzle_innodb_readonly_1000K | 950 | 64 | 3376.98 | 47277.73 | | drizzle_innodb_readonly_1000K | 950 | 128 | 2986.91 | 41816.74 | | drizzle_innodb_readonly_1000K | 950 | 256 | 2657.54 | 37205.47 | | drizzle_innodb_readonly_1000K | 968 | 2 | 712.73 | 9978.25 | | drizzle_innodb_readonly_1000K | 968 | 4 | 1081.72 | 15144.10 | | drizzle_innodb_readonly_1000K | 968 | 8 | 1714.77 | 24006.77 | | drizzle_innodb_readonly_1000K | 968 | 16 | 2480.48 | 34726.77 | | drizzle_innodb_readonly_1000K | 968 | 32 | 3140.16 | 43962.29 | | drizzle_innodb_readonly_1000K | 968 | 64 | 3394.03 | 47516.32 | | drizzle_innodb_readonly_1000K | 968 | 128 | 3008.74 | 42122.30 | | drizzle_innodb_readonly_1000K | 968 | 256 | 2676.62 | 37472.65 | | drizzle_innodb_readwrite_1000K | 950 | 2 | 438.04 | 8322.77 | | drizzle_innodb_readwrite_1000K | 950 | 4 | 720.68 | 13692.98 | | drizzle_innodb_readwrite_1000K | 950 | 8 | 1068.65 | 20304.39 | | drizzle_innodb_readwrite_1000K | 950 | 16 | 1454.71 | 27639.47 | | drizzle_innodb_readwrite_1000K | 950 | 32 | 1699.74 | 32295.02 | | drizzle_innodb_readwrite_1000K | 950 | 64 | 1506.04 | 28614.71 | | drizzle_innodb_readwrite_1000K | 950 | 128 | 1341.46 | 25487.69 | | drizzle_innodb_readwrite_1000K | 950 | 256 | 1157.95 | 22001.16 | | drizzle_innodb_readwrite_1000K | 968 | 2 | 444.10 | 8437.81 | | drizzle_innodb_readwrite_1000K | 968 | 4 | 700.45 | 13308.53 | | drizzle_innodb_readwrite_1000K | 968 | 8 | 1075.59 | 20436.14 | | drizzle_innodb_readwrite_1000K | 968 | 16 | 1457.83 | 27698.76 | | drizzle_innodb_readwrite_1000K | 968 | 32 | 1732.04 | 32908.82 | | drizzle_innodb_readwrite_1000K | 968 | 64 | 1506.98 | 28632.61 | | drizzle_innodb_readwrite_1000K | 968 | 128 | 1355.17 | 25748.31 | | drizzle_innodb_readwrite_1000K | 968 | 256 | 1157.29 | 21988.59 | +--------------------------------+-------+-----+---------+----------+ Drizzle is all about open and transparent, and if you want to know how these numbers are generated, feel free to download the drizzle-automation project, a Python benchmarking and code coverage utility I've written over the past couple weeks. We use standard sysbench modified to use our client driver, with the following configuration in our /etc/drizzle-automation/bench.cnf file: [defaults] # Number of iterations the benchmark process should do for each level # of concurrency iterations= 5 # Comma-separate list of concurrency levels to run concurrency_levels= 2,4,8,16,32,64,128,256 # Options passed as-as to make when building server make_options= -j32 # Options passed as-is to configure when building server configure_options= [drizzle_innodb_readonly_1000K] # The program used to run the benchmark. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed bench_cmd= sysbench # Options given to the benchmark program on every iteration bench_options= --max-time=60 --max-requests=0 --test=oltp \ --drizzle-db=test --drizzle-port=4427 --drizzle-host=127.0.0.1 \ --drizzle-user=root --db-ps-mode=disable --db-driver=drizzleclient \ --drizzle-table-engine=innodb --oltp-read-only=on --oltp-table-size=1000000 # The program used to start the server instance. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed server_cmd= ./drizzled/drizzled # Options passed to the server on startup server_options= --port=4427 --datadir=/tmp --innodb-buffer-pool=4G \ --key-buffer-size=64M --scheduler=multi_thread --innodb_log_buffer_size=512M \ --innodb_additional_mem_pool_size=120M --table_open_cache=4096 & [drizzle_innodb_readwrite_1000K] # The program used to run the benchmark. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed bench_cmd= sysbench # Options given to the benchmark program on every iteration bench_options= --max-time=60 --max-requests=0 --test=oltp --drizzle-db=test \ --drizzle-port=4427 --drizzle-host=127.0.0.1 --drizzle-user=root --db-ps-mode=disable \ --db-driver=drizzleclient --drizzle-table-engine=innodb --oltp-read-only=off \ --oltp-table-size=1000000 # The program used to start the server instance. If you use paths, they # will be relative to the sandbox directory in which the run is # being processed server_cmd= ./drizzled/drizzled # Options passed to the server on startup server_options= --port=4427 --datadir=/tmp --innodb-buffer-pool=4G \ --key-buffer-size=64M --scheduler=multi_thread --innodb_log_buffer_size=512M \ --innodb_additional_mem_pool_size=120M --table_open_cache=4096 & The machine we run the benchmarks on is an Intel Xeon QuadCore, and all tests fit into the available RAM on the box. These benchmarks, which we can run against a single revision or a range of revisions, are invaluable in allowing us to pinpoint the specific cause of a performance regression. Feel free to check out the automation work, contribute to the project, or suggest improvements to it on the Drizzle Discuss mailing list. Wednesday, March 4. 2009LCOV Code Coverage Pages for DrizzleYesterday, Monty and I were fussing around with lcov and genhtml trying to generate code coverage analysis for Drizzle. After a few hours, I was finally able to get some good output, and I've published the results temporarily on my website. We're currently at 70.4% code coverage which is less-than-ideal, but at least we now have a baseline from which to improve. We're all about making incremental improvements, and having statistics to tell us whether we're going in the right direction is important. This is a good first step. So, what exactly do these code coverage numbers mean?OK, so for those readers not familiar with gcov or lcov, here is what these code coverage numbers actually mean... They represent the percentage of executable source lines which are executed during a run of Drizzle's test suite. Basically, the percent gives us a rough idea of the percent of our code paths which are being verified by the test suite. What we do is have GCov produce coverage data during compile for each source code file. We then run our test suite, and then have LCov collect (capture) the source code lines in each file was executed during the run of the test suite. We then call on the genhtml utility to produce pretty HTML pages from the "info files" generated from GCov. Automating LCov ReportsThe goal is to have LCov reports run for each revision in trunk, and have the output automatically rsynced up to the Drizzle.org server. I'm working with Mike Shadle from Intel and Monty on getting this done, along with Doxygen output, and hope to get the automated test coverage reports done by the end of the week.
Part of the automation is producing a text-based report which I can then store in a database and keep historical data about our progress in improving test coverage. Of course, those historical graphs will also be published. How Can You Help Improve Drizzle's Code Coverage?Actually, this is a great starter task for new contributors to get involved in Drizzle development. The way to improve code coverage percentages is to analyze the blocks of code which are not covered in the following way:
Some more details on the above steps will come in another blog post later today... OK, gotta get back to work. Feel free to peruse the LCov HTML reports. There are some interesting things buried in there. Tuesday, March 3. 2009Signup for Drizzle Developer Day - Friday, April 24th - After MySQL Conference 2009We are having a developer day on the Friday after the MySQL Conference and Expo. Space is limited, so please be sure to sign up on the Drizzle.org wiki if you plan to attend. Here are some of the things I will be talking about:
See you there. Tuesday, February 17. 2009A Better Parser Needed?Taking a little break from refactoring temporal data handling this evening, I decided to run some profiles against both Drizzle and MySQL 5.1.33. I profiled the two servers with callgrind (a valgrind tool/skin) while running the drizzleslap/mysqlslap test case. In both cases, I had to make a small change to the drizzled/tests/test-run.pl Perl script.
For the MySQL build, I used the BUILD/compile-amd64-debug-max build script. For Drizzle, I used my standard build process which builds Drizzle with maximum debugging symbols and hooks. It's worth noting that the debug and build process for MySQL and Drizzle are very different, and the MySQL debug build contains hooks to the DBUG library, which you'll notice appear on the MySQL call graphs and consume a lot of the overall function calls. You won't see this in the Drizzle graphs because we do not use DBUG. For all intents and purposes, just ignore the calls to anything in the DBUG library in the MySQL graphs since in a non-debug build all that stuff is NOOPed out... FYI, the drizzleslap/mysqlslap test case is a decent one to run profiling against because it tests a range of different SQL statements in a concurrent environment, something you won't really see in the other tests cases. This is the reason I like using it when profiling with valgrind/callgrind/oprofile... OK, so on to the graphs... Drizzle Callgrind Profile -- The Function Calls
For the log_group_write_buf() function (defined in /storage/innobase/log/log0log.c) , this makes sense: Drizzle's default storage engine is InnoDB (yes, even for the INFORMATION_SCHEMA) and therefore you'll notice this function, which is responsible for writing to one of InnoDB's log group files. More interestingly, there is a fairly alarming 6.45% of the total execution time, for the get_text() static function (defined in drizzled/sql_lex.cc). If we look at this routine, it's fairly clear what the function does — it simply reads an unescaped text literal, without quotes. Nothing particularly fancy, although this comment above the routine might be telling: "Return an unescaped text literal without quotes. Fix sometimes to do only one scan of the string" Perhaps it's time to look into that single-scan thing...or perhaps not. The next "biggie" is the my_mb_wc_utf8mb4() routine, which is called an astonishing 20,779,971 times over the course of the execution of approximately 17,000 statements. How'd I come up with 17,000 statements? I just looked at the number of times DRIZZLEparse() is called, which is close enough...
my_mb_wc_utf8mb4() is called from a number of places, most notably from within the parser and lexer, and when converting to various string classes and primitives. It's really scary that this function is called so many times! Why? Well, one reason is that Drizzle does not support any other character set than UTF8 full 4-byte. Although we support many collations, we don't support the myriad character sets that MySQL does. So, it kind of makes sense that UTF8 routines would be called quite a bit. But...20M executions for 17K statements seems like there is an obvious inefficiency here.
This leads nicely to MySQL. If Drizzle is spending so much time in UTF8 routines, and MySQL doesn't by default use UTF8 as it's character set (and MySQL 5.1.33 doesn't support 4-byte UTF8, it would make sense that MySQL would NOT be spending nearly as much time executing character set conversion routines, right? Well, not so much. MySQL Callgrind Profile -- The Function Calls
What we notice in the MySQL calls is that pthread_getspecific is the number one execution time consumer, followed by a few of those DBUG library calls. I have a suspicion that the pthread_specific calls are actually related to the DBUG library calls, which track debugging information in the threads. I might be wrong about this but given the stark difference between pthread_specific()'s top spot in MySQL's call graph and it not appearing in Drizzle's graph, it makes sense that this is related to the DBUG library. So, I'll ignore it for now. So, after those, you'll notice a bunch of calls to memcpy() and the number of calls to memcpy() very closely matches the number of calls from the Drizzle graph. This makes sense. Drizzle's mechanism for transporting data across the wire and for translating record formats between the database kernel and the storage engine has not yet changed much, and this is where much of the calls to memcpy() are coming from. (this will change with ValueObjects, BTW, but more on that later...) After memcpy(), though, if you scan the function call list, you'll notice that MySQL, even with UTF8 not as the default character set, there is still a whole lot of calls, just like in Drizzle, to various character set routines — notably my_uni_utf8(), my_uft8_uni(), copy_and_convert(), my_mb_wc_latin1(), my_ismbchar_utf8(), and so on. It turns out that if you add up all the character set conversion and comparison routine executions in both MySQL and Drizzle, that all those function calls are taking up more than 12% of the total execution time for both Drizzle and MySQL! Call Trees Seem to Blame the Parser
I'm not going to go on too much further, as it's getting late and I'm tired, but I'm putting the call trees for Drizzle and MySQL for the profiling runs below. I think it's fairly clear that the parser is eating up a large chunk of execution time. Perhaps it's time to look into prototyping and benchmarking other parsers, or at the very least, looking into streamlining the existing parser to be more efficient when it comes to character set routines... Feel free to click on the images below for the fullsize versions. Cheers! -jay Here is the Drizzle call tree:
And here is the MySQL one:
Wednesday, February 11. 2009Drizzle, MySQL, and the mess that is Dates and SQL_MODEThe frustration builds. I have come to despise MySQL's sql_mode. It is a hack of the most gargantuan proportions. Basically, the optimizer just ignores the sql_mode whenever it is convenient for it to do so. More importantly, the optimizer silently ignores bad datetime input in various places. The reason for this is because of my statement above: sql_mode is a big ole' hack. Instead of fixing the runtime executor in MySQL to use real ValueObject types — that are immutable and know how to convert (and not convert) between each other, the runtime is a mess of checks for various runtime codes, warning modes, "count_cuted_field" crap and other miscellany that obfuscates the executor pipeline almost beyond recognition. Slowly, I am attacking the mess, but the executor is so fragile that even tiny changes can wreak havoc on the system, so the going is slow and painful. It's no wonder that the release cycle for the MySQL server is so bloody long... To give you an example of the symptoms which manifest themselves because of this hack, I present the following, executed in MySQL 5.0.67: mysql> create table t1(f1 int, f2 date); Query OK, 0 rows affected (0.03 sec) mysql> insert into t1 values(1,'2005-01-01'),(2,'2005-09-01'),(3,'2005-09-30'), (4,'2005-10-01'),(5,'2005-12-30'); Query OK, 5 rows affected (0.04 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> select @@sql_mode; +------------+ | @@sql_mode | +------------+ | | +------------+ 1 row in set (0.00 sec) mysql> select * from t1 where f2 <= '2005-09-31' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 1 | 2005-01-01 | | 2 | 2005-09-01 | | 3 | 2005-09-30 | +------+------------+ 3 rows in set (0.00 sec) OK, so far, everything is as "expected" when running in a non-strict mode. Even though "2005-09-31" is clearly an invalid date, MySQL in non-strict mode is expected to ignore such triviality and, we can assume, "corrects" the bad date to something else (maybe an integer representation of "2005-09-30"?) The MySQL manual is fairly clear about sql_mode=STRICT_ALL_TABLES and bad datetime values: Strict mode controls how MySQL handles input values that are invalid or missing. A value can be invalid for several reasons. For example, it might have the wrong data type for the column, or it might be out of range. So, we should expect the above statement to throw an error in STRICT_ALL_TABLES, right? Wrong: mysql> set @@sql_mode="STRICT_ALL_TABLES"; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1 where f2 >= '0000-00-00' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 1 | 2005-01-01 | | 2 | 2005-09-01 | | 3 | 2005-09-30 | | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 5 rows in set (0.00 sec) mysql> select * from t1 where f2 > '2005-09-99' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 2 rows in set, 1 warning (0.00 sec) mysql> show warnings; +---------+------+-------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------+ | Warning | 1292 | Incorrect date value: '2005-09-99' for column 'f2' at row 1 | +---------+------+-------------------------------------------------------------+ 1 row in set (0.00 sec) MySQL accepts the bad date, even in STRICT_ALL_TABLES mode, and does the same implicit conversion, giving us a warning when not the zero date but not telling us what conversion was done. Clearly, a conversion happened because MySQL winnowed the result set. But we have no way to know what the winnowing condition was on. Let's take this even further. The docs suggest that STRICT_ALL_TABLES must be coupled with NO_ZERO_IN_DATE if real validation on dates is to occur: Strict mode disallows invalid date values such as '2004-04-31'. It does not disallow dates with zero month or day parts such as '2004-04-00' or “zero” dates. To disallow these as well, enable the NO_ZERO_IN_DATE and NO_ZERO_DATE SQL modes in addition to strict mode. So, maybe with "STRICT_ALL_TABLES, NO_ZERO_IN_DATE" sql_mode, we'll actually get an error... mysql> set @@sql_mode="STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE"; Query OK, 0 rows affected (0.00 sec) mysql> select * from t1 where f2 > '2005-09-99' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 2 rows in set, 1 warning (0.00 sec) mysql> select * from t1 where f2 >= '0000-00-00' order by f2; +------+------------+ | f1 | f2 | +------+------------+ | 1 | 2005-01-01 | | 2 | 2005-09-01 | | 3 | 2005-09-30 | | 4 | 2005-10-01 | | 5 | 2005-12-30 | +------+------------+ 5 rows in set, 1 warning (0.00 sec) mysql> show warnings; +---------+------+-------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------+ | Warning | 1292 | Incorrect date value: '0000-00-00' for column 'f2' at row 1 | +---------+------+-------------------------------------------------------------+ 1 row in set (0.00 sec) Well, the above ridiculousness is the symptom of a hacked-up runtime engine switching on modes, return codes, and warning levels all over the place. The danger in the above is that we don't actually know how the silent truncation/conversion has affected our query results. What if the query results are not what we intended? A warning on bad input is simply NOT enough. Want to see another symptom of this kind of implicit truncation/conversion that is rampant in the runtime? Take a look at the following in MySQL: mysql> create table t1(f1 int, f2 date not null); Query OK, 0 rows affected (0.01 sec) mysql> insert into t1 values(1,'2005-01-01'),(2,'2005-09-01'),(3,'2005-09-30'), (4,'2005-10-01'),(5,'2005-12-30'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> select * from t1 where f2 is null; Empty set (0.00 sec) mysql> explain extended select * from t1 where f2 is null; +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 5 | Using where | +----+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set, 1 warning (0.02 sec) mysql> show warnings; +-------+------+--------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+--------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`f2` AS `f2` from `test`.`t1` where (`test`.`t1`.`f2` = 0) | +-------+------+--------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) If you take a look at the above query, you will notice that the WHERE condition is asking for rows in which f2 IS NULL. However, f2 is defined as DATE NOT NULL. POP QUIZ: Why doesn't the optimizer display a Impossible WHERE in the Extra column and optimize away the query? ANSWER: Because of all those silly silent conversions. MySQL converts "IS NULL" to "= 0" because everything is designed around comparisons to "the zero date". So, even if you have STRICT_ALL_TABLES, NO_ZERO_DATE, NO_ZERO_IN_DATE mode on, MySQL is still using the zero date behind the scenes. Here is the relevant code in sql_select.cc:
/* fix to replace 'NULL' dates with '0' (shreeve@uci.edu) */
else if (
((field->type() == DRIZZLE_TYPE_DATE) || (field->type() == DRIZZLE_TYPE_DATETIME))
&& (field->flags & NOT_NULL_FLAG)
&& ! field->table->maybe_null)
{
COND *new_cond;
if ((new_cond= new Item_func_eq(args[0],new Item_int("0", 0, 2))))
{
cond= new_cond;
/*
Item_func_eq can't be fixed after creation so we do not check
cond->fixed, also it do not need tables so we use 0 as second
argument.
*/
cond->fix_fields(session, &cond);
}
}
Suffice it to say, this "fix" is gone now from Drizzle, and we're back to producing the appropriate plan: drizzle> use test Database changed drizzle> create table t1(f1 int, f2 date not null); Query OK, 0 rows affected (0.02 sec) drizzle> insert into t1 values(1,'2005-01-01'),(2,'2005-09-01'),(3,'2005-09-30'), (4,'2005-10-01'),(5,'2005-12-30'); Query OK, 5 rows affected (0.01 sec) Records: 5 Duplicates: 0 Warnings: 0 drizzle> select * from t1 where f2 is null; Empty set (0.00 sec) drizzle> explain extended select * from t1 where f2 is null; +----+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+ | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE | +----+-------------+-------+------+---------------+------+---------+------+------+----------+------------------+ 1 row in set, 1 warning (0.00 sec) drizzle> show warnings; +-------+------+-----------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+-----------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`f2` AS `f2` from `test`.`t1` where 0 | +-------+------+-----------------------------------------------------------------------------------+ 1 row in set (0.00 sec) The deeper I dig, the more scared I become. Thursday, February 5. 2009Where Drizzle Is SucceedingOver the past few weeks, I have been happy working on Drizzle. Why have I been happy? Is it because of some new incredible code that will revolutionize the database industry? Nope. Is it because we've been able to remove all the issues that plague the server core? Nope. Is it because I see Drizzle quickly morphing into a modular, standard-conforming super-kernel? Nope. So, why am I joyous? To paraphrase the late Charlton Heston: "[Drizzle] is people!" Recently, I've seen the fruit that transparent, open source development bears. This fruit takes the form of engaged, motivated, and humble individuals who wish to make their mark on a project. Whether it's on IRC on #drizzle, the drizzle-discuss mailing list (now with 354 active members), or via the platform which Launchpad.net provides our community, I've seen new developers scrambling to pick up blueprint tasks, tackle bugs (minor and major), and stamp their footprint on the code base. With each new face comes an entirely new perspective, a new angle, a different set of skills and experience. And I'm taking the time to chat and learn with each of them. It's a humbling experience for me, as I learn from each person who visits the ever-growing IRC channel and mailing list. It doesn't matter if it's the sage advice of folks like MySQL's Mats Kindahl, Bernt Johnsen and Roy Lyseng, or database veteran Jim Starkey. It doesn't matter if the new face is a college student wishing to help in any small way they can. Everyone makes a difference in their own way. So, just like Monty Widenius says about his new company, that all employees will share in the profit, so is the case with Drizzle, and truly open source development projects. Those who contribute share in the project's success and stamp their mark, forever, on its direction and shape. It is this fact that propels me in coding, and gives me joy when I log into IRC in the mornings. Just thought I would share that happiness. Cheers.
Posted by Jay Pipes
in MySQL, PostgreSQL, Drizzle, C/C++, Launchpad
at
10:55
| Comment (1)
| Trackbacks (0)
Monday, January 12. 2009More on Day Numbers - The Actual CodeLike I mentioned in my previous article, I've been working on refactoring the temporal data handling in Drizzle. The major problem I've been dealing with is poorly or non-documented code. The lack of documentation has led me to rely on the MySQL Manual in some cases, and then additional research and lastly, my own intuition as to what was going on. One of the earliest cases of me saying "WTF?" was when I was investigating how day numbers were calculated. Here is the original, unmodified code from MySQL (/libmysql/my_time.c:746-778). I've highlighted in blue the massive amount of comments explaining the inner workings of the function and what it is doing.
/*
Calculate nr of day since year 0 in new date-system (from 1615)
SYNOPSIS
calc_daynr()
year Year (exact 4 digit year, no year conversions)
month Month
day Day
NOTES: 0000-00-00 is a valid date, and will return 0
RETURN
Days since 0000-00-00
*/
long calc_daynr(uint year,uint month,uint day)
{
long delsum;
int temp;
DBUG_ENTER("calc_daynr");
if (year == 0 && month == 0 && day == 0)
DBUG_RETURN(0); /* Skip errors */
delsum= (long) (365L * year+ 31*(month-1) +day);
if (month <= 2)
year--;
else
delsum-= (long) (month*4+23)/10;
temp=(int) ((year/100+1)*3)/4;
DBUG_PRINT("exit",("year: %d month: %d day: %d -> daynr: %ld",
year+(month <= 2),month,day,delsum+year/4-temp));
DBUG_RETURN(delsum+(int) year/4-temp);
} /* calc_daynr */
I tend to like code that is well-commented. And by well-commented, I don't mean comments that simply repeat an if condition or state the obvious. I mean, of course, comments that elucidate non-obvious parts of the code. The non-obvious parts of the above code are, well, most of it. Let's start at the top and I'll let you know why this little bastard of a function caused me so much pain. OK, so the function header comment says that the function "Calculate nr of day since year 0 in new date-system (from 1615)". Well, knowing even the small amount of stuff I knew about calendars before I really got into this stuff, I knew that there was no "year 0" in a calendar. So...I immediately suspected something was up. But, I made a big mistake at this point. I read the end of the comment "new date-system (from 1615)" as meaning that this function calculated the Gregorian Day Number. I thought since October 15, 1582 was close to the year 1615, that the function author meant the Gregorian calendar. Of course, I was wrong. The function actually is intended to calculate the number of days since "year 0" of the SQL date range. Unfortunately, the SQL calendar contains years 1 - 9999. There is no "year 0". So, I now have to figure out why this function considers 00.00.00 to be a real date. I have yet to figure this out. Perhaps it is a throwback to the MySQL days of February 31st, I don't really know. Moving on through the function, the code contains a number of lines which just baffled me given the lack of comments. For instance:
delsum-= (long) (month*4+23)/10;
What is the number 23 doing? Is it a presumed constant in a known formula? Is it an adjustment number? Who knows... And here: temp=(int) ((year/100+1)*3)/4; The line above seems to be trying to calculate the number of leap days (which are any day divisible by 4 and if divisible by 100 must also be divisible by 400.) But without comments, it's tough to tell... OK, so with a little bit of throwing my hands up, I move on to the function which takes a day number and converts it to a date... Day Number to Date?And, going the other way around, here is how a date is calculated from a day number (/sql/time.cc:142-185). Again, the voluminous comments are highlighted.
/* Change a daynr to year, month and day */
/* Daynr 0 is returned as date 00.00.00 */
void get_date_from_daynr(long daynr,uint *ret_year,uint *ret_month,
uint *ret_day)
{
uint year,temp,leap_day,day_of_year,days_in_year;
uchar *month_pos;
DBUG_ENTER("get_date_from_daynr");
if (daynr <= 365L || daynr >= 3652500)
{ /* Fix if wrong daynr */
*ret_year= *ret_month = *ret_day =0;
}
else
{
year= (uint) (daynr*100 / 36525L);
temp=(((year-1)/100+1)*3)/4;
day_of_year=(uint) (daynr - (long) year * 365L) - (year-1)/4 +temp;
while (day_of_year > (days_in_year= calc_days_in_year(year)))
{
day_of_year-=days_in_year;
(year)++;
}
leap_day=0;
if (days_in_year == 366)
{
if (day_of_year > 31+28)
{
day_of_year--;
if (day_of_year == 31+28)
leap_day=1; /* Handle leapyears leapday */
}
}
*ret_month=1;
for (month_pos= days_in_month ;
day_of_year > (uint) *month_pos ;
day_of_year-= *(month_pos++), (*ret_month)++)
;
*ret_year=year;
*ret_day=day_of_year+leap_day;
}
DBUG_VOID_RETURN;
}
And that's about the point in time I gave up trying to figure out the existing code. Sunday, January 11. 2009So, What Is a Day Number?Over the past three weeks, I've been refactoring the way that Drizzle handles temporal, or time-related, values. It's been a fun little adventure into an area of the server which hasn't changed much in a long time. Today, I wanted to share a couple things I've learned about calendars, MySQL, and dealing with date-related calculations. To start the conversation, imagine a SQL request like the following:
SELECT TO_DAYS("2008-08-11");
My guess is that many people don't really understand what is going on behind the scenes in the SQL server when they issue a request like the above. Believe it or not, there is quite a bit of calculation going on for a simple request like the above. Furthermore, there are a number of caveats that revolve around the ranges of dates that the SQL server operates on. To demonstrate, without looking at the MySQL manual, can you answer the following questions?
The above questions, and their answers, expose a number of issues with temporal values. Let's take each question in order. What is "day number zero" according to the TO_DAYS() function?OK, so if you answered 0000-01-01 for question #1, you would be correct. The TO_DAYS(date) function returns the number of days from the "date" 0000-01-01 up until date. Now, why would I put the term "date" in quotes? Well, it's because 0000-01-01 isn't actually an accurate date. "Wait a minute, Jay, of course 0000-01-01 is a real date! It's the first date of the first year in the 'modern' era...you know, A.D.!", you are probably saying. But, the fact is that there isn't such as date in existence. Or, more precisely, there is no way to accurately know what date was the first day of the year of the 'modern era'. In fact, there isn't a reliable way of telling what a "date" is if that date occurred before October 15th, 1582. October 15th, 1582, is the start of the Gregorian calendar, which is the calendar system used by the "modern world". This calendar takes into account adjustments for leap years and leap seconds (yes, there is something called a leap second). Before this date, it's not possible to use a Gregorian date such as 0301-03-23 to reliably determine calendar information for that date. The reason is that the dates October 5 - 14, 1582 do not exist. These dates were removed from the Gregorian calendar to "make up for" leap years and seconds up until the start of the Gregorian calendar. Given the above information, check out the results of the below statement.
mysql> select to_days("1582-10-08");
+-----------------------+
| to_days("1582-10-08") |
+-----------------------+
| 578094 |
+-----------------------+
1 row in set (0.00 sec)
If the dates 5-14 of October, 1582 don't exist, how can the above function return a value? Weird, huh? Well, kind of. The thing is, the TO_DAYS() function and it's sister FROM_DAYS() operate on a pseudo-calendar that represents the SQL range of dates, which is year 0 to year 9999. This pseudo-calendar simply assumes that up until October 15th, 1582, each year was comprised of 365.25 days. Which leads us nicely to our second question... Does TO_DAYS() take into account leap years?
Here, the answer depends. Can you use the results of the TO_DAYS() function to calculate intervals between dates?The answer is both yes and no. For reliable calculations, the dates must be in the Gregorian calendar. If one date is not in the Gregorian calendar and one date is, the results may not be accurate. Speaking of accuracy...The astute among you may have noticed something peculiar about the result of the SQL statement above. I just got finished saying that TO_DAYS() considers all years before 1582 as containing 365.25 days. If that is so, we can calculate the TO_DAYS("1582-10-15") using the following calculation: >>> days_in_years= (1582 * 365.25) >>> days_in_months= (31 + 28 + 31 + 30 + 31 + 30 + 31 + 31 + 30) >>> days_in_october= 15 >>> to_days= days_in_years + days_in_months + days_in_october >>> print to_days 578113.5 Strangely, this is what I get via MySQL:
mysql> select to_days("1582-10-15");
+-----------------------+
| to_days("1582-10-15") |
+-----------------------+
| 578101 |
+-----------------------+
1 row in set (0.00 sec)
That's a difference of 12.5. Is this a bug? Is it an adjustment for the missing days? Probably, but it's also not a bug that is likely to surface in an application. The reason is that FROM_DAYS() and TO_DAYS() are generally used in combination and the bug in the code exists in both functions, therefore they usually offset each other. The second reason is that TO_DAYS() returns an abstract number of days. It's not used to calculate intervals between dates unless that interval is in the modern era, like this: SELECT * FROM Orders WHERE TO_DAYS(CURRENT_DATE()) - TO_DAYS(order_created) >= 7; In this case, both abstract numbers returned by TO_DAYS() will be accurate relative to each other since both dates are in the Gregorian calendar and it is the difference between the two dates which is used in the range calculation. I'll leave you today with a couple oddities of the FROM_DAYS() and TO_DAYS() functions in MySQL:
mysql> select to_days("0000-01-01");
+-----------------------+
| to_days("0000-01-01") |
+-----------------------+
| -32212253 |
+-----------------------+
1 row in set (0.00 sec)
mysql> select to_days("0000-10-01");
+-----------------------+
| to_days("0000-10-01") |
+-----------------------+
| 274 |
+-----------------------+
1 row in set (0.00 sec)
mysql> select from_days(to_days("0000-10-01"));
+----------------------------------+
| from_days(to_days("0000-10-01")) |
+----------------------------------+
| 0000-00-00 |
+----------------------------------+
1 row in set (0.00 sec)
Monday, January 5. 2009Seven Things About Me You May Not Know (And Probably Won't Care About)Due to the efforts of Brian Moon and Michelangelo van Dam, I've been sucked into a meme started by Tony Bibbs. My initial reaction to this unfortunate event was ... (envision Steven Colbert, hands raised...) "Noooooo!!!" But I got over it. Hey, it's the holiday season, I might as well be a good boy and fulfill the modern-day geek's equivalent of a chain letter. So, without further ado, here is my list of seven things about me you probably could care less about and will skip over to see if you are on my list of tagged people. (Yeah, you know you will.)
Here are the folks I'd like to know a little more about:
And, of course, the rules in case anyone missed them:
Thursday, December 18. 2008I Didn't Say "Screw Windows"A number of commenters on my previous entry thought I was basically saying "Screw Windows". Lukas Smith and Bill Karwin, both of whom I respect enormously, noted that Windows is a dominant development platform for MySQL users, and that one of the reasons for MySQL's popularity was that it runs smoothly (for a while now) on Windows. Bill and Lukas: You are 100% correct. That said, what I wrote was this: Forget Windows for now: Use open source, community-maintained, and standardized libraries within the kernel. Don't rewrite libc and various other quality open source libraries because of Not Invented Here syndrome or because Windows lacks these things. Focus on the standards and don't bother with platforms that don't conform to POSIX. If Microsoft wants future MySQL versions to run on its platforms, partner with Microsoft and have them do the port. While you're at it, drop support for old platforms like Netware and other crap that is obselete. The above is advice to the team in charge of re-architecting the database server. What it boils down to is this: focus first and forement on using standard, community-maintained open source libraries and on creating a neatly-architected, clean and lean kernel. After that, worry about Windows ports. So, it's not about kicking Windows to the curb. It's about priorities. The priority should be clean code. Because having clean, easy-to-understand code leads to:
I posit that the above three things would make a Windows port much easier, and cleaner. More developers able to contribute, less time trying to figure out the spaghetti, and less chance the port would break tangential pieces of code. And thus, I believe a solid Windows port of a core kernel would be easier and faster to complete once a POSIX-compliant and clean kernel is completed. |