Comments on Code rant: The Database As Queue Anti-Pattern

You really want to make sure that both the data yo...

2019-11-04T07:34:45.662+00:00

You really want to make sure that both the data you are storing and sending the message (or storing a message to be sent) are in the same TRANSACTION, otherwise you might end up storing data and the process never sends the message (or vice versa).

Here is a great presentation on the topic:

https://vimeo.com/111998645

The article is bang on - it only looks easy to cre...

2019-10-18T18:00:29.617+01:00

The article is bang on - it only looks easy to create queuing systems in a database.

Message brokers are not storing messages in a single shared table and performing searches through them - messages are going to specific queues. In other words, a broker does not have to search for messages to process.

A message broker can be used as a front-end to a database. This provides the ability to disconnect your applications performance from user load - if all of your database operations are driven out of message queues, your application will perform at a consistent level regardless of workload. Even better is if all of your workload, not just database operations, is driven out of message queues.

Monitoring progress using a message queue is super easy - have the processing jobs post messages to a monitoring queue which is consumed by your monitoring application.

Finally, using a message broker allows you to perform all processing asynchronously, yet allows callers to simulate synchronous calls if they would like to. If a client wants to execute a job synchronously, the producer for that job creates a callback queue and a consumer to monitor it. It then passes the callback queue name into the request. The consumer of that job sends completion messages to that callback queue on completion. When done properly, the resulting data can be fed back into the callers output stream - in other words, the caller appears to have made a synchronous call, but it actually was processed asynchronously.

The hardest part about adopting a message broker is convincing people that it is easier than creating your own. And it is actually is easier and far more useful in the long run. A message broker will quickly become a central aspect of almost all of your most complex processing problems.

Try it, you will like it!

That is exactly my thoughts on the subject. Unfort...

2016-11-11T19:15:42.298+00:00

That is exactly my thoughts on the subject. Unfortunately I've inherited a terrible J2EE app that does exactly this. What a nightmare to fix and maintain!

Anonymous: 'Messaging systems take far more ti...

2016-02-05T15:36:17.788+00:00

Anonymous:
'Messaging systems take far more time to implement (read: have a higher impact on time to market) than simply doing it in the database, and if you're using a database engine that doesn't suck you're not going to have a performance problem.'

You refer to 'SQL Server' as the DB that sucks, yet you sing the praises of MySQL - I fell out of my seat laughing at that one.

Both your comment and the post seem based on outdated knowledge of SQL Server, and as others have pointed out most people with a modicum of knowledge would know to use Message Broker (or other message queue) for this sort of thing, and as has been pointed out, it doesn't rely on the definitely not good method of polling like some annoying kid wondering 'are we there yet'.

Almost only quotes "When all you know" d...

2015-06-05T21:40:57.053+01:00

Almost only quotes
"When all you know" deeply doesn't include "SQL Server" you can be sure that
"SQL Server is very efficient at insertions, or updates, or queries, but rarely all three on the same table."
and
"sharing a database between applications (or services) is a bad thing"

Made a queue in postgres the other day, spawned 10...

2015-05-18T21:51:56.100+01:00

Made a queue in postgres the other day, spawned 100 workers, pdo errored on to many connections and everything locked .. googled .. found this.. Spot on..

If your message needs to belong to a particular un...

2015-03-01T12:38:39.103+00:00

If your message needs to belong to a particular unit of work you must store it with that unit of work and forward it unless your messaging service supports distributed transactions.. Even then distributed transactions can be kinda gross under failure scenarios. Even in this case you may still want to forward into a messaging service for aggregation, routing, and integration. But yeah, as others have pointed out, sometimes you problem IS a nail and you need the RDBMS hammer.

Your mom is an anti-pattern

2014-09-17T04:10:32.138+01:00

Your mom is an anti-pattern

While I agree that it is certainly overused, for E...

2014-09-16T13:12:05.120+01:00

While I agree that it is certainly overused, for ETL code, it is generally preferable that the code runs entirely self-contained. Recently, I wrote some speed improvements that replaced a CURSOR loop with a table and MSSQL Jobs enabling the task to run in parallel for the Staging part. The request database, which we have no control over, force us to iterate over the inputs. By using a table as a task, it ensures that the ETL is self-contained and mobile - I can be up and running in a minutes and be almost guaranteed it will work.

For client applications, I prefer queues and please don't mention MSMQ, that thing is the anti-Christ.

I don't agree in general. Oracle AQ is a table...

2014-09-16T08:21:18.748+01:00

I don't agree in general. Oracle AQ is a table-based queue implementation that uses some advanced internal features that have been made available through general SQL in 10g (I believe) via the FOR UPDATE SKIP LOCKED clause.

While I wouldn't pass millions of message types through AQ, AQ is certainly capable of providing an easy transactional queue implementation for basic needs.

I recently posted an answer to a question on stack...

2014-01-30T09:44:43.472+00:00

I recently posted an answer to a question on stackoverflow in defense of messaging vs db integration if you’re interested:

http://stackoverflow.com/a/19229234/569662

It’s the first time I’ve been able to clearly articulate it.

RE: “When all you have is a hammer, every problem ...

2013-12-17T18:57:37.062+00:00

RE: “When all you have is a hammer, every problem looks like a nail.”

A good RDBMS is a toolbox. An ignoramus sees a hammer.

oversimplification aside, did you just equate work...

2013-07-01T18:06:12.374+01:00

oversimplification aside, did you just equate workflow to queue management?

Queues can be a way to implement portions of workflows, but semantically speaking, a workflow is more than just an assembly of queues.

There are at least two poor arguments against this...

2013-03-10T07:12:57.273+00:00

There are at least two poor arguments against this article:

1. A message queue needs a persistent store, therefore using a message queue means I am using an RDBMS.

Of the 500 ways to poke a hole in that argument, I will choose the easiest. Some message queues do *allow* you to use an RDBMS as the backing store (for instance, ActiveMQ on MySQL), it is always a less efficient choice than highly tuned persistence mechanisms you could be using (like KahaDB). RabbitMQ doesn't allow you to even make a choice of your store. Making the assumption that data storage for message queues is similar to data storage in a typical database is why database-as-queue is an anti-pattern.

2. If you are saying "well, I can design around this limitation or plan for this scale," you are missing the point. Yes, you *can* be successful using a database as a queue. You *can* put months of effort into tuning and plenty of money scaling up hardware. You *can* design an elegant retry mechanism, the ability to wiretap messages, apply messaging patterns, content-based-routing, or whatever you can imagine. Does it mean you should? By the time you are done(and have spent precious time fixing the bugs), you end up with what a message queue would have given you off the bat.

Put simply, an RDBMS is the right tool for some problems, but an insufficient solution for others. A message queue is a solution designed to fix a problem similar to the one described. To attempt to solve it with an RDBMS is not necessarily doomed to failure, but you are not using the tool best suited to the problem.

So, the next step is to draw up a Data Flow Diagra...

2013-01-17T11:45:38.467+00:00

So, the next step is to draw up a Data Flow Diagram to understand how the system should be refactored :)

Forgot to subscribe to replies...

2013-01-15T16:34:16.246+00:00

Forgot to subscribe to replies...

"...the fastest route through your system is ...

2013-01-15T16:33:06.929+00:00

"...the fastest route through your system is the sum of all the (long) intervals."

Shouldn't that read the *slowest* possible route through your system. That's analagous to hitting every red light on the way to work right as they turn red.
The fastest possible route would be no wait at all, like hitting every single green light on the way to work.

I would say, the much more useful statistic is that the average time through the system would be the sum of half of each polling interval.

Regarding performance of a DB-driven queue, why is there an assumption that every exclusive lock will be a table lock? SQL Server's default behavior is row locks until the lock escelation threshold is met (default 1024). That means dozens of updates could be performed on a table, and dozens of other processes could be reading from it at the same time. WITH(READPAST) will ignore rows with locks on them.

Realisitically speaking, I rarely have more than one consumer process looking at the table anyway so locks are a non-issue. All the examples I've seen seem to imply that most database queues will have many concurrent workers. Is that usually the case for others?

I think the people talking about how you could use...

2012-11-06T09:30:35.584+00:00

I think the people talking about how you could use an RDBMS for queuing, or that some messaging systems use an RDBMS for persistence have missed the point somewhat. In those cases your queuing infrastructure is encapsulated and extracted away from your business logic. In the example given your business logic is directly coupled to your queuing mechanism.

These two things are separate concerns and should be treated as such. It's easy to arrive at this design as it makes so much sense when you first start building your system. Things (let's say orders) arrive in the system and have a status of 0. You pick up all the orders at 0 and do something with them and update them to status 10. Repeat until they're at status 100 aka 'complete'.

That could even work long-term if you don't need to scale. If however the number of orders (or whatever it is you're processing) increases you will hit a wall. It's very likely that this kind of table will require multiple indices on it, it's also certain that it will be being constantly inserted into, updated and read from. That is not a good combination for database performance. The article mentions 'clearing down' records, but often in these scenarios it's extremely unclear when this should happen as there are often business requirements about needing access to these records for a long time ('our customers need to be able to request a refund on their orders after they are complete').

I think though that the killer blow in the article is this line 'sharing a database between applications (or services) is a bad thing.' Yes, yes it is. This will slow your development and change becomes harder and harder as your applications ossify from the shared schema up.

How do you think persistent queues are implemented...

2012-10-31T20:54:34.295+00:00

How do you think persistent queues are implemented ? I am marking this article as a FAIL!

Thomas Kejser, formerly of the SQLCAT team, blogge...

2012-05-29T02:33:59.926+01:00

Thomas Kejser, formerly of the SQLCAT team, blogged about this and how to eliminate the problems for creating queue tables in SQL Server:

http://blog.kejser.org/2012/05/25/implementing-message-queues-in-relational-databases/

It's a good read for anyone that is interested in how to scale this.

Using a database for integration isn't alway...

2012-05-14T05:52:28.224+01:00

Using a database for *integration* isn't always a good idea but using it as a queue may be fine.

It has to do with *how* you use it. What you are describing is most definitely not the correct way to implement the 'queue' concept.

However, using a single table to store messages and having a single process access that queue (even with threading) in a FIFO way where messages being processed are removed from the queue would work just fine. And it would be using the table as q real queue.

A queue should *never* be queried. Another reason why your example isn't really a queue.

The workflow type scenario you describe is fine if the processing of 'messages' does not happen based on the quasi-queue table. The table you have represents the state, which is fine. However, proper queueing techniques should be used for the actual messages even if the transport is a sql table.

Moreover, SQL Server has special support for "...

2012-05-11T14:56:27.172+01:00

Moreover, SQL Server has special support for "queuing tables": WITH(READPAST, UPDLOCK) helps you run multiple "consumers" on a queue.

On the other hand, as we found out, SQL Server's default statistics update is very bad for this sort of behavior: If a table goes from zero to more than zero records or vice versa, this will *always* trigger a statistics recomputation, and all your plans are gone ... So, disabling auto statistics on such tables is a must.

hmm... and a reliable queue has to use something ...

2012-05-02T23:58:40.245+01:00

hmm... and a reliable queue has to use something transactional and durable in which to keep the state of the queue ! The point might be to delegate the use of the database to the queue-ing system instead of trying to do it yourself (b/c it's a lot more involved than it seems at first)

Rant indeed. No, this is a very useful pattern. ...

2012-05-01T18:50:53.154+01:00

Rant indeed. No, this is a very useful pattern. What you have posted is as though you just discovered proper message queues. The other commenters have pretty well documented some of the advantages of this GOOD pattern.

We all do not have the luxury of having access to ...

2012-05-01T14:18:58.454+01:00

We all do not have the luxury of having access to Messaging, Memcached and other integration solutions so we have to stick with a database.

We have used the "Poor Man's Job Queue" (http://ssmusoke.wordpress.com/2012/04/10/the-poor-mans-job-queue/) and it has held up to some peak uploads

Comments on Code rant: The Database As Queue Anti-Pattern

You really want to make sure that both the data yo...

The article is bang on - it only looks easy to cre...

That is exactly my thoughts on the subject. Unfort...

Anonymous: 'Messaging systems take far more ti...

Almost only quotes "When all you know" d...

Made a queue in postgres the other day, spawned 10...

If your message needs to belong to a particular un...

Your mom is an anti-pattern

While I agree that it is certainly overused, for E...

I don't agree in general. Oracle AQ is a table...

I recently posted an answer to a question on stack...

RE: “When all you have is a hammer, every problem ...

oversimplification aside, did you just equate work...

There are at least two poor arguments against this...

So, the next step is to draw up a Data Flow Diagra...

Forgot to subscribe to replies...

"...the fastest route through your system is ...

I think the people talking about how you could use...

How do you think persistent queues are implemented...

Thomas Kejser, formerly of the SQLCAT team, blogge...

Using a database for *integration* isn't alway...

Moreover, SQL Server has special support for "...

hmm... and a reliable queue has to use something ...

Rant indeed. No, this is a very useful pattern. ...

We all do not have the luxury of having access to ...

Using a database for integration isn't alway...