Slow advanced search

Hey!

I have successfully imported around 10000 items to Omeka. But now I have encountered a problem of slowness while doing advanced search. Everything is nicely fast with the regular search, doing the simple query on one of the data types is very slow.

It's impossible to do searches over two letters. If I do search for "Language" "that contains" "EN" it takes half a minute, but with larger strings, it just hangs. And also other parts of Omeka doesn't work as well. So I have to restart Apache..

It shouldn't be much more complicated than the simple search which works fine..

Any ideas where to look?

Thanks!
Timo

Thanks for bringing this to our attention. By any chance, do you have access to the slow query log on your server, or could you activate one? This would basically create a log of all queries to the database that exceed a reasonable time, which seems to be the problem. If we could identify those queries, then we could look at optimizing them in the software.

Instructions for doing this are also here: http://blog.highspeedweb.net/2008/07/18/linux-hosting-optimization-slow-query-log-for-mysql/, but they'd require some hefty admin access to your server.

If this is too difficult, perhaps you can turn on the basic SQL logger that comes packaged with Omeka. You'll edit the /application/config/config.ini file, changing "log.errors = false" to "log.errors = true", renaming /application/logs/errors.log.empty to application/logs/errors.log, and making it writable by the server. The next time you load a page in Omeka, all the SQL statements the system uses should be logged in that file.

Hello!

Thanks for answer! I turned on the slow query logging and this is the one that takes the most time:

# Time: 091214 13:44:40
# User@Host: omeka[omeka] @ localhost []
# Query_time: 308 Lock_time: 0 Rows_sent: 1 Rows_examined: 50841202
SELECT COUNT(DISTINCT(i.id)) FROM argosomeka_items AS i WHERE (i.public = 1) AND (i.id IN ( SELECT i.id FROM argosomeka_items i
LEFT JOIN argosomeka_element_texts etx
ON etx.record_id = i.id
LEFT JOIN argosomeka_record_types rty
ON etx.record_type_id = rty.id
WHERE etx.text LIKE '%case%'
AND rty.name = 'Item'
AND etx.element_id = 39 ));

The weird thing is when I add the same search value to "Search by keyword" in advanced search then is fast again. It seems that searching only specific field without the keyword makes it slow.

Hope this helps!
Timo

Looks like this query is probably slow because it is searching 50M rows. How many items do you have in your database? Another thing you can do to help us out is to run an EXPLAIN query for this one and post the results here. You can do that by pasting the query into MySQL but adding EXPLAIN to the front, so it looks like: EXPLAIN SELECT ...

Also, could you give us a bit more detail about when the search is fast and when it's slow? When you say, searching the specific field without the keyword is slow, but searching with the keyword is fast, do you mean that searching on the keyword *and* the specific field is fast, or just searching on the keyword? Thanks.

I think the speed might increase if we index the public column in the items table.

I think the speed might also increase if avoid using the IN clause and instead use a join.

I did some tests with the following results:

Total Number of Items: 2679
Total Number of Element Texts: 28,957
Total Number of Items with 'Rea' In An Element Text: 8

Original (9,306,846 rows, .22 sec):

SELECT COUNT(DISTINCT(i.id)) FROM omeka_items AS i WHERE (i.public = 1) AND (i.id IN ( SELECT i.id FROM omeka_items i
LEFT JOIN omeka_element_texts etx
ON etx.record_id = i.id
LEFT JOIN omeka_record_types rty
ON etx.record_type_id = rty.id
WHERE etx.text LIKE '%Rea%'
AND rty.name = 'Item'
AND etx.element_id = 50 ));

Option 1 (28,616 rows, .03 sec):

SELECT COUNT(DISTINCT(i.id)) FROM omeka_items AS i INNER JOIN
(SELECT DISTINCT(etx.record_id) FROM omeka_element_texts etx
LEFT JOIN omeka_record_types rty
ON etx.record_type_id = rty.id
WHERE etx.text LIKE '%Rea%'
AND rty.name = 'Item'
AND etx.element_id = 50) as bob
ON bob.record_id = i.id
WHERE (i.public = 1)

Option 2 (3,577 rows, .02 sec):

SELECT COUNT(DISTINCT(i.id)) FROM omeka_items i
INNER JOIN omeka_element_texts etx
ON etx.record_id = i.id
INNER JOIN omeka_record_types rty
ON etx.record_type_id = rty.id
WHERE etx.text LIKE '%Rea%'
AND rty.name = 'Item'
AND etx.element_id = 50
AND i.public = 1

Hey!

It's fast when I specify both the keyword AND the meta-data field. And also it's fast with just the keyword query and simple search as well.

I have 11000 items and 108000 element_texts.

The result of EXPLAIN query as CSV:

1,PRIMARY,i,ALL,,,,,11391,Using where
2,DEPENDENT SUBQUERY,rty,const,"PRIMARY,name",name,767,const,1,
2,DEPENDENT SUBQUERY,etx,ref,"record_id,record_type_id,element_id",element_id,4,const,4030,Using where
2,DEPENDENT SUBQUERY,i,eq_ref,PRIMARY,PRIMARY,4,func,1,Using where; Using index

Greets!
Timo

@wanderingwill: Is it something I could change in the core files for now that would make it faster? Could You give me a hint how to do it?

Thanks!

Hello,

Sorry to try to reawaken this thread, but I was wondering if a solution to this problem was found?

I am having a similar issue since I have 8133 items in Omeka with 147,990 element_texts. Quick search is fast, but Advanced Search is really slow - taking minutes for some queries - and it's not easy to see how to modify the underlying SQL to make it faster!

Any help or advice would be most appreciated.

Thanks,
Dee

Even with over 8,000 items, Omeka's advanced search should be faster than that on most servers.

One thing you can try to do is "optimize" your Omeka database tables. Over time, database files can get stored out-of-order and in fragments on the disk. Optimizing re-arranges your Omeka database so that it can process queries more efficiently.

From phpMyAdmin, you can do this on the "Structure" tab for your database. At the bottom of the page, you can click "Check all", then pick "Optimize table" from the drop-down box.

If you don't have phpMyAdmin, you can do the same thing from the command line. Using the same values from your Omeka db.ini, you can run the following command:

mysqlcheck -u <username> -p -o <database name>

It will prompt you for your database password, then optimize the tables.

@John Flatness,

Thanks for the prompt reply!

I think I may have improved the speed of my Advanced Search.

I wanted to disable the default stop word list in my MySQL, since some of these stop words were important in the site's search. So I took steps to do that, the restarted the MySQL server and repaired the fulltext search indexes of the omeka_element_texts table. The mysql command below:

REPAIR TABLE omeka_element_texts QUICK

Afterwards, the stop words were gone AND my Advanced Search seems to work!

I will also optimized the tables as you have suggested though, although so far so good. I'll report back with any further developments.

Thanks a million,

Dee

I had the same problem with really slow queries in advanced search (links on subject fields in item views), the repair table option from John did worked fine, it's now really fine.

We had a similar problem after loading about 8,000 new records, though the slowness occurred when navigating through the items/browse pages. One page to the next would take about 25 seconds, according to the MySQL slow query log.

Here's a typical slow query:

SELECT i.* FROM omeka_items AS i INNER JOIN omeka_collections AS c ON i.collection_id = c.id LEFT JOIN omeka_item_orders AS io ON i.id = io.item_id WHERE (c.id = 26) GROUP BY i.id ORDER BY ISNULL(io.order), io.order ASC, i.id DESC LIMIT 1;

Doing an "explain" command on a typical slow query revealed a NULL possible_key/key on the 'omeka_item_orders' table. So we ran the following command:

alter table omeka_item_orders add index item_id_index (item_id);

...and it instantly sped things up. Omeka Devs, please reply if you think this is inadvisable, but for now we're going to leave it as is (fast!).

Jeremy

I take it you have the Item Order plugin installed?

I'm not personally familiar with that plugin, but I assume this was just an oversight, and adding proper indexing sounds like the right course.

I'll file an issue against the plugin. An updated version would probably add the same or similar index, so be prepared to undo your change or have a possible conflict if you upgrade the plugin.

John,

Yes, sorry to have left that out...we do have the Item Order plugin installed. We'll definitely proceed carefully upon upgrading that plugin.

Jeremy

Hello,

We also seem to have some severe speed problems with the advanced search (in our omeka 2.0 installation). An advanced search on 1 field costs up to 15 seconds.

We have 50.000 items in our database with 950.000 element texts. We are supposed to add many more over the next years (about 100.000 items)

I identified 2 queries that take very long.
Next I will post the 2 queries that take very long:

---------- 5.2 seconds execution time:
2013-02-21T13:24:50+01:00 DEBUG (7): SELECT items.* FROM omeka_items AS items
LEFT JOIN omeka_element_texts AS et_sort ON et_sort.record_id = items.id AND et_sort.record_type = 'Item' AND et_sort.element_id = 40
INNER JOIN omeka_collections AS collections ON items.collection_id = collections.id WHERE (collections.id = '1') AND (items.id IN (
SELECT etx.record_id FROM omeka_element_texts etx
WHERE etx.text = 'mop'
AND etx.record_type = 'Item'
AND etx.element_id = 107 )) GROUP BY items.id,
items.id ORDER BY IF(ISNULL(et_sort.text), 1, 0) ASC, et_sort.text ASC LIMIT 50

--------- ~ 5 seconds execution time:
SELECT COUNT(DISTINCT(items.id)) FROM omeka_items AS items
LEFT JOIN omeka_element_texts AS et_sort ON et_sort.record_id = items.id AND et_sort.record_type = 'Item' AND et_sort.element_id = 40
INNER JOIN omeka_collections AS collections ON items.collection_id = collections.id WHERE (collections.id = '1') AND (items.id IN (
SELECT etx.record_id FROM omeka_element_texts etx
WHERE etx.text = 'mop'
AND etx.record_type = 'Item'
AND etx.element_id = 107 ))

What can I do to take this time down? Especially for the item count I think this search time is not necessary. Also, when we add another 100.000 items I fear that the whole system will become totally useless.

Iwe

Have you tried OPTIMIZE, as mentioned earlier in this thread, particularly on the element_texts table?

The process of adding, and particularly of editing items, tends to cause a great deal of churn in the element_texts table.

I have tried this but it makes no measurable difference at all. This is probably because I imported the 50.000 items in one go.

The 2 queries I showed earlier are still taking more than 5 seconds (each) on my local system, and more than 2 seconds on our production server. Are there any other possible things I could do to make this better? And what can you say about the prospect of having more than 100.000 items in the database?

Copying one of those queries and manually running it with "EXPLAIN" prepended should give a clue of what's making this run so slowly for you.

My assumption is that it's the subquery created by the search for "mop" that's the bottleneck here, but we'll see.

For the sake of comparison, the following would be an equivalent query with a join instead of a subquery:

SELECT items.* FROM omeka_items AS items
LEFT JOIN omeka_element_texts AS et_sort ON et_sort.record_id = items.id AND et_sort.record_type = 'Item' AND et_sort.element_id = 40
INNER JOIN omeka_collections AS collections ON items.collection_id = collections.id
INNER JOIN omeka_element_texts etx ON etx.record_id = items.id AND etx.record_type = 'Item' AND etx.element_id = 107
WHERE collections.id = 1 AND etx.text = 'mop'
GROUP BY items.id
ORDER BY ISNULL(et_sort.text) ASC, et_sort.text ASC LIMIT 50

Thanks for the EXPLAIN tip. I didn't know about this function. I will do some tests this week and post the results here. The last query you gave was faster but still took almost 2 seconds.

I suspect also that there have been some modifications to the database since the first beta version of Omeka that I havent implemented. I will make a completely new installation of 2.01, import all the data and do some new speed tests.

To be continued...

I think I have identified the source of the problem.

We had to import about 3000 images, video's and audio files. These files already had some metadata in the previous database so I bluntly copied them into the element_texts table, the ugly way I suppose.
About 20.000 queries like these:
INSERT INTO omeka_element_texts (record_id, record_type, element_id, html, text) VALUES ('2897', 'File', '43', '0', 'ScreenSlechteGRap.jpg')

Up until this time it was fast as a bullet. I'm not sure what to do to speed up the database again. I guess I messed up some kind of lookup. Any suggestions on how to fix this?

Just doing what you say shouldn't really have affected anything much differently than adding the same number of element texts "normally."

Can you share the EXPLAIN output for one of the slow queries? That'll help give an idea of what's making it so slow.

Since you already tried OPTIMIZE TABLE, you might also want to take a look at possible performance improvements for MySQL and InnoDB specifically. You could also try looking at tools that offer automated suggestions, like mysqltuner. In particular, I'd look at innodb_buffer_pool_size.

All searches on a specific field are slow so here's an EXPLAIN on a search for "TM 8071" on element_it 49.

Query 1 (fetching)

EXPLAIN SELECT items . *
FROM omeka_items AS items
WHERE ( items.id IN (
SELECT etx.record_id
FROM omeka_element_texts etx
WHERE etx.text = 'TM 8071'
AND etx.record_type = 'Item'
AND etx.element_id =49))
GROUP BY items.id
ORDER BY items.added DESC
LIMIT 25


id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY items ALL NULL NULL NULL NULL 51772 Using where; Using filesort
2 DEPENDENT SUBQUERY etx ref record_type_record_id,element_id,text record_type_record_id 156 const,func 7 Using where

Query 2 (counting)

EXPLAIN SELECT COUNT( DISTINCT (
items.id) ) FROM
omeka_items AS items
WHERE (items.id IN (
SELECT etx.record_id
FROM omeka_element_texts etx
WHERE etx.text = 'TM 8071'
AND etx.record_type = 'Item'
AND etx.element_id = 49 ))

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY items range NULL PRIMARY 4 NULL 46038 Using where; Using index for group-by (scanning)
2 DEPENDENT SUBQUERY etx ref record_type_record_id,element_id,text record_type_record_id 156 const,func 7 Using where

Interesting.

Both of these queries look like they're doing a full or nearly full scan of the "items" table, about 50,000 rows. The element_text lookup isn't actually being used to limit the number of items considered. If my math is right, MySQL is looking through 8,076,432 rows to complete the top query (the 156 element texts rows once for each item).

In this case, an "OPTIMIZE/ANALYZE" on the items table may do you some good, in the immediate term, if you'd previously been focusing on the element_texts table.

We'll look at these to see what we can do in terms of indexing and eliminating subqueries to speed things up. Can you also post EXPLAIN for the follwing "inner join" version of your query?

EXPLAIN SELECT items . *
FROM omeka_items AS items
INNER JOIN omeka_element_texts etx
ON etx.record_id = items.id AND etx.record_type = 'Item' AND etx.element_id = 49
WHERE etx.text = 'TM 8071'
GROUP BY items.id
ORDER BY items.added DESC
LIMIT 25

Here it is:

EXPLAIN SELECT items. *
FROM omeka_items AS items
INNER JOIN omeka_element_texts etx ON etx.record_id = items.id
AND etx.record_type = "Item"
AND etx.element_id =49
WHERE etx.text = "TM 8071"
GROUP BY items.id
ORDER BY items.added DESC
LIMIT 25

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE etx ref record_type_record_id,element_id,text text 62 const 16 Using where; Using temporary; Using filesort
1 SIMPLE items eq_ref PRIMARY PRIMARY 4 verhalenbank_omeka.etx.record_id 1

That version should be significantly faster: is it?

That output estimates MySQL as examining 16 rows, rather than the 8 million of the previous one.

Yes, this last query is extremely fast (0.0015 sec).

Optimizing and analyzing generates no measurable change in speed otherwise.

(Sorry for the slow communication, I guess we stop working when you start across the ocean)

We're working on some changes to make the items advanced search use the second form, using joins instead of subqueries. MySQL does particularly poorly with the "IN (SELECT ...)" style we're currently using.

You can take a look at the recent change to this behavior in the development version, or keep up with our occasional efforts on this issue generally on the issue tracker.

Thanks. I'm updating our installation from git every time there is a change. I suppose at some point I guess we'll see a sudden speedup in search actions.

If you'd like some more data on queries on large systems like ours, just ask.

Do you think you can give an estimate on when this will be addressed? I am thinking about starting my own endeavor on the database code myself to speed things up. Some sorting and searching in multiple metadata fields still takes up to 20 seconds.

But I'm kind of afraid I will miss all the other good stuff that you will do to that code.

The new joining-based queries are in: that change was part of Omeka 2.0.2.

Are you on 2.0.2 and still seeing the same slowness?

I thought I noticed some good speed ups on most searches (but I thought it was because I made some more indexes.

Now I find that when I do an advanced search on keywords like this: http://bookstore.ewi.utwente.nl/vb/items/browse?search=kaas+eten+water+dagelijks&advanced%5B0%5D%5Belement_id%5D=&advanced%5B0%5D%5Btype%5D=&advanced%5B0%5D%5Bterms%5D=&range=&collection=1&type=&user=&tags=&public=&featured=&geolocation-address=&geolocation-latitude=&geolocation-longitude=&geolocation-radius=10&geolocation-radius=10&exhibit=&submit_search=Zoeken

The query that performs slow is:

SELECT items.* FROM omeka_items AS items
INNER JOIN omeka_collections AS collections ON items.collection_id = collections.id
INNER JOIN (
SELECT etx.record_id AS item_id
FROM omeka_element_texts etx
WHERE etx.record_type = 'Item'
AND etx.text LIKE '%kaas water dagelijks%' UNION SELECT tg.record_id AS item_id FROM omeka_records_tags AS tg
INNER JOIN omeka_tags AS t ON t.id = tg.tag_id WHERE (t.name LIKE 'kaas') OR (t.name LIKE 'water') OR (t.name LIKE 'dagelijks') OR (t.name LIKE 'kaas water dagelijks') AND (tg.record_type = 'Item')) AS s ON s.item_id = items.id WHERE (collections.id = '1') GROUP BY items.id ORDER BY items.added DESC LIMIT 25

With EXPLAIN:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY collections const PRIMARY PRIMARY 4 const 1 Using index; Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 2413
1 PRIMARY items eq_ref PRIMARY,collection_id PRIMARY 4 s.item_id 1 Using where
2 DERIVED etx ref record_type_record_id record_type_record_id 152 651947 Using where
3 UNION t range PRIMARY,name name 768 NULL 4 Using where; Using index
3 UNION tg index tag tag 160 NULL 653791 Using where; Using index; Using join buffer
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL

I got the same slowness with the COUNT query that comes after. (sorry for the mess)

Hmm, okay.

We haven't paid nearly as much attention to the keyword search for items since we added the sitewide search.

The old keyword search is bad in different ways from the "specific fields" one: as you can see, it does a subquery and it also adds a UNION and searches for tags within the given keyword string.

There's probably some pretty effective changes that could be made here also.

That's understandable. Busy times as it seems.
I found out that the all the rest is really a lot faster now.

I'll keep posting my speed findings here so you can see what to do with them.

I opened an issue for the slowness on simple search.

There's a commit, similar to the previous one for the advanced search, on there now, in master (along with a commit to fix an accidental testing statement that was left in the first one).

The LIKE %some term% search will always be comparatively slow because it doesn't take advantage of indexes, but this gets rid of the UNION and the subquery and should reduce the number of rows being examined.