IF YOU POST ABOUT SEARCH BEING BROKE, I'll ban you...

Reply Subscribe

Thread Tools

Search this Thread

Feb 22, 2006 | 02:25 PM

#201

NoGamesLS1

TECH Resident

iTrader: (16)

Joined: Jul 2005

Posts: 800

Likes: 0

Every viagra contains one drop of Mr. T's sweat

Apr 20, 2006 | 12:35 PM

#202

King James

TECH Resident

iTrader: (13)

Joined: Feb 2004

Posts: 838

Likes: 0

From: Terre Haute, IN

Quote:

Originally Posted by Brains

Update:

LS1Tech and myself personally have split the cost of a new development server. The components arrived yesterday, and I spent yesterday evening assembling the new machine and installing the operating system. This system will be used to develop and stress test any and all new code destined for use on this (and affiliate) site(s).

Search is priority one. As is, adding in more posts than a couple months worth really kills site performance -- more than it already is hurting. Some of you have caught wind of my efforts to port vBulletin to a more high-performance and enterprise-scale database server (PostreSQL). This work is more or less finished, but needs to be final bug tested, load tested, and then transferred to the production servers. LOTS of work, but it IS progressing.

Any new updates? The search being down sucks ***!

Apr 20, 2006 | 11:51 PM

#203

blacksnake

On The Tree

Joined: Jul 2005

Posts: 160

Likes: 0

From: New Jersey

I personally have not seen any ETA on the return of the full search function.

May 17, 2006 | 01:30 AM

#204

horist

TECH Senior Member

Joined: Nov 2001

Posts: 7,036

Likes: 1

From: Lake Zurich, IL

TTT any updates? site is running fast.. but search sux ***

May 17, 2006 | 01:56 PM

#205

Brains

Thread Starter

TECH Senior Member

iTrader: (7)

Joined: Jan 2002

Posts: 12,754

Likes: 0

From: Katy, TX

Here's the latest "State of the Union" on search..

First off, let me state that I've been incrementally and slowly adding posts into the existing search index while monitoring server performance. I don't want to rob Peter to pay Paul, so to speak.

Second, here's where we stand overall. Based on the post-***** test session we had when the server blew up a while back, I'm confident the PostgreSQL port of vBulletin is solid. The only variable beyond that has been search, since I turned it off for the duration of that test (since there were no posts indexed). Here's a breakdown of what has transpired since then:

1. Loaded 3.5M posts into the "vBulletin" style post indexing scheme, under Postgres. This turned out to work reasonably well, but was somewhat hit or miss. It didn't tear down the server (since PG has MUCH more intelligent query and resource scheduling) but the search results wouldn't return in a consistant timeframe. Sometimes took 2 seconds, some took 2+ minutes, for similar type searches.

2. Wiped out the vB style index, and indexed the tables using the tSearch2 "FullText" index plugin for PG. This required some hacking of the vBulletin search subsystem to get working, since PG does fulltext a little differently than MySQL. End result, was it worked a little faster than #1 but still inconsistant in performance.

3. Went looking at the actual data and query structure of the search, and the PHP code that controlled everything. Found out some HUGE problems with the way Jelsoft wrote the search system, mostly based on trying to offer too many options that truly don't make a lot of sense for us. The most glaring is the idea of cacheing the search results, so if someone else searches the same thing 20 minutes later, we can re-use the search results from the first guy. In theory, this sounds like a good idea. In practice, its pretty half-baked. First, what is the percentage of people will search for the exact same thing? Not very high, I looked. Less than 1%. So even if it DID gain a ton of performance (it really doesn't, I'll explain why in a bit) there's very little chance it would actually be used. That brings us to the next stage.

4. The vB PHP code does some very innefficient things, due to the framework being written to support saved searches. The biggest glaring thing is basically running every search TWICE (or more). Sure, some of that is handled by the DB engine query cache, but that also introduces its own issues. First off, if you expand the number of search results, you increase the amount of cache needed. This is why the current limit is 250 results. Any more than that, and it won't fit in the query cache at all, and it has to be paged to disk. That basically halts everything, since you're still fighting normal posting and reading queries. Once things start to back up, the site grinds to a halt. Now do that multiple times. Very bad. So, by removing saved searches it only runs once and alleviates quite a bit of that right off the bat. The next problem, is the search engine actually runs the exact same search query every time you either return to the results page, or click between pages. So again, if the query doesn't fit in the query cache it has to be paged to disk, or run from scratch. Not very efficient.

5. This is the stage I'm in the middle of right now, which is a 100% total rewrite of the search engine. Without laying out all the details, it basically works by running the initial query, and storing the results list in a search results table. Since this table is nice and small (n results * x searches in the past y minutes) requerying it based on page movements or returning to the results list is incredibly fast. With realistic limits (1000 results, 30 searches per minute, 5 minute window) the table would grow to a fuzzy max of 150,000 rows. Thats nothing for a database to query on, and even cache. There's still no cheap way of getting away from the initial impact of running the search, which can take up to 2 minutes in some cases. The only realistic way to speed that up is with a custom built search server/cluster, which spreads the database over as many disks as possible. The more disk spindles there are, and the faster they spin, the faster it can scan all that data. Its not cheap to build and host a box like that obviously, so we'll do the best we can with the resources that make the most sense. If some 6 word, obscure search back to the beginning of time takes a couple minutes, so be it. Most typical 1 to 3 word searches complete in mere seconds, so most searches would never take that hit anyway.

May 17, 2006 | 02:11 PM

#206

Nautilus

9 Second Club

iTrader: (15)

Joined: Nov 2002

Posts: 3,132

Likes: 0

From: VT/SC

I have no f*#king clue what you just said.... But thank you for your hard work anyway...

May 17, 2006 | 02:44 PM

#207

NOSjohn

Restricted User

iTrader: (9)

Joined: Apr 2004

Posts: 2,692

Likes: 0

He said, he's looked at it, analyzed how it's working with various parameters and is now trying to optimize how it works with his own coding.

May 17, 2006 | 02:57 PM

#208

MrDude_1

TECH Junkie

iTrader: (4)

Joined: Feb 2002

Posts: 3,368

Likes: 5

From: Charleston, SC

cool.. thanks for the update.. makes perfect sense to me.

i have a smaller site, but i also noticed the search saving.. i thought it was kinda dumb, but i dont really have a need to re-write it...