Advanced search and query language. Search engine language
The query language is an artificially created programming language used to make queries in databases and information systems.
In general, such query methods can beclassify, depending on whether they serve for a database or for information retrieval. The difference is that requests for such services are made to obtain actual answers to the questions posed, while the search engine tries to find documents containing information relating to the region of interest to the user.
Database
The query languages for databases include the following examples:
- QL - object-oriented, refers to relational databases; successor to Datalog.
- Contextual (CQL) is a formal query representation language for information retrieval systems (such as web indexes or bibliographic directories).
- CQLF (CODYASYL) - for CODASYL-TYPE databases.
- Concept-oriented query language (COQL) -is used in the respective models (com). It is based on the principles of data modeling construpt and uses such operations as projection and de-projection of multidimensional analysis, analytical operations and conclusions.
- DMX - used for data mining models.
- Datalog is the language of queries to deductive databases.
- Gellish English is a language that canUsed for queries in Gellish English databases and allows you to conduct dialogs (queries and answers), and also serves for information modeling of knowledge.
- HTSQL - translates http requests to SQL.
- ISBL - used for PRTV (one of the first relational database management systems).
- LDAP is a protocol for requests and directory services that runs over TCP / IP.
- MDX is required for OLAP databases.
Search engines
The language of search queries, in turn,is aimed at finding data in search engines. It differs in that often the requests contain plain text or hypertext with additional syntax (for example, "and" / "or"). It differs significantly from standard similar languages, which are regulated by strict rules of command syntax or contain positional parameters.
How are search queries classified?
There are three broad categories thatcover most of the search queries: information, navigation and transactional. Although this classification was not fixed theoretically, it was empirically confirmed by the presence of actual queries in the search engines.
Information requests are those thatcover broad topics (for example, a particular city or model of trucks) for which thousands of relevant results can be obtained.
Navigation - these are queries that search for one site or a web page on a specific topic (for example, YouTube).
Transactional - reflect the intention of the user to perform a certain action, for example, make a purchase of a car or book a ticket.
Search engines often support the fourthtype of request, which is used much less often. These are so-called connection requests that contain a report on the connectivity of the indexed web graphics (the number of links to a specific URL, or how many pages are indexed from a specific domain).
How is the information retrieval performed?
Most search resources do not disclose theirsearch logs, so the information that users are searching on the web is very difficult to find. Nevertheless, the first scientific research appeared in 1998. Later, a follow-up survey was conducted in 2001, which analyzed queries that were displayed as highly relevant. It also became clear how the search engine uses query language.
Interesting characteristics related to web search became known:
The average length of the search query was 2.4 words.
- About half of the users sent one request, and a little less than a third of users made three or more unique queries one by one.
- Almost half of users viewed only the first one or two pages of the results.
- Less than 5% of users use advanced search capabilities (for example, the choice of certain categories or search in the search).
Features of custom actions
The study also found that 19% of queriescontained a geographical term (for example, names, postal codes, geographical objects, etc.). It is also worth noting that, in addition to short queries (that is, with several conditions), predictable schemes were often present, according to which users changed their search phrases.
It was also found that 33% of requests fromone user is repeated, and in 87% of cases the user will click on the same result. This suggests that many users use repeated requests to review or re-find information.
Frequency Query Distributions
In addition, specialists confirmed thatthe frequency distributions of queries correspond to a power law. That is, a small part of the keywords is observed in the largest list of requests (for example, more than 100 million), and they are most often used. The rest of the phrases in the same subjects are used less often and more individually. This phenomenon was called the Pareto principle (or "rule 80-20"), and it allowed search engines to use optimization methods such as database indexing or partitioning, caching and proactive downloading, and also improved the language of search engine queries.
In recent years, it has been revealed that the average lengthqueries has been growing steadily over time. So, the average query in English became longer. In this regard, Google introduced an update called "Hummingbird" (August 2013), which is capable of processing long search phrases with non-speech, "spoken" query language (like "where is the nearest coffee house?").
For longer queries, their processing is used - they are divided into phrases, formulated in the standard language, and the answers to the different parts are displayed separately.
Structured queries
Search engines that support logicaloperations and syntax, use more extended query languages. A user who searches documents covering several topics or faces can describe each of them by the logical characterization of the word. At its core, the logical query language is a collection of specific phrases and punctuation marks.
What is advanced search?
The query language of "Yandex" and "Google" is capable ofTo carry out more narrowly focused search under certain conditions. Advanced search can search for part of the page name or header prefix, as well as in certain categories and lists of names. It can also restrict the search for pages that contain specific words in the title or are in certain topic groups. With the correct use of the query language, it can process parameters an order of magnitude more complex than the surface results of most search engines, including those given by the user with words with a variable ending and a similar spelling. When you display the results of an advanced search, a link to the relevant sections of the page will be displayed.
It is also possible to search for all pages,containing a certain phrase, while with a standard query, search engines can not stop on any page of the discussion. In many cases, the query language can lead to any page located in the noindex tags.
In some cases, a well-formed query allows you to find information containing a number of special characters and letters of other alphabets (Chinese characters for example).
How are the characters of the query language read?
Upper and lower case, as well as somediacritical marks (umlauts and accents) are not taken into account in searches. For example, the search for the keyword Citroen does not find pages containing the word "Citroen". But some ligatures correspond to individual letters. For example, a search for the word "aeroscope" will easily find pages containing "Ereskebing" (AE = Æ).
Many non-alphanumeric characters are constantlyignored. For example, it is impossible to find information on a query containing the string | L | (the letter between the two vertical bars), despite the fact that this symbol is used in some conversion templates. In the results there will be only data from "LT". Some characters and phrases are processed differently: the request “credit (Finance)” will display articles with the words “credit” and “finance”, ignoring parentheses, even if there is an article with the exact name “credit (Finance)”.
There are many functions that can be used with the query language.
Syntax
The query language "Yandex" and "Google" canuse some punctuation marks to refine your search. An example is braces - {{search}}. The phrase enclosed in them will be searched entirely, unchanged.
The phrase in double quotes allows you to decide onobject of search. For example, a word in quotation marks will be recognized as being used in a figurative sense or as a fictional character, without quotation marks - as more documentary information.
In addition, all major search engines support the “-” symbol for a logical “not”, as well as and / or. An exception is terms that cannot be separated by a hyphen or a dash.
Inaccurate matching of the search phrase is marked.symbol ~. For example, if you do not remember the exact wording of a term or name, you can specify it in the search bar with the specified symbol, and you will be able to get results that have maximum similarity.
Customized search options
There are also search options such as intitle,and incategory. They are colon-separated filters in the form of “filter: query string”. The query string can contain the search term or phrase, or part or the full name of the page.
Function "intitle: query ”gives priority to search results by name, but also shows the usual results on the content of the title. Several of these filters can be used simultaneously. How to use this opportunity?
Request of the form "intitle: airport name ”will display all articles with the name of the airport in the title. If we formulate it as “parking intitle: the name of the airport”, then you will receive articles with the name of the airport in the title and with mention of parking in the text.
Search by filter “incategory: Category ”works on the principle of the initial issuance of articles belonging to a particular group or list of pages. For example, a search query like “Temples incategory: History” will produce results on the subject of the history of temples. This function can also be used as an extended one by setting various parameters.