JavaScript字符串搜索算法詳解

符串搜索(查找)概述

字符串查找也叫字符串搜索或字符串匹配，就是從一段文本中查找一小段文本，返回完整匹配的位置。字符串查找的算法有很多種，如：Boyer-Moore算法、Rabin-Karp算法、KMP算法等。最好理解的是樸素搜索法，也就是窮舉比較，其算法復雜度接近于：O(N * M)。這里以樸素搜索為例來引入門。

步驟是：

1. 建立兩個循環，外循環是被查找的文本，內循環是查找字符串；

2. 將查找字符串逐個與被查找的文本對比，當遇到有不相等時，跳出內循環，文本指針向后移動一位，從下一個開始比較；

如果內循環遍歷完成后，還沒有不相等的情況，則表示匹配成功，返回當時文本內容的下標，否則返回-1。

樸素算法執行過程分析：

樸素搜索算法代碼實現：

function find(str, content) {

var i, conetentLen=content.length

var j, strLen=str.length

// 兩個循環，外層是被查找文本，內循環是查找字符串

for (i=0; i < conetentLen; i++) {

for (j=0; j < strLen; j++) {

// 當遇到不有不相等時，跳出從文本下一個字符開始比較

if (str[j] !==content[i + j]) {

break

}

// 如果查找字符串全部比較完成表示成功匹配

if (j===strLen) {

return i

}

// 如果文本全部比較完還是沒有查找到，則返回-1

return -1

}

find('ABC', 'ABABC') // 2

find('AAB', 'AAABC') // 1

find('ABC', 'AABAC') // -1

少老師在寫文章中都會遇到這樣一個難題：如何找到文檔里的特定文本？今天胖胖老師推薦給大家一款好用的工具“AnyTXT search”。

什么是AnyTXT Searcher

AnyTXT Searcher是功能強大的本地數據全文搜索引擎，就像本地磁盤Google搜索引擎一樣。它是您理想的桌面內容搜索工具。

AnyTXT Searcher內置了一個功能強大的文檔解析引擎，該引擎無需安裝任何其他軟件即可提取常用文檔的文本，并結合了內置的高速索引系統來存儲文本的元數據。您可以使用AnyTXT Searcher快速找到計算機上存在的任何單詞。它可以在Windows 10、8、7，Vista，XP，2008、2012、2016等操作系統上完美運行。

支持的格式

純文本格式（txt，cpp，html等）
Microsoft Outlook（eml）
Microsoft Word（doc，docx）
Microsoft Excel（xls，xlsx）
Microsoft PowerPoint（ppt，pptx）
便攜式文件格式（pdf）（測試版）
電子書格式（mobi，epub等）即將推出…

特色功能

支持Microsoft Office（doc，xls，ppt）
支援Microsoft Office 2007（docx，xl??sx，pptx，docm，xlsm，docm）
支持PDF（測試版）
非英文文件支持
全文搜索
實時搜索（測試版）
SSD優化
快速索引
快速搜尋
多國語言支持
AES256加密

上手體驗

軟件直接點擊安裝或是綠色版加壓就可以使用，初次使用會自動建立動態索引（不超過1分鐘），之后還會根據文件創建情況自動更新索引。

檢索支持多種語言文本，檢索結果會自動呈現文字預覽效果，方便查詢。

點擊工具欄還可以實現不同文檔類型的快速查找，進一步提高檢索的準確率。

更重要的是這樣一款搜索神器，內存占用不到30M，即便是十年前的老電腦也可以輕松運行（當然，固態硬盤+酷睿10代更香更迅速）。

小結

這款好用的軟件是免費軟件，值得大家體驗。記得回復“查找”，獲取軟件下載鏈接哦。

今天的分享就到這里，祝大家周末愉快！

句話介紹

Web端最快且最具內存靈活性的全文搜索庫，零依賴性。
Github地址：https://github.com/nextapps-de/flexsearch

github截圖

中文翻譯介紹

在原始搜索速度方面，FlexSearch優于每一個搜索庫，并提供靈活的搜索功能，如多字段搜索，語音轉換或部分匹配。根據使用的選項，它還提供最高內存效率的索引。 FlexSearch引入了一種新的評分算法，稱為“上下文索引”，基于預先評分的詞典字典體系結構，與其他庫相比，實際執行的查詢速度提高了1,000,000倍。 FlexSearch還為您提供非阻塞異步處理模型以及Web工作者，以通過專用平衡線程并行地對索引執行任何更新或查詢。

安裝

可以到官網下載經過壓縮的js文件或者使用cdn，也可以使用npm安裝

//使用最新版:
<script src="https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@master/dist/flexsearch.min.js"></script>
//或者特定版
<script src="https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.3.51/dist/flexsearch.min.js"></script>

npm安裝

npm install flexsearch

用法

創建一個索引

var index=new FlexSearch();
//或者
var index=FlexSearch.create();
//或者給定一個默認值
var index=new FlexSearch("speed");
//自定義配置
var index=new FlexSearch({
 // default values:
 encode: "balance",
 tokenize: "forward",
 threshold: 0,
 async: false,
 worker: false,
 cache: false
});
//在或者
var index=new FlexSearch("memory", {
 encode: "balance",
 tokenize: "forward",
 threshold: 0
});

將文本添加到索引

Index.add(id, string)

index.add(10025, "John Doe");

搜索

Index.search(string | options, <limit>, <callback>)

index.search("John");

限制數量

index.search("John", 10);

異步搜索

//基于回調函數
index.search("John", function(result){
 
 // array of results
});
//基于Promise
index.search("John").then(function(result){
 
 // array of results
});
//es6寫法
async function search(query){
 const result=await index.search(query);
 
 // ...
}

自定義搜索

index.search({
 query: "John",
 limit: 1000,
 threshold: 5, // >=threshold
 depth: 3, // <=depth
 callback: function(results){
 // ...
 }
});
//或者
index.search("John", {
 limit: 1000,
 threshold: 5,
 depth: 3
 
}, function(results){
 
 // ....
});

分頁

var response=index.search("John Doe", {
 limit: 5,
 page: true
});
index.search("John Doe", {
 limit: 10,
 page: response.next
});//下一頁

建議

獲取查詢建議

index.search({
 query: "John Doe",
 suggest: true
});

啟用建議后，將填寫所有結果（直到限制，默認為1000），并按相關性排序相似的匹配。

更新

Index.update(id, string)

index.update(10025, "Road Runner");

移除

Index.remove(id)

index.remove(10025);

重置索引

index.clear();

銷毀

index.destroy();

重新初始化

Index.init(<options>)

//使用相同配置重新初始化
index.init();
//使用新的配置重新初始化
index.init({
 /* options */
});
//重新初始化會銷毀舊的索引

獲取長度

var length=index.length;

獲取寄存器

var index=index.index;

寄存器的格式為“@”+ id

請不要手動修改寄存器，用作只讀即可

添加自定義匹配器

FlexSearch.registerMatcher({REGEX: REPLACE})

為所有實例添加全局匹配：

FlexSearch.registerMatcher({
 '?': 'a', // replaces all '?' to 'a'
 'ó': 'o',
 '[?úù]': 'u' // replaces multiple
});

為特定實例添加私有匹配：

index.addMatcher({
 '?': 'a', // replaces all '?' to 'a'
 'ó': 'o',
 '[?úù]': 'u' // replaces multiple
});

添加定制編碼

通過在索引創建/初始化期間傳遞函數來分配自定義編碼

var index=new FlexSearch({
 encode: function(str){
 
 // do something with str ...
 
 return str;
 }
});

編碼器函數獲取一個字符串作為參數，返回修改后的字符串

直接調用自定義編碼器：

var encoded=index.encode("sample text");

注冊全局編碼

FlexSearch.registerEncoder(name, encoder)

所有實例都可以共享/使用全局編碼

FlexSearch.registerEncoder("whitespace", function(str){
 return str.replace(/\s/g, "");
});

初始化索引并分配全局編碼

var index=new FlexSearch({ encode: "whitespace" });

直接調用全局編碼

var encoded=FlexSearch.encode("whitespace", "sample text");

混合/擴展多個編碼

FlexSearch.registerEncoder('mixed', function(str){
 
 str=this.encode("icase", str); // built-in
 str=this.encode("whitespace", str); // custom
 
 // do something additional with str ...
 
 return str;
});

添加自定義標記

在創建/初始化期間定義私有自定義標記

var index=new FlexSearch({
 tokenize: function(str){
 return str.split(/\s-\//g);
 }
});

添加特定于語言的詞干分析器和/或過濾器

Stemmer: several linguistic mutations of the same word (e.g. "run" and "running")
Filter: a blacklist of words to be filtered out from indexing at all (e.g. "and", "to" or "be")

在創建/初始化期間分配私有自定義詞干分析器或過濾器

var index=new FlexSearch({
 stemmer: {
 
 // object {key: replacement}
 "ational": "ate",
 "tional": "tion",
 "enci": "ence",
 "ing": ""
 },
 filter: [ 
 
 // array blacklist
 "in",
 "into",
 "is",
 "isn't",
 "it",
 "it's"
 ]
});

使用自定義詞干分析器

var index=new FlexSearch({
 stemmer: function(value){
 // apply some replacements
 // ...
 
 return value;
 }
});

使用自定義過濾器

var index=new FlexSearch({
 filter: function(value){
 
 // just add values with length > 1 to the index
 
 return value.length > 1;
 }
});

或者將詞干分析器/過濾器全局分配給一種語言

Stemmer作為對象（鍵值對）傳遞，過濾為數組

FlexSearch.registerLanguage("us", {
 stemmer: { /* ... */ },
 filter: [ /* ... */ ]
});

或者使用一些預定義的詞干分析器或首選語言的過濾器

<html>
<head>
 <script src="js/flexsearch.min.js"></script>
 <script src="js/lang/en.min.js"></script>
 <script src="js/lang/de.min.js"></script>
</head>
...

現在您可以在創建/初始化期間分配內置詞干分析器

var index_en=new FlexSearch({
 stemmer: "en", 
 filter: "en" 
});
var index_de=new FlexSearch({
 stemmer: "de",
 filter: [ /* custom */ ]
});

在Node.js中，只需要語言包文件即可使用它們

require("flexsearch.js");
require("lang/en.js");
require("lang/de.js");

總結

本文知識大致翻譯了部分使用方法，更加強大和完整的用法參考官方Github文檔，里面有更加詳細的用法！

在線咨詢

上一篇：P照片、扮智障、改成績…美國史上最大高考舞弊案！富人
下一篇：(五) Flutter入門學習之 Widget滾動

您的項目需求

*請認真填寫需求信息，我們會在24小時內與您取得聯系。

整合營銷服務商