Real_Myth

An Implementation of Double-Array Trie

What is Trie?
What Does It Take to Implement a Trie?
Tripple-Array Trie
Double-Array Trie
Suffix Compression
Key Insertion
Key Deletion
Double-Array Pool Allocation
An Implementation
Download
Other Implementations
References

What is Trie?

Trie is a kind of digital search tree. (See[Knuth1972] for the detail of digital search tree.)[Fredkin1960] introduced the trieterminology, which is abbreviated from "Retrieval".

Trie is an efficient indexing method. It is indeed also a kind ofdeterministic finite automaton (DFA) (See [Cohen1990],for example, for the definition of DFA). Within the tree structure, each nodecorresponds to a DFA state, each (directed) labeled edge from a parent node toa child node corresponds to a DFA transition. The traversal starts at the rootnode. Then, from head to tail, one by one character in the key string is takento determine the next state to go. The edge labeled with the same character ischosen to walk. Notice that each step of such walking consumes one characterfrom the key and descends one step down the tree. If the key is exhausted anda leaf node is reached, then we arrive at the exit for that key. If we getstuck at some node, either because there is no branch labeled with the currentcharacter we have or because the key is exhausted at an internal node, thenit simply implies that the key is not recognized by the trie.

Notice that the time needed to traverse from the root to the leaf is notdependent on the size of the database, but is proportional to the length ofthe key. Therefore, it is usually much faster than B-tree or anycomparison-based indexing method in general cases. Its time complexity iscomparable with hashing techniques.

In addition to the efficiency, trie also provides flexibility in searchingfor the closest path in case that the key is misspelled. For example, byskipping a certain character in the key while walking, we can fix the insertionkind of typo. By walking toward all the immediate children of one node withoutconsuming a character from the key, we can fix the deletion typo, or evensubstitution typo if we just drop the key character that has no branch to goand descend to all the immediate children of the current node.

What Does It Take to Implement a Trie?

In general, a DFA is represented with a transition table, inwhich the rows correspond to the states, and the columns correspond to thetransition labels. The data kept in each cell is then the next state to go fora given state when the input is equal to the label.

This is an efficient method for the traversal, because every transitioncan be calculated by two-dimensional array indexing. However, in term of spaceusage, this is rather extravagant, because, in the case of trie, most nodeshave only a few branches, leaving the majority of the table cells blanks.

Meanwhile, a more compact scheme is to use a linked list to store thetransitions out of each state. But this results in slower access, due tothe linear search.

Hence, table compression techniques which still allows fast access havebeen devised to solve the problem.

[Johnson1975] (Also explained in [Aho+1985] pp. 144-146) represented DFA with four arrays, which can be simplified to three in case of trie. The transition table rows are allocated in overlapping manner, allowing the free cells to be used by other rows.
[Aoe1989] proposed an improvement from the three-array structure by reducing the arrays to two.

Tripple-Array Trie

As explained in [Aho+1985] pp. 144-146, a DFAcompression could be done using four linear arrays, namely default,base, next, and check. However, ina case simpler than the lexical analyzer, such as the mere trie for informationretrieval, the default array could be omitted. Thus, a triecan be implemented using three arrays according to this scheme.

Structure

The tripple-array structure is composed of:

base. Each element in base corresponds to a node of the trie. For a trie node s, base[s] is the starting index within the next and check pool (to be explained later) for the row of the node s in the transition table.
next. This array, in coordination with check, provides a pool for the allocation of the sparse vectors for the rows in the trie transition table. The vector data, that is, the vector of transitions from every node, would be stored in this array.
check. This array works in parallel to next. It marks the owner of every cell in next. This allows the cells next to one another to be allocated to different trie nodes. That means the sparse vectors of transitions from more than one node are allowed to be overlapped.

Definition 1. For a transition from state s tot which takes character c as the input, the conditionmaintained in the tripple-array trie is:

check[ base[ s] + c] = s
next[ base[ s] + c] = t

Walking

According to definition 1, the walking algorithm for agiven state s and the input character c is:

t := base[ s] + c;
if check[ t] = s then next state := next[ t] else fail endif

Construction

To insert a transition that takes character c to traversefrom a state s to another state t, the cellnext[base[s] + c]]must be managed to be available. If it is already vacant, we are lucky.Otherwise, either the entire transition vector for the current owner of thecell or that of the state s itself must be relocated. Theestimated cost for each case could determine which one to move. After findingthe free slots to place the vector, the transition vector must berecalculated as follows. Assuming the new place begins at b,the procedure for the relocation is:

Double-Array Trie

The tripple-array structure for implementing trie appears to be well defined,but is still not practical to keep in a single file. Thenext/checkpool may be able to keep in a single array of integer couples, but thebase array does not grow in parallel to the pool, and is thereforeusually split.

To solve this problem, [Aoe1989] reduced thestructure into two parallel arrays. In the double-array structure, thebase and next are merged, resulting in only twoparallel arrays, namely, base and check.

Structure

Instead of indirectly referencing through state numbers asin tripple-array trie, nodes in double-array trie are linked directly withinthe base/check pool.

Definition 2. For a transition from state s tot which takes character c as the input, the conditionmaintained in the double-array trie is:

check[ base[ s] + c] = s
base[ s] + c = t

Walking

According to definition 2, the walking algorithm for agiven state s and the input character c is:

t := base[ s] + c;
if check[ t] = s then next state := t else fail endif

Construction

The construction of double-array trie is in principle the same as that oftripple-array trie. The difference is the base relocation:

Procedure Relocate( s : state; b : base_index) { Move base for state s to a new place beginning at b } begin foreach input character c for the state s { i.e. foreach c such that check[base[s] + c]] = s } begin check[ b + c] := s; { mark owner } base[ b + c] := base[ base[ s] + c]; { copy data } { the node base[s] + c is to be moved to b + c; Hence, for any i for which check[i] = base[s] + c, update check[i] to b + c } foreach input character d for the node base[ s] + c begin check[ base[ base[ s] + c] + d] := b + c end; check[ base[ s] + c] := none { free the cell } end; base[ s] := b end

Suffix Compression

[Aoe1989] also suggested a storage compressionstrategy, by splitting non-branching suffixes into single string storages,called tail, so that the rest non-branching steps are reducedinto mere string comparison.

With the two separate data structures, double-array branches and suffix-spool tail, key insertion and deletion algorithms must be modifiedaccordingly.

Key Insertion

To insert a new key, the branching position can be found by traversing thetrie with the key one by one character until it gets stuck. The state wherethere is no branch to go is the very place to insert a new edge, labeled bythe failing character. However, with the branch-tail structure, the insertionpoint can be either in the branch or in the tail.

1. When the branching point is in the double-array structure

Suppose that the new key is a stringa₁a₂...a_h-1a_ha_h+1...a_n,wherea₁a₂...a_h-1traverses the trie from the root to a node s_r in the double-arraystructure, and there is no edge labeled a_h that goes out ofs_r. The algorithm called A_INSERT in[Aoe1989] does as follows:

From s _r, insert edge labeled a _h to new node s _t;Let s _t be a separate node poining to a string a _h+1...a _n in tail pool.

2. When the branching point is in the tail pool

Since the path through a tail string has no branch, and therefore correspondsto exactly one key, suppose that the key corresponding to the tail is

a₁a₂...a_h-1a_h...a_h+k-1b₁...b_m,

wherea₁a₂...a_h-1 is in double-array structure, anda_h...a_h+k-1b₁...b_m is in tail.Suppose that the substringa₁a₂...a_h-1 traverses the trie from the rootto a node s_r.

And suppose that the new key is in the form

a₁a₂...a_h-1a_h...a_h+k-1a_h+k...a_n,

where a_h+k <> b₁. The algorithm calledB_INSERT in [Aoe1989] does as follows:

From s _r, insert straight path with a _h...a _h+k-1, ending at a new node s _t;From s _t, insert edge labeled b ₁ to new node s _u;Let s _u be separate node pointing to a string b ₂...b _m in tail pool;From s _t, insert edge labeled a _h+k to new node s _v;Let s _v be separate node pointing to a string a _h+k+1...a _n in tail pool.

Key Deletion

To delete a key from the trie, all we need to do is delete the tail blockoccupied by the key, and all double-array nodes belonging exclusively to the key, without touching any node belonging to other keys.

Consider a trie which accepts a language K = {pool#, prepare#, preview#,prize#, produce#, producer#, progress#} :

The key "pool#" can be deleted by removing the tail string "ol#" from thetail pool, and node 3 from the double-array structure. This is the simplestcase.

To remove the key "produce#", it is sufficient to delete node 14 from thedouble-array structure. But the resulting trie will not obay the conventionthat every node in the double-array structure, except the separate nodes whichpoint to tail blocks, must belong to more than one key. The path from node 10on will belong solely to the key "producer#".

But there is no harm violating this rule. The only drawback is theuncompactnesss of the trie. Traversal, insertion and deletion algoritms areintact. Therefore, this should be relaxed, for the sake of simplicity andefficiency of the deletion algorithm. Otherwise, there must be extra stepsto examine other keys in the same subtree ("producer#" for the deletion of"produce#") if any node needs to be moved from the double-array structure totail pool.

Suppose further that having removed "produce#" as such (by removing onlynode 14), we also need to remove "producer#" from the trie. What we have to dois remove string "#" from tail, and remove nodes 15, 13, 12, 11, 10 (which nowbelong solely to the key "producer#") from the double-array structure.

We can thus summarize the algorithm to delete a keyk = a₁a₂...a_h-1a_h...a_n,where a₁a₂...a_h-1 is in double-array structure,and a_h...a_n is in tail pool, as follows :

Let s_r := the node reached by a ₁a ₂...a _h-1;Delete a _h...a _n from tail; s := s_r; repeat p := parent of s; Delete node s from double-array structure; s := p until s = root or outdegree( s) > 0.

Where outdegree(s) is the number of children nodesof s.

Double-Array Pool Allocation

When inserting a new branch for a node, it is possible that the array elementfor the new branch has already been allocated to another node. In that case,relocation is needed. The efficiency-critical part then turns out to be thesearch for a new place. A brute force algoritm iterates along thecheck array to find an empty cell to place the first branch, andthen assure that there are empty cells for all other branches as well.The time used is therefore proportional to the size of the double-array pooland the size of the alphabet.

Suppose that there are n nodes in the trie, and the alphabet isof size m. The size of the double-array structure would ben + cm, where c is a coefficient whichis dependent on the characteristic of the trie. And the time complexity ofthe brute force algorithm would beO(nm + cm²).

[Aoe1989] proposed a free-space list in thedouble-array structure to make the time complexity independent of the sizeof the trie, but dependent on the number of the free cells only. The check array for the free cells are redefined to keep a pointerto the next free cell (called G-link) :

Definition 3. Let r₁, r₂, ... ,r_cm be the free cells in the double-array structure, orderedby position. G-link is defined as follows :

check[0] = -r ₁
check[r _i] = -r _i+1 ; 1 <= i <= cm-1
check[r _cm] = -1

By this definition, negative check means unoccupied in the samesense as that for "none" check in the ordinary algorithm. Thisencoding scheme forms a singly-linked list of free cells. When searching for anempty cell, only cm free cells are visited, instead of alln + cm cells as in the brute force algorithm.

This, however, can still be improved. Notice that for those cells withnegative check, the corresponding base's are notgiven any definition. Therefore, in our implementation, Aoe's G-link ismodified to be doubly-linked list by letting base of every freecell points to a previous free cell. This can speed up the insertion anddeletion processes. And, for convenience in referencing the list head and tail,we let the list be circular. The zeroth node is dedicated to be the entry pointof the list. And the root node of the trie will begin with cell number one.

Definition 4. Let r₁, r₂, ... ,r_cm be the free cells in the double-array structure, orderedby position. G-link is defined as follows :

check[0] = -r ₁
check[r _i] = -r _i+1 ; 1 <= i <= cm-1
check[r _cm] = 0
base[0] = -r _cm
base[r ₁] = 0
base[r _i+1] = -r _i ; 1 <= i <= cm-1

Then, the searching for the slots for a node with input symbol setP = {c₁, c₂, ..., c_p} needs to iterate onlythe cells with negative check :

{find least free cell s such that s > c₁}s := - check[0]; while s <> 0 and s <= c ₁ do s := - check[s] end; if s = 0 then return FAIL; {or reserve some additional space} {continue searching for the row, given that s matches c₁} while s <> 0 do i := 2; while i <= p and check[s + c _i - c ₁] < 0 do i := i + 1 end; if i = p + 1 then return s - c ₁; {all cells required are free, so return it} s := - check[s] end; return FAIL; {or reserve some additional space}

The time complexity for free slot searching is reduced toO(cm²). The relocation stage takes O(m²). The total time complexity is thereforeO(cm² + m²) = O(cm²).

It is useful to keep the free list ordered by position, so that the accessthrough the array becomes more sequential. This would be beneficial when thetrie is stored in a disk file or virtual memory, because the disk caching orpage swapping would be used more efficiently. So, the free cell reusingshould maintain this strategy :

t := - check[0]; while check[t] <> 0 and t < s do t := - check[t] end; {t now points to the cell after s' place} check[s] := -t; check[- base[t]] := -s; base[s] := base[t]; base[t] := -s;

Time complexity of freeing a cell is thus O(cm).

An Implementation

In my implementation, I designed the API with persistent data in mind.Tries can be saved to disk and loaded for use afterward. And in newer versions,non-persistent usage is also possible. You can create a trie in memory, populate data to it, use it, and free it, without any disk I/O. Alternativelyyou can load a trie from disk and save it to disk whenever you want.

The trie data is portable across platforms. The byte order in the disk isalways little-endian, and is read correctly on either little-endian or big-endian systems.

Trie index is 32-bit signed integer. This allows 2,147,483,646 (2³¹ - 2) total nodes in the trie data, which should be sufficientfor most problem domains. And each data entry can store a 32-bit integer valueassociated to it. This value can be used for any purpose, up to your needs.If you don't need to use it, just store some dummy value.

For sparse data compactness, the trie alphabet set should be continuous,but that is usually not the case in general character sets. Therefore, a mapbetween the input character and the low-level alphabet set for the trie iscreated in the middle. You will have to define your input character set bylisting their continuous ranges of character codes in a .abm (alphabet map)file when creating a trie. Then, each character will be automatically assignedinternal codes of continuous values.

Download

Update: The double-array trie implementation has been simplified and rewritten from scratch in C, and is now named libdatrie. It is now available under the terms ofGNU Lesser General PublicLicense (LGPL):

libdatrie-0.2.9 (3 May 2015)
libdatrie-0.2.8 (10 January 2014)
libdatrie-0.2.7.1 (22 October 2013)
libdatrie-0.2.6 (23 January 2013)
libdatrie-0.2.5 (4 November 2011)
libdatrie-0.2.4 (30 June 2010)
libdatrie-0.2.3 (27 February 2010)
libdatrie-0.2.2 (29 April 2009)
libdatrie-0.2.1 (5 April 2009)
libdatrie-0.2.0 (24 March 2009)
libdatrie-0.1.3 (28 January 2008)
libdatrie-0.1.2 (25 August 2007)
libdatrie-0.1.1 (12 October 2006)
libdatrie-0.1.0 (18 September 2006)

SVN: svn co http://linux.thai.net/svn/software/datrie

The old C++ source code below is under the terms ofGNU Lesser General PublicLicense (LGPL):

midatrie-0.3.3 (2 October 2001)
midatrie-0.3.3 (16 July 2001)
midatrie-0.3.2 (21 May 2001)
midatrie-0.3.1 (8 May 2001)
midatrie-0.3.0 (23 Mar 2001)

Other Implementations

DoubleArrayTrie: Java implementation by Christos Gioran (More information)

References

[Knuth1972] Knuth, D. E. The Art of Computer Programming Vol. 3, Sorting and Searching. Addison-Wesley. 1972.
[Fredkin1960] Fredkin, E. Trie Memory. Communication of the ACM. Vol. 3:9 (Sep 1960). pp. 490-499.
[Cohen1990] Cohen, D. Introduction to Theory of Computing. John Wiley & Sons. 1990.
[Johnson1975] Johnson, S. C. YACC-Yet another compiler-compiler. Bell Lab. NJ. Computing Science Technical Report 32. pp.1-34. 1975.
[Aho+1985] Aho, A. V., Sethi, R., Ullman, J. D. Compilers : Principles, Techniques, and Tools. Addison-Wesley. 1985.
[Aoe1989] Aoe, J. An Efficient Digital Search Algorithm by Using a Double-Array Structure. IEEE Transactions on Software Engineering. Vol. 15, 9 (Sep 1989). pp. 1066-1077.
[Virach+1993] Virach Sornlertlamvanich, Apichit Pittayaratsophon, Kriangchai Chansaenwilai. Thai Dictionary Data Base Manipulation using Multi-indexed Double Array Trie. 5th Annual Conference. National Electronics and Computer Technology Center. Bangkok. 1993. pp 197-206. (in Thai)

深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
python比较字符串是否一样,Python如何确定两个字符串是否相同鲁东学子 python比较字符串是否一样
I'vetriedtounderstandwhenPythonstringsareidentical(akasharingthesamememorylocation).Howeverduringmytests,thereseemstobenoobviousexplanationwhentwostringvariablesthatareequalsharethesamememory:importsy
1-1.Jetpack 之 Navigation 简单编码模板我命由我12345 Android -Jetpack 简化编程 java java-ee android-studio android studio 安卓 android jetpack
一、Navigation1、Navigation概述Navigation是Jetpack中的一个重要成员，它主要是结合导航图（NavigationGraph）来控制和简化Fragment之间的导航，即往哪里走，该怎么走2、Navigate引入在模块级build.gradle中引入相关依赖implementation'androidx.navigation:navigation-fragment:2
基于深度学习的多模态信息检索 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的多模态信息检索（MultimodalInformationRetrieval,MMIR）是指利用深度学习技术，从包含多种模态（如文本、图像、视频、音频等）的数据集中检索出满足用户查询意图的相关信息。这种方法不仅可以处理单一模态的数据，还可以在多种模态之间建立关联，从而更准确地满足用户需求。1.多模态信息检索的挑战异构数据表示：多模态数据通常具有不同的特征和表示形式（如文本的词嵌入与图
Core Foundation 对象的内存管理言己言
底层的CoreFoundation对象，大多数以xxxCreateWithxxx这样的方式创建，例如：#import"TestViewController.h"#import@interfaceTestViewController()@end@implementationTestViewController-(void)viewDidLoad{[superviewDidLoad];//创建一个CF
nvm下载node报错: Error retrieving “http://npm.taobao.org/mirrors/node/latest/SHASUMS256.txt“: HTTP Statu 你不讲 wood javascript 开发语言前端 node.js
nvm下载node报错:Errorretrieving“http://npm.taobao.org/mirrors/node/latest/SHASUMS256.txt”:HTTPStatus404使用nvm下载node出现以下报错:原因是淘宝镜像源已经下架,所以访问资源报404错误找到nvm安装的路径:修改setting.txt配置文件为以下内容:root:D:\NVM_node\nvmpath
python并发与并行（十一） ———— 让asyncio的事件循环保持畅通，以便进一步提升程序的响应能力 bug404_ python并发与并行 python 开发语言
前一篇blog说明了怎样把采用线程所实现的项目逐步迁移到asyncio方案上面。迁移后的run_tasks协程，可以将多份输入文件通过tail_async协程正确地合并成一份输出文件。importasyncio#OnWindows,aProactorEventLoopcan'tbecreatedwithin#threadsbecauseittriestoregistersignalhandlers
R-Drop pytorch实现 warpin 深度学习深度学习 pytorch
Pytorch实现了R-Drop，可以用于训练分类模型。#-*-coding:utf-8-*-"""Description:AnimplementationofR-Drop(https://arxiv.org/pdf/2106.14448.pdf).Authors:lihpCreateDate:2021/8/24"""fromtorchimportnnfromtorch.nnimportfunct
flutter app_Flutter App中的错误处理 weixin_26638123 python
flutterappPartofthe‘AWorkinProgress’Series“正在进行的工作”系列的一部分Today,I’mdemonstratinghowtheMVCframeworklibrarypackageisadaptiveandflexibleinitsimplementationofspecificneeds.Inthiscase,I’llshowyouhowtheframe
Objective-C 静态方法可以重写吗赵哥窟
首先来看一段代码#import@interfacePerson:NSObject+(void)pringName:(NSString*)name;@end#import"Person.h"@interfacePerson()@end@implementationPerson+(void)pringName:(NSString*)name{NSLog(@"Person-%@",name);}@end
Android kotlin开发项目MVP架构搭建哎吆我呸 android Kotlin初入门
1、引入需要的网络库implementation'com.squareup.retrofit2:retrofit:2.7.1'implementation'com.squareup.retrofit2:converter-gson:2.7.1'implementation'com.jakewharton.retrofit:retrofit2-kotlin-coroutines-adapter:0.
Android 利用OkHttp进行文件下载操作淼森007 Android基础
上回我的博客中讲了如何使用OkHttp封装一套自己的网路请求框架，这次说说文件下载。其实我们APP中还是很多地方会用到文件下载的。比如版本更新的时候，比如图片本地缓存的时候，都会用到文件下载，那么我们如何使用这个功能呢?首先我们要引入框架implementation'com.squareup.okhttp3:okhttp:3.6.0'接着创建类DownloadUtil.java，内容如下publi
说说百度大模型算法工程师二面经历 AI小白熊百度算法人工智能大模型面试 ai 自然语言处理
百度大模型算法工程师面试题应聘岗位：百度大模型算法工程师面试轮数：第二轮整体面试感觉：偏简单面试过程回顾1.自我介绍在自我介绍环节，我清晰地阐述了个人基本信息、教育背景、工作经历和技能特长，展示了自信和沟通能力。2.Leetcode题具体题意记不清了，但是类似【208.实现Trie(前缀树)】题目内容Trie（发音类似“try”）或者说前缀树是一种树形数据结构，用于高效地存储和检索字符串数据集中的
茴香豆：搭建RAG 智能助理不才妹妹人工智能 windows linux
RAGRAG（RetrievalAugmentedGeneration）技术，通过检索与用户输入相关的信息片段，并结合外部知识库来生成更准确、更丰富的回答。解决LLMs在处理知识密集型任务时可能遇到的挑战,如幻觉、知识过时和缺乏透明、可追溯的推理过程等。提供更准确的回答、降低推理成本、实现外部记忆。1.在茴香豆Web版中创建自己领域的知识问答助手1.1配置镜像环境进入开发机后，从官方环境复制运行I
OPENAI中RAG实现原理以及示例代码用PYTHON来实现 dzend aigc python 开发语言 ai
OPENAI中RAG实现原理以及示例代码用PYTHON来实现1.引言在当今人工智能领域，自然语言处理（NLP）是一个非常重要的研究方向。近年来，OPENAI发布了许多创新的NLP模型，其中之一就是RAG（Retrieval-AugmentedGeneration）模型。RAG模型结合了检索和生成两种方法，可以用于生成与给定问题相关的高质量文本。本文将介绍RAG模型的实现原理，并提供使用Python
langchain `as_retriever` 方法大多_C langchain java 服务器
as_retriever方法是一个用于将VectorStore对象转换为VectorStoreRetriever对象的便捷方法。VectorStoreRetriever是一个检索类，用于从向量存储中查找和检索最相关的文档。这个方法接受多个可选参数来配置检索的行为。用法介绍参数search_type(Optional[str]):定义检索器应该执行的搜索类型。选项包括："similarity":默认
RSocket-JS 使用指南郁俪晟Gertrude
RSocket-JS使用指南rsocket-jsJavaScriptimplementationofRSocket项目地址:https://gitcode.com/gh_mirrors/rs/rsocket-js1.项目介绍RSocket-JS是一个实现了RSocket协议（版本1.0）的JavaScript库，专为在浏览器环境及Node.js中使用设计。该库允许开发者通过RSocket协议进行高
Django 缓存 weixin_43640594 django 缓存 python
缓存⑴数据库缓存settings中添加CACHES={'default':{'BACKEND':'django.core.cache.backends.db.DatabaseCache','LOCATION':'my_cache_table','TIMEOUT':300,'OPTIONS':{'MAX_ENTRIES':300,'CULL_FREQUENCY':2,}}}参数说明BACKEND引擎
CURD是啥？蟹堡王首席大厨
最近在看一些关于后台开发相关的文章的时候，一时想不起来CURD是啥？，上网搜了搜：crud是指在做计算处理时的增加(Create)、读取(Retrieve)、更新(Update)和删除(Delete)几个单词的首字母简写。crud主要被用在描述软件系统中数据库或者持久层的基本操作功能。以上来自百度百科的词条。crud操作，表示是增删改查.c[create]/r[read]/u[update]/d[
trie算法云无心以出岫算法 #acwing 算法 c++数据结构
Trie（字典树、前缀树）是一种用于高效存储和检索字符串的数据结构。主要特点和优势：高效的前缀查询：能够快速判断一个字符串的前缀是否存在，以及查找具有特定前缀的所有字符串。节省空间：对于有共同前缀的字符串，只存储共同前缀部分一次，避免了重复存储。插入和查找的时间复杂度通常为O(m)，其中m是要插入或查找的字符串的长度。基本结构：Trie由节点组成，每个节点可能有多个子节点，通常用数组或哈希表来表示
Spark 3.5.1 升级 Java 17 异常 cannot access class sun.nio.ch.DirectBuffer 敏叔V587 spark java nio
异常说明使用Spark3.5.1升级到Java17的时候会有一个异常，异常如下SLF4J:Failedtoloadclass"org.slf4j.impl.StaticLoggerBinder".SLF4J:Defaultingtono-operation(NOP)loggerimplementationSLF4J:Seehttp://www.slf4j.org/codes.html#Static
【AI大模型应用开发】【LangChain系列】2. 一文全览LangChain数据连接模块：从文档加载到向量检索RAG，理论+实战+细节同学小张大模型 python 人工智能 langchain python 笔记经验分享 prompt embedding
大家好，我是【同学小张】。持续学习，持续干货输出，关注我，跟我一起学AI大模型技能。本文学习LangChain中的数据连接（Retrieval）模块。该模块提供文档加载、切分，向量存储、检索等操作的封装。最后，结合RAG基本流程、LangChainPrompt模板和输入输出模块，我们将利用LangChain实现RAG的基本流程。文章目录0.模块介绍1.Documentloaders文档加载模块1.
Android从零开始搭建MVVM架构（6） m0_66070459 程序员面试移动开发 android
//加载项目build.gradle的anroid标签下dataBinding{enabled=true}添加相关依赖//okhttp、retrofit、rxjavaimplementation‘com.squareup.okhttp3:okhttp:3.8.0’implementation‘com.squareup.retrofit2:retrofit:2.3.0’implementation‘
Unique3D：开启单张图片三维重建新篇章余靖年Veronica
Unique3D：开启单张图片三维重建新篇章Unique3DOfficialimplementationofUnique3D:High-QualityandEfficient3DMeshGenerationfromaSingleImage项目地址:https://gitcode.com/gh_mirrors/un/Unique3D在当今高速发展的科技领域中，三维重建技术正以惊人的速度改变着我们的视
python利用向量数据库chroma实现RAG检索增强生成 Cachel wood LLM和AIGC 阿里云云计算 python flask 开发语言 RAG chroma
文章目录向量数据库chroma简介RAG简介RAG示例向量数据库chroma简介向量数据库chroma教程RAG简介RAG的全称是Retrieval-AugmentedGeneration，中文翻译为检索增强生成。它是一个为大模型提供外部知识源的概念，这使它们能够生成准确且符合上下文的答案，同时能够减少模型幻觉。知识更新问题最先进的LLM会接受大量的训练数据，将广泛的常识知识存储在神经网络的权重中
Time-LLM 开源项目使用教程袁菲李
Time-LLM开源项目使用教程Time-LLM[ICLR2024]Officialimplementationof"Time-LLM:TimeSeriesForecastingbyReprogrammingLargeLanguageModels"项目地址:https://gitcode.com/gh_mirrors/ti/Time-LLM项目介绍Time-LLM是一个用于时间序列预测的框架，通过
探索未来时间管理新方式：Time-LLM 班歆韦Divine
探索未来时间管理新方式：Time-LLMTime-LLM[ICLR2024]Officialimplementationof"Time-LLM:TimeSeriesForecastingbyReprogrammingLargeLanguageModels"项目地址:https://gitcode.com/gh_mirrors/ti/Time-LLM项目简介Time-LLM是一个创新的时间管理和任务
关于centos7仓库归档导致yum源更新失败问题Could not retrieve mirrorlist http://mirrorlist.centos.org?arch=x86_64 飘然渡沧海自己新建项目遇到问题 linux centos
关于centos7仓库归档导致yum源更新失败问题，报错Loadedplugins:fastestmirrorDeterminingfastestmirrorsCouldnotretrievemirrorlisthttp://mirrorlist.centos.org?arch=x86_64&release=7&repo=sclo-rherrorwas14:curl#6-"Couldnotreso
基于Hadoop的学习行为数据云存储平台的设计与实现 usp1994 hadoop 学习大数据
基于Hadoop的学习行为数据云存储平台的设计与实现DesignandImplementationofaHadoop-BasedLearningBehavioralDataCloudStoragePlatform完整下载链接:基于Hadoop的学习行为数据云存储平台的设计与实现文章目录基于Hadoop的学习行为数据云存储平台的设计与实现摘要第一章绪论1.1研究背景1.2研究目的1.3研究意义第二章
基于python的校园二手商品交易系统的设计与实现 usp1994 Python 校园二手商品交易系统设计实现
基于python的校园二手商品交易系统的设计与实现DesignandImplementationofaPython-basedCampusSecond-handGoodsTradingSystem完整下载链接:基于python的校园二手商品交易系统的设计与实现文章目录基于python的校园二手商品交易系统的设计与实现摘要第一章绪论1.1研究背景1.2研究目的与意义1.3国内外研究现状第二章系统需求
集合框架天子之骄 java 数据结构集合框架
集合框架集合框架可以理解为一个容器，该容器主要指映射(map)、集合(set)、数组(array)和列表(list)等抽象数据结构。从本质上来说，Java集合框架的主要组成是用来操作对象的接口。不同接口描述不同的数据类型。简单介绍： Collection接口是最基本的接口，它定义了List和Set，List又定义了LinkLi
Table Driven（表驱动）方法实例 bijian1013 java enum Table Driven 表驱动
实例一： /** * 驾驶人年龄段 * 保险行业，会对驾驶人的年龄做年龄段的区分判断 * 驾驶人年龄段：01-[18,25);02-[25,30);03-[30-35);04-[35,40);05-[40,45);06-[45,50);07-[50-55);08-[55,+∞) */ public class AgePeriodTest { //if...el
Jquery 总结 cuishikuan java jquery Ajax Web jquery方法
1.$.trim方法用于移除字符串头部和尾部多余的空格。如：$.trim(' Hello ') // Hello2.$.contains方法返回一个布尔值，表示某个DOM元素（第二个参数）是否为另一个DOM元素（第一个参数）的下级元素。如：$.contains(document.documentElement, document.body); 3.$
面向对象概念的提出麦田的设计者 java 面向对象面向过程
面向对象中，一切都是由对象展开的，组织代码，封装数据。在台湾面向对象被翻译为了面向物件编程，这充分说明了，这种编程强调实体。下面就结合编程语言的发展史，聊一聊面向过程和面向对象。 c语言由贝尔实
linux网口绑定被触发 linux
刚在一台IBM Xserver服务器上装了RedHat Linux Enterprise AS 4，为了提高网络的可靠性配置双网卡绑定。一、环境描述我的RedHat Linux Enterprise AS 4安装双口的Intel千兆网卡，通过ifconfig -a命令看到eth0和eth1两张网卡。二、双网卡绑定步骤： 2.1 修改/etc/sysconfig/network
XML基础语法肆无忌惮_ xml
一、什么是XML？ XML全称是Extensible Markup Language，可扩展标记语言。很类似HTML。XML的目的是传输数据而非显示数据。XML的标签没有被预定义，你需要自行定义标签。XML被设计为具有自我描述性。是W3C的推荐标准。二、为什么学习XML？用来解决程序间数据传输的格式问题做配置文件充当小型数据库三、XML与HTM
为网页添加自己喜欢的字体知了ing 字体秒表 css
@font-face { font-family: miaobiao;//定义字体名字 font-style: normal; font-weight: 400; src: url('font/DS-DIGI-e.eot');//字体文件 } 使用： <label style="font-size:18px;font-famil
redis范围查询应用-查找IP所在城市矮蛋蛋 redis
原文地址： http://www.tuicool.com/articles/BrURbqV 需求根据IP找到对应的城市原来的解决方案 oracle表（ip_country）：查询IP对应的城市： 1.把a.b.c.d这样格式的IP转为一个数字，例如为把210.21.224.34转为3524648994 2. select city from ip_
输入两个整数，计算百分比 alleni123 java
public static String getPercent(int x, int total){ double result=(x*1.0)/(total*1.0); System.out.println(result); DecimalFormat df1=new DecimalFormat("0.0000%");
百合——————>怎么学习计算机语言百合不是茶 java 移动开发
对于一个从没有接触过计算机语言的人来说，一上来就学面向对象，就算是心里上面接受的了，灵魂我觉得也应该是跟不上的，学不好是很正常的现象，计算机语言老师讲的再多，你在课堂上面跟着老师听的再多，我觉得你应该还是学不会的，最主要的原因是你根本没有想过该怎么来学习计算机编程语言，记得大一的时候金山网络公司在湖大招聘我们学校一个才来大学几天的被金山网络录取，一个刚到大学的就能够去和
linux下tomcat开机自启动 bijian1013 tomcat
方法一：修改Tomcat/bin/startup.sh 为: export JAVA_HOME=/home/java1.6.0_27 export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:. export PATH=$JAVA_HOME/bin:$PATH export CATALINA_H
spring aop实例 bijian1013 java spring AOP
1.AdviceMethods.java package com.bijian.study.spring.aop.schema; public class AdviceMethods { public void preGreeting() { System.out.println("--how are you!--"); } } 2.beans.x
[Gson八]GsonBuilder序列化和反序列化选项enableComplexMapKeySerialization bit1129 serialization
enableComplexMapKeySerialization配置项的含义 Gson在序列化Map时，默认情况下，是调用Key的toString方法得到它的JSON字符串的Key，对于简单类型和字符串类型，这没有问题，但是对于复杂数据对象，如果对象没有覆写toString方法，那么默认的toString方法将得到这个对象的Hash地址。 GsonBuilder用于
【Spark九十一】Spark Streaming整合Kafka一些值得关注的问题 bit1129 Stream
包括Spark Streaming在内的实时计算数据可靠性指的是三种级别： 1. At most once，数据最多只能接受一次，有可能接收不到 2. At least once, 数据至少接受一次，有可能重复接收 3. Exactly once 数据保证被处理并且只被处理一次，具体的多读几遍http://spark.apache.org/docs/lates
shell脚本批量检测端口是否被占用脚本 ronin47
#!/bin/bash cat ports |while read line do#nc -z -w 10 $line nc -z -w 2 $line 58422>/dev/null2>&1if[ $?-eq 0]then echo $line:ok else echo $line:fail fi done 这里的ports 既可以是文件
java-2.设计包含min函数的栈 bylijinnan java
具体思路参见：http://zhedahht.blog.163.com/blog/static/25411174200712895228171/ import java.util.ArrayList; import java.util.List; public class MinStack { //maybe we can use origin array rathe
Netty源码学习-ChannelHandler bylijinnan java netty
一般来说，“有状态”的ChannelHandler不应该是“共享”的，“无状态”的ChannelHandler则可“共享” 例如ObjectEncoder是“共享”的, 但 ObjectDecoder 不是因为每一次调用decode方法时，可能数据未接收完全（incomplete），它与上一次decode时接收到的数据“累计”起来才有可能是完整的数据，是“有状态”的 p
java生成随机数 cngolon java
方法一： /** * 生成随机数 * @author [email protected] * @return */ public synchronized static String getChargeSequenceNum(String pre){ StringBuffer sequenceNum = new StringBuffer(); Date dateTime = new D
POI读写海量数据 ctrain 海量数据
import java.io.FileOutputStream; import java.io.OutputStream; import org.apache.poi.xssf.streaming.SXSSFRow; import org.apache.poi.xssf.streaming.SXSSFSheet; import org.apache.poi.xssf.streaming
mysql 日期格式化date_format详细使用 daizj mysql date_format 日期格式转换日期格式化
日期转换函数的详细使用说明 DATE_FORMAT(date,format) Formats the date value according to the format string. The following specifiers may be used in the format string. The&n
一个程序员分享8年的开发经验 dcj3sjt126com 程序员
在中国有很多人都认为IT行为是吃青春饭的，如果过了30岁就很难有机会再发展下去!其实现实并不是这样子的，在下从事.NET及JAVA方面的开发的也有8年的时间了，在这里在下想凭借自己的亲身经历，与大家一起探讨一下。明确入行的目的很多人干IT这一行都冲着“收入高”这一点的，因为只要学会一点HTML, DIV+CSS，要做一个页面开发人员并不是一件难事，而且做一个页面开发人员更容
android欢迎界面淡入淡出效果 dcj3sjt126com android
很多Android应用一开始都会有一个欢迎界面，淡入淡出效果也是用得非常多的，下面来实现一下。主要代码如下： package com.myaibang.activity; import android.app.Activity;import android.content.Intent;import android.os.Bundle;import android.os.CountDown
linux 复习笔记之常见压缩命令 eksliang tar解压 linux系统常见压缩命令 linux压缩命令 tar压缩
转载请出自出处:http://eksliang.iteye.com/blog/2109693 linux中常见压缩文件的拓展名 *.gz gzip程序压缩的文件 *.bz2 bzip程序压缩的文件 *.tar tar程序打包的数据，没有经过压缩 *.tar.gz tar程序打包后，并经过gzip程序压缩 *.tar.bz2 tar程序打包后，并经过bzip程序压缩 *.zi
Android 应用程序发送shell命令 gqdy365 android
项目中需要直接在APP中通过发送shell指令来控制lcd灯，其实按理说应该是方案公司在调好lcd灯驱动之后直接通过service送接口上来给APP，APP调用就可以控制了，这是正规流程，但我们项目的方案商用的mtk方案，方案公司又没人会改，只调好了驱动，让应用程序自己实现灯的控制，这不蛋疼嘛！！！！发就发吧！一、关于shell指令：我们知道，shell指令是Linux里面带的
java 无损读取文本文件 hw1287789687 读取文件无损读取读取文本文件 charset
java 如何无损读取文本文件呢？以下是有损的 @Deprecated public static String getFullContent(File file, String charset) { BufferedReader reader = null; if (!file.exists()) { System.out.println("getFull
Firebase 相关文章索引 justjavac firebase
Awesome Firebase 最近谷歌收购Firebase的新闻又将Firebase拉入了人们的视野，于是我做了这个 github 项目。 Firebase 是一个数据同步的云服务，不同于 Dropbox 的「文件」，Firebase 同步的是「数据」，服务对象是网站开发者，帮助他们开发具有「实时」（Real-Time）特性的应用。开发者只需引用一个 API 库文件就可以使用标准 RE
C++学习重点 lx.asymmetric C++笔记
1.c++面向对象的三个特性：封装性，继承性以及多态性。 2.标识符的命名规则：由字母和下划线开头，同时由字母、数字或下划线组成；不能与系统关键字重名。 3.c++语言常量包括整型常量、浮点型常量、布尔常量、字符型常量和字符串性常量。 4.运算符按其功能开以分为六类：算术运算符、位运算符、关系运算符、逻辑运算符、赋值运算符和条件运算符。 &n
java bean和xml相互转换 q821424508 java bean xml xml和bean转换 java bean和xml转换
这几天在做微信公众号做的过程中想找个java bean转xml的工具，找了几个用着不知道是配置不好还是怎么回事，都会有一些问题，然后脑子一热谢了一个javabean和xml的转换的工具里，自己用着还行，虽然有一些约束吧，还是贴出来记录一下顺便你提一下下，这个转换工具支持属性为集合、数组和非基本属性的对象。 packag
C 语言初级位运算 1140566087 位运算 c
第十章位运算 1、位运算对象只能是整形或字符型数据，在VC6.0中int型数据占4个字节 2、位运算符：运算符作用 ~ 按位求反 << 左移 >> 右移 & 按位与 ^ 按位异或 | 按位或他们的优先级从高到低； 3、位运算符的运算功能： a、按位取反： ~01001101 = 101
14点睛Spring4.1-脚本编程 wiselyman spring4
14.1 Scripting脚本编程脚本语言和java这类静态的语言的主要区别是:脚本语言无需编译,源码直接可运行; 如果我们经常需要修改的某些代码,每一次我们至少要进行编译,打包,重新部署的操作,步骤相当麻烦; 如果我们的应用不允许重启,这在现实的情况中也是很常见的; 在spring中使用脚本编程给上述的应用场景提供了解决方案,即动态加载bean; spring支持脚本

An Implementation of Double-Array Trie

An Implementation of Double-Array Trie

Contents

What is Trie?

What Does It Take to Implement a Trie?

Tripple-Array Trie

Structure

Walking

Construction

Double-Array Trie

Structure

Walking

Construction

Suffix Compression

Key Insertion

1. When the branching point is in the double-array structure

2. When the branching point is in the tail pool

Key Deletion

Double-Array Pool Allocation

An Implementation

Download

Other Implementations

References

你可能感兴趣的:(An Implementation of Double-Array Trie)