使用 OpenGrok搭建大型源码阅读环境

使用 OpenGrok搭建大型源码阅读环境

官方wiki 简单介绍了OpenGrok的搭建过程, 参考https://github.com/oracle/opengrok/wiki/How-to-setup-OpenGrok

在自己的实践过程中,还是会遇到一些小问题,记录下来以避免后人继续踩坑。

本文以最新版ubuntu 22.10为例(实测ubuntu 22.10搭建AOSP编译环境完全没有任何问题)

使用 OpenGrok搭建大型源码阅读环境_第1张图片

  • tomcat的安装

tomcat需要10.x版本, apt源中的版本是9.x,因此需要手动安装
手动安装步骤和相关脚本,可以参考https://github.com/lashwang2022/tomcat-installation-ubuntu/blob/main/install-tomcat-ubuntu.sh

  • opengrok的安装和配置

官方提供的命令

opengrok-deploy -c /opengrok/etc/configuration.xml \
    /opengrok/dist/lib/source.war /var/lib/tomcat8/webapps
  1. 需要注意python的版本问题, opengrok-deploy需要在python3环境下使用,自己可以使用pyven创建虚拟python环境.

  2. tomcat路径需要修改为实际安装的路径

  • 源码准备

可以通过软连接将需要索引的源码链接到opengrok安装的src目录, 不需要将源码放到src目录下。软链的好处是添加删除项目也非常方便

  • 源码索引

OpenGrop索引的核心就是opengrok.jar, 可以通过执行”java -jar /opengrok/dist/lib/opengrok.jar -h” 查看支持的参数


simon@simon-ubuntu-server:~$ java -jar /opt/opengrok/dist/lib/opengrok.jar -h
Jan 19, 2023 5:05:40 PM org.opengrok.indexer.index.Indexer parseOptions
INFO: Indexer options: [-h]

Usage: java -jar opengrok.jar [options] [subDir1 [...]]

  -h, -?, --help [mode]
        With no mode specified, display this usage summary. Or specify a mode:
          config - display configuration.xml examples.
           ctags - display ctags command-line.
            guru - display AnalyzerGuru details.
           repos - display enabled repositories.

  --annotationCache on|off
        Annotation cache provides speedup when getting annotation
        for files in the webapp at the cost of significantly increased
        indexing time (multiple times slower) and slightly increased
        disk space (comparable to history cache size).
        Can be enabled per project.

  --apiTimeout number
        Set timeout for asynchronous API requests.

  --connectTimeout number
        Set connect timeout. Used for API requests.

  -A, --analyzer (.ext|prefix.):(-|analyzer)
        Associates files with the specified prefix or extension (case-
        insensitive) to be analyzed with the given analyzer, where 'analyzer'
        may be specified using a class name (case-sensitive e.g. RubyAnalyzer)
        or analyzer language name (case-sensitive e.g. C). Option may be
        repeated.
          Ex: -A .foo:CAnalyzer
              will use the C analyzer for all files ending with .FOO
          Ex: -A bar.:Perl
              will use the Perl analyzer for all files starting with
              "BAR" (no full-stop)
          Ex: -A .c:-
              will disable specialized analyzers for all files ending with .c

  -c, --ctags /path/to/ctags
        Path to Universal Ctags. Default is ctags in environment PATH.

  --canonicalRoot /path/
        Allow symlinks to canonical targets starting with the specified root
        without otherwise needing to specify -N,--symlink for such symlinks. A
        canonical root must end with a file separator. For security, a canonical
        root cannot be the root directory. Option may be repeated.

  --checkIndex
        Check index, exit with 0 on success,
        with 1 on failure.

  -d, --dataRoot /path/to/data/root
        The directory where OpenGrok stores the generated data.

  --depth number
        Scanning depth for repositories in directory structure relative to
        source root. Default is 3.

  --disableRepository type_name
        Disables operation of an OpenGrok-supported repository. See also
        -h,--help repos. Option may be repeated.
          Ex: --disableRepository git
              will disable the GitRepository
          Ex: --disableRepository MercurialRepository

  -e, --economical
        To consume less disk space, OpenGrok will not generate and save
        hypertext cross-reference files but will generate on demand, which could
        be slightly slow.

  -G, --assignTags
        Assign commit tags to all entries in history for all repositories.

  -H
        Enable history.

  --historyBased on|off
        If history based reindex is in effect, the set of files
        changed/deleted since the last reindex is determined from history
        of the repositories. This needs history, history cache and
        projects to be enabled. This should be much faster than the
        classic way of traversing the directory structure.
        The default is on. If you need to e.g. index files untracked by
        SCM, set this to off. Currently works only for Git.
        All repositories in a project need to support this in order
        to be indexed using history.

  --historyThreads number
        The number of threads to use for history cache generation on repository level.
        By default the number of threads will be set to the number of available CPUs.
        Assumes -H/--history.

  --historyFileThreads number
        The number of threads to use for history cache generation
        when dealing with individual files.
        By default the number of threads will be set to the number of available CPUs.
        Assumes -H/--history.

  -I, --include pattern
        Only files matching this pattern will be examined. Pattern supports
        wildcards (example: -I '*.java' -I '*.c'). Option may be repeated.

  -i, --ignore pattern
        Ignore matching files (prefixed with 'f:' or no prefix) or directories
        (prefixed with 'd:'). Pattern supports wildcards (example: -i '*.so'
        -i d:'test*'). Option may be repeated.

  -l, --lock on|off|simple|native
        Set OpenGrok/Lucene locking mode of the Lucene database during index
        generation. "on" is an alias for "simple". Default is off.

  --leadingWildCards on|off
        Allow or disallow leading wildcards in a search. Default is on.

  -m, --memory number
        Amount of memory (MB) that may be used for buffering added documents and
        deletions before they are flushed to the directory (default 16.0).
        Please increase JVM heap accordingly too.

  --mandoc /path/to/mandoc
        Path to mandoc(1) binary.

  -N, --symlink /path/to/symlink
        Allow the symlink to be followed. Other symlinks targeting the same
        canonical target or canonical children will be allowed too. Option may
        be repeated. (By default only symlinks directly under the source root
        directory are allowed. See also --canonicalRoot)

  -n, --noIndex
        Do not generate indexes and other data (such as history cache and xref
        files), but process all other command line options.

  --nestingMaximum number
        Maximum depth of nested repositories. Default is 1.

  --reduceSegmentCount
        Reduce the number of segments in each index database to 1. This might
        (or might not) bring some improved performance. Anyhow, this operation
        takes non-trivial time to complete.

  -o, --ctagOpts path
        File with extra command line options for ctags.

  -P, --projects
        Generate a project for each top-level directory in source root.

  -p, --defaultProject path/to/default/project
        Path (relative to the source root) to a project that should be selected
        by default in the web application (when no other project is set either
        in a cookie or in parameter). Option may be repeated to specify several
        projects. Use the special value __all__ to indicate all projects.

  --profiler
        Pause to await profiler or debugger.

  --progress
        Print per-project percentage progress information.

  -Q, --quickScan on|off
        Turn on/off quick context scan. By default, only the first 1024KB of a
        file is scanned, and a link ('[..all..]') is inserted when the file is
        bigger. Activating this may slow the server down. (Note: this setting
        only affects the web application.) Default is on.

  -q, --quiet
        Run as quietly as possible. Sets logging level to WARNING.

  -R /path/to/configuration
        Read configuration from the specified file.

  -r, --remote on|off|uionly|dirbased
        Specify support for remote SCM systems.
              on - allow retrieval for remote SCM systems.
             off - ignore SCM for remote systems.
          uionly - support remote SCM for user interface only.
        dirbased - allow retrieval during history index only for repositories
                   which allow getting history for directories.

  --renamedHistory on|off
        Enable or disable generating history for renamed files.
        If set to on, makes history indexing slower for repositories
        with lots of renamed files. Default is off.

  --repository [path/to/repository|@file_with_paths]
        Path (relative to the source root) to a repository for generating
        history (if -H,--history is on). By default all discovered repositories
        are history-eligible; using --repository limits to only those specified.
        File containing paths can be specified via @path syntax.
        Option may be repeated.

  -S, --search [path/to/repository|@file_with_paths]
        Search for source repositories under source root (-s,--source),
        and add them. Path (relative to the source root) is optional.
        File containing the paths can be specified via @path syntax.
        Option may be repeated.

  -s, --source /path/to/source/root
        The root directory of the source tree.

  --style path
        Path to the subdirectory in the web application containing the requested
        stylesheet. The factory-setting is: "default".

  -T, --threads number
        The number of threads to use for index generation, repository scan
        and repository invalidation.
        By default the number of threads will be set to the number of available
        CPUs. This influences the number of spawned ctags processes as well.

  -t, --tabSize number
        Default tab size to use (number of spaces per tab character).

  --token string|@file_with_string
        Authorization bearer API token to use when making API calls
        to the web application

  -U, --uri SCHEME://webappURI:port/contextPath
        Send the current configuration to the specified web application.

  --updateConfig
        Populate the web application with a bare configuration, and exit.

  --userPage URL
        Base URL of the user Information provider.
        Example: "https://www.example.org/viewProfile.jspa?username=".
        Use "none" to disable link.

  --userPageSuffix URL-suffix
        URL Suffix for the user Information provider. Default: "".

  -V, --version
        Print version, and quit.

  -v, --verbose
        Set logging level to INFO.

  -W, --writeConfig /path/to/configuration
        Write the current configuration to the specified file (so that the web
        application can use the same configuration).

  --webappCtags on|off
        Web application should run ctags when necessary. Default is off.

  • 忽略文件和目录
    AOSP同步一次耗费数个小时,可以指定脚本忽略某些目录的索引,比如out/toolchain等等,如下是本人使用时忽略的目录。
    如果需要忽略文件, 将d改成f
d:out
d:prebuilts
d:cts
d:platform_testing
d:autotest
d:*old_codebase*
d:toolchain
d:rockdev
d:pdk
d:.repo
  • 源码和索引的定期更新
    可以将索引的命令添加到crontab做成定期任务自动更新

  • 源码存储空间问题
    由于AOSP一个项目就几百个G


欢迎关注我的公众号“虎哥 LoveDroid”,原创技术文章第一时间推送。

你可能感兴趣的:(Android开发工具,tomcat,java,oracle)