Options to add additional file types to index #3958

Shooter3k · 2019-01-18T18:50:53Z

Shooter3k
Jan 18, 2019

Is there any way to add additional file types to index such as .docx or .xlsx? If not, how does the indexer convert files into an indexable type and where would we need to focus to write our own code to add these file types ourselves?

Current list of supported formats:
https://github.com/oracle/opengrok/wiki/Supported-Languages-and-Formats

vladak · 2019-01-18T22:28:30Z

vladak
Jan 18, 2019
Maintainer

Basically, you need to add an analyzer (see https://github.com/oracle/opengrok/wiki/Internals#analysis) under opengrok-indexer/src/main/resources/analysis. If Universal ctags does not support given file type, it might be necessary to add basic support for definitions into opengrok-indexer/src/main/java/org/opengrok/indexer/analysis/Ctags.java.

If you look into the history of the OpenGrok repository, there are quite a few examples of new analyzer being added. I guess we should put some basic howto into a new wiki.

0 replies

vladak · 2019-01-18T22:29:39Z

vladak
Jan 18, 2019
Maintainer

Also, I think we should replace https://github.com/oracle/opengrok/wiki/Supported-Languages-and-Formats with command line option of the Indexer.

0 replies

vladak · 2019-01-23T12:15:52Z

vladak
Jan 23, 2019
Maintainer

#492 tracks something similar. Possibly @tarzanek can push the changes soon ?

0 replies

vladak · 2019-01-23T12:19:40Z

vladak
Jan 23, 2019
Maintainer

As for the question of how analyzer converts the files: it does not. Each analyzer dissects the files and grabs the terms that are useful. For formats that are not plain text, the analyzer needs to have the knowledge of the format.

0 replies

tarzanek · 2019-02-10T10:04:04Z

tarzanek
Feb 10, 2019

I see this will be easily doable when #2588 is merged

0 replies

tarzanek · 2019-02-10T10:05:27Z

tarzanek
Feb 10, 2019

also on doc / docx I have for a long time a tika library bundle connected to analyzers, let me publish the pdf analyser (#492) and then adding a doc one would be easy

0 replies

Ymoise · 2020-05-11T11:02:13Z

Ymoise
May 11, 2020

Is there anywhere I could find a how-to manual for adding a new analyzer?

Because the that explains it like an office is really great for understanding it conceptually, but I still have no idea how to actually go about adding a new analyzer - e.g. where the Ctags directory, for example? Are the analyzers in that directory, too? If so, I could take a look at them and see if any of them is an extension of an existing one and get an idea of what I need to do.

0 replies

vladak · 2020-05-11T14:17:53Z

vladak
May 11, 2020
Maintainer

@idodeclare should have the up to date info on how to add one. Time to recast it into documentation.

There is Javadoc generated on https://oracle.github.io/opengrok/javadoc/ so you can get some idea about the analyzer classes. Also, history of the OpenGrok repository can be perused - see e.g. 4f9cbae

0 replies

Ymoise · 2020-05-11T15:03:44Z

Ymoise
May 11, 2020

Thank you :)

0 replies

idodeclare · 2020-05-12T03:17:01Z

idodeclare
May 12, 2020

@vladak , let's not gloss over Oracle's lockdown of the OpenGrok wiki that prevents nearly everyone (in general terms) from contributing documentation — nor that the alternative PR model for the wiki is wretched.

You're right the best at the moment is to look at history. A slightly better commit for Verilog is the merge commit 8ee2396 that shows the xref and tokenizer tests which are crucial for a new analyzer. (Also an argument for merge commits for PRs.)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options to add additional file types to index #3958

{{title}}

Replies: 10 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Options to add additional file types to index #3958

Shooter3k Jan 18, 2019

Replies: 10 comments

vladak Jan 18, 2019 Maintainer

vladak Jan 18, 2019 Maintainer

vladak Jan 23, 2019 Maintainer

vladak Jan 23, 2019 Maintainer

tarzanek Feb 10, 2019

tarzanek Feb 10, 2019

Ymoise May 11, 2020

vladak May 11, 2020 Maintainer

Ymoise May 11, 2020

idodeclare May 12, 2020

Shooter3k
Jan 18, 2019

vladak
Jan 18, 2019
Maintainer

vladak
Jan 18, 2019
Maintainer

vladak
Jan 23, 2019
Maintainer

vladak
Jan 23, 2019
Maintainer

tarzanek
Feb 10, 2019

tarzanek
Feb 10, 2019

Ymoise
May 11, 2020

vladak
May 11, 2020
Maintainer

Ymoise
May 11, 2020

idodeclare
May 12, 2020