Options to add additional file types to index #3958
Replies: 10 comments
-
Basically, you need to add an analyzer (see https://github.com/oracle/opengrok/wiki/Internals#analysis) under If you look into the history of the OpenGrok repository, there are quite a few examples of new analyzer being added. I guess we should put some basic howto into a new wiki. |
Beta Was this translation helpful? Give feedback.
-
Also, I think we should replace https://github.com/oracle/opengrok/wiki/Supported-Languages-and-Formats with command line option of the Indexer. |
Beta Was this translation helpful? Give feedback.
-
#492 tracks something similar. Possibly @tarzanek can push the changes soon ? |
Beta Was this translation helpful? Give feedback.
-
As for the question of how analyzer converts the files: it does not. Each analyzer dissects the files and grabs the terms that are useful. For formats that are not plain text, the analyzer needs to have the knowledge of the format. |
Beta Was this translation helpful? Give feedback.
-
I see this will be easily doable when #2588 is merged |
Beta Was this translation helpful? Give feedback.
-
also on doc / docx I have for a long time a tika library bundle connected to analyzers, let me publish the pdf analyser (#492) and then adding a doc one would be easy |
Beta Was this translation helpful? Give feedback.
-
Is there anywhere I could find a how-to manual for adding a new analyzer? Because the that explains it like an office is really great for understanding it conceptually, but I still have no idea how to actually go about adding a new analyzer - e.g. where the Ctags directory, for example? Are the analyzers in that directory, too? If so, I could take a look at them and see if any of them is an extension of an existing one and get an idea of what I need to do. |
Beta Was this translation helpful? Give feedback.
-
@idodeclare should have the up to date info on how to add one. Time to recast it into documentation. There is Javadoc generated on https://oracle.github.io/opengrok/javadoc/ so you can get some idea about the analyzer classes. Also, history of the OpenGrok repository can be perused - see e.g. 4f9cbae |
Beta Was this translation helpful? Give feedback.
-
Thank you :) |
Beta Was this translation helpful? Give feedback.
-
@vladak , let's not gloss over Oracle's lockdown of the OpenGrok wiki that prevents nearly everyone (in general terms) from contributing documentation — nor that the alternative PR model for the wiki is wretched. You're right the best at the moment is to look at history. A slightly better commit for Verilog is the merge commit 8ee2396 that shows the xref and tokenizer tests which are crucial for a new analyzer. (Also an argument for merge commits for PRs.) |
Beta Was this translation helpful? Give feedback.
-
Is there any way to add additional file types to index such as .docx or .xlsx? If not, how does the indexer convert files into an indexable type and where would we need to focus to write our own code to add these file types ourselves?
Current list of supported formats:
https://github.com/oracle/opengrok/wiki/Supported-Languages-and-Formats
Beta Was this translation helpful? Give feedback.
All reactions