Howdy...I need to write a class that will take a java file as input, strip allthe comments out, and save thre result in a different file....Can I use that StreamTokenizer to do that? I can't really understandhow that work... Help, anyone?I guess I could also write code myself, but how would I handle a codelike that:1: // this is a one line comment2: System.out.println ("//");In the example below, the first line would be removed... What's thebest way to know when "//" is not part of a comment. For that matter,the same with "/*"Any help is welcome. Tks.
|
|
0
|
|
|
|
Reply
|
silviocortes (4)
|
10/30/2007 3:22:09 AM |
|
silviocortes@yahoo.com wrote:> Howdy...> > I need to write a class that will take a java file as input, strip all> the comments out, and save thre result in a different file....> > Can I use that StreamTokenizer to do that? I can't really understand> how that work... Help, anyone?> > I guess I could also write code myself, but how would I handle a code> like that:> > 1: // this is a one line comment> 2: System.out.println ("//");> > In the example below, the first line would be removed... What's the> best way to know when "//" is not part of a comment. For that matter,> the same with "/*"> > Any help is welcome. Tks.> It's actually not that easy of a problem, but there is hope! You can probably find a Java source parser out there somewhere by using a little website I call Google. Check it out at Google.com.-- Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/30/2007 4:53:19 AM
|
|
On Mon, 29 Oct 2007 20:22:09 -0700, silviocortes@yahoo.com wrote:> I need to write a class that will take a java file as input, strip> all the comments out, and save thre result in a different file....Run the code through a C preprocessor./gordon--
|
|
0
|
|
|
|
Reply
|
Gordon
|
10/30/2007 7:36:07 AM
|
|
silviocortes@yahoo.com wrote:> I need to write a class that will take a java file as input, strip all> the comments out, and save thre result in a different file....Why?
|
|
0
|
|
|
|
Reply
|
Esmond
|
10/30/2007 9:31:00 AM
|
|
silviocortes@yahoo.com wrote:> I need to write a class that will take a java file as input, strip all> the comments out, and save thre result in a different file....Have your class call javac.-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
10/30/2007 1:20:35 PM
|
|
On Mon, 29 Oct 2007 20:22:09 -0700, silviocortes@yahoo.com wrote,
quoted or indirectly quoted someone who said :
>Can I use that StreamTokenizer to do that? I can't really understand
>how that work... Help, anyone?
You would do it with a little finite state machine, or a parser.
See http://mindprod.com/jgloss/finitestate.html
http://mindprod.com/jgloss/parser.html
You can see an example of such a parser as part of
http://mindprod.com/products1.html#JDISPLAY
see com.mindprod.jprep.JavaTokenizer
You can strip out all the code except that which deals with comments,
by collapsing other states (each implemented with an enum constant)
into one.
You could simply search for all // with indexOf and rip out till nl or
all /* and rip out till */
However make sure you handle // embedded in /* ... */
and /* embedded in //.
You have to scan for next /* or // whichever comes first, then process
that.
THen there is the simplest solution of all. Google for "strip Java
comments" and see what comes up.
--
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
|
|
0
|
|
|
|
Reply
|
Roedy
|
10/30/2007 2:19:58 PM
|
|
silviocortes@yahoo.com burped up warm pablum in news:1193714529.702225.302810@o38g2000hse.googlegroups.com:> Howdy...> > I need to write a class that will take a java file as input, strip all> the comments out, and save thre result in a different file....> > Can I use that StreamTokenizer to do that? I can't really understand> how that work... Help, anyone?> > I guess I could also write code myself, but how would I handle a code> like that:> > 1: // this is a one line comment> 2: System.out.println ("//");> > In the example below, the first line would be removed... What's the> best way to know when "//" is not part of a comment. For that matter,> the same with "/*"All you need is a lexer (lex) to pick up tokens--no parsing (yacc or bison) required. I have a version in C and lex for MS-DOS which was slapped together in 1990. You can find it at http://sourceforge.net/projects/cshroud . Comments are naturally disposed of since that is half the job of shrouding.-- Tris Orendorff[ Anyone naming their child should spend a few minutes checking rhyming slang and dodgy sounding names. Brad and Angelina failed to do this when naming their kid Shiloh Pitt. At some point, someone at school is going to spoonerise her name.Craig Stark]
|
|
0
|
|
|
|
Reply
|
Tris
|
10/30/2007 7:38:13 PM
|
|
<silviocortes@yahoo.com> wrote:>I need to write a class that will take a java file as input, strip all>the comments out, and save thre result in a different file....This is harder than you think. Use a real parser.>1: // this is a one line comment>2: System.out.println ("//");Here are some more test cases for you: public class Comment { public static void main(String[] args) { String note = "// 1 "; // this is a comment System.out.println(note); /* // comment */ note = "2"; System.out.println(note); char ch = '"'; // code = "3 if broken" System.out.println(note); note=\u0022 // 4"; System.out.println(note); } }The output should be// 1 22// 4--Mark Rafn dagon@dagon.net <http://www.dagon.net/>
|
|
0
|
|
|
|
Reply
|
dagon
|
10/31/2007 12:11:59 AM
|
|
silviocortes@yahoo.com wrote:> I need to write a class that will take a java file as input, strip all> the comments out, and save thre result in a different file....Assuming the use of correct Java sources as an input, the code below should do the trick. (Warning: not tested intensively!)Note that it tries to preserve as much of the original code as possible. That is, the line numbers, positions, and escape sequences of the code in output should be the same as in input (that may help in debugging).piotrimport java.io.BufferedInputStream;import java.io.FileInputStream;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import java.io.PrintWriter;import java.io.Reader;import java.util.ArrayDeque;import java.util.Deque;public class CommentStripper { public static void main(String[] args) throws Exception { InputStream in = new BufferedInputStream( new FileInputStream("CommentStripper.java")); Reader source = new InputStreamReader(in); PrintWriter out = new PrintWriter(System.out, true); stripComments(source, out); } public static void stripComments( Reader source, PrintWriter out) throws IOException { SourceReader reader = new SourceReader(source); StringBuilder outbf = new StringBuilder(); boolean inComment = false; for(Char next; (next = reader.next()) != Char.EOF;) { int commentCharsInLine = 0; for(Char sc; !(sc = next).isEOL();) { next = reader.next(); if (inComment) { if (sc.codePoint == '*' && next.codePoint == '/') { // end of comment // read next next = reader.next(); if (!next.isEOL()) { // write out spaces int ix = outbf.length(); outbf.setLength(ix + commentCharsInLine + 2); for(final int len = outbf.length(); ix < len; ++ix) { outbf.setCharAt(ix, ' '); } } commentCharsInLine = 0; inComment = false; } else { commentCharsInLine++; } } else if (sc.codePoint == '/' && next.codePoint == '*') { // start of multiline comment inComment = true; commentCharsInLine = 2; // read next next = reader.next(); } else if (sc.codePoint == '/' && next.codePoint == '/') { // single line comment // skip to the end of line while(!next.isEOL()) { next = reader.next(); } } else if (sc.codePoint == '"' || sc.codePoint == '\'' ) { // text literal... sc.appendSource(outbf); // lookup end of literal (should be in the same line) boolean literalEndFound = false; for(; !next.isEOL(); next = reader.next()) { next.appendSource(outbf); if (next.codePoint == '\\') { // read & write next next = reader.next(); if (!next.isEOL()) { next.appendSource(outbf); } continue; } if (literalEndFound = next.codePoint == sc.codePoint) { // read next next = reader.next(); break; } } if (!literalEndFound) { // syntax error in input... throw new IOException("End of text literal not found"); } } else { // write out source "as is" sc.appendSource(outbf); } } // flush buffered line String outLine = outbf.toString(); if (outLine.trim().length() == 0) { out.println(); } else { out.println(outLine); } outbf.setLength(0); } } private static abstract class Char { final int codePoint; Char(int codePoint) { this.codePoint = codePoint; } boolean isEOL() { return codePoint == '\n'; } abstract void appendSource(StringBuilder sb); static final Char EOF = new Char(-1) { @Override public void appendSource(StringBuilder sb) { // write nothing } @Override boolean isEOL() { return true; } }; static Char newInstance(final InputChar c) { return new Char(c.value) { @Override void appendSource(StringBuilder sb) { c.appendSource(sb); } }; } static Char newInstance(int codePoint, final InputChar c) { return new Char(codePoint) { @Override void appendSource(StringBuilder sb) { c.appendSource(sb); } }; } static Char newInstance(int codePoint, final InputChar... chars) { return new Char(codePoint) { @Override void appendSource(StringBuilder sb) { for(InputChar c : chars) { c.appendSource(sb); } } }; } @Override public String toString() { StringBuilder sb = new StringBuilder(); appendSource(sb); return "[" + codePoint + "]=" + sb.toString(); } } private static abstract class InputChar { final int value; static final InputChar EOF = new InputChar(-1) { @Override void appendSource(StringBuilder sb) { // write nothing }; }; InputChar(int value) { this.value = value; } abstract void appendSource(StringBuilder sb); static InputChar newCharInstance(int value) { return new InputChar(value) { @Override void appendSource(StringBuilder sb) { sb.append((char)value); } }; } static InputChar newEscapeSequenceInstance(int value, final CharSequence seq) { return new InputChar(value) { @Override void appendSource(StringBuilder sb) { sb.append(seq); } }; } } private static class SourceReader { private Reader in; SourceReader(Reader in) { this.in = in; } private Deque<InputChar> inputChars = new ArrayDeque<InputChar>(); Char next() throws IOException { InputChar nc = nextInputChar(); if (nc == InputChar.EOF) { return Char.EOF; } InputChar fc = nextInputChar(); if (nc.value == '\r' && fc.value == '\n') { return Char.newInstance('\n', nc, fc); } if (nc.value == '\r' || nc.value == '\n') { unread(fc); return Char.newInstance('\n', nc); } if (Character.isSurrogatePair((char)nc.value, (char)fc.value)) { return Char.newInstance( Character.toCodePoint((char)nc.value, (char)fc.value), nc, fc); } unread(fc); return Char.newInstance(nc); } private void unread(InputChar c) { if (inputChars == null) { if (c != InputChar.EOF) { inputChars = new ArrayDeque<InputChar>(); } else { return; } } inputChars.addFirst(c); } private InputChar nextInputChar() throws IOException { if (inputChars == null) { return InputChar.EOF; } if (!inputChars.isEmpty()) { return inputChars.removeFirst(); } int r0 = in.read(); if (r0 == -1) { inputChars = null; return InputChar.EOF; } if (r0 == '\\') { int r1 = in.read(); if (r1 == '\\') { // double backslash, read each separately inputChars.add(InputChar.newCharInstance(r0)); return inputChars.peek(); } if (r1 == 'u') { // escape sequence StringBuilder seqbf = new StringBuilder(); // collect all 'u's seqbf.append((char)r0); do { seqbf.append((char)r1); r1 = in.read(); } while(r1 == 'u'); // parse escape sequence value parseSeq: if (r1 != -1) { seqbf.append((char)r1); for(int i = 3; i > 0; --i) { r1 = in.read(); if (r1 == -1) break parseSeq; seqbf.append((char)r1); } if (r1 != -1) { int val = Integer.parseInt( seqbf.substring(seqbf.length() - 4), 16); return InputChar.newEscapeSequenceInstance(val, seqbf); } } // incorrect escape sequence... throw new IOException("Incorrect escape sequence: '" + seqbf + "'"); } // unknown... inputChars.add(InputChar.newCharInstance(r1)); } return InputChar.newCharInstance(r0); } void close() throws IOException { if (in != null) { in.close(); } in = null; inputChars = null; } }}
|
|
0
|
|
|
|
Reply
|
Piotr
|
10/31/2007 4:00:21 AM
|
|
silviocortes@yahoo.com wrote:>> I need to write a class that will take a java file as input, strip all>> the comments out, and save thre result in a different file....Why not just decompile the bytecode?-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
10/31/2007 4:43:43 AM
|
|
Mark Rafn wrote:> This is harder than you think. Use a real parser.You don't need a real parser. You need a real lexer. Javac removes comments in the lexer, as does every compiler I've ever written. So can you.
|
|
0
|
|
|
|
Reply
|
Esmond
|
10/31/2007 5:49:48 AM
|
|
Lew wrote:> silviocortes@yahoo.com wrote:>>> I need to write a class that will take a java file as input, strip all>>> the comments out, and save thre result in a different file....> > Why not just decompile the bytecode?Because that's not always possible to achieve even equivalent source code from the bytecode? (Keywords: Type erasure, compile-time constant expressions resolution, obfuscation, etc...)piotr
|
|
0
|
|
|
|
Reply
|
Piotr
|
10/31/2007 9:16:26 AM
|
|
Esmond Pitt wrote:> Mark Rafn wrote:>> This is harder than you think. Use a real parser.> > You don't need a real parser. You need a real lexer. Javac removes > comments in the lexer, as does every compiler I've ever written. So can > you.Javac's lexer do not removes comments (not all at least). Important comments, i.e. /** ... */ must be preserver for parser because they may contain information needed for code generation (e.g. @deprecated Javadoc tags).In fact, there is not clear distinction between the javac lexer, and parser I think...BTW, The OP may also utilize the Java Compiler API (JSR-199) and its Tree API (the latter is still under com.sun.*, but AFAIK is "almost" stable now...). The starting point example is below (requires tolls.jar!). It needs more detailed scanning of source tree (extend TreeScanner) because of current Tree.toString() implementations gives not so exact preview of the original source code (e.g. annotations' attribute default values are skipped from output, etc...). In the OP's particular problem I prefer to use simplified "stripper" (the one sent by me earlier to this thread), because everything is under "my control" there. However, the 199 API usages are much wider than that, so its importance is much beyond my simple approach.piotrimport javax.tools.JavaCompiler;import javax.tools.JavaFileObject;import javax.tools.StandardJavaFileManager;import javax.tools.ToolProvider;import com.sun.source.tree.AnnotationTree;import com.sun.source.tree.CompilationUnitTree;import com.sun.source.tree.ImportTree;import com.sun.source.tree.Tree;import com.sun.source.tree.TreeVisitor;import com.sun.source.util.TreeScanner;public class JavaCBasedCommentStripper { public static void main(String[] args) throws Exception { final JavaCompiler compiler = ToolProvider.getSystemJavaCompiler(); final StandardJavaFileManager fileManager = compiler .getStandardFileManager(null, null, null); Iterable<? extends JavaFileObject> compilationUnits = fileManager .getJavaFileObjects("JavaCBasedCommentStripper.java"); com.sun.source.util.JavacTask jt = (com.sun.source.util.JavacTask) compiler .getTask(null, fileManager, null, null, null, compilationUnits); Iterable<? extends CompilationUnitTree> ts = jt.parse(); for (CompilationUnitTree cu : ts) { // System.out.println(cu); // preserves /** comments */ for(AnnotationTree at : cu.getPackageAnnotations()) { System.out.println(at); } String pkg = cu.getPackageName().toString(); if (!pkg.equals("")) { System.out.println("package " + pkg + ";\n"); } for(ImportTree it : cu.getImports()) { System.out.print(it); } for(Tree td : cu.getTypeDecls()) { System.out.println(td); // not all details in output! // extend the following instead...// TreeVisitor<Void, Void> tv = new TreeScanner<Void, Void>() {//// @Override// public Void visit...//// };// td.accept(tv, null); } } }}
|
|
0
|
|
|
|
Reply
|
Piotr
|
10/31/2007 12:28:04 PM
|
|
Tris Orendorff wrote:> silviocortes@yahoo.com burped up warm pablum in > news:1193714529.702225.302810@o38g2000hse.googlegroups.com:> >> Howdy...>>>> I need to write a class that will take a java file as input, strip all>> the comments out, and save thre result in a different file....>>>> Can I use that StreamTokenizer to do that? I can't really understand>> how that work... Help, anyone?>>>> I guess I could also write code myself, but how would I handle a code>> like that:>>>> 1: // this is a one line comment>> 2: System.out.println ("//");>>>> In the example below, the first line would be removed... What's the>> best way to know when "//" is not part of a comment. For that matter,>> the same with "/*"> > All you need is a lexer (lex) to pick up tokens--no parsing (yacc or bison) required. I have a version in C and > lex for MS-DOS which was slapped together in 1990. You can find it at > http://sourceforge.net/projects/cshroud . Comments are naturally disposed of since that is half the job of > shrouding.> As you want to process Java and can read it, you're better off using Coco/R. Unlike lex+yacc, it has a Java port which is written in Java and generates Java. Its fractionally easier to get your head round as well.http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/-- martin@ | Martin Gregoriegregorie. | Essex, UKorg |
|
|
0
|
|
|
|
Reply
|
Martin
|
10/31/2007 12:42:36 PM
|
|
Lew wrote:>> Why not just decompile the bytecode?Piotr Kobzda wrote:> Because that's not always possible to achieve even equivalent source > code from the bytecode? (Keywords: Type erasure, compile-time constant > expressions resolution, obfuscation, etc...)You make good points, except for the obfuscation part.-- Lew
|
|
0
|
|
|
|
Reply
|
Lew
|
10/31/2007 12:57:44 PM
|
|
Lew wrote:>> Because that's not always possible to achieve even equivalent source >> code from the bytecode? (Keywords: Type erasure, compile-time >> constant expressions resolution, obfuscation, etc...)> > You make good points, except for the obfuscation part.Well, the obfuscation is mentioned here to indicate a possibility of the one-way only transformation of the source code into the bytecode. Compilers are free to optimize, or -- just like the obfuscators -- to "mangle" the code in the way preventing from reverse engineering (even not fully generated debug info, for example the LVT not present in a class-file, is a kind of the obfuscation meant by me here).piotr
|
|
0
|
|
|
|
Reply
|
Piotr
|
10/31/2007 3:36:12 PM
|
|
On 31 oct, 05:43, Lew <l...@lewscanon.com> wrote:> silviocor...@yahoo.com wrote:> >> I need to write a class that will take a java file as input, strip all> >> the comments out, and save thre result in a different file....>> Why not just decompile the bytecode?>Because decompilers change the syntax. Sometimes making it hard tounderstand, sometimes easier, but changed anyway.For instance public String toString() { String myname =3D this.getName(); return("#<" + (myname!=3Dnull ? (" " + myname) : "" ) + ">"); }becomes (with jad) public String toString() { String myname =3D getName(); return (new StringBuilder()).append("#<").append(myname =3D=3Dnull ? "" : (new StringBuilder()).append("").append(myname).toString()).append(">").toString(); }Also, decompilers have problems, in particular with inline functionsor static declarations (see for instance http://www.kpdus.com/jad.html#bugs,and JAD is one of the best AFAIK).So, it was a nice idea, but does not provide a good answer to theneed. I like the idea of using a C/C++ preprocessor (even though theremight be side effects, too).--R=E9gis
|
|
0
|
|
|
|
Reply
|
iso
|
10/31/2007 4:14:21 PM
|
|
Piotr Kobzda wrote:> Javac's lexer do not removes comments (not all at least).In other words it could. So in other words it can be done by a lexer.
|
|
0
|
|
|
|
Reply
|
Esmond
|
10/31/2007 10:49:50 PM
|
|
Martin Gregorie <martin@see.sig.for.address> burped up warm pablum in news:0r9mv4-rn3.ln1@zoogz.gregorie.org:> As you want to process Java and can read it, you're better off using > Coco/R. Unlike lex+yacc, it has a Java port which is written in Java and > generates Java. Its fractionally easier to get your head round as well.> > http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/Agreed! Coco looks like a well thought out tool.-- Tris Orendorff[ Anyone naming their child should spend a few minutes checking rhyming slang and dodgy sounding names. Brad and Angelina failed to do this when naming their kid Shiloh Pitt. At some point, someone at school is going to spoonerise her name.Craig Stark]
|
|
0
|
|
|
|
Reply
|
Tris
|
11/1/2007 10:47:18 PM
|
|
Tris Orendorff wrote:> Agreed! Coco looks like a well thought out tool.All you really need is JavaCC 4.0 which comes with the Java 5.0 grammar. Then just use the tokenizer.
|
|
0
|
|
|
|
Reply
|
Esmond
|
11/2/2007 8:06:31 AM
|
|
|
19 Replies
344 Views
(page loaded in 0.292 seconds)
Similiar Articles: How to strip comments out of code - comp.lang.java.programmer ...Howdy...I need to write a class that will take a java file as input, strip allthe comments out, and save thre result in a different file....Can I use ... Documenting object-oriented MATLAB code - comp.soft-sys.matlab ...How to strip comments out of code - comp.lang.java.programmer ... Documenting object-oriented MATLAB code - comp.soft-sys.matlab ... > > In short, in matlab there are no ... SDL_LoadBMP Texture - comp.graphics.api.openglHow to strip comments out of code - comp.lang.java.programmer ... SDL_LoadBMP Texture - comp.graphics.api.opengl The codes as following: { textureImage[textureID] = SDL ... Trapezium Rule with Varying Strip Widths - comp.soft-sys.matlab ...How to strip comments out of code - comp.lang.java.programmer ..... sys.matlab My nested for loop can carry out the geometric sum okay ... How can I add to my code to ... How may decimal places SAS keep when do calculation - comp.soft ...How to strip comments out of code - comp.lang.java.programmer ... How may decimal places SAS keep when do calculation - comp.soft ... If the code runs dependtly , the c ... SPEED TEST: DirectX9 vs openGL - comp.graphics.api.opengl ...How to strip comments out of code - comp.lang.java.programmer ... SPEED TEST: DirectX9 vs openGL - comp.graphics.api.opengl ..... new DirectX version comes out). java.io.IOException: DER input, Integer tag error - comp.lang.java ...How to strip comments out of code - comp.lang.java.programmer ..... java.io.FileInputStream;import java.io.IOException ... syntax error in input... throw new IOException ... Nested Manipulate and LocalizeVariables -> False - comp.soft-sys ...How to strip comments out of code - comp.lang.java.programmer ..... new StringBuilder(); boolean inComment = false ... to this thread), because everything is under "my ... Gentlemens! What below error says? - comp.soft-sys.sasHow to strip comments out of code - comp.lang.java.programmer ... Gentlemens! ... help you understand ... 180;phi<=180.0;phi+=offset) { glBegin(GL_QUAD_STRIP ... How to compile the triangle mesh generation into matlab? - comp ...How to strip comments out of code - comp.lang.java.programmer ... (Keywords: Type erasure, compile-time constant ... they may contain information needed for code ... SWITCH expression must be a scalar or string constant - comp.soft ...Hi, everyone. I came across a strange question while I tried to use fzero. For example, if I type X = fzero(@tan,2), the Matlab window will show up t... CDE to Java Desktop - comp.unix.solarisHow to strip comments out of code - comp.lang.java.programmer ... How to strip comments out of code How to strip comments out of code - Java . This is a discussion on How ... pick up the number in the brackets....awk way.. - comp.lang.awk ...How to strip comments out of code - comp.lang.java.programmer ... What's thebest way to know when "//" is not part of a ... the same with "/*"All you need is a lexer (lex ... boost multi index - possible? - comp.lang.c++.moderatedHow to strip comments out of code - comp.lang.java.programmer ... boost multi index - possible? - comp.lang.c++.moderated How to strip comments out of code - comp.lang ... how to write a grammar rule with yacc/bison - comp.unix.programmer ...How to strip comments out of code - comp.lang.java.programmer ... Howdy...I need to write a class that will take a java ... is a lexer (lex) to pick up tokens--no parsing ... How to strip comments out of code - Page 2 - JavaMark Rafn wrote: > This is harder than you think. Use a real parser. You don't need a real parser. You need a real lexer. Javac removes comments in How to strip comments out of code - comp.lang.java.programmer ...Howdy...I need to write a class that will take a java file as input, strip allthe comments out, and save thre result in a different file....Can I use ... 7/24/2012 5:29:32 AM
|