Hi,
there's a DocBook XML file which I want to modify. The file contains
something like
....
<mediaobject>
<imageobject>
<imagedata fileref="PathToImage" format="ImgFormat"/>
</imageobject>
</mediaobject>
....
I just want to match the whole <mediaobject> thingy and prepend one
line which contains the PathToImage as a XML comment just like
<!-- PathToImage -->
My input to the matcher is the whole file as is. First I tried to get a
regex to match the whole thing
content = content.replaceFirst(
"<mediaobject>" +
"\\s*<imageobject>" +
"\\s*<imagedata fileref=\".*\".*/>" +
"\\s*</imageobject>" +
"\\s*</mediaobject>",
"<!-- Test -->"
);
But when I use a backref (like \0 for the whole match or \1 if I use
parentheses for the filename) in the replacement string like this:
"<!-- Test -->\0"
I just get
<!-- Test --> + this square char which cannot display here
The strange thing is that when I use exactly the same pattern with
Pattern.compile(regex).matcher(str).replaceAll(repl)
nothing matches (opposed to the Java API statment for
String.replaceAll()).
I tried Pattern.MULTILINE and Pattern.DOTALL in any combination. I
tried to use .* instead of \\s and even used \r?\n? for the line
endings ... nothing works.
Please can anyone help me?
_
Tom
|
|
0
|
|
|
|
Reply
|
bauer (27)
|
5/24/2005 12:46:29 PM |
|
bauer@b3s.de wrote:
> Hi,
> there's a DocBook XML file which I want to modify. The file contains
> something like
> ...
> <mediaobject>
> <imageobject>
> <imagedata fileref="PathToImage" format="ImgFormat"/>
> </imageobject>
> </mediaobject>
> ...
> I just want to match the whole <mediaobject> thingy and prepend one
> line which contains the PathToImage as a XML comment just like
> <!-- PathToImage -->
>
> My input to the matcher is the whole file as is. First I tried to get a
> regex to match the whole thing
>
> content = content.replaceFirst(
> "<mediaobject>" +
> "\\s*<imageobject>" +
> "\\s*<imagedata fileref=\".*\".*/>" +
> "\\s*</imageobject>" +
> "\\s*</mediaobject>",
> "<!-- Test -->"
> );
>
> But when I use a backref (like \0 for the whole match or \1 if I use
> parentheses for the filename) in the replacement string like this:
> "<!-- Test -->\0"
> I just get
> <!-- Test --> + this square char which cannot display here
>
> The strange thing is that when I use exactly the same pattern with
> Pattern.compile(regex).matcher(str).replaceAll(repl)
> nothing matches (opposed to the Java API statment for
> String.replaceAll()).
>
> I tried Pattern.MULTILINE and Pattern.DOTALL in any combination. I
> tried to use .* instead of \\s and even used \r?\n? for the line
> endings ... nothing works.
>
> Please can anyone help me?
>
> _
>
> Tom
>
Have you tried a pattern of "(<mediaobject)(.*)(</mediaobject>)". You
can then use a replacement along the lines of "<!-- PathToImage
-->$1$2$3". I'd also use Pattern.MULTILINE | Pattern.DOTALL when
building the pattern.
Hope that helps.
Pan
======================================================================
TechBookReport Java http://www.techbookreport.com/JavaIndex.html
|
|
0
|
|
|
|
Reply
|
tbr8059 (115)
|
5/24/2005 2:20:13 PM
|
|
TechBookReport wrote:
> Have you tried a pattern of "(<mediaobject)(.*)(</mediaobject>)". You
> can then use a replacement along the lines of "<!-- PathToImage
> -->$1$2$3". I'd also use Pattern.MULTILINE | Pattern.DOTALL when
> building the pattern.
>
> Hope that helps.
Not really ... this results in the same problem I already described.
Instead of substituting \1\2\3 with the matching groups I get only this
special char (looks like a square, cannot displayed here). Btw I even
noticed that you used $1$2$3. This is perl, right? In Java it would be
\1\2\3 or am I wrong?
You can try it yourself. Save the following content to a file:
<chapter>
<title>Chapter 1</title>
<sect1>
<title>Section 1</title>
<para>
Test Test Test Test Test Test Test Test Test
</para>
<mediaobject>
<imageobject>
<imagedata fileref="image.svg" format="SVG"/>
</imageobject>
</mediaobject>
<para>
Test Test Test Test Test Test Test Test Test
</para>
</sect1>
</chapter>
Read this file with
public String readPlain( File file ) throws Exception
{
String content = new String();
String line = new String();
BufferedReader brd = new BufferedReader( new FileReader( file ) );
while ( ( line = brd.readLine() ) != null )
content += line + "\r\n";
brd.close();
return content;
}
and then apply a
content = Pattern.compile( "(<mediaobject)(.*)(</mediaobject>)",
Pattern.MULTILINE|Pattern.DOTALL).matcher(
content).replaceAll("<!-- Test -->\1\2\3");
_
Tom
|
|
0
|
|
|
|
Reply
|
bauer (27)
|
5/24/2005 4:22:09 PM
|
|
Damn Java regex !!! It is $1$2$3. That was the point. I used the wrong
syntax for backrefs. But in Java API 1.4.2 under
java.util.regex.Pattern stands
Back references
\n Whatever the nth capturing group matched
So what ... ?!?
|
|
0
|
|
|
|
Reply
|
bauer (27)
|
5/24/2005 4:30:44 PM
|
|
bauer@b3s.de wrote:
> Damn Java regex !!! It is $1$2$3. That was the point. I used the wrong
> syntax for backrefs. But in Java API 1.4.2 under
> java.util.regex.Pattern stands
>
> Back references
> \n Whatever the nth capturing group matched
>
> So what ... ?!?
>
Did you escape the backslashes? Also, the funny square character is
probably the \r\n you are using. Try
System.getProperty("line.separator") instead.
Pan
======================================================================
TechBookReport Java http://www.techbookreport.com/JavaIndex.html
|
|
0
|
|
|
|
Reply
|
tbr8059 (115)
|
5/24/2005 4:43:31 PM
|
|
TechBookReport schrieb:
> bauer@b3s.de wrote:
> > Damn Java regex !!! It is $1$2$3. That was the point. I used the
wrong
> > syntax for backrefs. But in Java API 1.4.2 under
> > java.util.regex.Pattern stands
> >
> > Back references
> > \n Whatever the nth capturing group matched
> >
> > So what ... ?!?
> >
> Did you escape the backslashes? Also, the funny square character is
> probably the \r\n you are using. Try
> System.getProperty("line.separator") instead.
>
No the funny square char is not the \r\n cause if so it would be on
every line independant of the regex code. I'm on Windows and the app
runs only on this system but you are right, better I use
getProperty("line.separator").
I guess the funny square is some unicode character (\1=0x01?) if I use
\1 without escaping the backslash.
But that doesn't matter anymore, my problem is solved. Thanks for your
help.
|
|
0
|
|
|
|
Reply
|
bauer (27)
|
5/24/2005 5:08:51 PM
|
|
On Tue, 24 May 2005 15:20:13 +0100, TechBookReport <tbr@nospam.nos>
wrote:
>Have you tried a pattern of "(<mediaobject)(.*)(</mediaobject>)". You
>can then use a replacement along the lines of "<!-- PathToImage
>-->$1$2$3". I'd also use Pattern.MULTILINE | Pattern.DOTALL when
>building the pattern.
If there can be more than one mediaobject element in a document, you
need to use a reluctant dot-star:
"<mediaobject.*?</mediaobject>"
Otherwise, it will match everything from the first opening tag to the
last closing tag. Even if there's only one such element, it will
probably be more efficient this way.
You don't really need to use capturing parentheses, since you're
re-inserting the whole match; just use $0:
str = str.replaceAll("<mediaobject.*?</mediaobject>",
"<!-- PathToImage -->$0");
The JDK regex package uses the same syntax as Perl WRT
backreferences--"\n" within the regex and "$n" in the replacement
string--except that it uses $0 instead of $& for the whole match, and
doesn't emulate the other dollar-plus-punctuation variables: $`, $',
and $+.
|
|
0
|
|
|
|
Reply
|
jbigboote1 (87)
|
5/24/2005 9:54:43 PM
|
|
|
6 Replies
20 Views
(page loaded in 0.171 seconds)
Similiar Articles: Parsing multiple lines with regex - comp.lang.java.programmer ...Another idea is to nest the searches ... several lines it would be no problem ... lines with regex - comp.lang.java.programmer ... Parsing Log records with regular expressions ... gawk problem matching multiple patterns ?!? HOW-TO? - comp.lang ...the problem I'm having is this: the regex, as I'm using it matches the ... Another question though; as I was playing ... same. > > can you shed any lite? > Regular expressions ... Using regular expressions to split a string - comp.lang.javascript ...Using regular expressions, is it possible to split a ... cut: sample code] >The only problem is ... regex expression - comp.lang.java.programmer... Regular expression ... XML-Parsing with UTF-8 Byte-Order-Mark (BOM) - comp.lang.java ...XML-Parsing with UTF-8 Byte-Order-Mark (BOM) - comp.lang.java ... Hello, i have a really weird problem. ... Yet another binary search tree library - comp.lang.c ... XML-Parsing ... how to calculate the fingerprint of an x.509 certificate? - comp ...... or do other certificate also cause the same problem? you may try a different JCA/JCE provider in Java for ... know about the existing ones, but I need yet another one ... expdp is painfully slow - comp.databases.oracle.server> > I have yet another problem with 11g, cannot create db control most of > the time. ... 10g case at least, it seems important to use the built-in version of java ... Parsing Log records with regular expressions - comp.lang.ruby ...... by another ... java.programmer ... Regular expressions in ... problem. I have a string looking like this: "Found 3 log ... Are fi and fu line numbers or regular expressions regsub (and regular expressions in general) trouble. - comp.lang ...... This is related to my previous post "another ... but it's not been implemented yet (it's ... Regular Expressions Advanced regular expressions eliminate this problem because ... another simple question - comp.emacsregsub (and regular expressions in general) trouble ... simple questions - comp.lang.java.security ... I have also another ... I got yet another case, where notes.ini is in ... perl + regex bug? - comp.lang.perl.misc... ve found a workaround, i'm wondering if the problem ... an integer out of a string, and then look in another ... unnecessary variable assignments from your regular expressions ... Regular-Expressions.info - Regex Tutorial, Examples and Reference ...At Regular-Expressions.info you will find a wide range ... that code is written in Perl, PHP, Java, a .NET language or a multitude of other languages. Regular Expression Quick ... Regular Expressions and the Java Programming LanguageThis article provides an overview of the use of regular expressions, and details how to use regular expressions with the java.util.regex package, using the following ... 7/23/2012 2:36:46 AM
|