sqlite utf8 encoding error

  • Permalink
  • submit to reddit
  • Email
  • Follow


I have an application that uses sqlite3 to store job/error data.  When
I log in as a German user the error codes generated are translated into
German.  The error code text is then stored in the db.  When I use the
fetchall() to retrieve the data to generate a report I get the
following error:

Traceback (most recent call last):
  File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
OnGenerateButtonNow
    self.OnGenerateButton(event)
  File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
OnGenerateButton
    warningresult =3D messagecursor1.fetchall()
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

does anyone have any idea on what could be going wrong?  The string
that I store in the database table is:

'Keinen Text f=FCr =DCbereinstimmungsfehler gefunden'

I thought that all strings were stored in unicode in sqlite.

Greg Miller

0
Reply et1ssgmiller (9) 11/17/2005 11:47:00 AM

See related articles to this posting


Greg Miller wrote:

> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> unsupported Unicode code range
>
> does anyone have any idea on what could be going wrong?  The string
> that I store in the database table is:
>
> 'Keinen Text f�r �bereinstimmungsfehler gefunden'

$ more test.py
# -*- coding: iso-8859-1 -*-
u = u'Keinen Text f�r �bereinstimmungsfehler gefunden'
s = u.encode("iso-8859-1")
u = s.decode("utf-8") # <-- this gives an error

$ python test.py
Traceback (most recent call last):
  File "test.py", line 4, in ?
    u = s.decode("utf-8") # <-- this gives an error
  File "lib/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
unsupported Unicode code range

> I thought that all strings were stored in unicode in sqlite.

did you pass in a Unicode string or an 8-bit string when you stored the text ?

</F> 



0
Reply fredrik2101 (5275) 11/17/2005 12:04:25 PM

Greg Miller enlightened us with:
> 'Keinen Text f�r �bereinstimmungsfehler gefunden'

You posted it as "Keinen Text f<FC>r ...", which is Latin-1, not
UTF-8.

> I thought that all strings were stored in unicode in sqlite.

Only if you put them into the DB as such. Make sure you're inserting
UTF-8 text, since the DB won't do character conversion for you.

Sybren
-- 
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself? 
                                             Frank Zappa
0
Reply sybrenUSE8145 (706) 11/17/2005 12:39:02 PM

Fredrik Lundh napisa�(a):

>>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
>>unsupported Unicode code range
>>
>>does anyone have any idea on what could be going wrong?  The string
>>that I store in the database table is:
>>
>>'Keinen Text f�r �bereinstimmungsfehler gefunden'
> 
> $ more test.py
> # -*- coding: iso-8859-1 -*-
> u = u'Keinen Text f�r �bereinstimmungsfehler gefunden'
> s = u.encode("iso-8859-1")
> u = s.decode("utf-8") # <-- this gives an error
> 
> $ python test.py
> Traceback (most recent call last):
>   File "test.py", line 4, in ?
>     u = s.decode("utf-8") # <-- this gives an error
>   File "lib/encodings/utf_8.py", line 16, in decode
>     return codecs.utf_8_decode(input, errors, True)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> unsupported Unicode code range

I cann't wait for the moment when encoded strings go away from Python.
The more I program in this language, the more confusion this difference
is causing. Now most of functions and various object's methods accept
strings and unicode, making it hard to find sources of Unicode*Errors.

-- 
Jarek Zgoda
http://jpa.berlios.de/
0
Reply jzgoda (227) 11/17/2005 8:08:45 PM

Jarek Zgoda wrote:
> Fredrik Lundh napisa=B3(a):
>
> >>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> >>unsupported Unicode code range
> >>
> >>does anyone have any idea on what could be going wrong?  The string
> >>that I store in the database table is:
> >>
> >>'Keinen Text f=FCr =DCbereinstimmungsfehler gefunden'
> >
> > $ more test.py
> > # -*- coding: iso-8859-1 -*-
> > u =3D u'Keinen Text f=FCr =DCbereinstimmungsfehler gefunden'
> > s =3D u.encode("iso-8859-1")
> > u =3D s.decode("utf-8") # <-- this gives an error
> >
> > $ python test.py
> > Traceback (most recent call last):
> >   File "test.py", line 4, in ?
> >     u =3D s.decode("utf-8") # <-- this gives an error
> >   File "lib/encodings/utf_8.py", line 16, in decode
> >     return codecs.utf_8_decode(input, errors, True)
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> > unsupported Unicode code range
>
> I cann't wait for the moment when encoded strings go away from Python.
> The more I program in this language, the more confusion this difference
> is causing. Now most of functions and various object's methods accept
> strings and unicode, making it hard to find sources of Unicode*Errors.

Library writers can speed up the transition by hiding 8bit interface,
for example:

import sqlite
sqlite.I_promise_to_pass_8bit_string_only_in_utf8_encoding(my_signature=3D"=
sig.gif")

if you don't call this function 8bit strings will not be accepted :)
IMHO if libraries keep on excepting both str and unicode till python
3=2E0, it will just prolong the confusion of unicode newbies instead of
guiding them in the right direction _right now_.

0
Reply Serge.Orlov (257) 11/17/2005 9:35:42 PM

On 17 Nov 2005 03:47:00 -0800, "Greg Miller" <et1ssgmiller@gmail.com>
wrote:

>I have an application that uses sqlite3 to store job/error data.  When
>I log in as a German user the error codes generated are translated into
>German.  The error code text is then stored in the db.  When I use the
>fetchall() to retrieve the data to generate a report I get the
>following error:
>
>Traceback (most recent call last):
>  File "c:\Pest3\Glosser\baseApp\reportGen.py", line 199, in
>OnGenerateButtonNow
>    self.OnGenerateButton(event)
>  File "c:\Pest3\Glosser\baseApp\reportGen.py", line 243, in
>OnGenerateButton
>    warningresult = messagecursor1.fetchall()
>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
>unsupported Unicode code range
>
>does anyone have any idea on what could be going wrong?  The string
>that I store in the database table is:
>
>'Keinen Text f�r �bereinstimmungsfehler gefunden'
>
>I thought that all strings were stored in unicode in sqlite.
>


No, they are stored as UTF-8 in sqlite and pysqlite has no way to make
sure the string you insert into the database is really encoded in
UTF-8 (the only secure way is to use Unicode strings).

How did you insert that string?

As a partial solution, try to disable automatic conversion of text
fields in Unicode strings:


def convert_text(s):
    # XXX do not use Unicode
    return s


# Register the converter with SQLite
sqlite.register_converter("TEXT", convert_text)


....connect("...",
detect_types=sqlite.PARSE_DECLTYPES|sqlite.PARSE_COLNAMES
)




Regards  Manlio Perillo
0
Reply NOmanlio_perilloSPAM (55) 11/18/2005 9:44:11 AM

Thank you for all your suggestions.  I ended up casting the string to
unicode prior to inserting into the database.  

Greg Miller

0
Reply et1ssgmiller (9) 11/18/2005 5:09:24 PM

On 18 Nov 2005 09:09:24 -0800, "Greg Miller" <et1ssgmiller@gmail.com>
wrote:

>Thank you for all your suggestions.  I ended up casting the string to
>unicode prior to inserting into the database.  
>

Don't do it by hand if it can be done by an automated system.

Try with:

from pysqlite2 import dbapi2 as sqlite

def adapt_str(s):
    # if you have declared this encoding at begin of the module
    return s.decode("iso-8859-1") 

sqlite.register_adapter(str, adapt_str)


Read pysqlite documentation for more informations:
http://initd.org/pub/software/pysqlite/doc/usage-guide.html



Regards  Manlio Perillo
0
Reply NOmanlio_perilloSPAM (55) 11/19/2005 8:55:50 AM

Thanks again, I'll look into this method.

Greg Miller

0
Reply et1ssgmiller (9) 11/21/2005 12:13:09 PM
comp.lang.python 72543 articles. 12 followers. Post

8 Replies
74 Views

Similar Articles

[PageSpeed] 6


  • Permalink
  • submit to reddit
  • Email
  • Follow


Reply:

Similar Artilces:

Ascii Encoding Error with UTF-8 encoder
Can anyone explain why I'm getting an ascii encoding error when I'm trying to write out using a UTF-8 encoder? Thanks Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> filterMap = {} >>> for i in range(0,255): .... filterMap[chr(i)] = chr(i) .... >>> filterMap[chr(9)] = chr(136) >>> filterMap[chr(10)] = chr(133) >>> filterMap[chr(136)] = chr(9) >>> filterMap[chr(133)] = chr(10) >&g...

Ruby sqlite/gem error: Could not load sqlite adapter
The sample "c:\tmp'Test_sqlite.rb" generates the error: Could not load sqlite adapter, Yet, when the example is executed in the sqlite3-ruby directory, no error occur. Any suggestion or changes to the sample to resolve the error. Thanks for the help, John ------------------------------------ Windows XP SP3 Using ruby 1.8.7, gem 1.3.2, sqlite3-ruby 1.2.4, sqlite3 product Environment set: RUBYOPT=rubygems PATH=c:\tools\sqlite3 ... -----running sample----> c:\tmp\test_sqlite.rb -------------- require 'sequel' DB = Sequel.sqlite("test-sequel.db"...

sqlite error?
Am I using the ? placeholder wrong in this example? t = ('hi', 'bye') self.connection.execute("INSERT INTO Personal (firstName, lastName) VALUES ?", t) Traceback (most recent call last): File "C:\Python25\myscripts\labdb\dbapp.py", line 93, in OnSaveRecord self.save_to_database(textfield_values) File "C:\Python25\myscripts\labdb\dbapp.py", line 97, in save_to_database self.connection.execute("INSERT INTO Personal (firstName, lastName) VALUES ?", t) sqlite3.OperationalError: near "?": syntax error >>...

Encoding error
This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C34A18.0FFE5020 Content-Type: text/plain; charset="iso-8859-1" I get the following error for the list item below. I know I have to encode it, but am unsure how or where to write that in. I am new to python and have had good luck thus far. Any help is greatly apprecieated. I am not on the list, so a response to me is appreciated. UnicodeError: ASCII encoding error: ordinal not in range(128) eainfo = doc.createE...

UTF8 encoding
Hi Guys, It's been a long time since I've been using Dolphin but I'm glad to be back! I'm using SQLite 3 through the ODBC driver but have a huge problem when I'm using prepared statements, all the strings get messed up. I hope to fix this once I fill the prepared statement with UTF-8 encoded strings explicitly, but Dolphin out of the box will encode any string into an UTF-16 string only. Is there a goodie (Oh lord please let there be a goodie) which does that? G=FCnther G�nther, > Is there a goodie (Oh lord please let there be a goodie) which doe...

unicode(lString, "utf8") vs. lString.encode("utf8") vs. =?UTF-8?B?dXIiamFracWb?= tekst"
Witam Zakładam, że kod Pythona w drugiej lini ma: # -*- coding: utf-8 -*- oraz gdzieś na początku: reload(sys) sys.setdefaultencoding('utf8') No i mam takie pytania odnośnie unicode w utf8: Czy obie poniższe linie dają ten sam efekt, czyli obiekt unicode zakodowany w utf8? lString = unicode(lString, "utf8") lString = lString.encode("utf8") Konkretnie chodzi mi czy aby mieć obiekt unicode zakodowany w utf8 muszę robić coś takiego: lString = unicode(lString.decode("ISO-8859-15").encode("utf8"), "utf8") czy może wystarczy: lString = l...

[ENCODING] UTF8 hell
Hello, I'm trying to deal with Ruby flaws with encoding, which I thought would be almost past with Ruby 1.9. I managed to find a solution for Ruby 1.8 and thought I did for Ruby 1.9...but in fact, no ! I fetch rows from an UTF8 database and try to work with the string. To do so, I would like it to be UTF8 encoded. "str.encoding()" gives me "ASCII-8BIT"...so, I thought one of these lines would solve the problem str.replace(Iconv.iconv("UTF8", "ascii", self).join()) OR self.encode!('UTF-8') But they don't ! First one: in `...

Validate encoding
Is there a way that Javascript can validate that text in a textarea is UTF-8? gimme_this_gimme_that@yahoo.com wrote: > Is there a way that Javascript can validate that text in a textarea is > UTF-8? Text in a textarea as far as JavaScript in the HTML document sees it is just a string of characters where those characters in modern browsers (where modern means Netscape 4.06 and later and IE 4 and later) are Unicode characters. Only a byte stream could be tested to conform to a particular encoding like UTF-8 so I don't think you question makes any sense as script in a browser...

sqlite import error
I recently bought a new PC runnning in XP. I downloaded and installed PySqlite 0.4.3win32 for Python 2.3. My Python23/Lib/Site-Packages/SQlite/ now has files __init__.py and main.py. However, running a script with "import sqlite" generated an error - ImportError: No module named sqlite. I tried running the file with PythonWin and Boa, but both failed. achan fed this fish to the penguins on Tuesday 11 November 2003 23:00 pm: > > > I recently bought a new PC runnning in XP. I downloaded and installed > PySqlite 0.4.3win32 for Python 2.3. My > Python23/Lib/Si...

utf8 encoding problem
Hi, I am retrieving a string from a txt file. The file contains some utf8 characters. I am comparing these characters against a default string. The problem is that some of the characters are not stored in a default format. For example: A is stored as A Naturally when I compare the character it fails. Strangely when I unpacked the character it appears as 65313 which is the correct utf8 number for A. Any way around this? thanks. -- Posted via http://www.ruby-forum.com/. On Jun 25, 2009, at 14:29, Ad Ad wrote: > Hi, > I am retrieving a string from a txt file. > The file contain...

utf encoding error
hi there, this one is in relation to my py2exe saga. when i compile a package using py2exe i get the error msg below, if i just run the py files it doesn't error, so i assume pysvn is trying to use something thats not being included in the build. only i have no idea where to start looking. Traceback (most recent call last): File "Main.pyc", line 819, in ValidateLogin File "Main.pyc", line 861, in ShowMainFrameItems LookupError: unknown encoding: utf-8 and here is my setup.py for py2exe for good measure from distutils.core import setup import py2exe ...

Listings and utf8 encoding
I would like to inklude som java-source in my LaTeX document. My java-files are encoded in UTF-8, but when setting inputencoding=utf8 in my lstlisting, I get errors from latex regarding danish letters. I have constructed a little example without external files, that illustrates the problem: \documentclass{minimal} \usepackage{ucs} \usepackage[utf8]{inputenc} \usepackage[danish]{babel} \usepackage{listings} \begin{document} Here is some java code: \lstset{language=java, inputencoding=utf8} \begin{lstlisting} public void test() { //A danish letter in a comment: � System.out.pr...

sqlite 3 error
I typed the following in my browser http://localhost:3000/My_test and the following error occurred. MissingSourceFile in My testController#index no such file to load -- sqlite3 RAILS_ROOT: C:/rails/cookbook Application Trace | Framework Trace | Full Trace This error occurred while loading the following files: sqlite3 Request Parameters: None Show session dump --- flash: !map:ActionController::Flash::FlashHash {} Response Headers: {"cookie"=>[], "Cache-Control"=>"no-cache"} can anyone tell me how this error can be resolved.. -- Posted via...

encoding error #2
1) i have a function that i made in one of my program's modules named "input_box.py" that uses unicode: def get_key(): while 1: event = pygame.event.poll() if event.type == KEYDOWN: ev_unicode = event.unicode if type(ev_unicode) == types.UnicodeType: if ev_unicode == '': ev_unicode = 0 else: ev_unicode = ord(ev_unicode) value = unichr(ev_unicode).encode('latin1') return (event.key, value) else: pass ------...

utf8 encoding problem
I'm struggling with what should be a trivial problem but I can't seem to come up with a proper solution: I am working on a CGI that takes utf-8 input from a browser. The input is nicely encoded so you get something like this: firstname=t%C3%A9s where %C3CA9 is a single character in utf-8 encoding. Passing this through urllib.unquote does not help: >>> urllib.unquote(u't%C3%A9st') u't%C3%A9st' The problem turned out to be that urllib.unquote() process processes its input character by character which breaks when it tries to call chr() for a character: ...

encoding error #3
i'm trying to make indexing of csv file contain arabic words my code: from whoosh import fields, index import os.path import csv import codecs # This list associates a name with each position in a row columns = ["juza","chapter","verse","voc","analysis", "unvoc","root"] schema = fields.Schema(juza=fields.NUMERIC, chapter=fields.NUMERIC, verse=fields.NUMERIC, voc=fields.TEXT, analysis=fields.KEYWORD, ...

Error accessing Sqlite
Hi, I've tried to access to a Sqlite database, or to create one, but the result is always the same, when I execute this code: require 'rubygems' require 'sqlite3' dir='C:\\Documents and Settings\\Andrea Marchese\\Desktop\\prova.db' db = SQLite3::Database.open(dir) db.execute( "select * from prova" ) do |row| p row end db.close it stops at the 4th line with this error: c:/ruby/lib/ruby/1.8/dl/import.rb:29:in `initialize': unknown error (RuntimeError) from c:/ruby/lib/ruby/1.8/dl/import.rb:29:in `dlopen' from c:/ruby/lib/rub...

Encoding error help
Hi, I am getting an XML file from an external source ( I have no control over the generation of this file) via a socket as a ByteArrayInputStream. This XML file has the encoding set to UTF-8 but since it has been manually crafted, the content sometimes is not utf-8. As a result, when I send the stream to the Xerces XML parser, it blows up. Is there a way to forcibly convert the stream to utf-8 before sending it to the parser? Thanks, M ...

Pickle Encoding Errors
To demonstrate the problem I have written the following program: ___________________________________ Code Start ________________________________ import pickle destruct=1 class PickleTest: library={} def __init__(self): self.unpickle() def __del__(self): if destruct: print "Pickling from destructor..." self.pickle() def unpickle(self): try: f=open("pickle.txt") self.library=pickle.load(f) f.close() print "Unpickled just fine" except: print "Error unpickling" self.library={u"A": 1, u"B":...

utf8 encoding problem
The instruction show variables like 'character%' lists all variables correctly set to utf8. I wrote a batch file, before launching it, I checked the internal encoding with a hex editor, everything is ok, the font encoding is utf8. I launched the batch file everything was running nicely. The batch file is as follows: SET NAMES utf8; DROP DATABASE museumoz; CREATE DATABASE museumoz CHARACTER SET utf8 COLLATE utf8_general_ci; USE museumoz; CREATE TABLE naz_pt ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, nome VARCHAR(100) ) CHARACTER SET utf8; INSERT INTO naz_pt (nome) VALUES (...

encoding and parser: error?
Hi list I am working with postgresql 7.1.3 and 7.3.4 and German characters. I use only unicode databases. When importing data from iso-latin-1 via psql I tried drop table test; create table test(nr int, text1 text, text2 text) ; set client_encoding to 'latin1'; insert into test values(0,'these are all','German characters'); insert into test values(1,'--','abc�������'); insert into test values(2,'we insert','one German character'); insert into test values(3,'e.g.','�'); insert into test v...

Validating UTF8 encoding ...
I am consuming an XML feed, which continually contains invalid uft8 encoded characters. Time and time again the publisher of the feed has *fixed* the issue, but still they slip through. This has the unfortunate side effect of causing an exception in my code (C# using the XmlDocument class). What I would like to do is validate the feed I receive before processing it. Can anyone tell me how to iterate through each character validating that it is utf8 encoded? Thanks Andy On 4 Nov 2005, Andy wrote: > Organization: http://groups.google.com > User-Agent: G2/0.2 The inn...

iMovie
I am trying to make a simple iMovie of some clips I have on my DV Cam. I'm not using transitions or anything; just the video. When I go to share, it gives an error saying something to the effect that there was an error during compression, because the video may have some unsupported features. I am trying to encode with Sorenson 3 with QT 6.5 and 10.3.9 with iMovie 5. Anybody know whats causing this? ...

How to determine if a file is UTF8 encoded?
I need a check, if a file is utf8 encoded. I only found the php-functions 'iconv' and 'recode'. But it seems it´s not possible to determine the encoding with them. Isn´t there any similar function to the 'file'-command on linux for php? -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thomas Podlesak wrote: > I need a check, if a file is utf8 encoded. I only found the php-functions > 'iconv' and 'recode'. But it seems it´s not possible to determine the > encoding with them. Isn´t there any similar function to the 'file'...