 Merged revisions 57778-58052 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk
........
r57820 | georg.brandl | 2007-08-31 08:59:27 +0200 (Fri, 31 Aug 2007) | 2 lines
Document new shorthand notation for index entries.
........
r57827 | georg.brandl | 2007-08-31 10:47:51 +0200 (Fri, 31 Aug 2007) | 2 lines
Fix subitem markup.
........
r57833 | martin.v.loewis | 2007-08-31 12:01:07 +0200 (Fri, 31 Aug 2007) | 1 line
Mark registry components as 64-bit on Win64.
........
r57854 | bill.janssen | 2007-08-31 21:02:23 +0200 (Fri, 31 Aug 2007) | 1 line
deprecate use of FakeSocket
........
r57855 | bill.janssen | 2007-08-31 21:02:46 +0200 (Fri, 31 Aug 2007) | 1 line
remove mentions of socket.ssl in comments
........
r57856 | bill.janssen | 2007-08-31 21:03:31 +0200 (Fri, 31 Aug 2007) | 1 line
remove use of non-existent SSLFakeSocket in apparently untested code
........
r57859 | martin.v.loewis | 2007-09-01 08:36:03 +0200 (Sat, 01 Sep 2007) | 3 lines
Bug #1737210: Change Manufacturer of Windows installer to PSF.
Will backport to 2.5.
........
r57865 | georg.brandl | 2007-09-01 09:51:24 +0200 (Sat, 01 Sep 2007) | 2 lines
Fix RST link (backport from Py3k).
........
r57876 | georg.brandl | 2007-09-01 17:49:49 +0200 (Sat, 01 Sep 2007) | 2 lines
Document sets' ">" and "<" operations (backport from py3k).
........
r57878 | skip.montanaro | 2007-09-01 19:40:03 +0200 (Sat, 01 Sep 2007) | 4 lines
Added a note and examples to explain that re.split does not split on an
empty pattern match. (issue 852532).
........
r57879 | walter.doerwald | 2007-09-01 20:18:09 +0200 (Sat, 01 Sep 2007) | 2 lines
Fix wrong function names.
........
r57880 | walter.doerwald | 2007-09-01 20:34:05 +0200 (Sat, 01 Sep 2007) | 2 lines
Fix typo.
........
r57889 | andrew.kuchling | 2007-09-01 22:31:59 +0200 (Sat, 01 Sep 2007) | 1 line
Markup fix
........
r57892 | andrew.kuchling | 2007-09-01 22:43:36 +0200 (Sat, 01 Sep 2007) | 1 line
Add various items
........
r57895 | andrew.kuchling | 2007-09-01 23:17:58 +0200 (Sat, 01 Sep 2007) | 1 line
Wording change
........
r57896 | andrew.kuchling | 2007-09-01 23:18:31 +0200 (Sat, 01 Sep 2007) | 1 line
Add more items
........
r57904 | ronald.oussoren | 2007-09-02 11:46:07 +0200 (Sun, 02 Sep 2007) | 3 lines
Macosx: this patch ensures that the value of MACOSX_DEPLOYMENT_TARGET used
by the Makefile is also used at configure-time.
........
r57925 | georg.brandl | 2007-09-03 09:16:46 +0200 (Mon, 03 Sep 2007) | 2 lines
Fix #883466: don't allow Unicode as arguments to quopri and uu codecs.
........
r57936 | matthias.klose | 2007-09-04 01:33:04 +0200 (Tue, 04 Sep 2007) | 2 lines
- Added support for linking the bsddb module against BerkeleyDB 4.6.x.
........
r57954 | mark.summerfield | 2007-09-04 10:16:15 +0200 (Tue, 04 Sep 2007) | 3 lines
Added cross-references plus a note about dict & list shallow copying.
........
r57958 | martin.v.loewis | 2007-09-04 11:51:57 +0200 (Tue, 04 Sep 2007) | 3 lines
Document that we rely on the OS to release the crypto
context. Fixes #1626801.
........
r57960 | martin.v.loewis | 2007-09-04 15:13:14 +0200 (Tue, 04 Sep 2007) | 3 lines
Patch #1388440: Add set_completion_display_matches_hook and
get_completion_type to readline.
........
r57961 | martin.v.loewis | 2007-09-04 16:19:28 +0200 (Tue, 04 Sep 2007) | 3 lines
Patch #1031213: Decode source line in SyntaxErrors back to its original
source encoding. Will backport to 2.5.
........
r57972 | matthias.klose | 2007-09-04 20:17:36 +0200 (Tue, 04 Sep 2007) | 3 lines
- Makefile.pre.in(buildbottest): Run an optional script pybuildbot.identify
to include some information about the build environment.
........
r57973 | matthias.klose | 2007-09-04 21:05:38 +0200 (Tue, 04 Sep 2007) | 2 lines
- Makefile.pre.in(buildbottest): Remove whitespace at eol.
........
r57975 | matthias.klose | 2007-09-04 22:46:02 +0200 (Tue, 04 Sep 2007) | 2 lines
- Fix libffi configure for hppa*-*-linux* | parisc*-*-linux*.
........
r57980 | bill.janssen | 2007-09-05 02:46:27 +0200 (Wed, 05 Sep 2007) | 1 line
SSL certificate distinguished names should be represented by tuples
........
r57985 | martin.v.loewis | 2007-09-05 08:39:17 +0200 (Wed, 05 Sep 2007) | 3 lines
Patch #1105: Explain that one needs to build the solution
to get dependencies right.
........
r57987 | armin.rigo | 2007-09-05 09:51:21 +0200 (Wed, 05 Sep 2007) | 4 lines
PyDict_GetItem() returns a borrowed reference.
There are probably a number of places that are open to attacks
such as the following one, in bltinmodule.c:min_max().
........
r57991 | martin.v.loewis | 2007-09-05 13:47:34 +0200 (Wed, 05 Sep 2007) | 3 lines
Patch #786737: Allow building in a tree of symlinks pointing to
a readonly source.
........
r57993 | georg.brandl | 2007-09-05 15:36:44 +0200 (Wed, 05 Sep 2007) | 2 lines
Backport from Py3k: Bug #1684991: explain lookup semantics for __special__ methods (new-style classes only).
........
r58004 | armin.rigo | 2007-09-06 10:30:51 +0200 (Thu, 06 Sep 2007) | 4 lines
Patch #1733973 by peaker:
ptrace_enter_call() assumes no exception is currently set.
This assumption is broken when throwing into a generator.
........
r58006 | armin.rigo | 2007-09-06 11:30:38 +0200 (Thu, 06 Sep 2007) | 4 lines
PyDict_GetItem() returns a borrowed reference.
This attack is against ceval.c:IMPORT_NAME, which calls an
object (__builtin__.__import__) without holding a reference to it.
........
r58013 | georg.brandl | 2007-09-06 16:49:56 +0200 (Thu, 06 Sep 2007) | 2 lines
Backport from 3k: #1116: fix reference to old filename.
........
r58021 | thomas.heller | 2007-09-06 22:26:20 +0200 (Thu, 06 Sep 2007) | 1 line
Fix typo: c_float represents to C float type.
........
r58022 | skip.montanaro | 2007-09-07 00:29:06 +0200 (Fri, 07 Sep 2007) | 3 lines
If this is correct for py3k branch and it's already in the release25-maint
branch, seems like it ought to be on the trunk as well.
........
r58023 | gregory.p.smith | 2007-09-07 00:59:59 +0200 (Fri, 07 Sep 2007) | 4 lines
Apply the fix from Issue1112 to make this test more robust and keep
windows happy.
........
r58031 | brett.cannon | 2007-09-07 05:17:50 +0200 (Fri, 07 Sep 2007) | 4 lines
Make uuid1 and uuid4 tests conditional on whether ctypes can be imported;
implementation of either function depends on ctypes but uuid as a whole does
not.
........
r58032 | brett.cannon | 2007-09-07 06:18:30 +0200 (Fri, 07 Sep 2007) | 6 lines
Fix a crasher where Python code managed to infinitely recurse in C code without
ever going back out to Python code in PyObject_Call(). Required introducing a
static RuntimeError instance so that normalizing an exception there is no
reliance on a recursive call that would put the exception system over the
recursion check itself.
........
r58034 | thomas.heller | 2007-09-07 08:32:17 +0200 (Fri, 07 Sep 2007) | 1 line
Add a 'c_longdouble' type to the ctypes module.
........
r58035 | thomas.heller | 2007-09-07 11:30:40 +0200 (Fri, 07 Sep 2007) | 1 line
Remove unneeded #include.
........
r58036 | thomas.heller | 2007-09-07 11:33:24 +0200 (Fri, 07 Sep 2007) | 6 lines
Backport from py3k branch:
Add a workaround for a strange bug on win64, when _ctypes is compiled
with the SDK compiler. This should fix the failing
Lib\ctypes\test\test_as_parameter.py test.
........
r58037 | georg.brandl | 2007-09-07 16:14:40 +0200 (Fri, 07 Sep 2007) | 2 lines
Fix a wrong indentation for sublists.
........
r58043 | georg.brandl | 2007-09-07 22:10:49 +0200 (Fri, 07 Sep 2007) | 2 lines
#1095: ln -f doesn't work portably, fix in Makefile.
........
r58049 | skip.montanaro | 2007-09-08 02:34:17 +0200 (Sat, 08 Sep 2007) | 1 line
be explicit about the actual location of the missing file
........
19 years ago |
|
/* Tokenizer implementation */
#include "Python.h"
#include "pgenheaders.h"
#include <ctype.h>
#include <assert.h>
#include "tokenizer.h"
#include "errcode.h"
#ifndef PGEN
#include "unicodeobject.h"
#include "bytesobject.h"
#include "fileobject.h"
#include "codecs.h"
#include "abstract.h"
#endif /* PGEN */
#define is_potential_identifier_start(c) (\
(c >= 'a' && c <= 'z')\ || (c >= 'A' && c <= 'Z')\ || c == '_'\ || (c >= 128))
#define is_potential_identifier_char(c) (\
(c >= 'a' && c <= 'z')\ || (c >= 'A' && c <= 'Z')\ || (c >= '0' && c <= '9')\ || c == '_'\ || (c >= 128))
extern char *PyOS_Readline(FILE *, FILE *, const char *);/* Return malloc'ed string including trailing \n;
empty malloc'ed string for EOF; NULL if interrupted */
/* Don't ever change this -- it would break the portability of Python code */#define TABSIZE 8
/* Forward */static struct tok_state *tok_new(void);static int tok_nextc(struct tok_state *tok);static void tok_backup(struct tok_state *tok, int c);
/* Token names */
const char *_PyParser_TokenNames[] = { "ENDMARKER", "NAME", "NUMBER", "STRING", "NEWLINE", "INDENT", "DEDENT", "LPAR", "RPAR", "LSQB", "RSQB", "COLON", "COMMA", "SEMI", "PLUS", "MINUS", "STAR", "SLASH", "VBAR", "AMPER", "LESS", "GREATER", "EQUAL", "DOT", "PERCENT", "LBRACE", "RBRACE", "EQEQUAL", "NOTEQUAL", "LESSEQUAL", "GREATEREQUAL", "TILDE", "CIRCUMFLEX", "LEFTSHIFT", "RIGHTSHIFT", "DOUBLESTAR", "PLUSEQUAL", "MINEQUAL", "STAREQUAL", "SLASHEQUAL", "PERCENTEQUAL", "AMPEREQUAL", "VBAREQUAL", "CIRCUMFLEXEQUAL", "LEFTSHIFTEQUAL", "RIGHTSHIFTEQUAL", "DOUBLESTAREQUAL", "DOUBLESLASH", "DOUBLESLASHEQUAL", "AT", "ATEQUAL", "RARROW", "ELLIPSIS", /* This table must match the #defines in token.h! */ "OP", "AWAIT", "ASYNC", "<ERRORTOKEN>", "<N_TOKENS>"};
/* Create and initialize a new tok_state structure */
static struct tok_state *tok_new(void){ struct tok_state *tok = (struct tok_state *)PyMem_MALLOC( sizeof(struct tok_state)); if (tok == NULL) return NULL; tok->buf = tok->cur = tok->end = tok->inp = tok->start = NULL; tok->done = E_OK; tok->fp = NULL; tok->input = NULL; tok->tabsize = TABSIZE; tok->indent = 0; tok->indstack[0] = 0;
tok->atbol = 1; tok->pendin = 0; tok->prompt = tok->nextprompt = NULL; tok->lineno = 0; tok->level = 0; tok->altwarning = 1; tok->alterror = 1; tok->alttabsize = 1; tok->altindstack[0] = 0; tok->decoding_state = STATE_INIT; tok->decoding_erred = 0; tok->read_coding_spec = 0; tok->enc = NULL; tok->encoding = NULL; tok->cont_line = 0;#ifndef PGEN
tok->filename = NULL; tok->decoding_readline = NULL; tok->decoding_buffer = NULL;#endif
tok->async_def = 0; tok->async_def_indent = 0; tok->async_def_nl = 0;
return tok;}
static char *new_string(const char *s, Py_ssize_t len, struct tok_state *tok){ char* result = (char *)PyMem_MALLOC(len + 1); if (!result) { tok->done = E_NOMEM; return NULL; } memcpy(result, s, len); result[len] = '\0'; return result;}
#ifdef PGEN
static char *decoding_fgets(char *s, int size, struct tok_state *tok){ return fgets(s, size, tok->fp);}
static intdecoding_feof(struct tok_state *tok){ return feof(tok->fp);}
static char *decode_str(const char *str, int exec_input, struct tok_state *tok){ return new_string(str, strlen(str), tok);}
#else /* PGEN */
static char *error_ret(struct tok_state *tok) /* XXX */{ tok->decoding_erred = 1; if (tok->fp != NULL && tok->buf != NULL) /* see PyTokenizer_Free */ PyMem_FREE(tok->buf); tok->buf = NULL; return NULL; /* as if it were EOF */}
static char *get_normal_name(char *s) /* for utf-8 and latin-1 */{ char buf[13]; int i; for (i = 0; i < 12; i++) { int c = s[i]; if (c == '\0') break; else if (c == '_') buf[i] = '-'; else buf[i] = tolower(c); } buf[i] = '\0'; if (strcmp(buf, "utf-8") == 0 || strncmp(buf, "utf-8-", 6) == 0) return "utf-8"; else if (strcmp(buf, "latin-1") == 0 || strcmp(buf, "iso-8859-1") == 0 || strcmp(buf, "iso-latin-1") == 0 || strncmp(buf, "latin-1-", 8) == 0 || strncmp(buf, "iso-8859-1-", 11) == 0 || strncmp(buf, "iso-latin-1-", 12) == 0) return "iso-8859-1"; else return s;}
/* Return the coding spec in S, or NULL if none is found. */
static intget_coding_spec(const char *s, char **spec, Py_ssize_t size, struct tok_state *tok){ Py_ssize_t i; *spec = NULL; /* Coding spec must be in a comment, and that comment must be
* the only statement on the source code line. */ for (i = 0; i < size - 6; i++) { if (s[i] == '#') break; if (s[i] != ' ' && s[i] != '\t' && s[i] != '\014') return 1; } for (; i < size - 6; i++) { /* XXX inefficient search */ const char* t = s + i; if (strncmp(t, "coding", 6) == 0) { const char* begin = NULL; t += 6; if (t[0] != ':' && t[0] != '=') continue; do { t++; } while (t[0] == '\x20' || t[0] == '\t');
begin = t; while (Py_ISALNUM(t[0]) || t[0] == '-' || t[0] == '_' || t[0] == '.') t++;
if (begin < t) { char* r = new_string(begin, t - begin, tok); char* q; if (!r) return 0; q = get_normal_name(r); if (r != q) { PyMem_FREE(r); r = new_string(q, strlen(q), tok); if (!r) return 0; } *spec = r; } } } return 1;}
/* Check whether the line contains a coding spec. If it does,
invoke the set_readline function for the new encoding. This function receives the tok_state and the new encoding. Return 1 on success, 0 on failure. */
static intcheck_coding_spec(const char* line, Py_ssize_t size, struct tok_state *tok, int set_readline(struct tok_state *, const char *)){ char *cs; int r = 1;
if (tok->cont_line) { /* It's a continuation line, so it can't be a coding spec. */ tok->read_coding_spec = 1; return 1; } if (!get_coding_spec(line, &cs, size, tok)) return 0; if (!cs) { Py_ssize_t i; for (i = 0; i < size; i++) { if (line[i] == '#' || line[i] == '\n' || line[i] == '\r') break; if (line[i] != ' ' && line[i] != '\t' && line[i] != '\014') { /* Stop checking coding spec after a line containing
* anything except a comment. */ tok->read_coding_spec = 1; break; } } return 1; } tok->read_coding_spec = 1; if (tok->encoding == NULL) { assert(tok->decoding_state == STATE_RAW); if (strcmp(cs, "utf-8") == 0) { tok->encoding = cs; } else { r = set_readline(tok, cs); if (r) { tok->encoding = cs; tok->decoding_state = STATE_NORMAL; } else { PyErr_Format(PyExc_SyntaxError, "encoding problem: %s", cs); PyMem_FREE(cs); } } } else { /* then, compare cs with BOM */ r = (strcmp(tok->encoding, cs) == 0); if (!r) PyErr_Format(PyExc_SyntaxError, "encoding problem: %s with BOM", cs); PyMem_FREE(cs); } return r;}
/* See whether the file starts with a BOM. If it does,
invoke the set_readline function with the new encoding. Return 1 on success, 0 on failure. */
static intcheck_bom(int get_char(struct tok_state *), void unget_char(int, struct tok_state *), int set_readline(struct tok_state *, const char *), struct tok_state *tok){ int ch1, ch2, ch3; ch1 = get_char(tok); tok->decoding_state = STATE_RAW; if (ch1 == EOF) { return 1; } else if (ch1 == 0xEF) { ch2 = get_char(tok); if (ch2 != 0xBB) { unget_char(ch2, tok); unget_char(ch1, tok); return 1; } ch3 = get_char(tok); if (ch3 != 0xBF) { unget_char(ch3, tok); unget_char(ch2, tok); unget_char(ch1, tok); return 1; }#if 0
/* Disable support for UTF-16 BOMs until a decision
is made whether this needs to be supported. */ } else if (ch1 == 0xFE) { ch2 = get_char(tok); if (ch2 != 0xFF) { unget_char(ch2, tok); unget_char(ch1, tok); return 1; } if (!set_readline(tok, "utf-16-be")) return 0; tok->decoding_state = STATE_NORMAL; } else if (ch1 == 0xFF) { ch2 = get_char(tok); if (ch2 != 0xFE) { unget_char(ch2, tok); unget_char(ch1, tok); return 1; } if (!set_readline(tok, "utf-16-le")) return 0; tok->decoding_state = STATE_NORMAL;#endif
} else { unget_char(ch1, tok); return 1; } if (tok->encoding != NULL) PyMem_FREE(tok->encoding); tok->encoding = new_string("utf-8", 5, tok); if (!tok->encoding) return 0; /* No need to set_readline: input is already utf-8 */ return 1;}
/* Read a line of text from TOK into S, using the stream in TOK.
Return NULL on failure, else S.
On entry, tok->decoding_buffer will be one of: 1) NULL: need to call tok->decoding_readline to get a new line 2) PyUnicodeObject *: decoding_feof has called tok->decoding_readline and stored the result in tok->decoding_buffer 3) PyByteArrayObject *: previous call to fp_readl did not have enough room (in the s buffer) to copy entire contents of the line read by tok->decoding_readline. tok->decoding_buffer has the overflow. In this case, fp_readl is called in a loop (with an expanded buffer) until the buffer ends with a '\n' (or until the end of the file is reached): see tok_nextc and its calls to decoding_fgets.*/
static char *fp_readl(char *s, int size, struct tok_state *tok){ PyObject* bufobj; const char *buf; Py_ssize_t buflen;
/* Ask for one less byte so we can terminate it */ assert(size > 0); size--;
if (tok->decoding_buffer) { bufobj = tok->decoding_buffer; Py_INCREF(bufobj); } else { bufobj = PyObject_CallObject(tok->decoding_readline, NULL); if (bufobj == NULL) goto error; } if (PyUnicode_CheckExact(bufobj)) { buf = _PyUnicode_AsStringAndSize(bufobj, &buflen); if (buf == NULL) { goto error; } } else { buf = PyByteArray_AsString(bufobj); if (buf == NULL) { goto error; } buflen = PyByteArray_GET_SIZE(bufobj); }
Py_XDECREF(tok->decoding_buffer); if (buflen > size) { /* Too many chars, the rest goes into tok->decoding_buffer */ tok->decoding_buffer = PyByteArray_FromStringAndSize(buf+size, buflen-size); if (tok->decoding_buffer == NULL) goto error; buflen = size; } else tok->decoding_buffer = NULL;
memcpy(s, buf, buflen); s[buflen] = '\0'; if (buflen == 0) /* EOF */ s = NULL; Py_DECREF(bufobj); return s;
error: Py_XDECREF(bufobj); return error_ret(tok);}
/* Set the readline function for TOK to a StreamReader's
readline function. The StreamReader is named ENC.
This function is called from check_bom and check_coding_spec.
ENC is usually identical to the future value of tok->encoding, except for the (currently unsupported) case of UTF-16.
Return 1 on success, 0 on failure. */
static intfp_setreadl(struct tok_state *tok, const char* enc){ PyObject *readline = NULL, *stream = NULL, *io = NULL; _Py_IDENTIFIER(open); _Py_IDENTIFIER(readline); int fd; long pos;
io = PyImport_ImportModuleNoBlock("io"); if (io == NULL) goto cleanup;
fd = fileno(tok->fp); /* Due to buffering the file offset for fd can be different from the file
* position of tok->fp. If tok->fp was opened in text mode on Windows, * its file position counts CRLF as one char and can't be directly mapped * to the file offset for fd. Instead we step back one byte and read to * the end of line.*/ pos = ftell(tok->fp); if (pos == -1 || lseek(fd, (off_t)(pos > 0 ? pos - 1 : pos), SEEK_SET) == (off_t)-1) { PyErr_SetFromErrnoWithFilename(PyExc_OSError, NULL); goto cleanup; }
stream = _PyObject_CallMethodId(io, &PyId_open, "isisOOO", fd, "r", -1, enc, Py_None, Py_None, Py_False); if (stream == NULL) goto cleanup;
Py_XDECREF(tok->decoding_readline); readline = _PyObject_GetAttrId(stream, &PyId_readline); tok->decoding_readline = readline; if (pos > 0) { if (PyObject_CallObject(readline, NULL) == NULL) { readline = NULL; goto cleanup; } }
cleanup: Py_XDECREF(stream); Py_XDECREF(io); return readline != NULL;}
/* Fetch the next byte from TOK. */
static int fp_getc(struct tok_state *tok) { return getc(tok->fp);}
/* Unfetch the last byte back into TOK. */
static void fp_ungetc(int c, struct tok_state *tok) { ungetc(c, tok->fp);}
/* Check whether the characters at s start a valid
UTF-8 sequence. Return the number of characters forming the sequence if yes, 0 if not. */static int valid_utf8(const unsigned char* s){ int expected = 0; int length; if (*s < 0x80) /* single-byte code */ return 1; if (*s < 0xc0) /* following byte */ return 0; if (*s < 0xE0) expected = 1; else if (*s < 0xF0) expected = 2; else if (*s < 0xF8) expected = 3; else return 0; length = expected + 1; for (; expected; expected--) if (s[expected] < 0x80 || s[expected] >= 0xC0) return 0; return length;}
/* Read a line of input from TOK. Determine encoding
if necessary. */
static char *decoding_fgets(char *s, int size, struct tok_state *tok){ char *line = NULL; int badchar = 0; for (;;) { if (tok->decoding_state == STATE_NORMAL) { /* We already have a codec associated with
this input. */ line = fp_readl(s, size, tok); break; } else if (tok->decoding_state == STATE_RAW) { /* We want a 'raw' read. */ line = Py_UniversalNewlineFgets(s, size, tok->fp, NULL); break; } else { /* We have not yet determined the encoding.
If an encoding is found, use the file-pointer reader functions from now on. */ if (!check_bom(fp_getc, fp_ungetc, fp_setreadl, tok)) return error_ret(tok); assert(tok->decoding_state != STATE_INIT); } } if (line != NULL && tok->lineno < 2 && !tok->read_coding_spec) { if (!check_coding_spec(line, strlen(line), tok, fp_setreadl)) { return error_ret(tok); } }#ifndef PGEN
/* The default encoding is UTF-8, so make sure we don't have any
non-UTF-8 sequences in it. */ if (line && !tok->encoding) { unsigned char *c; int length; for (c = (unsigned char *)line; *c; c += length) if (!(length = valid_utf8(c))) { badchar = *c; break; } } if (badchar) { /* Need to add 1 to the line number, since this line
has not been counted, yet. */ PyErr_Format(PyExc_SyntaxError, "Non-UTF-8 code starting with '\\x%.2x' " "in file %U on line %i, " "but no encoding declared; " "see http://python.org/dev/peps/pep-0263/ for details", badchar, tok->filename, tok->lineno + 1); return error_ret(tok); }#endif
return line;}
static intdecoding_feof(struct tok_state *tok){ if (tok->decoding_state != STATE_NORMAL) { return feof(tok->fp); } else { PyObject* buf = tok->decoding_buffer; if (buf == NULL) { buf = PyObject_CallObject(tok->decoding_readline, NULL); if (buf == NULL) { error_ret(tok); return 1; } else { tok->decoding_buffer = buf; } } return PyObject_Length(buf) == 0; }}
/* Fetch a byte from TOK, using the string buffer. */
static intbuf_getc(struct tok_state *tok) { return Py_CHARMASK(*tok->str++);}
/* Unfetch a byte from TOK, using the string buffer. */
static voidbuf_ungetc(int c, struct tok_state *tok) { tok->str--; assert(Py_CHARMASK(*tok->str) == c); /* tok->cur may point to read-only segment */}
/* Set the readline function for TOK to ENC. For the string-based
tokenizer, this means to just record the encoding. */
static intbuf_setreadl(struct tok_state *tok, const char* enc) { tok->enc = enc; return 1;}
/* Return a UTF-8 encoding Python string object from the
C byte string STR, which is encoded with ENC. */
static PyObject *translate_into_utf8(const char* str, const char* enc) { PyObject *utf8; PyObject* buf = PyUnicode_Decode(str, strlen(str), enc, NULL); if (buf == NULL) return NULL; utf8 = PyUnicode_AsUTF8String(buf); Py_DECREF(buf); return utf8;}
static char *translate_newlines(const char *s, int exec_input, struct tok_state *tok) { int skip_next_lf = 0; size_t needed_length = strlen(s) + 2, final_length; char *buf, *current; char c = '\0'; buf = PyMem_MALLOC(needed_length); if (buf == NULL) { tok->done = E_NOMEM; return NULL; } for (current = buf; *s; s++, current++) { c = *s; if (skip_next_lf) { skip_next_lf = 0; if (c == '\n') { c = *++s; if (!c) break; } } if (c == '\r') { skip_next_lf = 1; c = '\n'; } *current = c; } /* If this is exec input, add a newline to the end of the string if
there isn't one already. */ if (exec_input && c != '\n') { *current = '\n'; current++; } *current = '\0'; final_length = current - buf + 1; if (final_length < needed_length && final_length) /* should never fail */ buf = PyMem_REALLOC(buf, final_length); return buf;}
/* Decode a byte string STR for use as the buffer of TOK.
Look for encoding declarations inside STR, and record them inside TOK. */
static const char *decode_str(const char *input, int single, struct tok_state *tok){ PyObject* utf8 = NULL; const char *str; const char *s; const char *newl[2] = {NULL, NULL}; int lineno = 0; tok->input = str = translate_newlines(input, single, tok); if (str == NULL) return NULL; tok->enc = NULL; tok->str = str; if (!check_bom(buf_getc, buf_ungetc, buf_setreadl, tok)) return error_ret(tok); str = tok->str; /* string after BOM if any */ assert(str); if (tok->enc != NULL) { utf8 = translate_into_utf8(str, tok->enc); if (utf8 == NULL) return error_ret(tok); str = PyBytes_AsString(utf8); } for (s = str;; s++) { if (*s == '\0') break; else if (*s == '\n') { assert(lineno < 2); newl[lineno] = s; lineno++; if (lineno == 2) break; } } tok->enc = NULL; /* need to check line 1 and 2 separately since check_coding_spec
assumes a single line as input */ if (newl[0]) { if (!check_coding_spec(str, newl[0] - str, tok, buf_setreadl)) return error_ret(tok); if (tok->enc == NULL && !tok->read_coding_spec && newl[1]) { if (!check_coding_spec(newl[0]+1, newl[1] - newl[0], tok, buf_setreadl)) return error_ret(tok); } } if (tok->enc != NULL) { assert(utf8 == NULL); utf8 = translate_into_utf8(str, tok->enc); if (utf8 == NULL) return error_ret(tok); str = PyBytes_AS_STRING(utf8); } assert(tok->decoding_buffer == NULL); tok->decoding_buffer = utf8; /* CAUTION */ return str;}
#endif /* PGEN */
/* Set up tokenizer for string */
struct tok_state *PyTokenizer_FromString(const char *str, int exec_input){ struct tok_state *tok = tok_new(); if (tok == NULL) return NULL; str = decode_str(str, exec_input, tok); if (str == NULL) { PyTokenizer_Free(tok); return NULL; }
/* XXX: constify members. */ tok->buf = tok->cur = tok->end = tok->inp = (char*)str; return tok;}
struct tok_state *PyTokenizer_FromUTF8(const char *str, int exec_input){ struct tok_state *tok = tok_new(); if (tok == NULL) return NULL;#ifndef PGEN
tok->input = str = translate_newlines(str, exec_input, tok);#endif
if (str == NULL) { PyTokenizer_Free(tok); return NULL; } tok->decoding_state = STATE_RAW; tok->read_coding_spec = 1; tok->enc = NULL; tok->str = str; tok->encoding = (char *)PyMem_MALLOC(6); if (!tok->encoding) { PyTokenizer_Free(tok); return NULL; } strcpy(tok->encoding, "utf-8");
/* XXX: constify members. */ tok->buf = tok->cur = tok->end = tok->inp = (char*)str; return tok;}
/* Set up tokenizer for file */
struct tok_state *PyTokenizer_FromFile(FILE *fp, const char* enc, const char *ps1, const char *ps2){ struct tok_state *tok = tok_new(); if (tok == NULL) return NULL; if ((tok->buf = (char *)PyMem_MALLOC(BUFSIZ)) == NULL) { PyTokenizer_Free(tok); return NULL; } tok->cur = tok->inp = tok->buf; tok->end = tok->buf + BUFSIZ; tok->fp = fp; tok->prompt = ps1; tok->nextprompt = ps2; if (enc != NULL) { /* Must copy encoding declaration since it
gets copied into the parse tree. */ tok->encoding = PyMem_MALLOC(strlen(enc)+1); if (!tok->encoding) { PyTokenizer_Free(tok); return NULL; } strcpy(tok->encoding, enc); tok->decoding_state = STATE_NORMAL; } return tok;}
/* Free a tok_state structure */
voidPyTokenizer_Free(struct tok_state *tok){ if (tok->encoding != NULL) PyMem_FREE(tok->encoding);#ifndef PGEN
Py_XDECREF(tok->decoding_readline); Py_XDECREF(tok->decoding_buffer); Py_XDECREF(tok->filename);#endif
if (tok->fp != NULL && tok->buf != NULL) PyMem_FREE(tok->buf); if (tok->input) PyMem_FREE((char *)tok->input); PyMem_FREE(tok);}
/* Get next char, updating state; error code goes into tok->done */
static inttok_nextc(struct tok_state *tok){ for (;;) { if (tok->cur != tok->inp) { return Py_CHARMASK(*tok->cur++); /* Fast path */ } if (tok->done != E_OK) return EOF; if (tok->fp == NULL) { char *end = strchr(tok->inp, '\n'); if (end != NULL) end++; else { end = strchr(tok->inp, '\0'); if (end == tok->inp) { tok->done = E_EOF; return EOF; } } if (tok->start == NULL) tok->buf = tok->cur; tok->line_start = tok->cur; tok->lineno++; tok->inp = end; return Py_CHARMASK(*tok->cur++); } if (tok->prompt != NULL) { char *newtok = PyOS_Readline(stdin, stdout, tok->prompt);#ifndef PGEN
if (newtok != NULL) { char *translated = translate_newlines(newtok, 0, tok); PyMem_FREE(newtok); if (translated == NULL) return EOF; newtok = translated; } if (tok->encoding && newtok && *newtok) { /* Recode to UTF-8 */ Py_ssize_t buflen; const char* buf; PyObject *u = translate_into_utf8(newtok, tok->encoding); PyMem_FREE(newtok); if (!u) { tok->done = E_DECODE; return EOF; } buflen = PyBytes_GET_SIZE(u); buf = PyBytes_AS_STRING(u); if (!buf) { Py_DECREF(u); tok->done = E_DECODE; return EOF; } newtok = PyMem_MALLOC(buflen+1); strcpy(newtok, buf); Py_DECREF(u); }#endif
if (tok->nextprompt != NULL) tok->prompt = tok->nextprompt; if (newtok == NULL) tok->done = E_INTR; else if (*newtok == '\0') { PyMem_FREE(newtok); tok->done = E_EOF; } else if (tok->start != NULL) { size_t start = tok->start - tok->buf; size_t oldlen = tok->cur - tok->buf; size_t newlen = oldlen + strlen(newtok); char *buf = tok->buf; buf = (char *)PyMem_REALLOC(buf, newlen+1); tok->lineno++; if (buf == NULL) { PyMem_FREE(tok->buf); tok->buf = NULL; PyMem_FREE(newtok); tok->done = E_NOMEM; return EOF; } tok->buf = buf; tok->cur = tok->buf + oldlen; tok->line_start = tok->cur; strcpy(tok->buf + oldlen, newtok); PyMem_FREE(newtok); tok->inp = tok->buf + newlen; tok->end = tok->inp + 1; tok->start = tok->buf + start; } else { tok->lineno++; if (tok->buf != NULL) PyMem_FREE(tok->buf); tok->buf = newtok; tok->line_start = tok->buf; tok->cur = tok->buf; tok->line_start = tok->buf; tok->inp = strchr(tok->buf, '\0'); tok->end = tok->inp + 1; } } else { int done = 0; Py_ssize_t cur = 0; char *pt; if (tok->start == NULL) { if (tok->buf == NULL) { tok->buf = (char *) PyMem_MALLOC(BUFSIZ); if (tok->buf == NULL) { tok->done = E_NOMEM; return EOF; } tok->end = tok->buf + BUFSIZ; } if (decoding_fgets(tok->buf, (int)(tok->end - tok->buf), tok) == NULL) { tok->done = E_EOF; done = 1; } else { tok->done = E_OK; tok->inp = strchr(tok->buf, '\0'); done = tok->inp[-1] == '\n'; } } else { cur = tok->cur - tok->buf; if (decoding_feof(tok)) { tok->done = E_EOF; done = 1; } else tok->done = E_OK; } tok->lineno++; /* Read until '\n' or EOF */ while (!done) { Py_ssize_t curstart = tok->start == NULL ? -1 : tok->start - tok->buf; Py_ssize_t curvalid = tok->inp - tok->buf; Py_ssize_t newsize = curvalid + BUFSIZ; char *newbuf = tok->buf; newbuf = (char *)PyMem_REALLOC(newbuf, newsize); if (newbuf == NULL) { tok->done = E_NOMEM; tok->cur = tok->inp; return EOF; } tok->buf = newbuf; tok->inp = tok->buf + curvalid; tok->end = tok->buf + newsize; tok->start = curstart < 0 ? NULL : tok->buf + curstart; if (decoding_fgets(tok->inp, (int)(tok->end - tok->inp), tok) == NULL) { /* Break out early on decoding
errors, as tok->buf will be NULL */ if (tok->decoding_erred) return EOF; /* Last line does not end in \n,
fake one */ strcpy(tok->inp, "\n"); } tok->inp = strchr(tok->inp, '\0'); done = tok->inp[-1] == '\n'; } if (tok->buf != NULL) { tok->cur = tok->buf + cur; tok->line_start = tok->cur; /* replace "\r\n" with "\n" */ /* For Mac leave the \r, giving a syntax error */ pt = tok->inp - 2; if (pt >= tok->buf && *pt == '\r') { *pt++ = '\n'; *pt = '\0'; tok->inp = pt; } } } if (tok->done != E_OK) { if (tok->prompt != NULL) PySys_WriteStderr("\n"); tok->cur = tok->inp; return EOF; } } /*NOTREACHED*/}
/* Back-up one character */
static voidtok_backup(struct tok_state *tok, int c){ if (c != EOF) { if (--tok->cur < tok->buf) Py_FatalError("tok_backup: beginning of buffer"); if (*tok->cur != c) *tok->cur = c; }}
/* Return the token corresponding to a single character */
intPyToken_OneChar(int c){ switch (c) { case '(': return LPAR; case ')': return RPAR; case '[': return LSQB; case ']': return RSQB; case ':': return COLON; case ',': return COMMA; case ';': return SEMI; case '+': return PLUS; case '-': return MINUS; case '*': return STAR; case '/': return SLASH; case '|': return VBAR; case '&': return AMPER; case '<': return LESS; case '>': return GREATER; case '=': return EQUAL; case '.': return DOT; case '%': return PERCENT; case '{': return LBRACE; case '}': return RBRACE; case '^': return CIRCUMFLEX; case '~': return TILDE; case '@': return AT; default: return OP; }}
intPyToken_TwoChars(int c1, int c2){ switch (c1) { case '=': switch (c2) { case '=': return EQEQUAL; } break; case '!': switch (c2) { case '=': return NOTEQUAL; } break; case '<': switch (c2) { case '>': return NOTEQUAL; case '=': return LESSEQUAL; case '<': return LEFTSHIFT; } break; case '>': switch (c2) { case '=': return GREATEREQUAL; case '>': return RIGHTSHIFT; } break; case '+': switch (c2) { case '=': return PLUSEQUAL; } break; case '-': switch (c2) { case '=': return MINEQUAL; case '>': return RARROW; } break; case '*': switch (c2) { case '*': return DOUBLESTAR; case '=': return STAREQUAL; } break; case '/': switch (c2) { case '/': return DOUBLESLASH; case '=': return SLASHEQUAL; } break; case '|': switch (c2) { case '=': return VBAREQUAL; } break; case '%': switch (c2) { case '=': return PERCENTEQUAL; } break; case '&': switch (c2) { case '=': return AMPEREQUAL; } break; case '^': switch (c2) { case '=': return CIRCUMFLEXEQUAL; } break; case '@': switch (c2) { case '=': return ATEQUAL; } break; } return OP;}
intPyToken_ThreeChars(int c1, int c2, int c3){ switch (c1) { case '<': switch (c2) { case '<': switch (c3) { case '=': return LEFTSHIFTEQUAL; } break; } break; case '>': switch (c2) { case '>': switch (c3) { case '=': return RIGHTSHIFTEQUAL; } break; } break; case '*': switch (c2) { case '*': switch (c3) { case '=': return DOUBLESTAREQUAL; } break; } break; case '/': switch (c2) { case '/': switch (c3) { case '=': return DOUBLESLASHEQUAL; } break; } break; case '.': switch (c2) { case '.': switch (c3) { case '.': return ELLIPSIS; } break; } break; } return OP;}
static intindenterror(struct tok_state *tok){ if (tok->alterror) { tok->done = E_TABSPACE; tok->cur = tok->inp; return 1; } if (tok->altwarning) {#ifdef PGEN
PySys_WriteStderr("inconsistent use of tabs and spaces " "in indentation\n");#else
PySys_FormatStderr("%U: inconsistent use of tabs and spaces " "in indentation\n", tok->filename);#endif
tok->altwarning = 0; } return 0;}
#ifdef PGEN
#define verify_identifier(tok) 1
#else
/* Verify that the identifier follows PEP 3131.
All identifier strings are guaranteed to be "ready" unicode objects. */static intverify_identifier(struct tok_state *tok){ PyObject *s; int result; if (tok->decoding_erred) return 0; s = PyUnicode_DecodeUTF8(tok->start, tok->cur - tok->start, NULL); if (s == NULL || PyUnicode_READY(s) == -1) { if (PyErr_ExceptionMatches(PyExc_UnicodeDecodeError)) { PyErr_Clear(); tok->done = E_IDENTIFIER; } else { tok->done = E_ERROR; } return 0; } result = PyUnicode_IsIdentifier(s); Py_DECREF(s); if (result == 0) tok->done = E_IDENTIFIER; return result;}#endif
/* Get next token, after space stripping etc. */
static inttok_get(struct tok_state *tok, char **p_start, char **p_end){ int c; int blankline, nonascii;
*p_start = *p_end = NULL; nextline: tok->start = NULL; blankline = 0;
/* Get indentation level */ if (tok->atbol) { int col = 0; int altcol = 0; tok->atbol = 0; for (;;) { c = tok_nextc(tok); if (c == ' ') col++, altcol++; else if (c == '\t') { col = (col/tok->tabsize + 1) * tok->tabsize; altcol = (altcol/tok->alttabsize + 1) * tok->alttabsize; } else if (c == '\014') /* Control-L (formfeed) */ col = altcol = 0; /* For Emacs users */ else break; } tok_backup(tok, c); if (c == '#' || c == '\n') { /* Lines with only whitespace and/or comments
shouldn't affect the indentation and are not passed to the parser as NEWLINE tokens, except *totally* empty lines in interactive mode, which signal the end of a command group. */ if (col == 0 && c == '\n' && tok->prompt != NULL) blankline = 0; /* Let it through */ else blankline = 1; /* Ignore completely */ /* We can't jump back right here since we still
may need to skip to the end of a comment */ } if (!blankline && tok->level == 0) { if (col == tok->indstack[tok->indent]) { /* No change */ if (altcol != tok->altindstack[tok->indent]) { if (indenterror(tok)) return ERRORTOKEN; } } else if (col > tok->indstack[tok->indent]) { /* Indent -- always one */ if (tok->indent+1 >= MAXINDENT) { tok->done = E_TOODEEP; tok->cur = tok->inp; return ERRORTOKEN; } if (altcol <= tok->altindstack[tok->indent]) { if (indenterror(tok)) return ERRORTOKEN; } tok->pendin++; tok->indstack[++tok->indent] = col; tok->altindstack[tok->indent] = altcol; } else /* col < tok->indstack[tok->indent] */ { /* Dedent -- any number, must be consistent */ while (tok->indent > 0 && col < tok->indstack[tok->indent]) { tok->pendin--; tok->indent--; } if (col != tok->indstack[tok->indent]) { tok->done = E_DEDENT; tok->cur = tok->inp; return ERRORTOKEN; } if (altcol != tok->altindstack[tok->indent]) { if (indenterror(tok)) return ERRORTOKEN; } } } }
tok->start = tok->cur;
/* Return pending indents/dedents */ if (tok->pendin != 0) { if (tok->pendin < 0) { tok->pendin++; return DEDENT; } else { tok->pendin--; return INDENT; } }
if (tok->async_def && !blankline && tok->level == 0 /* There was a NEWLINE after ASYNC DEF,
so we're past the signature. */ && tok->async_def_nl /* Current indentation level is less than where
the async function was defined */ && tok->async_def_indent >= tok->indent) { tok->async_def = 0; tok->async_def_indent = 0; tok->async_def_nl = 0; }
again: tok->start = NULL; /* Skip spaces */ do { c = tok_nextc(tok); } while (c == ' ' || c == '\t' || c == '\014');
/* Set start of current token */ tok->start = tok->cur - 1;
/* Skip comment */ if (c == '#') while (c != EOF && c != '\n') c = tok_nextc(tok);
/* Check for EOF and errors now */ if (c == EOF) { return tok->done == E_EOF ? ENDMARKER : ERRORTOKEN; }
/* Identifier (most frequent token!) */ nonascii = 0; if (is_potential_identifier_start(c)) { /* Process b"", r"", u"", br"" and rb"" */ int saw_b = 0, saw_r = 0, saw_u = 0; while (1) { if (!(saw_b || saw_u) && (c == 'b' || c == 'B')) saw_b = 1; /* Since this is a backwards compatibility support literal we don't
want to support it in arbitrary order like byte literals. */ else if (!(saw_b || saw_u || saw_r) && (c == 'u' || c == 'U')) saw_u = 1; /* ur"" and ru"" are not supported */ else if (!(saw_r || saw_u) && (c == 'r' || c == 'R')) saw_r = 1; else break; c = tok_nextc(tok); if (c == '"' || c == '\'') goto letter_quote; } while (is_potential_identifier_char(c)) { if (c >= 128) nonascii = 1; c = tok_nextc(tok); } tok_backup(tok, c); if (nonascii && !verify_identifier(tok)) return ERRORTOKEN; *p_start = tok->start; *p_end = tok->cur;
/* async/await parsing block. */ if (tok->cur - tok->start == 5) { /* Current token length is 5. */ if (tok->async_def) { /* We're inside an 'async def' function. */ if (memcmp(tok->start, "async", 5) == 0) return ASYNC; if (memcmp(tok->start, "await", 5) == 0) return AWAIT; } else if (memcmp(tok->start, "async", 5) == 0) { /* The current token is 'async'.
Look ahead one token.*/
struct tok_state ahead_tok; char *ahead_tok_start = NULL, *ahead_tok_end = NULL; int ahead_tok_kind;
memcpy(&ahead_tok, tok, sizeof(ahead_tok)); ahead_tok_kind = tok_get(&ahead_tok, &ahead_tok_start, &ahead_tok_end);
if (ahead_tok_kind == NAME && ahead_tok.cur - ahead_tok.start == 3 && memcmp(ahead_tok.start, "def", 3) == 0) { /* The next token is going to be 'def', so instead of
returning 'async' NAME token, we return ASYNC. */ tok->async_def_indent = tok->indent; tok->async_def = 1; return ASYNC; } } }
return NAME; }
/* Newline */ if (c == '\n') { tok->atbol = 1; if (blankline || tok->level > 0) goto nextline; *p_start = tok->start; *p_end = tok->cur - 1; /* Leave '\n' out of the string */ tok->cont_line = 0; if (tok->async_def) { /* We're somewhere inside an 'async def' function, and
we've encountered a NEWLINE after its signature. */ tok->async_def_nl = 1; } return NEWLINE; }
/* Period or number starting with period? */ if (c == '.') { c = tok_nextc(tok); if (isdigit(c)) { goto fraction; } else if (c == '.') { c = tok_nextc(tok); if (c == '.') { *p_start = tok->start; *p_end = tok->cur; return ELLIPSIS; } else { tok_backup(tok, c); } tok_backup(tok, '.'); } else { tok_backup(tok, c); } *p_start = tok->start; *p_end = tok->cur; return DOT; }
/* Number */ if (isdigit(c)) { if (c == '0') { /* Hex, octal or binary -- maybe. */ c = tok_nextc(tok); if (c == '.') goto fraction; if (c == 'j' || c == 'J') goto imaginary; if (c == 'x' || c == 'X') {
/* Hex */ c = tok_nextc(tok); if (!isxdigit(c)) { tok->done = E_TOKEN; tok_backup(tok, c); return ERRORTOKEN; } do { c = tok_nextc(tok); } while (isxdigit(c)); } else if (c == 'o' || c == 'O') { /* Octal */ c = tok_nextc(tok); if (c < '0' || c >= '8') { tok->done = E_TOKEN; tok_backup(tok, c); return ERRORTOKEN; } do { c = tok_nextc(tok); } while ('0' <= c && c < '8'); } else if (c == 'b' || c == 'B') { /* Binary */ c = tok_nextc(tok); if (c != '0' && c != '1') { tok->done = E_TOKEN; tok_backup(tok, c); return ERRORTOKEN; } do { c = tok_nextc(tok); } while (c == '0' || c == '1'); } else { int nonzero = 0; /* maybe old-style octal; c is first char of it */ /* in any case, allow '0' as a literal */ while (c == '0') c = tok_nextc(tok); while (isdigit(c)) { nonzero = 1; c = tok_nextc(tok); } if (c == '.') goto fraction; else if (c == 'e' || c == 'E') goto exponent; else if (c == 'j' || c == 'J') goto imaginary; else if (nonzero) { tok->done = E_TOKEN; tok_backup(tok, c); return ERRORTOKEN; } } } else { /* Decimal */ do { c = tok_nextc(tok); } while (isdigit(c)); { /* Accept floating point numbers. */ if (c == '.') { fraction: /* Fraction */ do { c = tok_nextc(tok); } while (isdigit(c)); } if (c == 'e' || c == 'E') { int e; exponent: e = c; /* Exponent part */ c = tok_nextc(tok); if (c == '+' || c == '-') { c = tok_nextc(tok); if (!isdigit(c)) { tok->done = E_TOKEN; tok_backup(tok, c); return ERRORTOKEN; } } else if (!isdigit(c)) { tok_backup(tok, c); tok_backup(tok, e); *p_start = tok->start; *p_end = tok->cur; return NUMBER; } do { c = tok_nextc(tok); } while (isdigit(c)); } if (c == 'j' || c == 'J') /* Imaginary part */ imaginary: c = tok_nextc(tok); } } tok_backup(tok, c); *p_start = tok->start; *p_end = tok->cur; return NUMBER; }
letter_quote: /* String */ if (c == '\'' || c == '"') { int quote = c; int quote_size = 1; /* 1 or 3 */ int end_quote_size = 0;
/* Find the quote size and start of string */ c = tok_nextc(tok); if (c == quote) { c = tok_nextc(tok); if (c == quote) quote_size = 3; else end_quote_size = 1; /* empty string found */ } if (c != quote) tok_backup(tok, c);
/* Get rest of string */ while (end_quote_size != quote_size) { c = tok_nextc(tok); if (c == EOF) { if (quote_size == 3) tok->done = E_EOFS; else tok->done = E_EOLS; tok->cur = tok->inp; return ERRORTOKEN; } if (quote_size == 1 && c == '\n') { tok->done = E_EOLS; tok->cur = tok->inp; return ERRORTOKEN; } if (c == quote) end_quote_size += 1; else { end_quote_size = 0; if (c == '\\') c = tok_nextc(tok); /* skip escaped char */ } }
*p_start = tok->start; *p_end = tok->cur; return STRING; }
/* Line continuation */ if (c == '\\') { c = tok_nextc(tok); if (c != '\n') { tok->done = E_LINECONT; tok->cur = tok->inp; return ERRORTOKEN; } tok->cont_line = 1; goto again; /* Read next line */ }
/* Check for two-character token */ { int c2 = tok_nextc(tok); int token = PyToken_TwoChars(c, c2); if (token != OP) { int c3 = tok_nextc(tok); int token3 = PyToken_ThreeChars(c, c2, c3); if (token3 != OP) { token = token3; } else { tok_backup(tok, c3); } *p_start = tok->start; *p_end = tok->cur; return token; } tok_backup(tok, c2); }
/* Keep track of parentheses nesting level */ switch (c) { case '(': case '[': case '{': tok->level++; break; case ')': case ']': case '}': tok->level--; break; }
/* Punctuation character */ *p_start = tok->start; *p_end = tok->cur; return PyToken_OneChar(c);}
intPyTokenizer_Get(struct tok_state *tok, char **p_start, char **p_end){ int result = tok_get(tok, p_start, p_end); if (tok->decoding_erred) { result = ERRORTOKEN; tok->done = E_DECODE; } return result;}
/* Get the encoding of a Python file. Check for the coding cookie and check if
the file starts with a BOM.
PyTokenizer_FindEncodingFilename() returns NULL when it can't find the encoding in the first or second line of the file (in which case the encoding should be assumed to be UTF-8).
The char* returned is malloc'ed via PyMem_MALLOC() and thus must be freed by the caller. */
char *PyTokenizer_FindEncodingFilename(int fd, PyObject *filename){ struct tok_state *tok; FILE *fp; char *p_start =NULL , *p_end =NULL , *encoding = NULL;
#ifndef PGEN
fd = _Py_dup(fd);#else
fd = dup(fd);#endif
if (fd < 0) { return NULL; }
fp = fdopen(fd, "r"); if (fp == NULL) { return NULL; } tok = PyTokenizer_FromFile(fp, NULL, NULL, NULL); if (tok == NULL) { fclose(fp); return NULL; }#ifndef PGEN
if (filename != NULL) { Py_INCREF(filename); tok->filename = filename; } else { tok->filename = PyUnicode_FromString("<string>"); if (tok->filename == NULL) { fclose(fp); PyTokenizer_Free(tok); return encoding; } }#endif
while (tok->lineno < 2 && tok->done == E_OK) { PyTokenizer_Get(tok, &p_start, &p_end); } fclose(fp); if (tok->encoding) { encoding = (char *)PyMem_MALLOC(strlen(tok->encoding) + 1); if (encoding) strcpy(encoding, tok->encoding); } PyTokenizer_Free(tok); return encoding;}
char *PyTokenizer_FindEncoding(int fd){ return PyTokenizer_FindEncodingFilename(fd, NULL);}
#ifdef Py_DEBUG
voidtok_dump(int type, char *start, char *end){ printf("%s", _PyParser_TokenNames[type]); if (type == NAME || type == NUMBER || type == STRING || type == OP) printf("(%.*s)", (int)(end - start), start);}
#endif
|