Local Variable Referenced Before Assignment Classification

Created on 2008-12-10 12:18 by amaury.forgeotdarc, last changed 2013-04-19 00:49 by barry. This issue is now closed.

File nameUploadedDescriptionEdit
delete_deref.patchamaury.forgeotdarc, 2008-12-10 12:18
msg77536 - (view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *Date: 2008-12-10 12:18
This issue comes from issue4613. The following code raises a SyntaxError("can not delete variable 'e' referenced in nested scope"): def f(): e = None def g(): e try: pass except Exception as e: pass # SyntaxError here??? The reason is because of http://www.python.org/dev/peps/pep-3110/#semantic-changes, a "del e" statement is inserted. The above code is correct, and should work. I suggest that the limitation: "can not delete variable referenced in nested scope" could be removed. After all, the "variable referenced" has no value before it is set, accessing it raises either NameError("free variable referenced before assignment in enclosing scope") or UnboundLocalError("local variable referenced before assignment") The Attached patch adds a DELETE_DEREF opcode, that removes the value of a cell variable, and put it in a "before assignment" state. Some compiler experts should review it. Few regressions are possible, since the new opcode is emitted where a SyntaxError was previously raised. The patch could also be applied to 2.7, even if it is less critical there. Tests are to come, but I'd like other's suggestions.
msg77691 - (view)Author: Terry J. Reedy (terry.reedy) *Date: 2008-12-12 23:03
-1 as I understand the proposal. Your code is bugged and should fail as soon as possible. If I understand correctly, you agree that the SyntaxError is correct as the language is currently defined, but you want the definition changed. It is not clear if you only want implicit deletes at the end of except clauses to work or if you only want explicit deletes to work. If the latter, you want def f(): e = 1 del e def g(): print(e) return g to compile. I would not. Your reason "After all, the "variable referenced" has no value before it is set," (duh, right) makes no sense to me in this context. g must have a valid value of e to run. So you seem to be suggesting that detection of buggy code should be delayed.
msg77693 - (view)Author: Raymond Hettinger (rhettinger) *Date: 2008-12-12 23:29
Not sure the "del e" idea was a good solution to the garbage collection problem. Amaury's code looks correct to me. Maybe the existing e variable should be overwritten and the left intact (as it used to be) or perhaps it should be made both temporary and invisible like the induction variable in a list comprehension. Phillip, any thoughts?
msg77696 - (view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *Date: 2008-12-12 23:49
Terry, my motivation is that the sample code above runs correctly with python 2.6, but python 3.0 cannot even compile it. The sample looks valid python code, and should run. Yes, the same 'e' is used both as a nested variable and as an exception target, but this should not matter with our dynamic language. First I thought to turn the implicit "del e" into something else (and change PEP3110), but then I saw that the error "can not delete variable referenced in nested scope" is actually a limitation of the interpreter that is easy to remove.
msg77703 - (view)Author: PJ Eby (pje) *Date: 2008-12-13 00:37
I could argue either way on this one; it's true that deleting a nested-scope variable is sometimes desirable, but it also seems to me like closing over an except: variable is a Potentially Bad Idea. In neither case, however, do I think it's appropriate to drop the temporary nature of the variable. I could perhaps get behind resetting the variable to None instead of deleting it, but of course the PEP would need changing. There's also a question of whether we should do the same thing with "with ... as" variables. (Btw, I'm not sure why this one's assigned to me; ISTM I might have proposed the current except/as GC semantics, but I'm not familiar with the actual implementation in 2.6 or 3.0)
msg77704 - (view)Author: Raymond Hettinger (rhettinger) *Date: 2008-12-13 00:48
Guido, any thoughts?
msg78434 - (view)Author: Benjamin Peterson (benjamin.peterson) *Date: 2008-12-28 21:28
I think being able to delete free variables is reasonable and brings more consistency as well as solving corner cases like this.
msg79228 - (view)Author: Guido van Rossum (gvanrossum) *Date: 2009-01-06 05:01
I don't think this has much to do with try/except. That it works in 2.6 but not in 3.0 isn't a big deal; the semantics of variables used in except clauses has changed dramatically. It has to do with deletion of a variable that's held in a cell for reference by an inner function, like this: def outer(): x = 0 def inner(): return x del x # SyntaxError I suspect (but do not know for sure) that the reason this is considered a SyntaxError is that the implementer of cells punted on the 'del' implementation and inserted a SyntaxError instead. (You can tell it's a pass-two SyntaxError because it doesn't mention a line number.) I think it's fine to fix this in 2.7 and 3.1, but I don't see it as a priority given that this has always been this way (and despite that it now affects try/except). It will probably require a new opcode. I don't see a reason to declare this a release blocker just because the try/except code is affected, and I don't think try/except needs to be changed to avoid this.
msg99247 - (view)Author: Craig McQueen (cmcqueen1975)Date: 2010-02-12 02:09
There's also this one which caught me out: def outer(): x = 0 y = (x for i in range(10)) del x # SyntaxError
msg99855 - (view)Author: Jeremy Hylton (jhylton)Date: 2010-02-22 22:18
It's an interesting bug. Maybe the compiler shouldn't allow you to use such a variable as a free variable in a nested function? On Thu, Feb 11, 2010 at 9:09 PM, Craig McQueen <report@bugs.python.org> wrote: > > Craig McQueen <python@craig.mcqueen.id.au> added the comment: > > There's also this one which caught me out: > > def outer(): >  x = 0 >  y = (x for i in range(10)) >  del x  # SyntaxError > > ---------- > nosy: +cmcqueen1975 > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4617> > _______________________________________ > _______________________________________________ > Python-bugs-list mailing list > Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/jeremy%40alum.mit.edu > >
msg99866 - (view)Author: Guido van Rossum (gvanrossum) *Date: 2010-02-22 23:10
All examples so far (*) have to do with our inability to have properly nested blocks. If we did, I'd make the except clause a block, and I'd issue a syntax warning or error if a nested block shadowed a variable referenced outside it. Ditto for generator expressions and comprehensions. As long as we don't have nested blocks, I think it's okay to see the limitation on (implicit or explicit) "del" of a cell variable as a compiler deficiency and fix that deficiency. __________ (*) However there's also this example: >>> def f(): ... try: 1/0 ... except Exception as a: ... def g(): return a ... return g ... SyntaxError: can not delete variable 'a' referenced in nested scope >>>
msg99880 - (view)Author: Jeremy Hylton (jhylton)Date: 2010-02-22 23:51
On Mon, Feb 22, 2010 at 6:10 PM, Guido van Rossum <report@bugs.python.org> wrote: > > Guido van Rossum <guido@python.org> added the comment: > > All examples so far (*) have to do with our inability to have properly nested blocks. If we did, I'd make the except clause a block, and I'd issue a syntax warning or error if a nested block shadowed a variable referenced outside it. Ditto for generator expressions and comprehensions. There's no reason we couldn't revise the language spec to explain that except clauses and comprehensions are block statements, i.e. statements that introduce a new block. For the except case, there would be some weird effects. y = 10 try: ... except SomeError as err: y = 12 print y # prints 10 In the example above, y would be a local variable in the scope of the except handler that shadows the local variable in the block that contains the try/except. It might be confusing that you couldn't assign to a local variable in the except handler without using a nonlocal statement. > As long as we don't have nested blocks, I think it's okay to see the limitation on (implicit or explicit) "del" of a cell variable as a compiler deficiency and fix that deficiency. The general request here is to remove all the SyntaxErrors about deleting cell variables, right? Instead, you'd get a NameError at runtime saying that the variable is currently undefined. You'd want that change regardless of whether we change the language as described above. hoping-for-some-bdfl-pronouncements-ly y'rs, Jeremy > __________ > (*) However there's also this example: > >>>> def f(): > ...  try: 1/0 > ...  except Exception as a: > ...   def g(): return a > ...   return g > ... > SyntaxError: can not delete variable 'a' referenced in nested scope >>>> > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4617> > _______________________________________ >
msg99911 - (view)Author: Guido van Rossum (gvanrossum) *Date: 2010-02-23 13:43
On Mon, Feb 22, 2010 at 6:51 PM, Jeremy Hylton <report@bugs.python.org> wrote: > There's no reason we couldn't revise the language spec to explain that > except clauses and comprehensions are block statements, i.e. > statements that introduce a new block. However (even apart from the below example) it would be tough to implement cleanly in CPython. > For the except case, there would be some weird effects. > > y = 10 > try: >  ... > except SomeError as err: >  y = 12 > print y  # prints 10 > > In the example above, y would be a local variable in the scope of the > except handler that shadows the local variable in the block that > contains the try/except.  It might be confusing that you couldn't > assign to a local variable in the except handler without using a > nonlocal statement. Yeah, there are all sorts of problems with less-conspicuous nested scopes like this, for a language that defaults to local assignment like Python. Hence the horrible hacks. >> As long as we don't have nested blocks, I think it's okay to see the limitation on (implicit or explicit) "del" of a cell variable as a compiler deficiency and fix that deficiency. > > The general request here is to remove all the SyntaxErrors about > deleting cell variables, right?  Instead, you'd get a NameError at > runtime saying that the variable is currently undefined.  You'd want > that change regardless of whether we change the language as described > above. Yeah, if we could kill those SyntaxErrors we can leave the rest as is.
msg99915 - (view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *Date: 2010-02-23 13:52
The above patch adds a new opcode (DELETE_DEREF), does the Moratorium apply here?
msg99918 - (view)Author: Guido van Rossum (gvanrossum) *Date: 2010-02-23 14:38
I don't think so. It's very marginal. --Guido (on Android) On Feb 23, 2010 8:52 AM, "Amaury Forgeot d&apos;Arc" <report@bugs.python.org> wrote: Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: The above patch adds a new opcode (DELETE_DEREF), does the Moratorium apply here? ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python...
msg99950 - (view)Author: Jeremy Hylton (jhylton)Date: 2010-02-23 20:41
The patch looks pretty good. I'd factor out the common error-checking code (common between LOAD_DEREF and DELETE_DEREF) into a helper function. It would also be good to add some test cases. Jeremy On Tue, Feb 23, 2010 at 9:38 AM, Guido van Rossum <report@bugs.python.org> wrote: > > Guido van Rossum <guido@python.org> added the comment: > > I don't think so. It's very marginal. > > --Guido (on Android) > > On Feb 23, 2010 8:52 AM, "Amaury Forgeot d&apos;Arc" <report@bugs.python.org> > wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > The above patch adds a new opcode (DELETE_DEREF), does the Moratorium apply > here? > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python... > > ---------- > Added file: http://bugs.python.org/file16341/unnamed > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue4617> > _______________________________________
msg113312 - (view)Author: Florent Xicluna (flox) *Date: 2010-08-08 20:22
This bug is waiting for unit tests and a small patch cleanup. See previous message: http://bugs.python.org/issue4617#msg99950
msg113335 - (view)Author: Terry J. Reedy (terry.reedy) *Date: 2010-08-08 21:58
I have changed my mind on this issue. Since e = 1 del e def g(): print(e) g() compiles and raises a run-time name error, so should the same code embedded within a function. In either case, the premature deletion is a logic error, not a syntax error. However, changing the language definition, even to fix what is considered a design bug, is a feature request. For both 2.7 and 3.1, section 6.5. "The del statement", says "It is illegal to delete a name from the local namespace if it occurs as a free variable in a nested block." So this seems too late for 2.7. On the other hand, Guido has allowed it for 3.2 in spite of the moratorium, but I think it should go in the initial release.
msg113340 - (view)Author: Guido van Rossum (gvanrossum) *Date: 2010-08-08 22:10
Yeah, please fix in 3.2, don't fix in 2.7.
msg116048 - (view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *Date: 2010-09-10 22:02
Fixed in r84685, with tests and doc updates.
DateUserActionArgs
2013-04-19 00:49:14barrysetnosy: + barry
2010-09-10 22:02:17amaury.forgeotdarcsetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg116048

stage: test needed -> resolved
2010-09-10 14:44:14amaury.forgeotdarcsetassignee: amaury.forgeotdarc
resolution: accepted
2010-08-09 06:16:06scodersetnosy: - scoder
2010-08-08 22:10:44gvanrossumsetmessages: + msg113340
2010-08-08 21:58:04terry.reedysettype: behavior -> enhancement
messages: + msg113335
versions: - Python 2.7
2010-08-08 20:22:38floxsetversions: + Python 3.2, - Python 3.0
nosy: + scoder, ezio.melotti

messages: + msg113312

keywords: - needs review
2010-03-13 08:50:33floxsetnosy: + flox

type: behavior
components: + Interpreter Core
stage: test needed
2010-03-13 08:50:00floxsetfiles: - unnamed
2010-03-13 08:48:44floxlinkissue8130 superseder
2010-02-23 20:41:28jhyltonsetmessages: + msg99950
2010-02-23 14:38:31gvanrossumsetfiles: + unnamed

messages: + msg99918
2010-02-23 13:52:22amaury.forgeotdarcsetmessages: + msg99915
2010-02-23 13:43:28gvanrossumsetmessages: + msg99911
2010-02-22 23:51:09jhyltonsetmessages: + msg99880
2010-02-22 23:10:52gvanrossumsetmessages: + msg99866
2010-02-22 22:18:06jhyltonsetmessages: + msg99855
2010-02-12 02:09:11cmcqueen1975setnosy: + cmcqueen1975
messages: + msg99247
2009-01-06 05:01:44gvanrossumsetpriority: release blocker -> normal
assignee: gvanrossum -> (no value)
messages: + msg79228
2008-12-28 21:28:21benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg78434
2008-12-20 02:41:17loewissetpriority: deferred blocker -> release blocker
2008-12-13 00:48:42rhettingersetassignee: pje -> gvanrossum
messages: + msg77704
nosy: + gvanrossum
2008-12-13 00:37:22pjesetmessages: + msg77703
2008-12-12 23:49:05amaury.forgeotdarcsetmessages: + msg77696
2008-12-12 23:29:12rhettingersetassignee: pje
messages: + msg77693
nosy: + pje, rhettinger
2008-12-12 23:03:43terry.reedysetnosy: + terry.reedy
messages: + msg77691
2008-12-10 18:46:53benjamin.petersonsetnosy: + jhylton
2008-12-10 16:41:24loewissetpriority: release blocker -> deferred blocker
2008-12-10 12:18:42amaury.forgeotdarccreate

If you're closely following the Python tag on StackOverflow, you'll notice that the same question comes up at least once a week. The question goes on like this:

x = 10deffoo(): x += 1print x foo()

Why, when run, this results in the following error:

Traceback (most recent call last): File "unboundlocalerror.py", line 8, in <module> foo() File "unboundlocalerror.py", line 4, in foo x += 1 UnboundLocalError: local variable 'x' referenced before assignment

There are a few variations on this question, with the same core hiding underneath. Here's one:

lst = [1, 2, 3] deffoo(): lst.append(5) # OK#lst += [5] # ERROR here foo() print lst

Running the statement successfully appends 5 to the list. However, substitute it for , and it raises , although at first sight it should accomplish the same.

Although this exact question is answered in Python's official FAQ (right here), I decided to write this article with the intent of giving a deeper explanation. It will start with a basic FAQ-level answer, which should satisfy one only wanting to know how to "solve the damn problem and move on". Then, I will dive deeper, looking at the formal definition of Python to understand what's going on. Finally, I'll take a look what happens behind the scenes in the implementation of CPython to cause this behavior.

The simple answer

As mentioned above, this problem is covered in the Python FAQ. For completeness, I want to explain it here as well, quoting the FAQ when necessary.

Let's take the first code snippet again:

x = 10deffoo(): x += 1print x foo()

So where does the exception come from? Quoting the FAQ:

This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope.

But is similar to , so it should first read , perform the addition and then assign back to . As mentioned in the quote above, Python considers a variable local to , so we have a problem - a variable is read (referenced) before it's been assigned. Python raises the exception in this case [1].

So what do we do about this? The solution is very simple - Python has the global statement just for this purpose:

x = 10deffoo(): global x x += 1print x foo()

This prints , without any errors. The statement tells Python that inside , refers to the global variable , even if it's assigned in .

Actually, there is another variation on the question, for which the answer is a bit different. Consider this code:

defexternal(): x = 10definternal(): x += 1print(x) internal() external()

This kind of code may come up if you're into closures and other techniques that use Python's lexical scoping rules. The error this generates is the familiar . However, applying the "global fix":

defexternal(): x = 10definternal(): global x x += 1print(x) internal() external()

Doesn't help - another error is generated: . Python is right here - after all, there's no global variable named , there's only an in . It may be not local to , but it's not global. So what can you do in this situation? If you're using Python 3, you have the keyword. Replacing by in the last snippet makes everything work as expected. is a new statement in Python 3, and there is no equivalent in Python 2 [2].

The formal answer

Assignments in Python are used to bind names to values and to modify attributes or items of mutable objects. I could find two places in the Python (2.x) documentation where it's defined how an assignment to a local variable works.

One is section 6.2 "Assignment statements" in the Simple Statements chapter of the language reference:

Assignment of an object to a single target is recursively defined as follows. If the target is an identifier (name):

  • If the name does not occur in a global statement in the current code block: the name is bound to the object in the current local namespace.
  • Otherwise: the name is bound to the object in the current global namespace.

Another is section 4.1 "Naming and binding" of the Execution model chapter:

If a name is bound in a block, it is a local variable of that block.

[...]

When a name is used in a code block, it is resolved using the nearest enclosing scope. [...] If the name refers to a local variable that has not been bound, a UnboundLocalError exception is raised.

This is all clear, but still, another small doubt remains. All these rules apply to assignments of the form which clearly bind to . But the code snippets we're having a problem with here have the assignment. Shouldn't that just modify the bound value, without re-binding it?

Well, no. and its cousins (, , etc.) are what Python calls "augmented assignment statements" [emphasis mine]:

An augmented assignment evaluates the target (which, unlike normal assignment statements, cannot be an unpacking) and the expression list, performs the binary operation specific to the type of assignment on the two operands, and assigns the result to the original target. The target is only evaluated once.

An augmented assignment expression like can be rewritten as to achieve a similar, but not exactly equal effect. In the augmented version, is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.

With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible in-place behavior, the binary operation performed by augmented assignment is the same as the normal binary operations.

So when earlier I said that is similar to, I wasn't telling all the truth, but it was accurate with respect to binding. Apart for possible optimization, counts exactly as when binding is considered. If you think carefully about it, it's unavoidable, because some types Python works with are immutable. Consider strings, for example:

x = "abc" x += "def"

The first line binds to the value "abc". The second line doesn't modify the value "abc" to be "abcdef". Strings are immutable in Python. Rather, it creates the new value "abcdef" somewhere in memory, and re-binds to it. This can be seen clearly when examining the object ID for before and after the :

>>> x = "abc" >>> id(x) 11173824 >>> x += "def" >>> id(x) 32831648 >>> x 'abcdef'

Note that some types in Python are mutable. For example, lists can actually be modified in-place:

>>> y = [1, 2] >>> id(y) 32413376 >>> y += [2, 3] >>> id(y) 32413376 >>> y [1, 2, 2, 3]

didn't change after , because the object referenced was just modified. Still, Python re-bound to the same object [3].

The "too much information" answer

This section is of interest only to those curious about the implementation internals of Python itself.

One of the stages in the compilation of Python into bytecode is building the symbol table [4]. An important goal of building the symbol table is for Python to be able to mark the scope of variables it encounters - which variables are local to functions, which are global, which are free (lexically bound) and so on.

When the symbol table code sees a variable is assigned in a function, it marks it as local. Note that it doesn't matter if the assignment was done before usage, after usage, or maybe not actually executed due to a condition in code like this:

x = 10deffoo(): if something_false_at_runtime: x = 20print(x)

We can use the module to examine the symbol table information gathered on some Python code during compilation:

importsymtable code = '''x = 10def foo(): x += 1 print(x)''' table = symtable.symtable(code, '<string>', 'exec') foo_namespace = table.lookup('foo').get_namespace() sym_x = foo_namespace.lookup('x') print(sym_x.get_name()) print(sym_x.is_local())

This prints:

So we see that was marked as local in . Marking variables as local turns out to be important for optimization in the bytecode, since the compiler can generate a special instruction for it that's very fast to execute. There's an excellent article here explaining this topic in depth; I'll just focus on the outcome.

The function in handles variable name references. To generate the correct opcode, it queries the symbol table function . For our , this returns a bitfield with in it. Having seen , generates a . We can see this in the disassembly of :

35 0 LOAD_FAST 0 (x) 3 LOAD_CONST 1 (1) 6 INPLACE_ADD 7 STORE_FAST 0 (x) 36 10 LOAD_GLOBAL 0 (print) 13 LOAD_FAST 0 (x) 16 CALL_FUNCTION 1 19 POP_TOP 20 LOAD_CONST 0 (None) 23 RETURN_VALUE

The first block of instructions shows what was compiled to. You will note that already here (before it's actually assigned), is used to retrieve the value of .

This is the instruction that will cause the exception to be raised at runtime, because it is actually executed before any is done for . The gory details are in the bytecode interpreter code in :

TARGET(LOAD_FAST) x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); PUSH(x); FAST_DISPATCH(); } format_exc_check_arg(PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg)); break;

Ignoring the macro-fu for the moment, what this basically says is that once is seen, the value of is obtained from an indexed array of objects [5]. If no was done before, this value is still , the branch is not taken [6] and the exception is raised.

You may wonder why Python waits until runtime to raise this exception, instead of detecting it in the compiler. The reason is this code:

x = 10deffoo(): if something_true(): x = 1 x += 1print(x)

Suppose is a function that returns , possibly due to some user input. In this case, binds locally, so the reference to it in is no longer unbound. This code will then run without exceptions. Of course if actually turns out to return , the exception will be raised. Python has no way to resolve this at compile time, so the error detection is postponed to runtime.


0 thoughts on “Local Variable Referenced Before Assignment Classification”

    -->

Leave a Comment

Your email address will not be published. Required fields are marked *