Code construction

From 22113
Jump to navigation Jump to search

Learning how to construct good code is a main goal of the course. An important tool to facilitate this is peer evaluation. It is beneficial for you both to evaluate others, seeing other ways of solving the same problem - and to get feedback from other people on your own work. When evaluating you should check the teachers solutions to have some kind of reference apart from your own.
Tip: If you want your peer to pay specific attention to something when evaluating, please write a comment about it at the top of your hand-in.

Good code and peer evaluation

This is a walk-through of the several aspects of how any programs in any language should be made, but with a small focus on Python. You can use this page as a reference for how your own programs should be, and as a base for your peer evaluation.

Comments

Comments in code have no influence on the actual running of the program, but they are an important visual help. It can make the understanding of the code easier (or harder) in several ways, both for the author of the program, but also for a causal reader, who must understand/maintain the code. Comments must not be underestimated in their importance for making the code easier to understand. Both the reading and writing of the comments enhance this understanding. Thus it is a good idea to write the comment first as that focuses your mind on what you have to do.

Placement

As a surface requirement, the placement of the comment is critical. Comments can disturb the reading of the code by making it difficult to distinguish between code and comments, usually by being placed in non-systematic ways. Good placement of comments enhance readability of the code, simply by "looking nice" (actual reading not required), and by separating different elements/parts of the program from each other. The general rule is that the comment is placed just above and at the same indention level as the statement(s) that must be explained. Otherwise it can be placed on the same line as the statement to be explained but clearly separated from it. All comments like that must be indented at the same level.

Content

The content of the comments must describe what is going on in this part of the code. Irrelevant, misleading or directly wrong content is harmful for understanding and hence maintenance. The content must be easy to read and understand. The content can be a headline for trivial code, or more throughout explanation for intricate parts.

Amount

The amount of comments is also of importance. Filling the code with comments about every little detail is not beneficial, but neither are too few comments, so the reader has to guess at what is going on. The "right" amount of comments is usually very personal, but a good rule of thumb is that comments make up 10% to 25% of the text in the program.

Spacing

The code should appear segmented - separated by empty lines. This helps readability and clarity, as one can concentrate on reading and understanding one segment at a time. Making code is stringing a lot of ideas together in a step-wise fashion, and by judiciously placing empty lines these steps/segments can become apparent. Comments are also part of the spacing. Improper spacing is if the code looks "cramped" or has lots of empty space.

Naming

Any program will contain variables and other objects of generic nature. By naming the variables/objects well, the readability of the code will be greatly enhanced and conversely by using nonsensical, misleading or directly wrong names, the reader will be confused - including yourself. Any name must reflect the nature and/or purpose of the variable/object. It is well worth considering a name for a while, as you will use it several times in your code and the way you think about the name will influence the way you use the variable and the meaning you attach to it. Being systematic and consistent in naming is beneficial. It can be achieved in several ways - here is some:

  • DO's
  • Using a specific naming convention - like camelCase
  • Consequently use i and j in trivial for-loops that iterates over numbers
  • Include a word in the variable name that tells something about what it is. E.g. 'List' for a list, 'Pos' for a position, etc.
  • DON'Ts
  • Nonsensical assignment, like age = "Joe"
  • Too long a name is annoying to use/read
  • Too short or abbreviated name can become nonsensical

Code quality

There are many aspects to consider under this heading.

Amount of variables

Code can easily get confusing by using more variables than truly necessary. This is typically seen when you do a lot of copying content from one variable to another, or other situations where the value of one variable carries a subset of the same meaning/value as another. It is bad code and a sign of lack of overview.

Memory usage

Gross misuse of memory is often easily spotted. The classic example is to read an entire file into memory, and then loop through it line by line - just read the file directly line by line. Other easily spotted forms of misuse is when big data structures are copied over to new variables with no or few changes. The changes usually consist of filtering out some lines/elements.

Correct use of data type

In other situations the given problem (the data) is represented in an inappropriate data type. This happens most frequently with the collection types, i.e. a dictionary is used where a list would be better or vice versa.

Simple structural flaws in code

This covers directly observable minor flaws in the structure of the code.

  • If-statement; No empty statement, i.e. no "else: pass", or "if x==0": pass else: z=1".
    No repeat of a statement in both true/false cases - usually an increment of the same variable.
  • For-loops; Don't change the value of the loop variable inside the loop.
    Sometimes a while is used instead of an if and the code will work with both due to the way the code work - the while is effectively only iterated once => change to if.
  • Conditional logic; or is used instead of and, or vice versa. This happens because when we speak, we often mix up and and or when we argue logic. This is translated to the code you write.

Code clarity

The simple structural flaws above will definitely reduce clarity of the code, and so will "bad" comments and variable names, wrong spacing and modularization. Disregarding these which have been covered, there are still some elements of clarity not yet discussed.

  • All loops - especially while loops - should be initialized properly. Basically this means variables should be set to contain the proper values so the loop can progress correctly. This is done just before the loop and can be thought of as part of the modularization. It can be that no initialization is required or in more complex situations it may not be possible.
  • If possible with nested loops - do not reset a key variable after the loop, but initialize instead before the loop. Functionally it is the same result, but initialization is easier to read.
  • Declare/initialize your variables near the code where they are used, when possible. This can be considered an aspect of modularization.
  • Avoiding using break and sys.exit() in ways that break the natural flow of the code
  • Write explicit code. In Python this is most easily achieved by not making too long and complex lines. Break them up in simpler lines.
  • Write direct code, i.e. avoid code bloat. While code bloat is hard to quantify, then it can be easy to see. This happens when people write a lot of code, which could have been written much simpler and much more directly towards the goal. When you see code bloat, it can be hard to point the finger to any specific part of the code, which is wrong. The code might not be wrong at all, it is just bloated.
  • (Utility) functions should be written with a singular specific purpose in mind, i.e. they are not doing 2 different things. They should be general in their purpose. This is part of modularization.

Input control

A lot of the code in a program is handling the input to the program - from keyboard or file. Many trivial errors come from accepting inappropriate input. It could be the wrong type of input (text instead of number), a number is not within the expected/accepted range or expecting DNA but getting protein sequence. As far as possible trivial control of input should be implemented and "nice" error handling enforced, i.e. informing the user of the problem, so it can be corrected.

Error scenarios

Even when the program is fed input, which at least on the surface is correct, situations can occur where the program breaks down or produces wrong output. This is often due to extreme cases where the programmer does not care or cannot imagine the situation occurring. None the less, it is possible find and handle such errors by observing inconsistencies in the result. A good way of finding such error scenarios is to see if you can imagine some input, which will cause the program to break. Good code is robust and deals with unforeseen input.

Modularization

Part of modularization is correctly placed comments and spacing, but it very much has to do with the code itself.
In small scale, when segmenting the parts/steps of the code, then the segmentation should be true, i.e. a line (or more) of code that conceptually belongs to one part should not be in another part. This can be (and often is) as simple as placing the spacing the right place, or more complex considerations about what code belongs to what part. If a piece of code is consisting of a mix of two (or more) segments, it is not well modularized.
In larger scale, a number of smaller code segments can constitute a larger part of the code, which is very self-contained (has little to no interaction with other parts of the program) and which have a very well defined entry conditions and exit conditions (data looks like this when we enter and like this when we exit). The proper use of functions/classes/libraries is a strong part of good modularization.

Correctness

This is simply to which degree does the solution(s) give the right answer when using the data set supplied with the exercise - or similar natural data sets.
"Natural data sets" in this context are data set, that

  • are NOT specially constructed to catch minor flaws in the programming.
  • do NOT differ significantly from what has been used in exercises.
  • are NOT unlikely to occur in Real Life.

Algorithm clarity

This is the highest abstraction level covered. It is a metric for how easy it is to follow the "purpose" of the code and the "ways" or "mechanisms" it uses to achieve this purpose. It could also be described as how easy is it is to follow the thoughts and logic of the programmer. All previous elements are parts of this evaluation. Making this evaluation can be slightly subjective - be careful about that.

Anti-Patterns

Anti-patterns are also called “bad practices”, “dark patterns”, or “pitfalls”.
An anti-pattern is just like a pattern, except that instead of a solution it gives something that looks superficially like a solution, but isn’t one.
Andrew Koenig

https://sourcemaking.com/antipatterns/software-development-antipatterns