Code:Blocked 2.0

Saturday, October 29, 2011

Overview: The Intel C/C++/Fortran compilers

One of the worst things a person with no stable internet connection (like me) could do, is install Ubuntu Linux. Based on Debian, it boasts extensible design, support and popularity, but comes with little to none third-party software. Also, i believe it to be the only distribution to come without any developmental tools. No compilers, no IDEs, no translators. Nothing. What that means is I, a student of CSE, cannot write and compile programs. Until i install a compiler. But seeing how i do not have an active connection, i cannot do that.

So i posted on Facebook about my issue, and sooner or later, someone brought me a .tar archive of the Intel Compiler Suite, along with their IDE, in .deb format. OK, now we have the compiler- so let us install.

I fired up the console, started the install process, inputted my (legal!) serial number, and... the install asked for my GCC directory.

Wait, what?

That. Apparently, you need to have a compiler installed, to install a compiler. It seems the Intel Compiler Suite makes some pretty interesting calls on its main competitor. Bet Dr.Stallman would laugh at this for decades.

So now, we're back to square one, where i'm trying to download GCC. So much for thr ICS.

Algorithm development: Computer programming vs Mathematics

There has been a lot of debate between me and fellow colleagues whether mathematical skill is necessary for computer programming or not. While i used to be a strong advocate of the "logical" programming style, i have recently seen the light, so to speak, regarding the use of mathematics and its application in computer science.
So, what does my oh-so-important opinion state nowadays? Let us look at it from start to finish.

C o m p u t i n g H i s t o r y

Today's computers are used in almost none of the ways they were meant to. Today we send email, open up facebook, listen to music and do all sorts of weird stuff only sleep-deprived maniacs do (e.g. program). But the original computer was meant for calculations of missile trajectories and decryption of enemy messages. Cool and flashy, huh? I bet all of us have written a few dozen uber programs like that. Jokes aside, computers were designed to be the perfect calculator, which they were- and still are. A computer doesn't do anything except shift bit patterns to simulate addition, subtraction, multiplication and division, and it didn't do much less in ages past. In fact, it's about the same. So when asked what is the basic role of a computer, one has to ask himself whether it really is "uhm...stuff", or automated maths.

P r o g r a m m i n g T h e o r y : A l g o r i t h m d e s i g n

I am not really going to cover all I've to say about the intricacies of algorithmic design and research. Needless to say, it is a broad topic best discussed in person, and with a really big chalk board (or in my case, a brown door. And chalk.) But what i always notice when working with, or teaching, novice programmers is that they miss a big part of the software creation process. Namely, they do not design. They start hacking away, writing code and thinking as they go, and always end up wondering why it doesn't work, or doesn't work well. As many times as i have said this, programming does not work that way.

Yet i keep wondering if it is their fault.

Most CS courses and classes start away with programming, and do not focus on algorithms, but that i find wrong. Thing about algorithms is, however, there are exactly two ways to develop one:

-Either logically
-Or mathematically

This does not, however, imply that there are only two algorithms for every problem in existence. No, there are probably many, many more. But these two general mindsets, these two categorizations of algorithms, are all there is. I used to believe that any problem could be solved logically. However, i have recently been proven wrong. Some problems, like file formatting, low-level programming etc., can only be solved mathematically. There is practically no way for one to do low-level programming without Fourier transformations, and third-degree integrals. One cannot write graphics programs without a very, very firm grasp of geometry, trigonometry and calculus. Computer security and networking is all based on numerical methods, and mathematical analysis. And even if there is a logical solution to a problem, rest assured that it is always both slower and less elegant than the mathematical ones.

Now this stems from two, and only two things (which are essentially the same). First of all, computers were never meant as logic machines. In fact, they are far too stupid for that. Instead, they were meant for mathematical problem automation. And the second thing is, while you can simulate logical principles with computers, you can't really do it well. There have been attempts to rectify this, like the programming languages Prolog and Lisp, but we all know how many people use them. And as sooped up as AI seems to be these days, i haven't met a compiler that could outprogram me in assembly, nor an AI program that can beat me in any strategy game, unless it is a ridiculously simple one with solutions hard-coded.

My conclusion and final words are ones of remorse. Kids, learn maths. Perfect algebra. Master trigonometry. Own calculus. I didn't, and now i'm paying for it, even seriously thinking about dropping out of university because of it. Don't do what i did, and be sure that you'll become a far better computer programmer that i ever will.

Friday, October 28, 2011

Programming theory: Static vs Dynamic compilation

There was a bit of debate in my structured programming class about whether it's possible to declare an array with a number of elements dependent on a variable. I argued for, because there is nothing in the C language standard that forbids it. The assistant professor, however, argued against, because of static compilation. Seeing how it turned into a rather interesting debate, i thought i'd post about it, and how both of us came to our conclusions. To do that, however, we will first need some

C o m p i l e r t h e o r y : t h e a b s o l u t e b a s i c s

First of all, let's settle for compilers, instead of interpreters. Of course, this debate could be extended to interpreters, but a)They always render code to a compiler and b)They always dynamically allocate memory. So it would be a moot debate, in the first place.

Now that we settled that difference, let us examine the bare-bone basics of compilers. A compiler is a software package that takes instructions written in a language as input and translates them into machine-compliant instructions as output. Now that dictates that a compiler is consisted of three parts: A lexical parser, that scans the entire code and understands it, lexical analyzer, optimizer, which speeds up our code before it is actually compiled, and a translator which performs the final stage of compiling (reserving memory included).

Now that we have that bit of information, let us focus on the second and third part of the compiler, as the parser doesn't actually have an effect in memory allocation. Assume the following visualization of the Java interpreting system:

Apparently, there are two branches here, leading to different code speeds. Now this is because of two very different approaches to compilation: static compilation, and dynamic compilation.

A static compiler does all in a minimal amount of parses. It takes the code, calculates line for line how much memory you need, and makes a request at the operating system to allocate it. Now this does indeed mean that memory gets a bit inflexible, as the compiler needs to know how much memory you need beforehand: you cannot allocate it at runtime. Now not only this means you cannot declare array x with a variable for index size, it means you also cannot use dynamic structures, such as dynamic arrays. Also, it is impossible to use some of the more interesting forms of linked lists, which basically expand by need, but use pointer arithmetic to link them instead of adjacent memory addresses, as in arrays. Most traditional compilers, like gcc (one of the very, very best) work using this principle, and it's all good. Until you need something more, that is.

Now let us look at the second branch of the diagram. Instead of going directly to the interpreter, it first passes the Java Accelerator. The Accelerator is a dynamic allocation based semi-compiler, also known as JIT or "lazy" compiler. The lazy approach is a bit different. Instead of parsing everything in a single pass, or close to it, it goes on and on, simplifying statements until it reaches their simplest form. For a practical demonstration, assume the following algorithm sample:

exec start:
var x = 3, y = 4, sum = 0
const z = 12
sum = x*y - z
exec stop

A fairly simple code snippet. as anyone would agree. It takes two variables (3 and 4), multiplies them, and
assigns the difference of that product and a third variable (12) to the sum.

Let us see how a traditional compiler would parse this:

-Notice 4 variables, all integer type.
-Assign memory for the 4 variables (in C, this would be 4x4 = 16 bytes minimum.)
-Assign values to x, y, z, sum
-Calculate x*y
-Calculate (x*y)-z
-Assign ((x*y)-z) to sum

12 instructions. Perhaps more, depending on optimization. Rather slow on the hardware level.
Now let us look at the JIT approach:

-Notice 4 variables, all integer types
-Assign memory for the variables
-Assign values
-Calculate x*y
-Calculate (x*y)-z

And there isn't an assignment statement after this. Wait, what? No assignment to sum?

Nope. Sum already contains that value. Why would you reassign it? See, a lazy (JIT) compiler doesn't do anything until it absolutely needs to. That includes assignment and memory allocation. Now that not only means you lose the unnecessary assignments, that also delays the memory assignment. This implies that memory is assigned at runtime, and that makes the variable-based array declaration perfectly legal.

Now some argue about the speed of the JIT compilers, but most of the modern compilers are JIT. The CLI compiler (.NET architecture) is JIT, for an example. And JIT offers more than just optimization. It would be impossible to write self-modifying code without JIT and dynamic compilation, to say one thing.

Tuesday, October 25, 2011

OS Review: Ubuntu 11.10 with Unity desktop

There is nothing i enjoy writing about more than operating systems, and the Unix variants have always been my number one love in journalism. More than an opportunity to inform you, the reader, about what is new, what is now deprecated and the best way to operate your hardware, it gives me the excuse to play around with low level code, as well as explore an OS too many. But this particular version of the Linux system is something i was eager to write, not because i am a fan of Ubuntu (which i am not), but because it's rather historic.

With the 11.10 version, Canonical is gone where no Linux vendor has done before- they are literally showing both Microsoft and Apple the finger, and just after they've both announced radical changes in the way their own systems interface with the user. Namely, just as Microsoft announced that Windows 8 will be tablet-centric and low hardware oriented, and revealed their new "streamlined" GUI, and Apple were about to release their own flavor of minimalistic, yet attractive look-and-feel, Canonical quietly brought the Unity interface to Linux, saying "We did it both better and faster than both of you". So how does Unity fare against its main competitors in both the OS wars and intra-system battle against the established Gnome 3 and KDE 4?

1. Installation

The installation was always the big, scary monster in the Linux world, and surprisingly, Canonical never did much to improve it. Sure, they have the wubi installer, but it rarely works: namely, for 11.10 it simply doesn't. So i had to either use a classic live cd install, or install from a thumb-drive. Seeing how i didn't have a spare CD-ROM, i went with the thumb drive. After downloading a little bit of software- namely this one - i managed to turn my 1 gigabyte flash drive into a live, bootable medium. Thing is, since my drive is USB 2.0 i had to wait for 30 minutes before everything was done with it, even though i have the uber 3.0 interface. But for the classic CD-ROM approach, it will boil down to the 2 minutes you need to burn a 600 megabyte image to disk.

Now here is where the fun starts- formatting hard drives. The Ubuntu format utility is pretty much terrible, as it is probably meant to be- cryptic enough to scam you into formatting your entire partition. Now I've nothing against Ubuntu, but i don't plan on removing my Windows 7 home ultimate i paid so much for, so i had to reboot into Windows and write down the names of all the hard drives and partitions so i know what I'm partitioning and don't butcher my system accidentally. After that, installation is a breeze- you get a huge repository to choose packages from -all .deb, of course- and installation is all automated for you. Then you just choose a language and root password, and that's it.

2. Proprietary software

Ubuntu 11.10 comes with a bunch of third party proprietary software, mostly drivers and codecs. Now, some might complain, but i'd say that it's a good thing for everyone, except perhaps Richard Stallman. That means that Ubuntu is compatible with most hardware right off the bat, and it managed to read my graphics cards, all 8 processors and all networking utilities (WiFi, WiMAX and Bluetooth) and all the exotics like USB 3.0, Firewire, Lightpeak and even Sony MagicGate with no problems. In fact, it does so better than Windows does even after driver installation.

3. The user interface

Ok, now we're getting somewhere. There's been a lot of buzz among Linux hackers about the Unity system- apparently, not too many people like it. I found it to be rather interesting for portable work, though. Simple and minimalistic to be used on a laptop computer, yet elegant enough to effectively finger Microsoft's Aero. Admittedly, it might not be everyone's brand of orange juice, but i find it far better than GNOME 3. Now, there are a few bugs here and there (but not too many and not too bad), and the left-side window panel is a bit strange for those of us coming from Windows and KDE, but overall, i like it. I don't see Microsoft and Canonical swapping market shares, but i at least see Ubuntu making it into the OEM machines rather more often. I haven't had the chance to test out the API for it, but i'm willing to place a good bet that it's going to be fun writing decent software for Unity.

At least for me, if not for the Free Software Foundation.

Final words

I've already written enough, but i don't believe I have effectively shown my opinion of it. I installed it expecting a rough Debian with a face-lift, but ended up really liking Ubuntu. I am thinking about installing it permanently, and that says something right there.

Free 64-bit SuSE dvds, anyone?

Versus-approved

Monday, October 24, 2011

Issues of Peer to Peer file sharing

I was just asked by via Facebook whether it's recommended to keep BearShare installed after downloading music from it. The answer took 10 minutes to elaborate in layman terms, so i decided to post my opinion here, for future reference (basically, i'm too impatient to answer the same question again, and would rather have a link). The answer relates not just to BearShare, but to all P2P programs in general, like uTorrent, Morpheus, Limewire etc.

First of all, while the average visitor here knows what P2P is, the average person does not. A peer to peer network is a protocol for transferring information between two or more equal sources. That means there isn't a server, and that means that all computer access information from each other. Namely, when you download a song via P2P, you're downloading from a random persons computer. Now this is kinda good, as you can find almost anything you're looking for and is usually faster than traditional server- client networking. It's bad, however, because you're downloading from a random bunch of people, and have no idea who they are. For all you know, they could be crackers (known as Hackers to the random John Doe) baiting you with potential malware.

Here is how that works:

I find out a person i want to obtain information about likes Polka music. So i get myself a file, mask it as .mp3, and embed some pretty nasty scripts in it that retrieve the registry information from all computers that contain the file and send it back to me. Now i put that on a couple of P2P trackers (like BearShare), and wait for that person to hit the file for download. Now, while it doesn't necessarily mean that the person i'm baiting will get it, i am willing to place a 2,000 dollar bet that someone will. And voila. I have access to your computer. Now i can read your emails, send spam messages, commit fraud and steal your passwords and Personal Identification Numbers. And that is not good for anyone except me (and the FBI busting my doors yet again).

There is another bit of information regarding why P2P client programs are bad, and that is because they're annoying. I don't mean they install random toolbars and change your homepage (which they do anyway), i mean they're set to start up as soon as your computer does. That means they're always on, leaving both you exposed to cracker ("hacker" for all you John Does, again) attacks and eating up system resources, slowing down your computer. So if you have one, make sure to both have an updated quality antivirus program (i mean Avast, not Avira) and turn them off manually from the system tray whenever you're done with them, as well as use a startup management program that prevents them from starting at boot-up.

Data deletion and retrieval

The problem here is, it doesn't actually work. Deleting files is, in reality, far more difficult than emptying your recycle bin. Since the hard drive is basically just a big magnetic plate (or a composition of smaller magnetic plates), it stores everything using bits and bytes of information. And because a single bit ranges from 0 to 1, a single byte goes from 0 to 255. Now, if you had a 16-byte text file, and it went something like 10010110 01100101 etc. etc. what would be the deleted version of the file?

Bet you don't know.

Simple, the deleted version of the file would be the file itself. See, since a byte ranges from 0-255, there isn't a n empty state for memory blocks. What happens when you delete a file is that your operating system just removes the reference to that block (or those blocks, as rarely does a file occupy one block only) from its table. The file is very much alive until you write something over its initial position on the drive, and that could take quite a while. So there's actually a neat trick of restoring data information from deleted files: you just read them as if they were still there. Now this can be done by hand, but it's a major hassle, so it's far smarter and easier to use a data retrieval utility, like this bit of freeware here. It will automatically search your hard drive, comparing blocks to your OS' references, and whatever doesn't match- it's your file. The algorithm is simple, and usually works. Just to be safe, though, i recommend keeping this kind of software on external media such as a thumb drive, so you don't overwrite anything you actually need with this one when downloading and installing.

So now that we've raised the dead, how do we put them back in their graves for good?

As i said, by overwriting them with other data. Now this isn't easy, and is usually done in multiple passes, like 7 or so. And to do it, you need some special ware, like this guy here - also freeware. There are also commercial utilities as well as software packets that do this- personally i use the wiper in my Norton Internet Security for data wiping. Now, the catch in actually deleting the files is that, not only you want to pass the procedure n times, but on the n-th time you need to write all zeros. Why? Well, it's cool. Also, there cannot be a file associated with all 0s, as most OS's use zeros somewhere in the pattern to indicate non-referenced blocks or something.

Well, regardless of the algorithms used for these programs (i find them to be simple, at least on the first glance), they're handy bits of software to have around. We all have files we need deleted, and all have experienced the frustration of deleting a file we actually need. So, get those two, put them on a thumb-drive and set them to read-only so you don't delete them. Then you'll have a hassle un-deleting your un-deleter.

Sunday, October 23, 2011

Python: Object-oriented programming

I looked up some of the object-oriented features of Python last night, and apparently, it doesn't fare bad here. In fact, its object orientation is more similar to Modula-3 rather than Java, which is inherently a good thing. Namely, one can program using the Object oriented paradigm, but isn't obligated to. Classes can be defined with no members, which can be dynamically added, but not methods. Everything is public, and that's weird, but livable with. What's kind of weird is that everything in Python seems to be an object. Namely, everything inherits from the object class, whether it's a simple integer or uber special epic data type. Cool in some ways, bad with multiple inheritance, which is rather well supported, even though it's optimization hell with the multiple inheritance routes to Object. Method parameters are passed by reference, which makes up for the lack of pointers in some cases- sadly, not well nor everywhere, as it's a major hassle to wrap up everything you want passed by reference in a class. Will definitely need to ask some python programmers at the hacklab how on earth they handle this problem (though i'll bet they'll tell me to wrap in classes, then pass).

Hit counter: