Strukture podataka i algoritmi u Javi, 1. dio: Pregled

Java programeri koriste podatkovne strukture za pohranu i organiziranje podataka, a mi koristimo algoritme za manipulaciju podacima u tim strukturama. Što više razumijete strukture podataka i algoritme i kako oni rade zajedno, to će vaši Java programi biti učinkovitiji.

Ovaj tutorial pokreće kratku seriju koja predstavlja strukture podataka i algoritme. U 1. dijelu naučit ćete što je struktura podataka i kako su strukture podataka klasificirane. Također ćete naučiti što je algoritam, kako su algoritmi predstavljeni i kako koristiti funkcije složenosti vremena i prostora za usporedbu sličnih algoritama. Kad steknete ove osnove, bit ćete spremni naučiti o pretraživanju i sortiranju s jednodimenzionalnim nizovima, u 2. dijelu.

Što je struktura podataka?

Strukture podataka temelje se na apstraktnim vrstama podataka (ADT), što Wikipedia definira kako slijedi:

[A] matematički model za tipove podataka gdje je vrsta podataka definirana svojim ponašanjem (semantikom) sa stajališta korisnika podataka, posebno u pogledu mogućih vrijednosti, mogućih operacija s podacima ove vrste i ponašanja ovih operacija.

ADT-a nije briga za memorijski prikaz njegovih vrijednosti ili kako se njegove operacije provode. To je poput Java sučelja, što je vrsta podataka koja nije povezana s bilo kojom implementacijom. Suprotno tome, struktura podataka je konkretna implementacija jednog ili više ADT-ova, slično onome kako Java klase implementiraju sučelja.

Primjeri ADT-a uključuju zaposlenika, vozilo, niz i popis. Razmotrimo ADT popisa (poznat i kao Sequence ADT) koji opisuje uređenu zbirku elemenata koji dijele zajednički tip. Svaki element u ovoj zbirci ima svoj položaj i dopušteni su duplicirani elementi. Osnovne operacije podržane List ADT uključuju:

  • Stvaranje novog i praznog popisa
  • Dodavanje vrijednosti na kraj popisa
  • Umetanje vrijednosti u popis
  • Brisanje vrijednosti s popisa
  • Iteracija po popisu
  • Uništavanje popisa

Strukture podataka koje mogu implementirati popis ADT uključuju jednodimenzionalne nizove fiksne veličine i dinamički veličine i pojedinačno povezane popise. (S poljima ćete se upoznati u 2. dijelu, a povezani popisi u 3. dijelu.)

Klasificiranje struktura podataka

Postoje mnoge vrste struktura podataka, od pojedinačnih varijabli do nizova ili povezanih popisa objekata koji sadrže više polja. Sve se strukture podataka mogu klasificirati kao primitivi ili agregati, a neke se klasificiraju kao spremnici.

Primitivi vs agregati

Najjednostavnija vrsta strukture podataka pohranjuje pojedine podatke; na primjer, varijabla koja pohranjuje logičku vrijednost ili varijabla koja pohranjuje cijeli broj. Takve strukture podataka nazivam primitivima .

Mnoge podatkovne strukture mogu pohraniti više podataka. Na primjer, niz može pohraniti više podataka u različite utora, a objekt može pohraniti više podataka kroz svoja polja. Te strukture podataka nazivam agregatima .

Sve strukture podataka koje ćemo pogledati u ovoj seriji su agregati.

Spremnici

Sve iz čega se pohranjuju i preuzimaju stavke podataka moglo bi se smatrati strukturom podataka. Primjeri uključuju strukture podataka izvedene iz prethodno spomenutih ADT-ova zaposlenika, vozila, niza i popisa.

Mnoge su strukture podataka dizajnirane za opis različitih entiteta. Primjerci Employeeklase su strukture podataka koje postoje da bi, na primjer, opisale razne zaposlenike. Suprotno tome, neke strukture podataka postoje kao generičke posude za pohranu drugih struktura podataka. Na primjer, niz može pohraniti primitivne vrijednosti ili reference na objekt. Ovu posljednju kategoriju podatkovnih struktura nazivam spremnicima .

Osim što su agregati, sve strukture podataka koje ćemo pogledati u ovoj seriji su i spremnici.

Strukture podataka i algoritmi u Java kolekcijama

Okvir Java Collections podržava mnoge vrste struktura podataka orijentiranih na spremnik i pridružene algoritme. Ova serija pomoći će vam da bolje razumijete ovaj okvir.

Uzorci dizajna i strukture podataka

Postalo je prilično uobičajeno koristiti obrasce dizajna za upoznavanje studenata sa strukturama podataka. Članak Sveučilišta Brown istražuje nekoliko obrazaca dizajna koji su korisni za dizajniranje visokokvalitetnih struktura podataka. Između ostalog, rad pokazuje da je obrazac adaptera koristan za dizajniranje hrpa i redova. Demonstracijski kôd prikazan je u Popisu 1.

Popis 1. Korištenje uzorka adaptera za hrpe i redove (DequeStack.java)

public class DequeStack implements Stack { Deque D; // holds the elements of the stack public DequeStack() { D = new MyDeque(); } @Override public int size() { return D.size(); } @Override public boolean isEmpty() { return D.isEmpty(); } @Override public void push(Object obj) { D.insertLast(obj); } @Override public Object top() throws StackEmptyException { try { return D.lastElement(); } catch(DequeEmptyException err) { throw new StackEmptyException(); } } @Override public Object pop() throws StackEmptyException { try { return D.removeLast(); } catch(DequeEmptyException err) { throw new StackEmptyException(); } } }

Popis 1 izvlači DequeStacknastavu iz časopisa Brown University , koja pokazuje uzorak adaptera. Imajte na umu da Stacki Dequejesu sučelja koja opisuju Stack i Deque ADT-ove. MyDequeje klasa koja provodi Deque.

Nadjačavanje metoda sučelja

Izvorni kod koji Prikazane 1 temelji se na ne predstavljaju izvorni kod Stack, Dequei MyDeque. Radi jasnoće, uveo sam @Overridebilješke kako bih pokazao da sve DequeStackne-konstruktorske metode nadjačavaju Stackmetode.

DequeStackprilagođava MyDequetako da može implementirati Stack. Sve DequeStackmetode su jednoslojni pozivi Dequemetoda sučelja. Međutim, postoji mala bora u kojoj se Dequeiznimke pretvaraju u Stackiznimke.

What is an algorithm?

Historically used as a tool for mathematical computation, algorithms are deeply connected with computer science, and with data structures in particular. An algorithm is a sequence of instructions that accomplishes a task in a finite period of time. Qualities of an algorithm are as follows:

  • Receives zero or more inputs
  • Produces at least one output
  • Consists of clear and unambiguous instructions
  • Terminates after a finite number of steps
  • Is basic enough that a person can carry it out using a pencil and paper

Note that while programs may be algorithmic in nature, many programs do not terminate without external intervention.

Many code sequences qualify as algorithms. One example is a code sequence that prints a report. More famously, Euclid's algorithm is used to calculate the mathematical greatest common divisor. A case could even be made that a data structure's basic operations (such as store value in array slot) are algorithms. In this series, for the most part, I'll focus on higher-level algorithms used to process data structures, such as the Binary Search and Matrix Multiplication algorithms.

Flowcharts and pseudocode

How do you represent an algorithm? Writing code before fully understanding its underlying algorithm can lead to bugs, so what's a better alternative? Two options are flowcharts and pseudocode.

Using flowcharts to represent algorithms

A flowchart is a visual representation of an algorithm's control flow. This representation illustrates statements that need to be executed, decisions that need to be made, logic flow (for iteration and other purposes), and terminals that indicate start and end points. Figure 1 reveals the various symbols that flowcharts use to visualize algorithms.

Consider an algorithm that initializes a counter to 0, reads characters until a newline (\n) character is seen, increments the counter for each digit character that's been read, and prints the counter's value after the newline character has been read. The flowchart in Figure 2 illustrates this algorithm's control flow.

A flowchart's simplicity and its ability to present an algorithm's control flow visually (so that it's is easy to follow) are its major advantages. Flowcharts also have several disadvantages, however:

  • It's easy to introduce errors or inaccuracies into highly-detailed flowcharts because of the tedium associated with drawing them.
  • It takes time to position, label, and connect a flowchart's symbols, even using tools to speed up this process. This delay might slow your understanding of an algorithm.
  • Flowcharts belong to the structured programming era and aren't as useful in an object-oriented context. In contrast, the Unified Modeling Language (UML) is more appropriate for creating object-oriented visual representations.

Using pseudocode to represent algorithms

An alternative to flowcharts is pseudocode, which is a textual representation of an algorithm that approximates the final source code. Pseudocode is useful for quickly writing down an algorithm's representation. Because syntax is not a concern, there are no hard-and-fast rules for writing pseudocode.

You should strive for consistency when writing pseudocode. Being consistent will make it much easier to translate the pseudocode into actual source code. For example, consider the following pseudocode representation of the previous counter-oriented flowchart:

 DECLARE CHARACTER ch = '' DECLARE INTEGER count = 0 DO READ ch IF ch GE '0' AND ch LE '9' THEN count = count + 1 END IF UNTIL ch EQ '\n' PRINT count END

The pseudocode first presents a couple of DECLARE statements that introduce variables ch and count, initialized to default values. It then presents a DO loop that executes UNTILch contains \n (the newline character), at which point the loop ends and a PRINT statement outputs count's value.

For each loop iteration, READ causes a character to be read from the keyboard (or perhaps a file--in this case it doesn't matter what constitutes the underlying input source) and assigned to ch. If this character is a digit (one of 0 through 9), count is incremented by 1.

Choosing the right algorithm

The data structures and algorithms you use critically affect two factors in your applications:

  1. Memory usage (for data structures).
  2. CPU time (for algorithms that interact with those data structures).

It follows that you should be especially mindful of the algorithms and data structures you use for applications that will process lots of data. These include applications used for big data and the Internet of Things.

Balancing memory and CPU

When choosing a data structure or algorithm, you will sometimes discover an inverse relationship between memory usage and CPU time: the less memory a data structure uses, the more CPU time associated algorithms need to process the data structure's data items. Also, the more memory a data structure uses, the less CPU time associated algorithms will need to process the data items–leading to faster algorithm results.

As much as possible, you should strive to balance memory use with CPU time. You can simplify this task by analyzing algorithms to determine their efficiency. How well does one algorithm perform against another of similar nature? Answering this question will help you make good choices given a choice between multiple algorithms.

Measuring algorithm efficiency

Some algorithms perform better than others. For example, the Binary Search algorithm is almost always more efficient than the Linear Search algorithm–something you'll see for yourself in Part 2. You want to choose the most efficient algorithm for your application's needs, but that choice might not be as obvious as you would think.

For instance, what does it mean if the Selection Sort algorithm (introduced in Part 2) takes 0.4 seconds to sort 10,000 integers on a given machine? That benchmark is only valid for that particular machine, that particular implementation of the algorithm, and for the size of the input data.

As computer scientist, we use time complexity and space complexity to measure an algorithm's efficiency, distilling these into complexity functions to abstract implementation and runtime environment details. Complexity functions reveal the variance in an algorithm's time and space requirements based on the amount of input data:

  • A time-complexity function measures an algorithm's time complexity--meaning how long an algorithm takes to complete.
  • Funkcija prostor složenost mjeri algoritam je složenost prostora --meaning količinu memorije nadzemne zahtijeva algoritma za obavljanje njegove zadaće.

Obje funkcije složenosti temelje se na veličini ulaznih podataka ( n ), što nekako odražava količinu ulaznih podataka. Uzmite u obzir sljedeći pseudokod za ispis polja:

 DECLARE INTEGER i, x[] = [ 10, 15, -1, 32 ] FOR i = 0 TO LENGTH(x) - 1 PRINT x[i] NEXT i END

Funkcije vremenske složenosti i složenosti vremena

Složenost vremena ovog algoritma možete izraziti specificiranjem funkcije vremenske složenosti , gdje (konstantni množitelj) predstavlja količinu vremena za završetak jedne iteracije petlje i predstavlja vrijeme postavljanja algoritma. U ovom primjeru vremenska složenost je linearna.t(n) = an+bab